demo_utils.generator module

The module which houses the data generator and all of the different types of Datum Objects.

This module is easily extendable by simple implementing the methods in the AbstractDatum class.

Configurations can be read through JSON strings or JSON files.

A configuration object itself should be a list of JSON objects. Each JSON object should have at least two fields implemented

  • fieldName (str): A simple user-defined name for this field
  • type (str): A Datum which has been implemented in the data generator

Any other properties of the object are specific to the type of datum (see datum documentation below).

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import demo_utils.generator
from generator import DataGenerator

rand_schema = '/path/to/schema.json'
# or
# import json
# js_schema = [
#               {
#                 "fieldName": "field1"
#                 "type": "string"
#                 "values": ["a", "b", "c", "d"]
#               }
#             ]
# rand_schema = json.dumps(js_schema)


gen = DataGenerator(rand_schema)
# Generate a single 'row' of data
print(gen.generate())
class demo_utils.generator.AbstractDatum(field)

Bases: object

An abstract object which defines some methods for other Datum to implement.

This class should NOT be instantiated

check()

This method should not be used

Raises:NotImplementedError – This method should not be called by an instance of this class. (In fact there should never be an instance of just this class)
check_for_key(key_name)

Ensure the key is in the field

Parameters:key_name (str) – The key to check for in the datum object
Returns:True if the key is in the object. Otherwise an error will be raised
Return type:bool
Raises:KeyError – If they key is not present, this error is raised.
generate(rand)

This method should not be used

Raises:NotImplementedError – This method should not be called by an instance of this class. (In fact there should never be an instance of just this class)
class demo_utils.generator.BooleanDatum(field)

Bases: demo_utils.generator.AbstractDatum

check()
generate(rand)
class demo_utils.generator.DataGenerator(schema, seed='')

The generator object. Pass the configuration here. use the generate() method to get a random object

Parameters:
  • schema (str) – The schema file or JSON string which defines the data to be generated.
  • seed (str, optional) – The seed value for the generator
check_schema(schema)

Checks the entire schema for any incorrect or missing parameters.

Parameters:

schema (str) – The file path or JSON string of the schema

Returns:

N/A

Raises:
  • TypeError – If the root of JSON schema is not a list.
  • KeyError – If ‘fieldName’ or ‘type’ are not found in a datum
  • ValueError – If there are duplicate fieldNames
  • RuntimeError – If a certain ‘type’ isn’t found
generate()

Produce a single row of data. One random value for each datum.

Returns:A dictionary object with keys for each fieldName in the schema. Values according to the schema.
Return type:(dict)
class demo_utils.generator.DecimalDatum(field)

Bases: demo_utils.generator.NumberDatum

The DecimalDatum uses the NumberDatum to generate numbers. It does not truncate any decimal places.

For configuration examples please see the documentation on NumberDatum

class demo_utils.generator.IntDatum(field)

Bases: demo_utils.generator.NumberDatum

The IntDatum uses the NumberDatum to generate numbers. It Rounds to the nearest whole and then convert the number to an int before returning .

For configuration examples please see the documentation on NumberDatum

generate(rand)
class demo_utils.generator.MapDatum(field)

Bases: demo_utils.generator.AbstractDatum

Allows you to Map certain values from one field to another.

Using two special fields: map and mapFromField you can use this field to generate a certain piece of data based on another field’s value. Must be a string.

Example:

{
  "fieldName": "mapped_field",
  "type": "map",
  "mapFromField": "RandomIntField1",
  "map": {
    "1": "small",
    "2": "small",
    "3": "small",
    "4": "small",
    "5": "small",
    "6": "large",
    "7": "large",
    "8": "large",
    "9": "large",
    "10": "large"
  }
}

Another Example using gender:

{
  "fieldName": "gender",
  "type": "map",
  "mapFromField": "fName",
  "map": {
    "Jen":    "F",
    "Susan":  "F",
    "Mary":   "F",
    "John":   "M",
    "Mike":   "M",
    "Joe":    "M"
  }
}
check()

Not to be used externally

Checks to make sure the “map” and “mapFromField” are present in the datum

generate(data)

Not to be used externally.

Generates the data from the map

class demo_utils.generator.NumberDatum(field)

Bases: demo_utils.generator.AbstractDatum

Create a random number form a schema object

This datum shouldn’t be used. Use the IntDatum or DecimalDatum instead.

Along with ‘fieldName’ and ‘type’ a number datum requires a ‘distribution’ field.

There are 4 types of distributions currently available (along with the extra argument that can be supplied for each):

Parameters:
  • uniform

    A distribution which produces numbers between a and b with equal probability.

    • a: lower bound. Defaults to 0.
    • b: upper bound. Defaults to 1.
  • exponential

    A distribution which produces lower values with high probability.

    • lambda: Lower values (greater than 0) results in higher numbers. High values of lambda result in lower values. Defaults to 1.
  • gaussian

    A distribution which produces numbers in a normal curve. The values produced will fall in between 3 standard deviations 97% of the time.

    • mu: The mean of all numbers which will be produced. Defaults to 0.
    • sigma: The standard deviation of numbers produced. Defaults to 1.
  • gamma

    Results in a gamma distribution

    • alpha: defaults to 1
    • beta: defaults to 1

Example

Gaussian Example:

{
  "fieldName": "number_field",
  "type": "int",
  "distribution": "gaussian",
  "mu": 50,
  "sigma": 10
}

Exponential Example

{
  "fieldName": "number_field",
  "type": "decimal",
  "distribution": "exponential",
  "lambda": 0.5
}
check()

Not to be used externally

generate(rand)

Generates a rando number based on the distribution parameters given in the schema.

class demo_utils.generator.StringDatum(field)

Bases: demo_utils.generator.AbstractDatum

A datum that will randomly generate strings

There are two possible options writing schemas.

The first is using a list of strings. This causes each element to appear with equal probability.

{
  "fieldName": "test_field",
  "type": "string",
  "values": ["a", "b", "c", "d"]
}

The second is using an object with different keys, and specifying a probability for each key.

{
  "fieldName": "test_field",
  "type": "string",
  "values": {
    "a": 0.1,
    "b": 0.2,
    "c": 0.5
  }
}
Parameters:field (dict) – The datum object (represented as a dict)
check()

Ensure that the fieldName and values are proper.

Shouldn’t be called externally

generate(rand)

Generate a string from the given list of values. Pick one based on even or given probabilities

Shouldn’t be called externally