demo_utils.generator module¶
The module which houses the data generator and all of the different types of Datum Objects.
This module is easily extendable by simple implementing the methods in the AbstractDatum class.
Configurations can be read through JSON strings or JSON files.
A configuration object itself should be a list of JSON objects. Each JSON object should have at least two fields implemented
fieldName
(str): A simple user-defined name for this fieldtype
(str): A Datum which has been implemented in the data generator
Any other properties of the object are specific to the type of datum (see datum documentation below).
Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | import demo_utils.generator
from generator import DataGenerator
rand_schema = '/path/to/schema.json'
# or
# import json
# js_schema = [
# {
# "fieldName": "field1"
# "type": "string"
# "values": ["a", "b", "c", "d"]
# }
# ]
# rand_schema = json.dumps(js_schema)
gen = DataGenerator(rand_schema)
# Generate a single 'row' of data
print(gen.generate())
|
-
class
demo_utils.generator.
AbstractDatum
(field)¶ Bases:
object
An abstract object which defines some methods for other Datum to implement.
This class should NOT be instantiated
-
check
()¶ This method should not be used
Raises: NotImplementedError
– This method should not be called by an instance of this class. (In fact there should never be an instance of just this class)
-
check_for_key
(key_name)¶ Ensure the key is in the field
Parameters: key_name (str) – The key to check for in the datum object Returns: True if the key is in the object. Otherwise an error will be raised Return type: bool Raises: KeyError
– If they key is not present, this error is raised.
-
generate
(rand)¶ This method should not be used
Raises: NotImplementedError
– This method should not be called by an instance of this class. (In fact there should never be an instance of just this class)
-
-
class
demo_utils.generator.
BooleanDatum
(field)¶ Bases:
demo_utils.generator.AbstractDatum
-
check
()¶
-
generate
(rand)¶
-
-
class
demo_utils.generator.
DataGenerator
(schema, seed='')¶ The generator object. Pass the configuration here. use the generate() method to get a random object
Parameters: - schema (str) – The schema file or JSON string which defines the data to be generated.
- seed (str, optional) – The seed value for the generator
-
check_schema
(schema)¶ Checks the entire schema for any incorrect or missing parameters.
Parameters: schema (str) – The file path or JSON string of the schema
Returns: N/A
Raises: TypeError
– If the root of JSON schema is not a list.KeyError
– If ‘fieldName’ or ‘type’ are not found in a datumValueError
– If there are duplicate fieldNamesRuntimeError
– If a certain ‘type’ isn’t found
-
generate
()¶ Produce a single row of data. One random value for each datum.
Returns: A dictionary object with keys for each fieldName in the schema. Values according to the schema. Return type: (dict)
-
class
demo_utils.generator.
DecimalDatum
(field)¶ Bases:
demo_utils.generator.NumberDatum
The DecimalDatum uses the NumberDatum to generate numbers. It does not truncate any decimal places.
For configuration examples please see the documentation on
NumberDatum
-
class
demo_utils.generator.
IntDatum
(field)¶ Bases:
demo_utils.generator.NumberDatum
The IntDatum uses the NumberDatum to generate numbers. It Rounds to the nearest whole and then convert the number to an int before returning .
For configuration examples please see the documentation on
NumberDatum
-
generate
(rand)¶
-
-
class
demo_utils.generator.
MapDatum
(field)¶ Bases:
demo_utils.generator.AbstractDatum
Allows you to Map certain values from one field to another.
Using two special fields:
map
andmapFromField
you can use this field to generate a certain piece of data based on another field’s value. Must be a string.Example:
{ "fieldName": "mapped_field", "type": "map", "mapFromField": "RandomIntField1", "map": { "1": "small", "2": "small", "3": "small", "4": "small", "5": "small", "6": "large", "7": "large", "8": "large", "9": "large", "10": "large" } }
Another Example using gender:
{ "fieldName": "gender", "type": "map", "mapFromField": "fName", "map": { "Jen": "F", "Susan": "F", "Mary": "F", "John": "M", "Mike": "M", "Joe": "M" } }
-
check
()¶ Not to be used externally
Checks to make sure the “map” and “mapFromField” are present in the datum
-
generate
(data)¶ Not to be used externally.
Generates the data from the map
-
-
class
demo_utils.generator.
NumberDatum
(field)¶ Bases:
demo_utils.generator.AbstractDatum
Create a random number form a schema object
This datum shouldn’t be used. Use the
IntDatum
orDecimalDatum
instead.Along with ‘fieldName’ and ‘type’ a number datum requires a ‘distribution’ field.
There are 4 types of distributions currently available (along with the extra argument that can be supplied for each):
Parameters: - uniform –
A distribution which produces numbers between a and b with equal probability.
a
: lower bound. Defaults to 0.b
: upper bound. Defaults to 1.
- exponential –
A distribution which produces lower values with high probability.
lambda
: Lower values (greater than 0) results in higher numbers. High values of lambda result in lower values. Defaults to 1.
- gaussian –
A distribution which produces numbers in a normal curve. The values produced will fall in between 3 standard deviations 97% of the time.
mu
: The mean of all numbers which will be produced. Defaults to 0.sigma
: The standard deviation of numbers produced. Defaults to 1.
- gamma –
Results in a gamma distribution
alpha
: defaults to 1beta
: defaults to 1
Example
Gaussian Example:
{ "fieldName": "number_field", "type": "int", "distribution": "gaussian", "mu": 50, "sigma": 10 }
Exponential Example
{ "fieldName": "number_field", "type": "decimal", "distribution": "exponential", "lambda": 0.5 }
-
check
()¶ Not to be used externally
-
generate
(rand)¶ Generates a rando number based on the distribution parameters given in the schema.
- uniform –
-
class
demo_utils.generator.
StringDatum
(field)¶ Bases:
demo_utils.generator.AbstractDatum
A datum that will randomly generate strings
There are two possible options writing schemas.
The first is using a list of strings. This causes each element to appear with equal probability.
{ "fieldName": "test_field", "type": "string", "values": ["a", "b", "c", "d"] }
The second is using an object with different keys, and specifying a probability for each key.
{ "fieldName": "test_field", "type": "string", "values": { "a": 0.1, "b": 0.2, "c": 0.5 } }
Parameters: field (dict) – The datum object (represented as a dict) -
check
()¶ Ensure that the fieldName and values are proper.
Shouldn’t be called externally
-
generate
(rand)¶ Generate a string from the given list of values. Pick one based on even or given probabilities
Shouldn’t be called externally
-