Data Generator

Index

How to Start Data Generator?
Edit Table Patterns
Pattern Repository
Generator Errors

Boolean
Numeric Patterns
Sequence
Skip
Date & Timestamp

Values from File
Foreign Key Values
Reverse Regular Expression Text
Combine Values from two Patterns
Groovy Patterns

Use the Data Generator to fill database tables with random data. The random generator is using patterns for setting how the generated data should look like. There are dedicated patterns for numbers, date, booleans, starting with a keyword like int, short, sequence, boolean, groovy, etc. Any other pattern which doesn't start with one of these keywords will be interpreted as reverse regular expressions. Here are the used patterns:

boolean:
date: timestamp:
int: short: double: long: float:
sequence: - generate numbers in a row
identity:
file: - values from file
skip: - skip the column from generator (do not generate)
load_values_from_pk - foreign key columns (involved in a foreign key, referring other columns) should use this pattern. They will pick up a value from the Primary Key table.
json_list: json_map: - we use this for MongoDb
uuid:
groovy: - use own script to generate values
$patternA $patternB - combine two patterns
ref: - reference another generator
All other patterns will be interpreted as Reverse Regular Expressions

The first time you access the Data Generator feature, DbSchema will try to find a pattern for you. This is not always the best, so please review them and set the best one. You can choose and edit for each column one of the predefined patterns from the Pattern Repository.

The generator patterns are saved for each column in the model file. Define them once and save the .dbs model file.

How to Start The Data Generator

First, create a diagram with the tables for which you want to generate random data. Each time you need a different set of tables, you have to create a new diagram. Saving the model file will save the diagrams as well as the data generator settings.

Start the Data Generator from the Data Tools application menu or by right-clicking any table header.

Set the number of data rows to be generated for each table and the table order. The dialog is also showing the table status: if it currently contains data or not. Double-click any of the tables for editing the generator settings.

The table order is important for minimizing errors caused by foreign keys. For example, if the table City refers to the table Country, the table City should be the first in the list. DbSchema will try to order of the tables, which sometimes may fail, if for example, there are foreign keys in a loop.

Database Data Data Generator

Tip about checking existing data before generating random data

After opening the dialog, DbSchema will check if the tables contain data or not. When clicking the 'Generate' button, it will ask to drop or not the current data. Please make sure you don't drop data on production databases! Carefully read DbSchema messages.

Edit Table Patterns

In the next dialog, you can edit the table patterns, the percent of null values, and the seed. If the column is not mandatory, this field is marked with a red '*'. Two columns with the same pattern and seed will generate identical values.

Pattern Repository

The pattern repository is storing predefined patterns. DbSchema tries to find one pattern for each column, but this does not always work. You can choose a pattern from the repository and edit it in the table pattern dialog.

Generator Errors

Different errors may occur when generating the data. In the Generator Dialog, select one of the tables, and the errors will show at the bottom. To read further details, click the error.

At this point you have options to view the sample generated data or view the actual data in the table. Fixing the problem would mean to edit the pattern for the problematic column.

Generator Patterns

As explained in the Table Generator Dialog, you can edit for each column the:

Pattern - is the generator pattern. See below a description of available patterns.
Nulls - a red (*) if the column is mandatory ( NOT NULL). In this case, the Data Generator will always generate a value. If not, you can set a percentage of values to be NULL.
Seed - is an integer. If you set two columns with the same value, they will generate the same sequence of values.

The generator pattern can be any text. With some exceptions, the text is interpreted as reverse regular expression text. Here are the different possibilities:

Boolean

May use percent_true as double between 0 and 1 to set the density of 'true' versus 'false' values. Percent_true default value is 0.5 .


boolean:percent_true=0.7

Numeric Patterns

Use a pattern like int:from=0;to=10000;. Instead of an 'int' you can use 'integer', 'long', 'double', 'float' or 'short'. The 'from' and 'to' values are the interval where the values are generated.

For double and float you can use double:from=0;to=10000;format=####.##. The format is used to format the decimal digits. It is based on the java class java.text.DecimalFormat.

Samples:

int:from=0;to=100

Generated data : 12, 64, 34....

double:from=0.5;to=5.5;format=#.##

Will generate random double values with two digits after comma : 0.53, 0.12....

Sequence

sequence:from=0;step=1;

Generates number starting at 'from' and increasing with 'step'.

Identity

identity:

Set this pattern for identity columns to be left empty and ignored from generation.

Skip

skip:

Set this pattern to skip one column from being generated.

Date and Timestamp

Dates and timestamps have a pattern like date:from='01.01.2008';to='01.01.2009'; or timestamp:from='01.01.2008 00:00:00 000';to='01.01.2009 00:00:00 000';

The date and timestamp formats are specified in the Edit Menu / Configuration Dialog.

Values from File

list:<path_to_file>

The path_to_file should point to a text file. The generator will randomly pick up a text line from that file and use it as the generated value. The default files included in DbSchema are located in the dbschema.jar in the /generator folder. Add new files to the application in /Users/v<CurrentUser> /.DbSchema /config /generator.

Foreign Key Columns: 'load_values_from_pk'

If a column refers to a primary key column via a foreign key, that column should contain only values from the primary key column. Using the pattern 'load_values_from_pk', the data generator will load the existing values from the primary key column and use them for generating values in the referring column.

The primary key table should contain at least one record for the pattern to work. If the actual referring column is unique, the maximal number of records is limited to the number of records in the primary key column.

Combine Values from Two Patterns

Some Text $Sequnece

Use $PatternName to refer a predefined pattern from the pattern repository. This will generate 'Some Text 1', 'Some Text 2', etc.

Reference Generator from the Repository

ref:name=generatorName

This may help if you want to share the same generator over multiple columns. Modifying the generator will apply the changes on all columns.

Reverse Regular Expression Text

Any other text is treated as a reverse regular expression string. The random text is generated based on the current text.

Sample:

(My|Your|Their) friend (John|Mike)

Will generate one of :
My friend John
Your friend Mike
....

How to choose predefined patterns from the pattern repository

You can choose a predefined pattern from the Pattern Repository Dialog by clicking the '...' button.

Reverse Regular Expressions

The Reverse Regular Expressions are from the Java programming language. Here is a list of known symbols:

.           Matches any single sign, includes everything
^xxx        Matches xxx regex at the beginning of the line
xxx$        Matches regex xxx at the end of the line
[abc]       Can match any of the letter a, b or c. [] are known as character classes.
[abc][12]   Can match a, b or c followed by 1 or 2
[^abc]      When ^ is the first character in [], it negates the pattern, matches anything except a, b or c
[a-e1-8]    Matches ranges between a to e or 1 to 8
xx|yy       Matches regex xx or yy

Regular Expression Quantifiers

Quantifiers specify the number of occurrences of a character to match against.

x?          x occurs once or not at all
X*          X occurs zero or more times
X+          X occurs one or more times
X{n}        X occurs exactly n times
X{n,}       X occurs n or more times
X{n,m}      X occurs at least n times but not more than m times

Expressions Metacharacters

\d	        Any digits, short of [0-9]
\D	        Any non-digit, short for [^0-9]
\s	        Any whitespace character, short for [\t\n\x0B\f\r]
\S	        Any non-whitespace character, short for [^\s]
\w	        Any word character, short for [a-zA-Z_0-9]
\W	        Any non-word character, short for [^\w]
\b	        A word boundary
\B	        A non word boundary

Groovy Patterns

Use this pattern to programmatic generate one value.

Groovy is a Java Scripting language, known as is used by Gradle. You can use Java Groovy to generate complex values, such as JSON objects. For example:

groovy:
    def map = [firstname: 30, lastname:generate('(I|John|Cindy)')]
    groovy.json.JsonOutput.toJson( map )

Here we generate a map object, and we fill it with two random values. Then we convert the map to a JSON string, which can be inserted in a JSON column in MySql. Object to String and vice-versa can be done using the Groovy JSON Slurper Library.

The function 'generate(String pattern)' can be used to generate random values. The pattern can be any reverse regular expression or pattern, like 'int:from:0;to=100'. Look in the Pattern Repository for other examples. The method can be called with :

generate( String pattern )
generate( String pattern, int nullPercent ) // nullPercent can be between 0 and 100, 0 means no null values
generate( String pattern, int nullPercent, int seed )

For the Cassandra database, you can store map objects directly in the database. The next example will generate a map and fill it with two values.

groovy:[firstname: '(Anna|John|Cindy)', lastname:generate('(Schwarz|Danin)')]

Use Groovy to combine values from two different other columns.

groovy: income_cost + vat_cost

Use Groovy patterns to compute custom values from values already generated for other columns. For example if you have two columns: 'income_cost' and 'vat_cost', you can generate a computed value into another column using:

groovy: if ( cost > 10 ) return cost*2; else return cost +5;

Suppose two columns 'start_date' and 'end_date' and generate an 'end_date' after 'start_date' with no more than 1 week, using:

groovy: return start_date + (int)(Math.random()*7)

Groovy is a Java-based scripting language. Please check the Help / Code Samples dialog.