This is a java library that, given a regex pattern, allows to:
Status
Try it now
Usage
Supported Syntax
Configuration
Limitations
Other similar libraries
Support
Build status:
Latest Release | Latest snapshot |
---|---|
Follow the link to Online IDE with created project: JDoodle.Enter your pattern and see the results.
<dependency>
<groupId>com.github.curious-odd-man</groupId>
<artifactId>rgxgen</artifactId>
<version>1.3</version>
</dependency>
<project>
<repositories>
<repository>
<id>snapshots-repository</id>
<url>https://oss.sonatype.org/content/repositories/snapshots/</url>
</repository>
</repositories>
<!-- .... -->
<dependency>
<groupId>com.github.curious-odd-man</groupId>
<artifactId>rgxgen</artifactId>
<version>1.4-SNAPSHOT</version>
</dependency>
</project>
Changes in snapshot:
a?b|c
#61public class Main {
public static void main(String[] args){
RgxGen rgxGen = new RgxGen("[^0-9]*[12]?[0-9]{1,2}[^0-9]*"); // Create generator
String s = rgxGen.generate(); // Generate new random value
Optional<BigInteger> estimation = rgxGen.getUniqueEstimation(); // The estimation (not accurate, see Limitations) how much unique values can be generated with that pattern.
StringIterator uniqueStrings = rgxGen.iterateUnique(); // Iterate over unique values (not accurate, see Limitations)
String notMatching = rgxGen.generateNotMatching(); // Generate not matching string
}
}
public class Main {
public static void main(String[] args){
RgxGen rgxGen = new RgxGen("[^0-9]*[12]?[0-9]{1,2}[^0-9]*"); // Create generator
Random rnd = new Random(1234);
String s = rgxGen.generate(rnd); // Generate first value
String s1 = rgxGen.generate(rnd); // Generate second value
String s2 = rgxGen.generate(rnd); // Generate third value
String notMatching = rgxGen.generateNotMatching(rnd); // Generate not matching string
// On each launch s, s1 and s2 will be the same
}
}
Pattern | Description |
---|---|
. |
Any symbol |
? |
One or zero occurrences |
+ |
One or more occurrences |
* |
Zero or more occurrences |
\r |
Carriage return CR character |
\t |
Tab character |
\n |
Line feed LF character. |
\d |
A digit. Equivalent to [0-9] |
\D |
Not a digit. Equivalent to [^0-9] |
\s |
Carriage Return, Space, Tab, Newline, Vertical Tab, Form Feed |
\S |
Anything, but Carriage Return, Space, Tab, Newline, Vertical Tab, Form Feed |
\w |
Any word character. Equivalent to [a-zA-Z0-9_] |
\W |
Anything but a word character. Equivalent to [^a-zA-Z0-9_] |
\i |
Places same value as capture group with index i . i is any integer number. |
\Q and \E |
Any characters between \Q and \E , including metacharacters, will be treated as literals. |
\b and \B |
These characters are ignored. No validation is performed! |
\xXX and \x{XXXX} |
Hexadecimal value of unicode characters 2 or 4 digits |
{a} and {a,b} |
Repeat a; or min a max b times. Use {n,} to repeat at least n times. |
[...] |
Single character from ones that are inside brackets. [a-zA-Z] (dash) also supported |
[^...] |
Single character except the ones in brackets. [^a] - any symbol except 'a' |
() |
To group multiple characters for the repetitions |
foo(?=bar) and (?<=foo)bar |
Positive lookahead and lookbehind. These are equivalent to foobar |
foo(?!bar) and (?<!foo)bar |
Negative lookahead and lookbehind. |
(a|b) |
Alternatives |
\ | Escape character (use \\ (double backslash) to generate single \ character) |
RgxGen treats any other characters as literals - those are generated as is.
RgxGen can be configured on global or instance level.
Please refer to the following enum for all available properties: com.github.curiousoddman.rgxgen.config.RgxGenOption
.
Each property value will be looked up in this order:
Use new RgxGenProperties()
to create properties object.RgxGenProperties
extends java.util.Properties
and can be used in all the same ways.
public class Main {
public static void main(String[] args){
// Create properties object (RgxGenProperties extends java.util.Properties)
RgxGenProperties properties = new RgxGenProperties();
// Set value "20" for INFINITE_PATTERN_REPETITION option in properties
RgxGenOption.INFINITE_PATTERN_REPETITION.setInProperties(properties, 20);
// ... now properties can be passed to RgxGen
}
}
Set a global configuration using RgxGen.setDefaultProperties(properties);
public class Main {
public static void main(String[] args){
RgxGenProperties properties = createAndConfigureProperitesObject();
RgxGen rgxGen_1 = new RgxGen("xxx"); // Created for example purposes
// Set default properties.
// NOTE! only instances created after setDefaultProperties are affected.
// e.g. rgxGen_1 will have default value of INFINITE_PATTERN_REPETITION option
// and rgxGen_2 will have value "20" for the property, unless local config specified.
RgxGen.setDefaultProperties(properties);
RgxGen rgxGen_2 = new RgxGen("xxx");
}
}
Set a local configuration using rgxGen.setProperties(localProperties);
on existing RgxGen instance.
public class Main {
public static void main(String[] args){
RgxGenProperties properties = createAndConfigureProperitesObject();
RgxGen.setDefaultProperties(properties);
// Create properties object (RgxGenProperties extends java.util.Properties)
RgxGenProperties localProperties = createAndConfigureLocalProperitesObject();
RgxGen rgxGen_3 = new RgxGen("xxx");
// Set local configuration for rgxGen_3
// Note, for options that are not defined in localProperties, will try find option inside properties, since these are set globally prior creation of rgxGen_3 instance creation
rgxGen_3.setProperties(localProperties);
}
}
Currently these two have very limited support. Please refer to #63.I'm currently working on the solution, but I cannot say when I come up with something.
rgxGen.getUniqueEstimation()
- might not be accurate, because it does not count actual unique values, but only counts different states of each building block of the expression.For example: "(a{0,2}|b{0,2})"
will be estimated as 6, though actual number of unique values is 5.That is because left and right alternative can produce same value.At the same time "(|(a{1,2}|b{1,2}))"
will be correctly estimated to 5, though it will generate same values.
For the similar reasons as with estimations - requested unique values iterator can contain duplicates.
By design a+
, a*
and a{n,}
patterns in regex imply infinite number of characters should be matched.When generating data, that would mean values of infinite length might be generated.It is highly doubtful anyone would require a string of infinite length, thus I've artificially limited repetitions in such patterns to 100 symbols, when generating random values.This value can be changed - please refer to configuration section.
On the contrast, when generating unique values - the number of maximum repetitions is Integer.MAX_VALUE.
Use a{n,m}
if you require some specific number of repetitions.It is suggested to avoid using such infinite patterns to generate data based on regex.
The general rule is - I am trying to generate not matching strings of same length as would be matching strings, though it is not always possible.For example pattern .
- any symbol - would yield empty string as not matching string.Another example a{0,2}
- this pattern could yield empty string, but for not matching string the resulting strings would be only 1 or 2 symbols long.I chose these approaches because they seem predictable and easier to implement.
There are 2 more libraries available to achieve same goal:
Though I found they have following issues:
a{60000}
, or IPV6 regex pattern.(a|b)
the probability of a and b is equal. For (a|b|c)
it would be expected to have a or b or c with probability 33.(3)% each. Though really the probabilities are a=50%, and b=25% and c=25% each. For longer alternatives you might never get the last alternative.I plan to support this library, so you're welcome to open issues or reach me by e-mail in case of any questions.Any suggestions, feature requests or bug reports are welcome!
Please vote up my answer on StackOverflow to help others find this library.