Regular Expressions

Regular expression searching provides a way to search for advanced combinations of characters. A regular expression included in a search request must be quoted and must begin with ##. Examples:

Apple and "##199[0-9]"

Apple and "##19[0-9]+"

 

In addition to searching, dtSearch can use regular expressions in File Segmentation and Text Fields rules.  

When used in searching, a regular expression must match a single whole word. For example, you could not search for "apple pie" with a regular expression "##app.*ie".  The beginning of line and end of line regular expression markers ^ and $ cannot be used in searches.

Special characters in a regular expression are:

Regular expression

Effect

.  (period)

Matches any single character.  Example: "sampl." would match "sample" or "samplZ"

\

Treat next character literally.  Example: in "\$100", the \ indicates that the pattern is "$100", not end-of-line ($) followed by "100"

[abc]

Brackets indicate a set of characters, one of which must be present.  For example, "sampl[ae]" would match "sample" or "sampla", but not "samplx"

[a-z]

Inside brackets, a dash indicates a range of characters.  For example, "[a-z]" matches any single lower-case letter.

[^a-z]

Indicates any character except the ones in the bracketed range.

.* (period, asterisk)

An asterisk means "0 or more" of something, so .* would match any string of characters, or nothing

.+ (period, plus)

A plus means "1 or more" of something, so .+ would match any string of at least one character

[a-z]+

Any sequence of one or more lower-case letters.

 

dtSearch uses the TR1 implementation of regular expressions, which provides many capabilities beyond what is described above.   For more details on TR1 regular expression capabilities, please see this Microsoft article:

http://msdn.microsoft.com/en-us/library/bb982727.aspx