Automatic recognition of dates, email addresses and credit card numbers

Menu option: Options > Preferences > Indexing Options

Dates

Date recognition looks for anything that appears to be a date, using English-language months (including common abbreviations) and numerical formats.  Examples of date formats that are recognized include:

January 15, 2006

15 Jan 06

2006/01/15

1/15/06

1-15-06

The fifteenth of January, two thousand six

 

To search for a date, put "date()" around the date expression or range.  For example, to find any of the expressions above near the word "apple", search for:

date(jan 15 2006) w/10 apple

 

To search for a range of dates near the word "apple", search for:

date(jan 10 2006 to jan 20 2006) w/10 apple

 

A field search for a date expression would be expressed like a field search for a word:

DateField contains date(jan 10 2006 to jan 20 2006)

 

Unterminated ranges are not supported, so to search for any date after or before a particular date, enter a bounded range with a maximal or minimal value for the bounds.   The maximum value for a year is 2900, and the minimum value is 1000.  Example:


DateField contains date(jan 10 2006 to jan 1 2900)

Email Addresses

Email address recognition looks for text that follows the syntax for a valid email address (example:  sales@dtsearch.com).  This makes it possible to search for a specific email address regardless of the alphabet settings for the @ and . characters, as well as any other punctuation that may be present in an email address.  Also, this makes it possible to use the word listing functions in dtSearch to enumerate all email addresses in a document collection.

To search for an email address, put "mail()" around the address.  The * and ? wildcard expressions are supported inside the () marks.  Examples:

mail(sales@dtsearch.com)

mail(sa*@dtsearch.com)

Credit Card Numbers

Credit card number recognition looks for any sequence of numbers that appears to satisfy the criteria for a valid credit card number issued by one of the major credit card issuers.  Credit card numbers are recognized regardless of the pattern of spaces or punctuation embedded in the number.  Examples:

1234-5678-1234-5678

1234567812345678

1234 5678 1234 5678

 

Numerical tests used by the credit card issuers for card validity are used to exclude sequences of numbers that are not credit card numbers.  However, these tests are not perfect and so the credit card number recognition feature may pick up some numbers that are not really credit card numbers.

To search for a credit card number, put "creditcard()" around the number.  Example:

creditcard(1234*)

Other numerical patterns

To search for other numerical patterns such as social security numbers, you can use the = wildcard, which matches any single digit. For example, if hyphens are indexed as spaces, then the following search request would find U.S. social security numbers:

=== == ====

Enabling automatic recognition of dates, email addresses, and credit card numbers

In dtSearch Desktop, click Options > Preferences > Indexing Options, and check the box to "Automatically recognize dates, email addresses, and credit card numbers in text."

Word lists

To list dates, credit card numbers or email addresses in an index, click Index > List Index Contents.  The same syntax used in search requests works in the listing functions, so listing words matching "creditcard(*)" will list all credit card numbers in the index.

Effect on performance

Indexing will be slower with the recognition feature enabled.