Indexing Options

Menu option: Options > Preferences > Indexing options

Automatically recognize dates, email addresses and credit card numbers in text
Check this box to have the dtSearch indexer scan for anything that looks like a date, email address, or credit card number during indexing.  With this option enabled, you can search specifically for text matching credit card numbers, email addresses, or ranges of dates.  See: Automatic recognition of dates, email addresses, and credit card numbers.

Index numbers
If your documents contain a lot of numbers and you do not expect to want to search for them, clear this checkbox to make dtSearch exclude numbers from your index.  This will make your indexes smaller and will speed indexing.

Index document properties
If checked, dtSearch will index document summary information fields in Office, PDF and WordPerfect documents and META tags in HTML files.

Index filenames as text
If checked, dtSearch will append the filename of each document to the end of the text during indexing, so text in a filename will be searchable like other document text.  If "Include path information" is checked, then the full path and filename will be searchable, instead of just the filename.

Add file type name (Word, Excel, etc.) to documents
The file type name will appear at the end of each document in a searchable "File Type" field.  With this option enabled, you can include file type criteria in a search.  Example:  "apple and (file type contains Excel)".  

Index MIME headers in emails
Basic email properties such as Subject, To, From, and Date are always indexed.  Check this box to also index the text of all MIME headers transmitted with a message.

Index HTML scripts, styles, links, and comments
Normally HTML scripts, styles, links and comments are not indexed and dtSearch will index only visible text and META tags in HTML files.   Check this box to make these hidden HTML elements searchable.   

Index attachments in PDF files
PDF files can have attachments such as embedded Word documents.  Check this box to have dtSearch detect and index attachments in PDF files.  This option will affect they way PDF files with attachments are displayed in dtSearch.  Because Adobe Reader cannot display PDF attachments with hit highlighting, any PDF files with attachments that are indexed will be displayed as text in dtSearch.

Index links in PDF files
If checked, dtSearch will index any links embedded in PDF files.

Index field names in XML files
Index field attributes in XML files

Check these boxes to have dtSearch index field names or field attributes in XML files.  If both boxes are unchecked, dtSearch will only index field values in XML.

Ignore common HTML field names (<P>, <I>, <B>, etc.) in XML data
Malformed XML data can sometimes contain HTML tags in text fields.  Check this box to have dtSearch detect and ignore these tags.

Index XML files as plain text (without field searching)
With this option selected, XML files are indexed without including field attributes.   All of the text, including field names, remains searchable, but XML content treated as plain text, which makes indexing and searching faster.

Index database files as plain text (without field searching)
With this option selected, database files such as Microsoft Access (*.mdb, *.accdb) and CSV files are indexed without treating each row as a separate document, and without including field attributes.   All of the text, including field names, remains searchable, but database content is combined into a single plain text document, which makes indexing and searching faster.

Index properties of images embedded in documents
When images are indexed as individual files, their properties are always indexed.  Check this box to also index properties of images that are embedded in other documents.

Enable numeric range searching
By default, dtSearch indexes numbers both as text and as numeric values, which is necessary for numeric range searching. Use this flag to suppress indexing of numeric values in applications that do not require numeric range searching. Numbers will still be searchable as text if the Index numbers option is checked. This setting can reduce the size of your indexes by about 20%.

Generate and index MD5 hashes for documents
MD5 hashes are unique numerical codes that are sometimes used in forensics to identify files.  Check this box if you need to be able to search for files by their numerical MD5 hash.  dtSearch will generate a hash for each document as it is indexed and index the hash a field named MD5Hash.  The hash will be formatted as a single 32-digit hexadecimal string.  Generating hashes will make indexing slower.

Generate and index SHA-256 hashes for documents
SHA-256 hashes are unique numerical codes that are sometimes used in forensics to identify files.  Check this box if you need to be able to search for files by their numerical SHA-256 hash.  dtSearch will generate a hash for each document as it is indexed and index the hash a field named Sha256Hash.  The hash will be formatted as a pair of 32-digit hexadecimal strings so it will be searchable with the default maximum word length of 32 characters.  Generating hashes will make indexing slower.

Index hidden content in Office documents (such as macros)
In addition to the normally visible text, Office documents can contain a wide range of other embedded data, such as macros, viruses, or other embedded documents.   Check this box to make these items visible in dtSearch.

Index NTFS Summary Information streams
Check this box to have dtSearch index NTFS Summary Information data for each document indexed.   NTFS Summary Information properties were created in older versions of Windows when you right-click a document in Windows Explorer and enter values in the Summary Information fields (Author, Subject, etc.).  

Compatibility mode for Parallels Desktop for Mac
Parallels Desktop is a program that enables Windows to run on Mac computers.  The option enables dtSearch to work around an issue in Parallels Desktop that causes Parallels to report the modification date of files located in Mac folders as January 1, 1601.  When this box is checked, dtSearch alters the indexing process to ensure that the correct dates are detected.  

Index lists of file filenames in ZIP and RAR archives
This option provides a way to search on the list of files in a ZIP or RAR archive, even if the individual files may be inaccessible due to encryption. When dtSearch indexes a ZIP or RAR archive, in addition to the files actually present in the ZIP or RAR archive, it will also make a list of all of the files in the archive and index it with the name ArchiveFileList.html.  This is only done if the filenames themselves are not encrypted in the archive.

The original file is not modified but the ArchiveFileList.html file is searchable as if it were part of the ZIP or RAR file. The file consists of a list of the names of the files inside the archive.  

Default location for new indexes
By default, indexes will be created in your dtSearch UserData folder.  You can specify a different location here.   (In the Create Index dialog box, you can override this setting for each index that you create.)