Sets dtSearch Engine indexing and searching option settings.
File
File: dtsearch.h
Syntax
C++
struct dtsOptions {
long binaryFiles;
char binaryFilterTextChars[256];
long hyphens;
char alphabetFile[FileNameLen];
long indexNumbers;
char noiseWordFile[FileNameLen];
char stemmingRulesFile[FileNameLen];
long maxWordsToRetrieve;
long maxStoredFieldSize;
long titleSize;
char xmlIgnoreTags[512];
long maxWordLength;
char segmentationRulesFile[FileNameLen];
char textFieldsFile[FileNameLen];
char userThesaurusFile[FileNameLen];
long updateFiles;
long lzwEnableCode;
char homeDir[FileNameLen];
char privateDir[FileNameLen];
char booleanConnectors[512];
char fileTypeTableFile[FileNameLen];
long textFlags;
long maxFieldNesting;
long autoFilterSizeMB;
char macroChar;
char fuzzyChar;
char phonicChar;
char stemmingChar;
char synonymChar;
char weightChar;
char matchDigitChar;
char storedFieldDelimiterChar;
long fieldFlags;
char unicodeFilterRanges[256];
long unicodeFilterBlockSize;
long unicodeFilterFlags;
long unicodeFilterMinTextSize;
dtsLanguageAnalyzerInterface * pAnalyzer;
long unicodeFilterWordOverlapAmount;
char tempFileDir[200];
};
Data Members
Data Member |
Description |
---|---|
Name of dtSearch alphabet file to use when parsing text into words. | |
Size of files that are always processed using the unicode filtering algorithm, in megabytes. Files larger than autoFilterSizeMB are assumed to be non-document files such as forensically recovered disk images or slack space, and are indexed using the unicode filtering algorithm. If zero, the unicode filtering algorithm is never applied based on file size. | |
BinaryFilesSettings value specifying the treatment of binary files. | |
Define characters considered to be text if binaryFiles is set ot dtsoFilterBinary | |
Use to replace the default connectors used in search requests. | |
FieldFlags values that control indexing of metadata. | |
Name of the file containing a table of filename patterns for file formats that dtSearch cannot detect automatically, such as older versions of WordStar. The FileTypeTableFile is an XML file. To create the file, start dtSearch Desktop, click Options > Preferences > File Types and use the dialog box to set up the file type definitions. The XML file will be saved as filetype.xml in your dtSearch UserData folder. See File Types in the dtSearch Desktop help for information on this setting. | |
Character that enables fuzzy searching for a search term (default "%"); see Redefining Search Operators. | |
Directory where the dtSearch Engine and support files are located. | |
HyphenSettings value specifying the treatment of hyphens in text | |
If false, any word that begins with a digit will not be indexed. | |
Obsolete. This was used in older versions to enable LZW decompression in file parsers, which is now always enabled. | |
Character that indicates that a search term is a macro (default "@"); see Redefining Search Operators. | |
Wildcard character that matches a single digit (default "=") | |
Maximum depth of nested fields (value must be between 1 and 32) | |
Maximum size of a single stored field. Stored fields are field data collected during indexing that is returned in search results. | |
Words longer than the maxWordLength will be truncated when indexing. The default maxWordLength is 32. The maximum value is 128. | |
Maximum number of words that can be matched in a search. This can be any value from 16 to 512k (32-bit)or 4,096k (64-bit). The default is 64k. If a search matches more unique words than the maxWordsToRetrieve limit, the error code dtsErMaxWords (137) will be returned. | |
List of noise words to skip during indexing (default: "noise.dat") A noise word is a word such as the or if that is so common that it is not useful in searches. To save time, noise words are not indexed and are ignored in index searches. When an index is created, dtSearch copies the list of words from noise.dat into the index directory and also builds the word list into other index files. After an index is created, subsequent changes to the noise word list will not affect indexing for that index | |
Pointer to language analyzer to use for word breaking | |
Character that enables phonic searching for a search term (default "#"); see Redefining Search Operators. The regular expression mark ("##") is a doubling of the phonicChar. | |
A directory that the dtSearch Engine can use to store per-user settings and temporary files. | |
File segmentation rules, used to split up long text files into logical subdocuments during indexing. The SegmentationRulesFile is an XML file. To create the file, start dtSearch Desktop, click Options > Preferences > File Segmentation Rules and use the dialog box to set up the rules. The XML file will be saved as fileseg.xml in your dtSearch UserData folder. See File Segmentation Rules in the dtSearch Desktop help for information on this setting. | |
Character that enables stemming for a search term; see Redefining Search Operators. The expression used for range searching (default "~~") a doubling of the stemmingChar. | |
Stemming rules for stemming searches (default: "stemming.dat") The stemming.dat file uses a plain text format and includes comments in the file that describe the file format. | |
Character to insert between multiple instances of a stored field in a single document | |
Character that enables synonym searching for a search term (default "&"); see Redefining Search Operators. | |
Directory to use for temporary files. | |
Name of the file containing rules for extraction of field data from text files based on markers in the next The TextFieldsFile is an XML file. To create the file, start dtSearch Desktop, click Options > Preferences > Text Fields and use the dialog box to set up the text field definitions. The XML file will be saved as fields.xml in your dtSearch UserData folder. See Define Text Fields in the dtSearch Desktop help for information on this setting. | |
Flags that control text-processing options. See the TextFlags enum for values. | |
Use this option to change the number of characters stored as the "title" property of each document, up to a maximum of 512. By default, the dtSearch Engine collects the first 80 characters of text from a file for the title associated with each document. | |
Specifies how each input file is divided into blocks before being filtered. | |
UnicodeFilterFlags values controlling the behavior of the Unicode filtering algorithm. | |
Minimum length of a run of text when applying the Unicode Filtering algorithm. | |
Indicates Unicode ranges that are of interest when filtering. | |
Amount of overlap when automatically breaking words when applying the Unicode Filtering algorithm. | |
Set to true to force all configuration files to be re-read. If the contents of a configuration file such as TextFieldsFile changes, but the filename is not changed, set UpdateFiles=true to indicate that dtSearch should discard any internally-cached copies of configuration files and re-read them from disk. | |
User-defined synonym sets. The UserThesaurusFile is an XML file. To create the file, start dtSearch Desktop, click Options > Preferences > User Thesaurus and use the dialog box to set up the synonym definitions. The XML file will be saved as thesaur.xml in your dtSearch UserData folder. See User Thesaurus in the dtSearch Desktop help for information on this setting. | |
Character used to indicate term weighting (example: apple:5); see Redefining Search Operators. The prefix used to add field name in front of a word in an xfilter expression is a doubling of the weightChar (default "::"). For example, if you change the WeightChar to !, then an xfilter expression with a field would look like this: ... more | |
Comma-separated list of tags to ignore when indexing XML |
Group
Members
Data Members
Data Member |
Description |
---|---|
Name of dtSearch alphabet file to use when parsing text into words. | |
Size of files that are always processed using the unicode filtering algorithm, in megabytes. Files larger than autoFilterSizeMB are assumed to be non-document files such as forensically recovered disk images or slack space, and are indexed using the unicode filtering algorithm. If zero, the unicode filtering algorithm is never applied based on file size. | |
BinaryFilesSettings value specifying the treatment of binary files. | |
Define characters considered to be text if binaryFiles is set ot dtsoFilterBinary | |
Use to replace the default connectors used in search requests. | |
FieldFlags values that control indexing of metadata. | |
Name of the file containing a table of filename patterns for file formats that dtSearch cannot detect automatically, such as older versions of WordStar. The FileTypeTableFile is an XML file. To create the file, start dtSearch Desktop, click Options > Preferences > File Types and use the dialog box to set up the file type definitions. The XML file will be saved as filetype.xml in your dtSearch UserData folder. See File Types in the dtSearch Desktop help for information on this setting. | |
Character that enables fuzzy searching for a search term (default "%"); see Redefining Search Operators. | |
Directory where the dtSearch Engine and support files are located. | |
HyphenSettings value specifying the treatment of hyphens in text | |
If false, any word that begins with a digit will not be indexed. | |
Obsolete. This was used in older versions to enable LZW decompression in file parsers, which is now always enabled. | |
Character that indicates that a search term is a macro (default "@"); see Redefining Search Operators. | |
Wildcard character that matches a single digit (default "=") | |
Maximum depth of nested fields (value must be between 1 and 32) | |
Maximum size of a single stored field. Stored fields are field data collected during indexing that is returned in search results. | |
Words longer than the maxWordLength will be truncated when indexing. The default maxWordLength is 32. The maximum value is 128. | |
Maximum number of words that can be matched in a search. This can be any value from 16 to 512k (32-bit)or 4,096k (64-bit). The default is 64k. If a search matches more unique words than the maxWordsToRetrieve limit, the error code dtsErMaxWords (137) will be returned. | |
List of noise words to skip during indexing (default: "noise.dat") A noise word is a word such as the or if that is so common that it is not useful in searches. To save time, noise words are not indexed and are ignored in index searches. When an index is created, dtSearch copies the list of words from noise.dat into the index directory and also builds the word list into other index files. After an index is created, subsequent changes to the noise word list will not affect indexing for that index | |
Pointer to language analyzer to use for word breaking | |
Character that enables phonic searching for a search term (default "#"); see Redefining Search Operators. The regular expression mark ("##") is a doubling of the phonicChar. | |
A directory that the dtSearch Engine can use to store per-user settings and temporary files. | |
File segmentation rules, used to split up long text files into logical subdocuments during indexing. The SegmentationRulesFile is an XML file. To create the file, start dtSearch Desktop, click Options > Preferences > File Segmentation Rules and use the dialog box to set up the rules. The XML file will be saved as fileseg.xml in your dtSearch UserData folder. See File Segmentation Rules in the dtSearch Desktop help for information on this setting. | |
Character that enables stemming for a search term; see Redefining Search Operators. The expression used for range searching (default "~~") a doubling of the stemmingChar. | |
Stemming rules for stemming searches (default: "stemming.dat") The stemming.dat file uses a plain text format and includes comments in the file that describe the file format. | |
Character to insert between multiple instances of a stored field in a single document | |
Character that enables synonym searching for a search term (default "&"); see Redefining Search Operators. | |
Directory to use for temporary files. | |
Name of the file containing rules for extraction of field data from text files based on markers in the next The TextFieldsFile is an XML file. To create the file, start dtSearch Desktop, click Options > Preferences > Text Fields and use the dialog box to set up the text field definitions. The XML file will be saved as fields.xml in your dtSearch UserData folder. See Define Text Fields in the dtSearch Desktop help for information on this setting. | |
Flags that control text-processing options. See the TextFlags enum for values. | |
Use this option to change the number of characters stored as the "title" property of each document, up to a maximum of 512. By default, the dtSearch Engine collects the first 80 characters of text from a file for the title associated with each document. | |
Specifies how each input file is divided into blocks before being filtered. | |
UnicodeFilterFlags values controlling the behavior of the Unicode filtering algorithm. | |
Minimum length of a run of text when applying the Unicode Filtering algorithm. | |
Indicates Unicode ranges that are of interest when filtering. | |
Amount of overlap when automatically breaking words when applying the Unicode Filtering algorithm. | |
Set to true to force all configuration files to be re-read. If the contents of a configuration file such as TextFieldsFile changes, but the filename is not changed, set UpdateFiles=true to indicate that dtSearch should discard any internally-cached copies of configuration files and re-read them from disk. | |
User-defined synonym sets. The UserThesaurusFile is an XML file. To create the file, start dtSearch Desktop, click Options > Preferences > User Thesaurus and use the dialog box to set up the synonym definitions. The XML file will be saved as thesaur.xml in your dtSearch UserData folder. See User Thesaurus in the dtSearch Desktop help for information on this setting. | |
Character used to indicate term weighting (example: apple:5); see Redefining Search Operators. The prefix used to add field name in front of a word in an xfilter expression is a doubling of the weightChar (default "::"). For example, if you change the WeightChar to !, then an xfilter expression with a field would look like this: ... more | |
Comma-separated list of tags to ignore when indexing XML |
Methods
Methods
Remarks
Deprecated -- use dtsOptions2 instead of dtsOptions.
To change option settings,
- Call dtssGetOptions to get the option settings currently in effect
- Change the values in dtsOptions as needed, and
- Call dtssSetOptions to apply the changes
Option settings are not persisted anywhere so changes must be made each time a new program instance starts.
Option settings apply to the current thread and any threads created after the current thread. Therefore, each thread can have its own settings.