Letters and Words

Menu option: Options > Preferences > Letters and words

Changes to these settings take effect when you create a new index and will not affect existing indexes.

Alphabet file
The alphabet file determines how dtSearch interprets certain characters in your documents (Unicode characters in the range from 32-127).  Other character properties are set to conform to the Unicode Standard and cannot be modified.  The default alphabet file included with dtSearch is DEFAULT.ABC. 

To modify the alphabet file (for example, to make a character such as + searchable) click the Edit... button.

Noise word list
The noise word list contains words that are generally too common to be useful in searching (such as the). See "Noise Words" for more information.  To modify the noise word list or to select a noise word list in a language other than English, click the Edit... button.

Maximum word length
This is the number of letters dtSearch will consider when indexing long words. 

Hyphens
By default, dtSearch treats hyphens as spaces in indexed text and in search requests.  For example, "first-class" would be treated like "first class." This option provides a way to select alternative treatments. Treating hyphen as spaces is recommended.  For technical details on the other options, please see "Hyphenation options."

Insert word breaks between Chinese, Japanese, and Korean characters in text
Check this box if you are searching Chinese, Japanese, or Korean documents that do not contain word breaks.  

Some Chinese, Japanese, and Korean text does not include word breaks.  Instead, the text appears as lines of characters with no spaces between the words. Because there are no spaces separating the words on each line, dtSearch sees each line of text as a single long word. To make this type of text searchable, enable automatic insertion of word breaks around Chinese, Japanese, and Korean characters, so each character will be treated as single word.