How to make the '+', '#', or other characters searchable

Article: dts0131

 

dtSearch uses an "alphabet" file to determine which characters are searchable, which cause a word break, and which are ignored.  By changing the alphabet settings, you can make characters such as # or + searchable, making it possible to search for "C#" or "C++" as opposed to "C".  

Changing alphabet settings in dtSearch Desktop

1.  In dtSearch Desktop, click Options > Preferences > Letters and Words, and click the Edit button next to the Alphabet filename box.

2.  Locate character you want to make searchable in the list of characters and select it.

3.  Set the Character type to Letter

4.  Click Save and then Close

5.  Rebuild your indexes with the Clear index before adding documents box checked.  This is necessary because each index stores a private copy of the alphabet settings when the index is created.  Therefore, changing the alphabet settings only affects indexes that are created or cleared after the change.

For more information on changing your alphabet settings, see:
Alphabet Customization

Changing Search Punctuation Characters

A few characters require special treatment: &, %, #, :, =, and ~. These characters are used in dtSearch search requests to indicate search features like fuzzy searching, phonic searching, etc. To make one of these characters searchable,

1.  Go through the steps above to make the character a "Letter".

2.  Exit dtSearch and run RegEdit to access your dtSearch registry entries. Open the key [HKEY_CURRENT_USER\Software\dtSearch Corp.\dtSearch\Settings]

3.  Use these registry settings to redefine the characters that you want to change:

MacroChar = "@"

FuzzyChar = "%"

StemmingChar = "~"

PhonicChar = "#"

SynonymChar = "&"

WeightChar = ":"

MatchDigitChar = "="

For example, to make the & character searchable, you could change the SynonymChar to $.  Unicode characters (characters with a character code above 128) cannot be used for these symbols, and ! cannot be used for MatchDigitChar, FuzzyChar, or PhonicChar.

Changing Alphabet Settings using the dtSearch Engine API

1.  Follow the steps above to customize the dtSearch alphabet file. The default name for the alphabet file is default.abc, but after changing the settings you can save the file under a different name.

2.  Use the Options object (in .NET or Java) or the DOptions or dtsOptions object (C++) to tell dtSearch to use the new alphabet file. You can also use these objects to change the MacroChar, FuzzyChar, etc. as described above.

Unicode characters

The alphabet file controls the processing of characters in the range from 33 to 127.  Characters above 127 are processed according to the Unicode specification.  For information on overriding the Unicode classification for individual characters, please see this article: Alphabet Settings

Unsearchable characters

The following characters cannot be made searchable in dtSearch: ", (, ), *, and ?.