Troubleshooting Synonym and Stemming Searches

Article: dts0190

Applies to: dtSearch Engine Web

Synonyms

dtSearch provides three kinds of synonym expansion:

(1) "User" synonyms are based on the user thesaurus, thesaur.xml, which is an XML file with a table of synonym sets.  To create a set of User synonyms, run dtSearch Desktop and click Options > Preferences > User Thesaurus.   After you create some synonym groups and save them, the data will be stored in thesaur.xml in your dtSearch user data folder.

(2) "WordNet Synonyms" are synonyms defined in the WordNet thesaurus, which is included with dtSearch.

(3) "WordNet Related Words" are related words (antonyms, sub-categories, etc.) defined in the WordNet thesaurus.

Symptom:  Synonym expansion does not work

(1) Check that you are setting the synonym flags correctly.   To enable synonym searching the dtsSearchSynonyms flag must be set, AND one or more of the following additional flags must be set to indicate the type of synonym expansion to perform:  dtsSearchUserSynonyms, dtsSearchWordNetSynonyms, or dtsSearchWordNetRelated.

(2) Check that dtSearch is finding the thesaurus that corresponds to the type of synonym searching that you are using.

dtSearch gets the user thesaurus filename from Options object's UserThesaurusFile property.  This should be the full path and filename of the thesaur.xml file with your user thesaurus list.

The WordNet thesaurus must be in a folder named WordNet that is a subfolder of either the HomeDir or the PrivateDir.  A debug log of the search will tell you where dtSearch is looking for the thesaurus files and what settings are being used as the HomeDir, PrivateDir, and UserThesaurusFile (look for <dtsOptions> in the log).

Symptom: Incorrect synonym expansion

(1) User Thesaurus:  Check that the synonym rules are set up correctly.   A common error is to include a word in the name of a synonym group, but not in the list of synonyms.   For example, this is an item from a thesaur.xml file:

<Item>

<Name>"Example"</Name>

<Synonyms>"Sample" "examplar" "model" "case in point"</Synonyms>

</Item>

For this item, "Example" would not find any synonyms, because it is not one of the words listed in the <Synonyms> section.  Another common error is to omit the quotation marks around phrases.

(2) WordNet Thesaurus:  If you see puzzling results from a synonym search, start dtSearch Desktop and click Search > Thesaurus, then enter the search word that produced the puzzling results.   The Thesaurus dialog box will list all of the meanings of the word, along with the synonyms associated with each.   For example, "find" as a synonym for "rule" seems incorrect, but in fact it is a synonym of the one of the verb definitions of "rule" (to "rule" in a judicial case, as in "The court finds the defendant guilty as charged").

(3) Combined search options.   Search features such as stemming that apply to a search also apply to synonym lookup.   This means, for example, that a stemming search for "ruling" will find "rule" and also "find" (see above).   If you combine phonic searching with synonym searching, you will get back all synonyms of words that sound like the search term, which is likely to be an absurdly broad set of results.  

Symptom: Synonym searches crash

Check for corruption of the WordNet thesaurus files.   Some of the .dat files have a text format which can result in file changes when the files are copied or processed through tools like SourceSafe.  Minor changes in linefeed or return characters in the .dat files can cause the binary offsets in the index to become invalid.  To check for this, perform a binary comparison of the WordNet thesaurus files you are using with the files from a standard dtSearch Desktop installation.

Stemming

Stemming rules are stored in a file named stemming.dat, which dtSearch looks for in the HomeDir or PrivateDir.  You can also specify the location of the stemming rules file in Options.StemmingRulesFile.

Symptom: Stemming does not work

(1) Check that dtSearch is finding the stemming rules.  To do this, generate a debug log and look in the log stemming.dat, to see where dtSearch is trying to read the stemming rules and whether the file can be accessed.

(2) Check that you are setting the dtsSearchStemming flag in your SearchJob.   A debug log will tell you what search flags are actually being passed to the dtSearch Engine.