The Options.Hyphens setting is a HyphenSettings value that determines how hyphen characters are indexed
The dtSearch Engine supports four options for the treatment of hyphens when indexing documents: spaces, searchable text, ignored, and "all three".
For most applications, treating hyphens as spaces is the best option. Hyphens are translated to spaces during indexing and during searches. For example, if you index "first-class mail" and search for "first class mail", "first-class-mail", or "first-class mail", you will find the phrase correctly.
|
HyphenSettings Value |
Meaning |
|
dtsoHyphenAsIgnore |
index "first-class" as "firstclass" |
|
dtsoHyphenAsHyphen |
index "first-class" as "first-class" |
|
dtsoHyphenAsSpace |
index "first-class" as "first" and "class" |
|
dtsoHyphenAll |
index "first-class" all three ways |
Effect on Indexes
When an index is created, the hyphenation option currently in effect is stored in the index, and cannot be changed without re-creating that index. Therefore, the hyphenation option you select affects any indexes you create in the future, but it does not affect indexes that already exist.
When a user searches an index, the hyphenation option for that index applies to the user's search request.
How the hyphens option applies during indexing
During indexing, dtSearch extracts a stream of words from each document, and each word is assigned a number that represents that word's position in the file. The first word is assigned the position "1", the second word is assigned the position "2", and so forth. Consider a document that starts with the sentence, "I sent it by first-class mail". The following describes how the document would be treated under each of the hyphenation options:
I (1), sent (2), it (3), by (4), first (5), class (6), mail (7)
I (1), sent (2), it (3), by (4), first-class (5), mail (6)
I (1), sent (2), it (3), by (4), firstclass (5), mail (6)
I (1), sent (2), it (3), by (4), (5) first-class, (6) first-class, (5) first, (6) class, (5) firstclass (6) firstclass, (7) mail
How the hyphens option applies during searching
During a search, dtSearch translates the search request according to the hyphenation option for the index being searched. For example, if you search for "first-class" in an index created with hyphens treated as spaces, the search request is translated into "first class".
During a search of an index created with the "all three" option, the search request is not modified. For example, if you search for "first-class", dtSearch will not search for "firstclass" or "first class".
Effects of the "all three" option
The "all three" option has one advantage over treating hyphens as spaces: it will return a document containing "first-class" in a search for "firstclass". Otherwise, it provides no benefit over treating hyphens as spaces, and it has some significant disadvantages:
|
Copyright (c) 1995-2008 dtSearch Corp. All rights reserved.
|