Close
dtSearch Engine API for .NET Framework 2.x-4.x 2024.01
Options::UnicodeFilterWordOverlapAmount Property

Amount of overlap when automatically breaking words when applying the Unicode Filtering algorithm.

public int UnicodeFilterWordOverlapAmount;

Unicode Filtering can automatically break long runs of letters into words each time more than Options.MaxWordLength consecutive letters are found. By default, a word break is inserted and the next word starts with the following character. Set UnicodeFilterWordOverlapAmount and also set the dtsoUfAutoWordBreakOverlapWords flag in UnicodeFilterFlags to start the next word before the end of the previous word.  

For example, suppose the maximum word length is set to 8, and the following run of letters is found: aaaaahiddenaaaaa. By default, this would be indexed as aaaaahid and denaaaa, which means that a search for *hidden* would not find it. With a word overlap of 4, this would be indexed as: aaaaahid, ahiddena, denaaaaa which would allow the embedded word "hidden" to be found in a search for *hidden*.