You are here: C++ API > Classes > dtsOptions2 Structure > dtsOptions2::unicodeFilterWordOverlapAmount Data Member
dtsOptions2::unicodeFilterWordOverlapAmount Data Member
dtSearch Text Retrieval Engine Programmer's Reference
dtsOptions2::unicodeFilterWordOverlapAmount Data Member

Amount of overlap when automatically breaking words when applying the Unicode Filtering algorithm.

int unicodeFilterWordOverlapAmount;

Unicode Filtering can automatically break long runs of letters into words each time more than Options.MaxWordLength consecutive letters are found. By default, a word break is inserted and the next word starts with the following character. Set UnicodeFilterWordOverlapAmount and also set the dtsoUfAutoWordBreakOverlapWords flag in UnicodeFilterFlags to start the next word before the end of the previous word. 

For example, suppose the maximum word length is set to 8, and the following run of letters is found: aaaaahiddenaaaaa. By default, this would be indexed as aaaaahid and denaaaa, which means that a search for *hidden* would not find it. With a word overlap of 4, this would be indexed as: aaaaahid, ahiddena, denaaaaa which would allow the embedded word "hidden" to be found in a search for *hidden*.

Copyright (c) 1995-2022 dtSearch Corp. All rights reserved.