Values for Options.UnicodeFilterFlags (.NET) or dtsOptions.unicodeFilterFlags (C++) or Options.setUnicodeFilterFlags(java).
enum UnicodeFilterFlags { dtsoUfExtractAsHtml = 0x0001, dtsoUfOverlapBlocks = 0x0002, dtsoUfAutoWordBreakByLength = 0x0004, dtsoUfAutoWordBreakByCase = 0x0008, dtsoUfAutoWordBreakOnDigit = 0x0010, dtsoUfAutoWordBreakOverlapWords = 0x0020, dtsoUfFilterFailedDocs = 0x0040, dtsoUfFilterAllDocs = 0x0080 };
dtsearch.h
|
Members |
Description |
|
dtsoUfExtractAsHtml = 0x0001 |
Extracting blocks as HTML has no effect on the text that is extracted, but it adds additional information in HTML comments to each extracted block. |
|
dtsoUfOverlapBlocks = 0x0002 |
Overlapping blocks prevents text that crosses a block boundary from being missed in the filtering process. With overlapping enabled, each block extends 256 characters past the start of the previous block. |
|
dtsoUfAutoWordBreakByLength = 0x0004 |
Automatically insert a word break in long sequences of letters. A word break will be inserted when the word length reaches Options.MaxWordLength. |
|
dtsoUfAutoWordBreakByCase = 0x0008 |
Automatically insert a word break when a capital letter appears following lower-case letters. |
|
dtsoUfAutoWordBreakOnDigit = 0x0010 |
Automatically insert a word break when a digit follows letters. |
|
dtsoUfAutoWordBreakOverlapWords = 0x0020 |
When a word break is automatically inserted due to dtsoUfAutoWordBreakByLength, overlap the two words generated by the word break. |
|
dtsoUfFilterFailedDocs = 0x0040 |
When a document cannot be indexed due to file corruption or encryption, apply the filtering algorithm to extract text from the file. |
|
dtsoUfFilterAllDocs = 0x0080 |
Ignore file format information and apply Unicode Filtering to all documents. |
UnicodeFilterFlags control the behavior of the Unicode Filtering algorithm when it is used to filter text from binary data. See Filtering Options.
|
Copyright (c) 1995-2012 dtSearch Corp. All rights reserved.
|