Values for Options.UnicodeFilterFlags
[public] enum UnicodeFilterFlags { dtsoUfExtractAsHtml = 0x0001, dtsoUfOverlapBlocks = 0x0002, dtsoUfAutoWordBreakByLength = 0x0004, dtsoUfAutoWordBreakByCase = 0x0008, dtsoUfAutoWordBreakOnDigit = 0x0010, dtsoUfAutoWordBreakOverlapWords = 0x0020, dtsoUfFilterFailedDocs = 0x0040, dtsoUfFilterAllDocs = 0x0080 };
|
Members |
Description |
|
dtsoUfExtractAsHtml = 0x0001 |
Extracting blocks as HTML has no effect on the text that is extracted, but it adds additional information in HTML comments to each extracted block. |
|
dtsoUfOverlapBlocks = 0x0002 |
Overlapping blocks prevents text that crosses a block boundary from being missed in the filtering process. |
|
dtsoUfAutoWordBreakByLength = 0x0004 |
Automatically insert a word break in long sequences of letters. |
|
dtsoUfAutoWordBreakByCase = 0x0008 |
Automatically insert a word break when a capital letter appears following lower-case letters. |
|
dtsoUfAutoWordBreakOnDigit = 0x0010 |
Automatically insert a word break when a digit follows letters. |
|
dtsoUfAutoWordBreakOverlapWords = 0x0020 |
When a word break is automatically inserted due to dtsoUfAutoWordBreakByLength, overlap the two words generated by the word break. |
|
dtsoUfFilterFailedDocs = 0x0040 |
When a document cannot be indexed due to file corruption or encryption, apply the filtering algorithm to extract text from the file. |
|
dtsoUfFilterAllDocs = 0x0080 |
Ignore file format information and apply Unicode Filtering to all documents. |
|
Copyright (c) 1995-2012 dtSearch Corp. All rights reserved.
|