Indicates Unicode ranges that are of interest when filtering.
Syntax
Structure
Remarks
If unicodeFilterRanges is set to 1 and 8, then the filtering algorithm will look for characters from U+0100-U+01FF and U+0800-U+08FF
This is used to help the filtering algorithm to distinguish text from non-text data. It is only used as a hint in the algorithm, so if the text extraction algorithm detects text in another language with a sufficient level of confidence, it will return that text even if the language was not selected.
In the C++ API, a 256-byte array is used to specify the ranges, with each byte set to a nonzero value to indicate that the corresponding range should be included.
See Also