You are here: C++ API > Classes > dtsOptions2 Structure > dtsOptions2::unicodeFilterRanges Data Member
dtsOptions2::unicodeFilterRanges Data Member
Close
dtSearch Text Retrieval Engine Programmer's Reference
dtsOptions2::unicodeFilterRanges Data Member

Indicates Unicode ranges that are of interest when filtering.

Syntax
C++
char unicodeFilterRanges[256];

If unicodeFilterRanges is set to 1 and 8, then the filtering algorithm will look for characters from U+0100-U+01FF and U+0800-U+08FF 

This is used to help the filtering algorithm to distinguish text from non-text data. It is only used as a hint in the algorithm, so if the text extraction algorithm detects text in another language with a sufficient level of confidence, it will return that text even if the language was not selected. 

In the C++ API, a 256-byte array is used to specify the ranges, with each byte set to a nonzero value to indicate that the corresponding range should be included.

Copyright (c) 1995-2021 dtSearch Corp. All rights reserved.