You are here: C++ API > Enumerations > TextFlags Enumeration
Close
dtSearch Text Retrieval Engine Programmer's Reference
TextFlags Enumeration

File: dtsearch.h

Syntax
C++
enum TextFlags { dtsoTfSkipNumericValues = 0x0001, dtsoTfSkipXFirstAndLast = 0x0002, dtsoTfRecognizeDates = 0x0004, dtsoTfRecognizeDatesPresumeDMY = 0x0008, dtsoTfRecognizeDatesPresumeYMD = 0x0010, dtsoTfAutoBreakCJK = 0x0020, dtsoTfXmlAsText = 0x100, dtsoTfDatabasesAsText = 0x200, dtsoTfHideRevisions = 0x800, dtsoTfHtmlSkipNav = 0x1000, dtsoTfHtmlIgnoreNoIndex = 0x2000, dtsoTfUseEmailDateAsFileDate = 0x4000, dtsoTfUseIcu = 0x8000, dtsoTfMimeFollowContentDispositionForText = 0x40000 };
Members
Description
dtsoTfSkipNumericValues = 0x0001
By default, dtSearch indexes numbers both as text and as numeric values, which is necessary for numeric range searching. Use this flag to suppress indexing of numeric values in applications that do not require numeric range searching. This setting can reduce the size of the index by about 20%.
dtsoTfSkipXFirstAndLast = 0x0002
Suppress automatic generation of xfirstword and xlastword. By default, xfirstword is defined to be the first word in each document, and xlastword is defined to be the last word in each document. These words are generated when an index is created, so this flag must be set during indexing to suppress xlastword and xfirstword.
dtsoTfRecognizeDates = 0x0004
Automatically recognize dates, email addresses, and credit card numbers in text as it is indexed. See Recognition of dates, email addresses, and credit card numbers.
dtsoTfRecognizeDatesPresumeDMY = 0x0008
Presume DD/MM/YY format for dates (default is MM/DD/YY). See Recognition of dates, email addresses, and credit card numbers.
dtsoTfRecognizeDatesPresumeYMD = 0x0010
Presume YY/MM/DD format for dates (default is MM/DD/YY). See Recognition of dates, email addresses, and credit card numbers.
dtsoTfAutoBreakCJK = 0x0020
Automatically insert a word break around characters in the Chinese, Japanese, and Korean Unicode ranges. This makes it possible to search text in documents that do not contain word breaks. Like the hyphenation setting, this setting is kept in the alphabet for an index and so will only change when an index is created. See Alphabet Settings.
dtsoTfXmlAsText = 0x100
Index XML files as text, without applying field attributes to the content.
dtsoTfDatabasesAsText = 0x200
Index database files (*.dbf, *.csv, *.mdb, and *.accdb) as text, without applying field attributes to the content or separating rows into separate documents.
dtsoTfHideRevisions = 0x800
Remove strikeout text and redlining from Microsoft Word documents edited using "Track Changes".
dtsoTfHtmlSkipNav = 0x1000
Skip navigation sections of HTML files (<nav>...</nav>)
dtsoTfHtmlIgnoreNoIndex = 0x2000
Ignore BeginNoIndex and EndNoIndex tags in HTML
dtsoTfUseEmailDateAsFileDate = 0x4000
Use internal message date as file date for standalone .eml and .msg files
dtsoTfUseIcu = 0x8000
Enable use of ICU for Unicode processing. See "ICU Integration" for more information.
dtsoTfMimeFollowContentDispositionForText = 0x40000
When indexing MIME messages, if a text or HTML message part has a Content-Disposition header specifying that it is an attachment, then treat the content as an attachment rather than rendering the content inline.

Values for Options.TextFlags (.NET), Options.setTextFlags() (Java), and dtsOptions.textFlags (C++)

Copyright (c) 1995-2021 dtSearch Corp. All rights reserved.