dtSearch Text Retrieval Engine .NET interface

BinaryFilesSettings Enumeration

Values for Options.BinaryFiles

public enum BinaryFilesSettings

Remarks

Use Options.BinaryFiles to specify whether dtSearch should index binary files as plain text, skip them entirely, or filter out only the text of binary files.

Binary files are files that dtSearch does not recognize as documents. Examples of binary files include executable programs, fragments of documents recovered through an "undelete" process, or blocks of unallocated or recovered data obtained through computer forensics. Content in these files may be stored in a variety of formats, such as plain text, Unicode text, or fragments of .DOC or .XLS files. Many different fragments with different encodings may be present in the same binary file. Indexing such a file as if it were a simple text file would miss most of the content.

The dtSearch filtering algorithm scans a binary file for anything that looks like text using multiple encoding detection methods. The algorithm can detect sequences of text with different encodings or formats in the same file, so it is much better able to extract content from recovered or corrupt data than a simple text scan. Input files can be up to 2 Gb in size.

Members

Member NameDescription
dtsoFilterBinaryUnicode Filter text from binary files using a text extraction algorithm that scans for sequences of single-byte, UTF-8, or Unicode text in the input. This option is recommended for working with forensic data, particularly when searching for non-English text.
dtsoIndexSkipBinary Do not index files binary files
dtsoIndexBinary Index all contents of binary files as single-byte text.
dtsoFilterBinary Filter text from binary files using the character array in binaryFilterTextChars to determine which characters are text.

Requirements

Namespace: dtSearch.Engine

Assembly: dtSearchNetApi (in dtSearchNetApi.dll)

See Also

dtSearch.Engine Namespace