Values for Options.BinaryFiles
Use Options.BinaryFiles to specify whether dtSearch should index binary files as plain text, skip them entirely, or filter out only the text of binary files.
Binary files are files that dtSearch does not recognize as documents. Examples of binary files include executable programs, fragments of documents recovered through an "undelete" process, or blocks of unallocated or recovered data obtained through computer forensics. Content in these files may be stored in a variety of formats, such as plain text, Unicode text, or fragments of .DOC or .XLS files. Many different fragments with different encodings may be present in the same binary file. Indexing such a file as if it were a simple text file would miss most of the content.
The dtSearch filtering algorithm scans a binary file for anything that looks like text using multiple encoding detection methods. The algorithm can detect sequences of text with different encodings or formats in the same file, so it is much better able to extract content from recovered or corrupt data than a simple text scan. Input files can be up to 2 Gb in size.
| Member Name | Description |
|---|---|
| dtsoFilterBinaryUnicode | Filter text from binary files using a text extraction algorithm that scans for sequences of single-byte, UTF-8, or Unicode text in the input. This option is recommended for working with forensic data, particularly when searching for non-English text. |
| dtsoIndexSkipBinary | Do not index files binary files |
| dtsoIndexBinary | Index all contents of binary files as single-byte text. |
| dtsoFilterBinary | Filter text from binary files using the character array in binaryFilterTextChars to determine which characters are text. |
Namespace: dtSearch.Engine
Assembly: dtSearchNetApi (in dtSearchNetApi.dll)