Documents indexed with multiple numbered filenames

Article: dts0231

When a document that is stored inside a container is retrieved in a search, the filename that is returned describes the path to the document through the containers in which it is found.

The path consists of the name of the disk file where the container is stored followed by one or more strings identifying items to be extracted from a container. Each string consists of an ordinal (in hex), a comma, the type id of the container (also hex), a | delimiter, and a text identifier for the item. The strings are delimited with >.

For example, if "smith.docx" is stored as the fourth item in "c:\zips\november.zip", the filename would be:

c:\zips\november.zip>4,df|smith.docx

Nested containers can result in multiple levels of container expressions in a filename.

The file formats that are treated as containers in dtSearch include: ZIP, RAR, DBF, CSV, Microsoft Access (MDB and ACCDB), MBOX message archives, Outlook Express DBX, Outlook PST.

Additionally, files indexed using the dtSearch filtering algorithm, which extracts segments of text from data files in unrecognized binary formats, can be treated as containers if they are longer than the Options.UnicodeFilterBlockSize setting.

Related articles

What file formats does dtSearch support?
dtSearch Engine filtering options
dtSearch Desktop filtering options
Container file types