dtSearch indexes some file types such as .zip or Microsoft Access (*.mdb) as containers, generating multiple documents for each file.
When a document that is stored inside a container is retrieved in a search, the filename that is returned describes the path to the document through the containers in which it is found. The path consists of the name of the disk file where the container is stored followed by one or more strings identifying items to be extracted from a container. Each string consists of an ordinal (in hex), a comma, the type id of the container (also hex), a | delimiter, and a text identifier for the item. The strings are delimited with >. For example, if "smith.doc" is stored as the fourth item in "c:\zips\november.zip", the filename would be:
c:\zips\november.zip>4,df|smith.doc
Nested containers can result in multiple levels of container expressions in a filename.
The file formats that are treated as containers in dtSearch include:
ZIP
RAR
DBF
CSV
Microsoft Access (MDB and ACCDB)
MBOX message archives
Outlook Express DBX
Outlook PST
Additionally, files indexed using the Unicode Filtering algorithm, which extracts segments of text from data files in unrecognized binary formats, can be treated as containers if they are longer than the Options.UnicodeFilterBlockSize setting.
The FileConverter object knows how to extract items from a container, so if you pass in a container filename such as c:\zips\november.zip>4,df|smith.doc as the InputFile, FileConverter will extract the file from the ZIP and then apply the conversion to the extracted file.
You can also use FileConverter to recursively unpack and convert all items in a container, using the dtsConvertInlineContainer flag. This option generates a single output stream from a container file, including items that may be nested many layers within the container, such as a document inside a ZIP file that is inside another ZIP file.