Use to add support for custom file formats to the dtSearch Engine
The File Parser API makes it possible to add support for custom file formats to the dtSearch Engine. For information on integrating an external file parser DLL into dtSearch Desktop, see the dtsMakeViewerInfo help topic. The File Parser API is currently available from C/C++ only.
Documents and Containers
There are two types of parsers: Document parsers and Container parsers. A Document parser extracts text from a document. A Container parser enumerates and extracts Documents stored in a container file. A WordPerfect parser would be an example of a Document parser. A PKZIP parser would be an example of a Container parser.
Overview of the File Parser API
A parser is added to the dtSearch indexing engine through a call to dtsRegisterViewer. A dtsViewerInfo structure is passed to dtsRegisterViewer. A dtsViewerInfo contains information and function pointers telling the dtSearch engine how to determine if the parser should be used for a particular document, how to extract text from the document, and, for containers, function pointers to use for enumerating and extracting documents from the container.
For an example of a complete file parser, see the ExternalFileParser sample included with the dtSearch Engine. This parser works with a file format in which the letters a-m are translated to n-z and the letters n-z are translated to a-m (ROT13).
Everything dtSearch needs to know about a viewer is contained in the dtsViewerInfo structure. At startup, an external parser must create a dtsViewerInfo and register the dtsViewerInfo by calling dtsRegisterViewer. The calling application is responsible for making sure this happens.
How the dtSearch Engine Works with File Parsers
For each file that the dtSearch Engine indexes or searches, the following process is used:
|
Topic |
Description |
|
A document file parser translates an input document into the requested output format. | |
|
A container file parser provides an interface to enumerate the documents within a container. | |
|
Each file parser provides a dtsViewerInfo that describes which documents that file parser should handle. | |
|
File parsers can return text in RTF or UTF8. |
|
Copyright (c) 1995-2012 dtSearch Corp. All rights reserved.
|