Close
dtSearch Text Retrieval Engine Programmer's Reference
Conversion_output_formatting

Options to control the formatting of FileConverter output.

Output format

Use FileConverter.OutputFormat to specify the type of output to generate. Supported output formats:

Format
Description
itAnsi
Ansi text format. Ansi text can only express a very limited set of characters, so for plain text output, itUTF8 is recommended instead.
itHTML
HTML output. For HTML input, the conversion will leave the original tags in place, with some exceptions. See Highlighting hits in HTML files for more information.
itRTF
Rich Text Format (RTF) output.
itUnformattedHTML
Unformatted HTML output uses HTML encoding for characters such as < and > and is otherwise the same as plain text output. It is intended for use when generating a synopsis to be included in search results, so all formatting, including line breaks, fonts, etc., is removed.
itUTF8
Plain text output, encoded as UTF-8. The output will not include a UTF-8 byte-order mark (BOM) unless you set the flag dtsConvertIncludeBOM in FileConverter.
itXML
itXML output can only be generated from XML input. See Highlighting hits in XML files
itContentAsXml
The itContentAsXml format organizes document content, metadata, and attachments into a standard XML format. This format is intended for content extraction rather than hit highlighting.
Metadata

dtSearch automatically detects and extracts metadata from converted documents, such as the Subject, To, From, etc. for an email, or document properties for a Word document. For details on the types of metadata extracted, see Supported File Formats

To control metadata extraction, set Options.FieldFlags before executing a conversion. If you are highlighting hits in a retrieved document, the value of FieldFlags should be identical to the value in effect when the documents were indexed. Otherwise, the change in extracted content could result in incorrect hit highlighting.

Attachments and images

FileConverter can extract embedded attachments, images, or other content such as OLE objects from the input file when performing a conversion. To enable extraction of embedded content, set FileConverter.ExtractionOptions to an ExtractionOptions object specifying the types of content to extract and the location for the extracted files.

HTML output options

To specify content to go inside the <HEAD>...</HEAD> tags in HTML output, use FileConverter.HtmlHead. 

To specify a tag such as <!DOCTYPE html> to go before the default <HTML> tag at the top of the file, use FileConverter.DocTypeTag. 

To specify a <BASE> href for HTML output, use FileConverter.BaseHRef. 

To control the formatting used for metadata tables and attachment delimiters in HTML output, set the dtsConvertUseStyles flag in FileConverter, and include CSS styles in FileConverter.HtmlHead in <style>...</style> tags. Currently, the following standard styles are used in HTML output when dtsConvertUseStyles is set:

CSS style name
Description
dts-field-table
Table containing metadata names and values, such as document properties and email to/from/subject/date.
dts-field-table-name-cell
Table cell containing a field name, such as Subject, To, From, or Author.
dts-field-table-value-cell
Table cell containing a field value, such as the subject of an email.
dts-begin-attachment
The name of an attachment included in the conversion output.
dts-begin-file
The name of an embedded file included in the conversion output.
dts-section-break
Break between logical divisions in a document, such as slides, worksheets, or pages.
dts-begin-worksheet
The name of a worksheet in a spreadsheet

For an example of a style sheet implementing these styles, see the DocStyles.css file included with dtSearch, which is installed in the dtSearch templates folder.