Conversion_output

Options to control the formatting of FileConverter output.

Remarks

Output format

Use FileConverter.OutputFormat to specify the type of output to generate. Supported output formats:

Format	Description
itAnsi	Ansi text format. Ansi text can only express a very limited set of characters, so for plain text output, itUTF8 is recommended instead.
itHTML	HTML output. For HTML input, the conversion will leave the original tags in place, with some exceptions. See Highlighting hits in HTML files for more information.
itRTF	Rich Text Format (RTF) output.
itUnformattedHTML	Unformatted HTML output uses HTML encoding for characters such as < and > and is otherwise the same as plain text output. It is intended for use when generating a synopsis to be included in search results, so all formatting, including line breaks, fonts, etc., is removed.
itUTF8	Plain text output, encoded as UTF-8. The output will not include a UTF-8 byte-order mark (BOM) unless you set the flag dtsConvertIncludeBOM in FileConverter.
itXML	itXML output can only be generated from XML input. See Highlighting hits in XML files
itContentAsXml	The itContentAsXml format organizes document content, metadata, and attachments into a standard XML format. This format is intended for content extraction rather than hit highlighting.

Metadata

dtSearch automatically detects and extracts metadata from converted documents, such as the Subject, To, From, etc. for an email, or document properties for a Word document. For details on the types of metadata extracted, see Supported File Formats.

To control metadata extraction, set Options.FieldFlags before executing a conversion. If you are highlighting hits in a retrieved document, the value of FieldFlags should be identical to the value in effect when the documents were indexed. Otherwise, the change in extracted content could result in incorrect hit highlighting.

Attachments and images

FileConverter can extract embedded attachments, images, or other content such as OLE objects from the input file when performing a conversion. To enable extraction of embedded content, set FileConverter.ExtractionOptions to an ExtractionOptions object specifying the types of content to extract and the location for the extracted files.

HTML output options

To specify content to go inside the <HEAD>...</HEAD> tags in HTML output, use FileConverter.HtmlHead.

To specify a tag such as <!DOCTYPE html> to go before the default <HTML> tag at the top of the file, use FileConverter.DocTypeTag.

To specify a <BASE> href for HTML output, use FileConverter.BaseHRef.

To control the formatting used for metadata tables and attachment delimiters in HTML output, set the dtsConvertUseStyles flag in FileConverter, and include CSS styles in FileConverter.HtmlHead in <style>...</style> tags. Currently, the following standard styles are used in HTML output when dtsConvertUseStyles is set:

CSS style name	Description
dts-field-table	Table containing metadata names and values, such as document properties and email to/from/subject/date.
dts-field-table-name-cell	Table cell containing a field name, such as Subject, To, From, or Author.
dts-field-table-value-cell	Table cell containing a field value, such as the subject of an email.
dts-begin-attachment	The name of an attachment included in the conversion output.
dts-begin-file	The name of an embedded file included in the conversion output.
dts-section-break	Break between logical divisions in a document, such as slides, worksheets, or pages.
dts-begin-worksheet	The name of a worksheet in a spreadsheet

For an example of a style sheet implementing these styles, see the DocStyles.css file included with dtSearch, which is installed in the dtSearch templates folder.

Group

Highlighting Hits