FileConverter Class

Convert files to HTML, RTF, XML, or text, optionally marking hits with caller-supplied tags.

dtSearch.Engine.JobBase | dtSearch.Engine.OutputBase | dtSearch.Engine.FileConverter

public class FileConverter : OutputBase;

Remarks

FileConverter converts files to HTML, RTF, XML, or text, optionally marking hits with caller-supplied tags.

Highlighting Hits

Most commonly, FileConverter is used after a search to highlight hits in a retrieved document. To highlight hits in a document, FileConverter needs:

The input document.
The word offsets of the hits returned from search results
The location of the alphabet file to use for word breaking.
The location of the index the document was found in.
The document id of the document in the index.
The output format (HTML, RTF, XML, text)
Tags to insert around each hit.
The location of the output to create.

The first five items all come from the SearchResults object with the results of the search, so you can set them all in a single step by calling FileConverter.SetInputItem() with the SearchResults object and the ordinal of the document to select.

SetInputItem will set InputFile, InputTypeId, InputDocId, Hits, AlphabetLocation, and IndexRetrievedFrom. If the index was built with caching of documents, SetInputItem will also set up FileConverter to retrieve the cached version of the document from the index.

Conversion Input

The document data to convert can consist of one binary document file, such as a Word document, and any number of field-value pairs in InputFields. InputText can be used to provide additional text to include in the converted output.

You can pass the binary document to FileConverter in several ways:

To get the document from a disk file, set InputFile to the name of the file.
To pass the document as a stream of bytes, set InputBytes to an array of bytes containing the document data.
To pass the document as a .NET Stream object (such as a FileStream), set InputStream to the Stream object to use.

InputText and InputFields may only contain plain text. If HTML, RTF, or other text-like document data is passed in InputText, the HTML or RTF tags will be interpreted as text and included in the conversion output.

InputFile must be an accessible disk file. UNC paths will work, provided that the network resource can be accessed, but HTTP paths will not. To convert data accessed by HTTP, download the data to a memory buffer and supply it in InputBytes or InputStream.

Even when InputBytes or InputStream is used, a filename should be provided in InputFile if possible to tell dtSearch the original filename extension, which can provide useful information about the document format.

Cached documents

When you build an index, you can request that the documents be cached in the index, in which case dtSearch will zip-compress each document and store it in the index folder. This can be done with any type of indexed data, including dynamically-generated data returned through the DataSource API. To have FileConverter use the cached document as input, use SetInputItem to set up FileConverter as described above, and set the flag dtsConvertGetFromCache in FileConverter.Flags.

DataSource input

If the original data was indexed using the DataSource indexing API, then to highlight hits set InputBytes, InputFields, and InputText to the same values that were returned from the data source as DocBytes, DocFields, and DocText when the document was indexed. Alternatively, you can build the index with caching of documents enabled, and then use the cached document to highlight hits (see above).

Conversion Output

The BeforeHit and AfterHit markers are inserted before and after each hit word. The BeforeHit and AfterHit markers can contain hypertext links or other HTML tags. To facilitate creation of hit navigation markers, the strings "%%ThisHit%%", "%%NextHit%%", and "%%PrevHit%%" will be replaced with ordinals representing the current hit, the next hit, and the previous hit in the document.

For more information on conversion output options, see:

Highlighting hits - overview

Conversion output formatting

Recommended Flags for HTML Output

Set dtsConvertAutoUpdateSearch to have dtSearch automatically correct out-of-date hit highlighting information.

Set dtsConvertRemoveScripts to disable JavaScript in HTML input documents.

Set dtsConvertUseStyles to have CSS styles included in output, and add a style sheet based on the dtSearch DocStyles.css file to specify the appearance of each style.

IDisposable

FileConverter requires the IDisposable Pattern.

See also

Highlighting hits - overview

Caching documents

Topics

Topic	Description
JobBase Members	The following tables list the members exposed by JobBase.
JobBase Methods	The methods of the JobBase class are listed here.
JobBase Properties	The properties of the JobBase class are listed here.

OutputBase Class

Topic	Description
OutputBase Members	The following tables list the members exposed by OutputBase.
OutputBase Properties	The properties of the OutputBase class are listed here.

FileConverter Class

Topic	Description
FileConverter Members	The following tables list the members exposed by FileConverter.
FileConverter Methods	The methods of the FileConverter class are listed here.
FileConverter Properties	The properties of the FileConverter class are listed here.

JobBase Methods

Show:Inherited

No members matching the current filter

JobBase Methods	Description
Failed	True if any errors occurred during execution of the job. Check the JobErrorInfo Errors object for details. (Inherited from JobBase)

FileConverter Class

FileConverter Class	Description
Execute	Performs the conversion.
SetInputItem	SetInputItem provides a quick way to set up a FileConvertJob with a particular item from a SearchResults list.

JobBase Properties

Show:Inherited

No members matching the current filter

JobBase Properties	Description
Errors	Contains any errors that occurred during execution of the job. (Inherited from JobBase)
TimeoutSeconds	Set to a non-zero value to make the job terminate automatically after a specified number of seconds. (Inherited from JobBase)

OutputBase Class

Show:Inherited

No members matching the current filter

OutputBase Class	Description
AfterHit	If an array of hit offsets has been provided in Hits, then the BeforeHit and AfterHit strings will be used to mark each hit in the document in the converted output (Inherited from OutputBase)
BaseHRef	For HTML output, an HREF for a BASE tag to be inserted in the header. (Inherited from OutputBase)
BeforeHit	If an array of hit offsets has been provided in Hits, then the BeforeHit and AfterHit strings will be used to mark each hit in the document in the converted output (Inherited from OutputBase)
DocTypeTag	For HTML output, a DocType tag such as <!DOCTYPE html>to go before the first tag in the output. (Inherited from OutputBase)
Footer	The Footer will be appended to the conversion output and can use tags in the output format, such as HTML tags in a document converted to HTML. (Inherited from OutputBase)
Header	The Header will appear at the top of the conversion output and can use tags in the output format, such as HTML tags in a document converted to HTML. (Inherited from OutputBase)
HtmlHead	Use HtmlHead to supply HTML data to appear inside the HEAD section of the output. (Inherited from OutputBase)
OutputFile	Name of the converted file to create. (Inherited from OutputBase)
OutputFormat	By default, a FileConverter converts the input file to HTML. Other supported options are: itRTF, itUTF8 (Unicode text), itAnsi, and itXML (for XML input data only). (Inherited from OutputBase)
OutputString	If OutputToString is true, output will be stored in OutputString rather than in a disk file. (Inherited from OutputBase)
OutputStringMaxSize	When output is directed to an in-memory string, you may wish to limit the maximum amount of memory used. To do this, set OutputStringMaxSize to the maximum size you want to allow. (Inherited from OutputBase)
OutputToString	If true, output will be stored in an in-memory string variable rather than a disk file. (OutputFile will be ignored.) After the Execute method is done, the output will be in the OutputString property. (Inherited from OutputBase)
WasTruncated	The output was truncated because of the OutputStringMaxSize setting. (Inherited from OutputBase)

FileConverter Class

FileConverter Class	Description
AlphabetLocation	The location of the dtSearch alphabet file to use when highlighting hits. SetInputItem() will set this based on information in SearchResults.
DetectedTypeId	File type of input document detected by dtSearch file parsers.
ExtractionOptions	Options for extraction of embedded images and attachments
Flags	Flags that control the conversion.
Hits	Word offsets of the hits to highlight using the BeforeHit and AfterHit marks.
HitsByWord	Information returned in SearchResultsItem.HitsByWord
IndexCache	Use an IndexCache for faster extraction of cached documents from indexes
IndexRetrievedFrom	The index in which the document was found. SetInputItem will set this based on information in SearchResults.
InputBytes	Use InputBytes to provide a document in a memory buffer rather than as a disk file.
InputDocId	The doc id of the document being converted. This is used when the document is being extracted from cached data in the index rather than from InputBytes or InputText. SetInputItem() will set this based on information in SearchResults.
InputFields	If the document was indexed using a DataSource object, supply the same fields in InputFields that the DataSource returned for this document in the DocFields property.
InputFile	Name of the file to convert. This can be a local disk file or a UNC path, but not an HTTP file.
InputStream	Use DocStream to provide access to binary document data for this document.
InputText	If the document was indexed using a DataSource object, supply the same text in InputText that the DataSource returned for this document in the DocText property.
InputTypeId	The file type of the input document when it was indexed. SetInputItem() will set this based on information in SearchResults.