Convert files to HTML, RTF, XML, or text, optionally marking hits with caller-supplied tags.
FileConverter converts files to HTML, RTF, XML, or text, optionally marking hits with caller-supplied tags.
Most commonly, FileConverter is used after a search to highlight hits in a retrieved document. To highlight hits in a document, FileConverter needs:
The first five items all come from the SearchResults object with the results of the search, so you can set them all in a single step by calling FileConverter.SetInputItem() with the SearchResults object and the ordinal of the document to select.
SetInputItem will set InputFile, InputTypeId, InputDocId, Hits, AlphabetLocation, and IndexRetrievedFrom. If the index was built with caching of documents, SetInputItem will also set up FileConverter to retrieve the cached version of the document from the index.
The document data to convert can consist of one binary document file, such as a Word document, and any number of field-value pairs in InputFields. InputText can be used to provide additional text to include in the converted output.
You can pass the binary document to FileConverter in several ways:
InputText and InputFields may only contain plain text. If HTML, RTF, or other text-like document data is passed in InputText, the HTML or RTF tags will be interpreted as text and included in the conversion output.
InputFile must be an accessible disk file. UNC paths will work, provided that the network resource can be accessed, but HTTP paths will not. To convert data accessed by HTTP, download the data to a memory buffer and supply it in InputBytes or InputStream.
Even when InputBytes or InputStream is used, a filename should be provided in InputFile if possible to tell dtSearch the original filename extension, which can provide useful information about the document format.
Cached documents
When you build an index, you can request that the documents be cached in the index, in which case dtSearch will zip-compress each document and store it in the index folder. This can be done with any type of indexed data, including dynamically-generated data returned through the DataSource API. To have FileConverter use the cached document as input, use SetInputItem to set up FileConverter as described above, and set the flag dtsConvertGetFromCache in FileConverter.Flags.
DataSource input
If the original data was indexed using the DataSource indexing API, then to highlight hits set InputBytes, InputFields, and InputText to the same values that were returned from the data source as DocBytes, DocFields, and DocText when the document was indexed. Alternatively, you can build the index with caching of documents enabled, and then use the cached document to highlight hits (see above).
The BeforeHit and AfterHit markers are inserted before and after each hit word. The BeforeHit and AfterHit markers can contain hypertext links or other HTML tags. To facilitate creation of hit navigation markers, the strings "%%ThisHit%%", "%%NextHit%%", and "%%PrevHit%%" will be replaced with ordinals representing the current hit, the next hit, and the previous hit in the document.
For more information on conversion output options, see:
Set dtsConvertAutoUpdateSearch to have dtSearch automatically correct out-of-date hit highlighting information.
Set dtsConvertRemoveScripts to disable JavaScript in HTML input documents.
Set dtsConvertUseStyles to have CSS styles included in output, and add a style sheet based on the dtSearch DocStyles.css file to specify the appearance of each style.
FileConverter requires the IDisposable Pattern.
Topic |
Description |
The following tables list the members exposed by JobBase. | |
The properties of the JobBase class are listed here. |
Topic |
Description |
The following tables list the members exposed by OutputBase. | |
The properties of the OutputBase class are listed here. |
Topic |
Description |
The following tables list the members exposed by FileConverter. | |
The methods of the FileConverter class are listed here. | |
The properties of the FileConverter class are listed here. |
FileConverter Class |
Description |
Performs the conversion. | |
SetInputItem provides a quick way to set up a FileConvertJob with a particular item from a SearchResults list. |
OutputBase Class |
Description |
If an array of hit offsets has been provided in Hits, then the BeforeHit and AfterHit strings will be used to mark each hit in the document in the converted output (Inherited from OutputBase) | |
For HTML output, an HREF for a BASE tag to be inserted in the header. (Inherited from OutputBase) | |
If an array of hit offsets has been provided in Hits, then the BeforeHit and AfterHit strings will be used to mark each hit in the document in the converted output (Inherited from OutputBase) | |
For HTML output, a DocType tag such as <!DOCTYPE html>to go before the first tag in the output. (Inherited from OutputBase) | |
The Footer will be appended to the conversion output and can use tags in the output format, such as HTML tags in a document converted to HTML. (Inherited from OutputBase) | |
The Header will appear at the top of the conversion output and can use tags in the output format, such as HTML tags in a document converted to HTML. (Inherited from OutputBase) | |
Use HtmlHead to supply HTML data to appear inside the HEAD section of the output. (Inherited from OutputBase) | |
Name of the converted file to create. (Inherited from OutputBase) | |
By default, a FileConverter converts the input file to HTML. Other supported options are: itRTF, itUTF8 (Unicode text), itAnsi, and itXML (for XML input data only). (Inherited from OutputBase) | |
If OutputToString is true, output will be stored in OutputString rather than in a disk file. (Inherited from OutputBase) | |
When output is directed to an in-memory string, you may wish to limit the maximum amount of memory used. To do this, set OutputStringMaxSize to the maximum size you want to allow. (Inherited from OutputBase) | |
If true, output will be stored in an in-memory string variable rather than a disk file. (OutputFile will be ignored.) After the Execute method is done, the output will be in the OutputString property. (Inherited from OutputBase) |
FileConverter Class |
Description |
The location of the dtSearch alphabet file to use when highlighting hits. SetInputItem() will set this based on information in SearchResults. | |
File type of input document detected by dtSearch file parsers. | |
Options for extraction of embedded images and attachments | |
Flags that control the conversion. | |
Information returned in SearchResultsItem.HitsByWord | |
Use an IndexCache for faster extraction of cached documents from indexes | |
The index in which the document was found. SetInputItem will set this based on information in SearchResults. | |
Use InputBytes to provide a document in a memory buffer rather than as a disk file. | |
The doc id of the document being converted. This is used when the document is being extracted from cached data in the index rather than from InputBytes or InputText. SetInputItem() will set this based on information in SearchResults. | |
If the document was indexed using a DataSource object, supply the same fields in InputFields that the DataSource returned for this document in the DocFields property. | |
Name of the file to convert. This can be a local disk file or a UNC path, but not an HTTP file. | |
Use DocStream to provide access to binary document data for this document. | |
If the document was indexed using a DataSource object, supply the same text in InputText that the DataSource returned for this document in the DocText property. | |
The file type of the input document when it was indexed. SetInputItem() will set this based on information in SearchResults. |
Copyright (c) 1998-2023 dtSearch Corp. All rights reserved.
|