FileConverter Class

FileConverter converts files to HTML, RTF, or text, optionally marking hits with caller-supplied tags.

File

File: FileConverter.java

Syntax

Java

public class FileConverter;

Methods

Method	Description
execute	Call execute() to execute the conversion.
getDetectedTypeId	File type of input document detected by dtSearch file parsers.
getErrors	After execute() returns, use getErrors to access error information.
getInputFile	Name of the file to convert
getOutputFile	Name of the file to create from the input file. Use setOutputToString() to request conversion to a memory buffer.
getOutputFormat	The output format can be it_HTML (226), it_Ansi (202), it_Utf8 (238), it_RTF (212), or it_XML (234)
getOutputString	Conversion output can be directed to a string or to a disk file. For string output, use setOutputStringMaxSize to set the maximum size of the output string, and getOutputString after conversion to access the result.
setAfterHit	If an array of hit offsets has been provided using setHits, then the beforeHit and afterHit strings will be used to mark each hit in the document in the converted output. The strings must be appropriate for the output format. For example, to use an angle bracket in HTML output, use >
setAlphabetLocation	The location of the dtSearch alphabet file to use when highlighting hits. The alphabet file determines how dtSearch counts words, so it is important that the same alphabet file used to index or search a file also be used to highlight hits. For more information on how hit highlighting works, see Highlighting Hits in the online help. To ensure that the same alphabet used to index a file is used to highlight hits in that file, set the alphabetLocation to the folder where the index is located. The alphabet definition will be stored in this folder (in a file named... more
setBaseHref	For HTML output, an HREF for a BASE tag to be inserted in the header.
setBeforeHit	If an array of hit offsets has been provided using setHits, then the beforeHit and afterHit strings will be used to mark each hit in the document in the converted output. The strings must be appropriate for the output format. For example, to use an angle bracket in HTML output, use <
setDocBytes	Use setDocBytes to provide a document in a memory buffer rather than as a disk file. The byte array input must contain exactly the same bytes as the representation of this document on disk. When a byte array is provided through setDocBytes, the filename is disregarded.
setDocFields	Use setDocFields to provide fields associated with the input document, to highlight hits in data as it was passed through the DataSource API's getDocFields method during indexing. DocFields consists of a series of pairs of field names and values, with tab characters (chr$(9)) between them.
setDocText	Use setDocText to provide text associated with the input document, to highlight hits in data as it was passed through the DataSource API's getDocText method during indexing. DocText content is always interpreted as plain text. Data in a format that includes tags, such as HTML or RTF, should be passed through DocBytes.
setExtractionOptions	Options for extraction of embedded images and attachments
setFlags	Set to ConvertFlags values to control file conversion.
setFooter	The footer will be appended to the conversion output and can use tags in the output format, such as HTML tags in a document converted to HTML.
setHeader	The header will appear at the top of the conversion output and can use tags in the output format, such as HTML tags in a document converted to HTML.
setHits	To request hit highlighting using the beforeHit and afterHit strings, provide an array of hit offsets using setHits. The array returned from the SearchResults getHits method can be used for this purpose.
setHitsByWord	Information generated by setting the flag dtsSearchWantHitsByWord and dtsSearchWantHitsArray in SearchJob, used when applying different highlight attributes to each search term (see the dtsConvertMultiHighlight flag).
setInputFile	Name of the file to convert
setInputItem	Select an item from search results to use as input for the FileConverter. setInputItem will set the name of the input file, the alphabet location, and the hits.
setOutputFile	Name of the file to create from the input file. Use setOutputToString() to request conversion to a memory buffer.
setOutputFormat	The output format can be it_HTML (226), it_Ansi (202), it_Utf8 (238), it_RTF (212), or it_XML (234)
setOutputStringMaxSize	Conversion output can be directed to a string or to a disk file. For string output, use setOutputStringMaxSize to set the maximum size of the output string, and getOutputString after conversion to access the result.
setOutputToString	Conversion output can be directed to a string or to a disk file. For string output, use setOutputStringMaxSize to set the maximum size of the output string, and getOutputString after conversion to access the result.
setTimeoutSeconds	Set timeoutSeconds to the maximum amount of time you want to permit. When this time is exceeded, execution will halt leaving incomplete results in the output file or output string. If timeoutSeconds is 0 (the default), no time limit will be set. After a timeout has occured, getErrors() will return the error code dtsErTimeout.

Remarks

For general information on implementing hit highlighting and hit navigation, see:

Highlighting Hits

To convert a file, create a FileConverter, use the properties of the FileConverter to describe the conversion task you want to perform, and call the execute() method.

When highlighting hits from search results, use setInputItem to initialize the FileConverter with information obtained from SearchResults.

BeforeHit, AfterHit, Header, and Footer control the appearance of converted text. Header and Footer are inserted before and after the body of the document. The BeforeHit and AfterHit markers are inserted before and after each hit word. The BeforeHit and AfterHit markers can contain hypertext links. To facilitate creation of hit navigation markers, the strings "%%ThisHit%%", "%%NextHit%%", and "%%PrevHit%%" will be replaced with ordinals representing the current hit, the next hit, and the previous hit in the document.

Example

// results:  SearchResults from a previous search
// whichDoc:  integer from 0 to results.getCount()-1 identifying the document to display

// Select the item to display
results.getNthDoc(whichDoc);
String f = results.getDocName();

com.dtsearch.engine.FileConverter fc = new com.dtsearch.engine.FileConverter();

// Set up FileConverter to use the selected item from search results
fc.setInputItem(results, index);

// If the file is HTML, this ensures that it has a BASE tag preserving relative links
fc.setBaseHref(f);

// Generate HTML output in a string
fc.setOutputToString(true);
fc.setOutputFormat(Constants.it_HTML);

// Highlight hits by making them bold
fc.setBeforeHit("<b>");
fc.setAfterHit("</b>");

// Perform the conversion
fc.execute();

// Display the result
setHtml(fc.getOutputString());

Class Hierarchy

com.dtsearch.engine.FileConverter