FileConverter converts files to HTML, RTF, or text, optionally marking hits with caller-supplied tags.
File: FileConverter.java
Package: com.dtsearch.engine
Method |
Description |
---|---|
Call execute() to execute the conversion. | |
File type of input document detected by dtSearch file parsers. | |
After execute() returns, use getErrors to access error information. | |
Name of the file to convert | |
Name of the file to create from the input file. Use setOutputToString() to request conversion to a memory buffer. | |
Conversion output can be directed to a string or to a disk file. For string output, use setOutputStringMaxSize to set the maximum size of the output string, and getOutputString after conversion to access the result. | |
If an array of hit offsets has been provided using setHits, then the beforeHit and afterHit strings will be used to mark each hit in the document in the converted output. The strings must be appropriate for the output format. For example, to use an angle bracket in HTML output, use > | |
The location of the dtSearch alphabet file to use when highlighting hits. The alphabet file determines how dtSearch counts words, so it is important that the same alphabet file used to index or search a file also be used to highlight hits. For more information on how hit highlighting works, see Highlighting Hits in the online help. To ensure that the same alphabet used to index a file is used to highlight hits in that file, set the alphabetLocation to the folder where the index is located. The alphabet definition will be stored in this folder (in a file named... more | |
For HTML output, an HREF for a BASE tag to be inserted in the header. | |
If an array of hit offsets has been provided using setHits, then the beforeHit and afterHit strings will be used to mark each hit in the document in the converted output. The strings must be appropriate for the output format. For example, to use an angle bracket in HTML output, use < | |
Use setDocBytes to provide a document in a memory buffer rather than as a disk file. The byte array input must contain exactly the same bytes as the representation of this document on disk. When a byte array is provided through setDocBytes, the filename is disregarded. | |
Use setDocFields to provide fields associated with the input document, to highlight hits in data as it was passed through the DataSource API's getDocFields method during indexing. DocFields consists of a series of pairs of field names and values, with tab characters (chr$(9)) between them. | |
Use setDocText to provide text associated with the input document, to highlight hits in data as it was passed through the DataSource API's getDocText method during indexing. DocText content is always interpreted as plain text. Data in a format that includes tags, such as HTML or RTF, should be passed through DocBytes. | |
Options for extraction of embedded images and attachments | |
Set to ConvertFlags values to control file conversion. | |
The footer will be appended to the conversion output and can use tags in the output format, such as HTML tags in a document converted to HTML. | |
The header will appear at the top of the conversion output and can use tags in the output format, such as HTML tags in a document converted to HTML. | |
To request hit highlighting using the beforeHit and afterHit strings, provide an array of hit offsets using setHits. The array returned from the SearchResults getHits method can be used for this purpose. | |
Information generated by setting the flag dtsSearchWantHitsByWord and dtsSearchWantHitsArray in SearchJob, used when applying different highlight attributes to each search term (see the dtsConvertMultiHighlight flag). | |
Name of the file to convert | |
Select an item from search results to use as input for the FileConverter. setInputItem will set the name of the input file, the alphabet location, and the hits. | |
Name of the file to create from the input file. Use setOutputToString() to request conversion to a memory buffer. | |
Conversion output can be directed to a string or to a disk file. For string output, use setOutputStringMaxSize to set the maximum size of the output string, and getOutputString after conversion to access the result. | |
Conversion output can be directed to a string or to a disk file. For string output, use setOutputStringMaxSize to set the maximum size of the output string, and getOutputString after conversion to access the result. | |
Set timeoutSeconds to the maximum amount of time you want to permit. When this time is exceeded, execution will halt leaving incomplete results in the output file or output string. If timeoutSeconds is 0 (the default), no time limit will be set. After a timeout has occured, getErrors() will return the error code dtsErTimeout. |
For general information on implementing hit highlighting and hit navigation, see:
Highlighting Hits
To convert a file, create a FileConverter, use the properties of the FileConverter to describe the conversion task you want to perform, and call the execute() method.
When highlighting hits from search results, use setInputItem to initialize the FileConverter with information obtained from SearchResults.
BeforeHit, AfterHit, Header, and Footer control the appearance of converted text. Header and Footer are inserted before and after the body of the document. The BeforeHit and AfterHit markers are inserted before and after each hit word. The BeforeHit and AfterHit markers can contain hypertext links. To facilitate creation of hit navigation markers, the strings "%%ThisHit%%", "%%NextHit%%", and "%%PrevHit%%" will be replaced with ordinals representing the current hit, the next hit, and the previous hit in the document.