How to display retrieved documents with hits highlighted.
To display a retrieved document with hits highlighted, the dtSearch Engine provides APIs that can convert documents to easily-displayed formats (HTML, RTF, or text) with caller-supplied highlight markings around the hits.
To highlight hits in a document, dtSearch needs the following information:
When highlighting hits in a document retrieved from a search, use FileConverter.SetInputItem() to transfer all document properties from search results to the FileConverter in one step, eliminating the need to set items (1) through (4) individually.
For information on highlighting hits in specific formats, see:
For information on implementing hit navigation, see:
If you see incorrect hit highlighting in a document after a search,
(1) Check that you are using FileConverter.SetInputItem to ensure that all necessary document properties are transferred from SearchResults.
(2) Check that the document was not changed since it was last indexed.
(3) Check that the dtSearch version used to index the document is the same as the version being used to highlight hits.
Using indexes created with older dtSearch versions can result in hit highlighting errors if the newer version includes a file parser changes that affect text extraction or word breaking.
dtSearch can automatically correct for these types of errors by re-scanning the document for the search request when highlighting hits. To enable this option, set the flag dtsConvertAutoUpdateSearch in FileConverter.
HTML output from FileConverter may not be well-formed. For example, it may not contain exactly more than one <HTML>...</HTML> pair of tags. The reason is that dtSearch extracts pieces of HTML from different places depending on the file format and has to splice them all together. For example, an email will often include one or more message bodies in HTML, attachments that may be in HTML, and attachments in other formats that have to be converted to HTML. While it would be possible to scan the HTML output for errors in HTML syntax, this would require a potentially time-consuming and memory-consuming full additional pass through the converted data, before anything is returned. If you need this in your application, you can use the library provided by this open-source project to add a post-conversion step to clean up the HTML: Html Tidy Project - http://tidy.sourceforge.net