Close
dtSearch Text Retrieval Engine Programmer's Reference
Highlighting hits in XML files

How to display retrieved XML documents with hits highlighted.

If you display a retrieved XML document by having dtSearch convert it to HTML, the 

result looks like this:

<Record> This is the text of the "Record" field. This is a highlighted hit. <Item> This is the text of the "Item" field inside "Record" ...

dtSearch can also highlight hits in XML without conversion (like the HTML-to-HTML hit highlighting feature for HTML files). This provides pure XML output with caller-specified highlight markings around each hit. Example:

<Record> This is the text of the "Record" field. This is a <hit>highlighted hit</hit>. <Item> This is the text of the "Item" field inside "Record"</Item> </Record>

The advantage of obtaining pure XML is that you can use XSL to format the content for display. The BeforeHit and AfterHit markings can either be an XML field tag or a unique mark (like {{{ and }}}) that will be replaced with HTML formatting instructions after XSL processing.

Requirements

To use XML-to-XML hit highlighting, the original XML must have been indexed with two flags set in Options.FieldFlags: dtsoFfXmlHideFieldNames and dtsoFfXmlSkipAttributes 

If these flags were not set when the XML was indexed, the wrong words will be highlighted because the word counts will not match the way the document was indexed. 

To set up a FileConverter or DFileConvertJob for XML-to-XML highlighting,

  1. Set the flags to dtsConvertXmlToXml, and
  2. Set the output format to it_XML (the it_XML output format cannot be used with any input data other than XML).
Web sites generated from XML

If the XML is being used with XSL on a web site, a simpler way to implement hit highlighting might be to use the dtSearch Spider to crawl the web site. This way the dtSearch Spider will see the HTML output from the XSL styles, rather than the original XML. �After a search, the URL that generates the HTML output will be retrieved, so dtSearch will be able to highlight hits in the generated HTML.