How to display retrieved XML documents with XSL formatting

Article: dts0183

Applies to: dtSearch Web

The default format for display of XML documents looks like this:

<Record>  This is the text of the "Record" field

    <Item>   This is the text of the "Item" field inside "Record"

...

To display XML with XSL styles, there are three options: (1) display the XML without hit-highlighting, so that XSL formatting will be applied automatically in the browser, (2) index the content in HTML format after the XSL formatting has been applied, or (3) add hit highlight tags to the XML.

1.  Displaying XML without highlighted hits

When XML is displayed without highlighted hits, the browser will automatically apply any style sheet that is specified in the XML file.  

To display XML without highlighted hits in dtSearch Web, edit the dtSearch_options.html file to add XML to the list of "Unconverted Types", like this:

<!-- $Begin UnconvertedTypes -->
PDF XML
<!-- $End -->

With this change, dtSearch Web search results will link directly to XML files rather than highlighting hits, so XSL styles will be displayed.

To display XML without highlighted hits in dtSearch Desktop,

(1) In dtSearch Desktop, click Options > Preferences > External Viewers

(2) Click New... and set up a new item named "XML", with "*.XML" as the filename filter, and check the option to "Display file in dtSearch without hits highlighted."

2.  Displaying XML as HTML with highlighted hits

To highlight hits in XML with style sheet formatting, the XML documents can be indexed using the dtSearch Spider, with the server configured to apply the XML style sheets.  This way the dtSearch Spider will see the HTML output from the XSL styles, rather than the original XML.  After a search, the URL that generates the HTML output will be retrieved, so dtSearch will be able to highlight hits in the generated HTML.

For more information on indexing dynamically-generated content using the dtSearch Spider, see:

How to use dtSearch Web with dynamically-generated content

3.  Highlighting hits in XML

dtSearch (version 6.21 and later) can also highlight hits in XML without conversion (like the HTML-to-HTML hit highlighting feature for HTML files). This provides pure XML output with caller-specified highlight markings around each hit.  Example:

<Record> This is the text of the "Record" field. 

    <Item> This is the text of the "Item" field inside "Record".  This is a <hit>highlighted hit</hit>.</Item>

</Record>

The advantage of obtaining pure XML is that you can use XSL to format the content for display. The BeforeHit and AfterHit markings can either be an XML field tag or a unique mark (like {{{ and }}}) that will be replaced with HTML formatting instructions after XSL processing.

Requirements:

(1) The XML must have been indexed with XML field names and attributes skipped. 

(2) XML-to-XML conversion must be enabled when the documents are displayed.

Indexing without field names or attributes

XML-to-XML hit highlighting requires that the XML be indexed with field names and attributes made non-searchable.   This is because the highlighting algorithm assumes that only field data will be highlighted.  If these flags were not set when the XML was indexed, the wrong words will be highlighted because the word counts will not match the way the document was indexed.

dtSearch Desktop: Click Options > Preferences > Indexing Options, and un-check these two boxes:
"Index field names in XML files"
"Index field attributes in XML files"

dtSearch developer API: Before indexing documents, set these flags in Options.FieldFlags:      dtsoFfXmlHideFieldNames and dtsoFfXmlSkipAttributes

Enabling XML-to-XML conversion

dtSearch Web or dtSearch Publish: To enable XML-to-XML conversion for retrieved XML documents, change this setting in the dtsearch_options.html file:

<!-- $Begin EnableXmlToXmlHighlighting -->
1
<!-- $End -->

dtSearch Developer API: To set up a FileConverter or DFileConvertJob for XML-to-XML highlighting,
(1) Set the flags to dtsConvertXmlToXml, and
(2) Set the output format to it_XML (the it_XML output format cannot be used with any input data other than XML).
(3) Set the BeforeHit and AfterHit values to XML tags to be used around hits, like "<hit>" and "</hit>".

dtSearch Desktop: XML-to-XML hit-highlighting is not possible in dtSearch Desktop.

Notes

1.  XML style sheet references must be absolute to work after highlighting.   When a document has been hit-highlighted, it no longer has a location to serve as a base for relative references, so if your XML file references "sample.xsl", the styles will not be found.  Instead, insert an absolute reference like "/folder/sample.xsl".

2.  The style sheet must define a style for the tag to be used to highlight hits.   dtSearch Web uses <dtSearchHit> and </dtSearchHit> around hits in XML.

3.  Hit navigation (the "Next Hit" and "Prev Hit" buttons) is not currently available for data that is displayed as XML rather than HTML.