Highlighting hits in PDF files

How to display retrieved PDF documents with hits highlighted.

Remarks

Highlighting hits in Adobe Reader

dtSearch can highlight hits in Adobe Reader and Adobe Acrobat using a plug-in. For information on this plug-in and a link to download the current version of the plug-in, please see https://www.dtsearch.com/pdfhl/

For information on using the plug-in to highlight hits in Adobe Reader either in server-based or client-based applications, please see:

https://www.dtsearch.com/pdfhl/PdfHighlighter.pdf

URL syntax for passing hit offsets

The hit highlighting mechanism that the plug-in uses is based on a feature that was included in Adobe Reader versions 9 and earlier, which used a URL-based syntax to pass hit offset information to Adobe Reader. The URL format looks like this:

http://www.dtsearch.com/sample.pdf#xml=http//www.dtsearch.com/hits.xml

The #xml= portion of the link points to a URL that returns an XML stream describing the location of the hits in the PDF file. The format of the XML file is described in Adobe Technical Note 5172 -- Highlight File Format.

The dtSearch plug-in uses the same format for the hit highlighting information and the same URL syntax. The SearchResults.MakePdfWebHighlightFile function in the dtSearch Engine API generates the hit highlighting data that the plug-in uses.

Highlighting hits in a desktop application

In a desktop application, use a WebBrowser control to display PDF files. When Adobe Reader is installed, navigating the WebBrowser control to a PDF file opens the file in Adobe Reader.

The path to the PDF file should include a suffix like this: ?xml=<path to highlighting data>. The dtSearch PDF highlighting plug-in will see this suffix and use it to retrieve the highlighting data.

For sample code demonstrating how to use the PDF highlighting feature in a desktop application, see the DesktopSearch sample in examples\cs4\DesktopSearch.

Highlighting hits in a server application

Because the dtSearch PDF highlighter works with Adobe Reader, only web browsers that use Adobe Reader to display PDF files can be used with the PDF highlighter. Currently the only web browser that allows Adobe Reader to open PDF files is Internet Explorer. Chrome, Edge, and Firefox all use their own PDF viewer, which does not work with the dtSearch hit highlighter.

lbvProt.dll and dts_svr.exe

These components were used in older versions and are no longer needed to implement hit highlighting. They were needed to highlight hits in older Adobe Reader versions because Adobe Reader required that the hit highlighting data be passed by HTTP. The dtSearch plug-in can pass hit highlighting information through local files, eliminating the need to implement a local version of HTTP for this purpose.

PDF files with attachments

PDF files can contain attachments, which can be in any file format. If a PDF file has attachments, Adobe Reader cannot be used to display the file with hits highlighted, because Adobe Reader can only highlight hits in PDF content. Therefore, when a PDF file has attachments, hit highlighting can only be done by file conversion.

PDF files with attachments will have the TypeId it_PdfWithAttachments instead of it_PDF.

To make it possible to treat PDF files with attachments like other PDF files, you can suppress indexing of attachments. In this case, only the pages and properties of the PDF file itself will be indexed. To suppress indexing of attachments, set the flag dtsoFfPdfSkipAttachments in Options.FieldFlags.

Language Analyzer API

PDF hit highlighting inside Adobe Reader does not currently work if documents were indexed using a word breaker integrated using the language analyzer API. The only kind of hit highlighting that is supported in combination with the language analyzer API is conversion of files using FileConverter.

Group

Highlighting Hits