How to display retrieved PDF documents with hits highlighted
While it is possible to convert PDF files to HTML, it is better to highlight hits directly in Adobe Reader because then all aspects of the PDF file's appearance are preserved.
When a user clicks on a link to a PDF file on a web page, the browser loads Adobe Reader as a plug-in and uses it to display the page. Adobe Reader knows how to interpret a type of URL that provides hit highlight information. The URL format looks like this:
http://www.dtsearch.com/sample.pdf#xml=http//www.dtsearch.com/hits.xml
The #xml= portion of the link points to a URL that returns an XML stream describing the location of the hits in the PDF file. The format of the XML file is described in this document, which is also included in the Acrobat SDK.
Adobe Technical Note 5172 -- Highlight File Format
http://partners.adobe.com/public/developer/en/pdf/HighlightFileFormat.pdf
When the hit highlighting API works, hits will be highlighted in Adobe Reader inside the user's browser and hit navigation buttons on the Adobe Reader toolbar will let the user navigate from hit to hit.
Usually the #xml portion of the URL does not point to a text file but instead requests the XML from a script or program, like this:
http://www.dtsearch.com/sample.pdf#xml=http//www.dtsearch.com/dtsearch.asp?cmd=getPdfHits&idoc=5
The dtSearch Engine provides a MakePdfWebHighlightFile method in the SearchResults object to generate this XML stream. For sample code demonstrating this, please see the dtsearch.asp sample included with the dtSearch Engine.
This mechanism for highlighting hits is difficult to troubleshoot because it involves interaction between the web server, the browser, Adobe Reader, and your application. The most common problem is a scripting error in the implementation of the #xml= portion of the URL. For troubleshooting suggestions to resolve problems with PDF hit highlighting, see this article on the dtSearch web site: http://support.dtsearch.com/faq/dts0152.htm
The same browser-based interface can be used to view PDF files in a client application. To display a PDF file, the application would embed a WebBrowser control and use the Navigate() function to direct the control to a URL like the ones used on web sites (above).
The Adobe interface used to highlight hits only works consistently when a PDF file is accessed via HTTP. Therefore, to highlight hits in a local PDF file, it is still necessary to send the PDF file and highlight information to Adobe Reader via HTTP.
The dtSearch Engine includes tools to support two different mechanisms to do this: (1) an in-process COM object that implements an Asynchronous Pluggable Protocol, lbvProt.dll; and (2) an out-of-process HTTP server, dts_svr.exe, that implements a local-only HTTP server.
Currently lbvProt.dll is the preferred mechanism. Because it does not use any ports and integrates directly with the browser, it does not trigger any firewall warnings.
Highlighting hits in a client application involves interaction between your program, the embedded web browser control, the Adobe Reader instance embedded in the web browser, and any security software or firewalls that may be installed on the end-user system. Changes or unexpected behavior in any one of these components can prevent the highlighting mechanism from working. Therefore, for widely-distributed applications it may be a good precaution to provide both mechanisms with a user-controllable option setting to select between the two mechanisms.
|
Topic |
Description |
|
dts_svr.exe provides a way to highlight hits in PDF files using a local-only HTTP server | |
|
lbvProt.dll provides a way to highlight hits in PDF files using an Asynchronous Pluggable Protocol. |
|
Copyright (c) 1995-2008 dtSearch Corp. All rights reserved.
|