Last Reviewed: March 16, 2013
Applies to: dtSearch Engine 6, 7
This article covers problems that developers may encounter displaying PDF files with hit highlighting in Adobe Reader through a browser interface.
Troubleshooting PDF Indexing
PDF file appears as blank page in dtSearch Web
PDF Files appear without hit highlighting in dtSearch Web
When a user clicks on a link to a PDF file on a web page, the browser loads Adobe Reader as a plug-in and uses it to display the page. The URL with the PDF filename can include a second URL that specifies words to highlight. The URL format looks like this:
The #xml= portion of the link points to a URL that returns
an XML stream describing the location of the hits in the PDF file. The
format of the XML file is described in this document:
Adobe Reader X and XI requires a plug-in to enable hit highlighting. For information on the plug-in and a link to download it, please see https://www.dtsearch.com/pdfhl/
Adobe Reader 9 requires an option settings change to enable hit highlighting. Please click here for more information.
Adobe Reader versions 8 and older support hit highlighting without the plugin.
When the hit highlighting API works, hits will be highlighted in Adobe Reader inside the user's browser and hit navigation buttons on the Adobe Reader toolbar will let the user navigate from hit to hit. If it does not work, often the only symptom you will see is the absence of hit highlighting.
Usually the #xml portion of the URL does not point to a text file but instead requests the XML from a script or program, like this:
The dtSearch Engine provides a MakePdfWebHighlightFile method in the SearchResults object to generate this XML stream.
To test PDF highlighting on the client machine, click on this link: http://support.dtsearch.com/pdftest
If PDF highlighting does not work on the client machine, please see these articles for troubleshooting steps:
Troubleshooting PDF viewing problems in dtSearch Desktop/Network
Troubleshooting PDF hit highlighting problems in dtSearch Web
Index the PDF files with dtSearch Desktop and try searching. If searching does not work, please see:
Troubleshooting PDF indexing
Check that the URLs your application generates have the right format. The format for the URLs that provide hit highlighting information is:
The #xml= portion of the link points to a URL that returns an XML stream describing the location of the hits in the PDF file.
To prevent use of the plug-in to send forged requests to web sites, the plug-in will send a standard validation request to make sure the target URL is really a PDF search highlighter. The validation request replaces the query in the original URL with "IsPdfHighlighter", and expects a response that contains "YesPdfHighlighter".
For example, suppose a user clicks this link:
Adobe Reader will open the PDF file, and the dtSearch plug-in will see the #xml= in the URL. To verify the target URL, before requesting the highlighting data, the dtSearch plug-in will first send this request:
To test your script, you can enter the URL for your highlighting script in a browser window with the IsPdfHighlighter request added and check that the response includes YesPdfHighlighter.
See How to generate a diagnostic log from the dtSearch PDF Search Highlighter, below, for instructions to turn on diagnostic logging. In the diagnostic log:
- Check that the URL detected in the log includes the #xml= syntax. If you do not see the #xml= syntax in the log, either your application may not be generating the #xml= links, or, if you are using Internet Explorer, then the dtSearch PDF Search Highlighter BHO may be disabled in Internet Explorer.
- Check for any error messages recorded in the log identify any security issues that prevented highlighting from working.
- Check that the IsPdfHighlighter query was processed correctly with a response that includes "YesPdfHighlighter"
- Check that the XML your application returns is correctly formatted and does not include any extra content such as HTML headers or messages.
Save the results of the view-source URL above from Notepad to a file named test.xml in the root folder of your web server. Save the PDF file to test.pdf in the same folder. Open your browser and enter the following URL, replacing "localhost" with the address of your web site, if appropriate:
If the XML stream is correct, test.pdf should appear in a browser window with hits highlighted. If hits are not highlighted, check the format of the information in test.xml against the Adobe documentation of the Highlight File Format (see link above).
To generate a diagnostic log:
(1) Run the dtspdfcfg.exe utility (click Start > Programs > dtSearch Pdf Search Highlighter > dtSearch PDF Search Highlighter Options).
(2) Click Diagnostics... and check the box to Enable diagnostic logging.
(3) Try to open a PDF file that should have highlighting.
(4) Close all browser windows and all Adobe Reader windows
(5) In the dtSearch PDF Search Highlighter Options program, click Zip logs for email to find the diagnostic logs.
Monitoring the log using dbgview.exe
You can also monitor the log in real time using the dbgview utility from the Microsoft Sysinternals web site. To use dbgview.exe to monitor highlighting, first open dbgview.exe and then open a browser window and execute a search. You should see diagnostic messages from the highlighter appear in dbgview.exe as soon as a PDF file opens.
Because of browser or Adobe Reader sandboxing, you may need to run dbgview as a limited user to see the log. Currently this is necessary when using Internet Explorer, but not with Chrome. To run dbgview as a limited user, use the psexec utility (also available from the Microsoft Sysinternals web site) to launch dbgview.exe with the -l command-line switch, like this:
psexec -l dbgview.exe