How to prevent filenames and document properties from appearing in documents

Article: dts0173

Applies to: dtSearch 6 and later

Automatically-generated fields

Some document formats contain fields or properties in addition to the document text. For example, Word documents contain a Document Summary area with information such as the Author and Subject of a document.  dtSearch automatically recognizes these fields and indexes them so you can search for text across the whole document or just in a particular field.  These fields are added to the end of the document text during indexing and are displayed at the end of the file when it is opened after a search.  For details on the types of document fields that dtSearch currently recognizes, see "What file formats does dtSearch support?"

In addition to these automatically-detected fields, dtSearch also adds the document filename as a field named "Filename".  This makes it possible to find a document if a word appears in the filename, even if the word does not appear in the document text.

dtSearch Desktop/Network

In dtSearch Desktop, click Options > Preferences > Indexing Options to disable automatically-generated fields.

The following options are available:

"Index document properties" - Controls display and indexing of document summary information fields in Microsoft Office files, document properties in PDF and WordPerfect, and similar metadata in other document formats.

"Index properties of images embedded in documents" -- Controls indexing and display of image metadata for images that are embedded in other documents.  By default, image metadata for embedded images is not indexed or displayed.  This setting does not affect indexing or display of metadata in image files that are not embedded in other files.

"Index filenames as text" -- Controls indexing and display of the document filename as a "Filename" field at the end of the document.  Check the "Include path information" box to include the complete path and filename in the Filename field.  

"Add file type name to documents" -- If enabled, dtSearch will add a "File Type" field to the end of each document with the name of the file type (Microsoft Word, WordPerfect, etc.)

"Index MIME headers in emails" -- Controls indexing and display of the complete MIME headers in Outlook and EML email files.  The MIME headers will appear at the end of the email.  This setting does not affect display of basic email metadata such as "To", "From", "Subject", etc.

dtSearch Engine

Programmers using the dtSearch Engine developer API can control indexing of metadata using Options.FieldFlags:

(.NET) Use the Options.FieldFlags property

(C/C++) Use the fieldFlags member of dtsOptions

(Java) Use the fieldFlags member of the Options object

dtSearch Desktop A registry entry can be used to change this in dtSearch Desktop. The key is:     HKEY_CURRENT_USER\Software\dtSearch Corp.\dtSearch\Settings\FieldFlags

The dtSearch Engine does not permanently store the FieldFlags value anywhere, so if you set Options.FieldFlags before indexing documents, you will need to set it to the same value before invoking FileConverter to highlight hits.  

dtSearch Web and dtSearch Publish

The same FieldFlags values used in the dtSearch Engine can be used with dtSearch Publish.   To specify FieldFlags to apply when displaying documents, edit the dtsearch_options.html file for your search form and look for the FieldFlags section of the file:

<BR><HR><I>Field flags: </I>

<!-- $Begin FieldFlags -->

3

<!-- $End -->

The value shown, 3, indicates that both dtsoFfSkipDocumentProperties (2) and dtsoFfSkipFilenameField (1) are set.