How to exclude folders from an index

Last Reviewed: August 4, 2013

Article: DTS0135

Applies to: dtSearch Desktop

When you specify a folder to be indexed in dtSearch, by default all subfolders of that folder will also be indexed. You can change this and have dtSearch index just the top-level folder, but sometimes you may want to exclude a few folders from a large tree of folders and subfolders.

To do this, use the Exclude filters in the Update Index dialog box. In addition to the usual type of filters -- *.exe, *.tif, etc. -- you can include filters with directory information. For example, to exclude everything in the c:\docs\archive\oldfiles folder, you could add an exclude filter "c:\docs\archive\oldfiles\*"

If a filter contains a space, it must be quoted, like this:  "*\name with space\*".

Exclude filters will apply inside a ZIP archive.  For example, if you exclude *.doc from an index, and a ZIP archive contains sample.doc, then the indexer will skip sample.doc because of the exclude filter, even though it is inside a ZIP.

Additional Information

When a filename filter includes a \, dtSearch will compare it to the full path and filename of a document, but if the filename filter does not include a \, dtSearch will compare it to the filename alone. Examples:

 

s*.doc

Matches c:\docs\sample.doc, but not s:\docs\other.doc

*\oldfiles\*

Matches any file in a folder named oldfiles

"*\Old Files\*"

Matches any file in a folder named "Old Files".  The quotation marks are needed because "Old Files" has a space, and without the quotation marks, dtSearch will see the filter as two separate filters, *\Old and Files\*

Web Sites and the dtSearch Spider

You can use the same type of filters to control which pages of a web site the dtSearch Spider will visit.   For example, if you have a filename filter "*/OnlyThisFolder/*", then only documents under the "OnlyThisFolder" folder will be indexed.

Additionally, the dtSearch Spider obeys the instructions in a robots.txt file if one is present on the web site, and it also obeys instructions in a robots META tag if one is present in an HTML file.  This makes it possible to create a file with links in it that dtSearch will follow, while preventing dtSearch from indexing the links.   For more information on robots.txt and robots META tags, see:

http://www.robotstxt.org/wc/exclusion.html