Caching text and documents in an index

Article: dts0226

 

In addition to storing word locations to enable fast searching, dtSearch indexes can also store the text of documents to make them open faster after a search.  dtSearch indexes can optionally store documents in either, or both, of two ways: (1) the entire original file can be stored, or (2) just the text of the file can be stored.  Option settings in the "Create Index (Advanced)" dialog box enable these features when an index is created.

Storing the text of documents makes generation of search reports much faster, including generation of the brief hits-in-context synopsis in search results.

Storing complete documents is useful in situations where the documents may not be accessible at search time, or where access to the documents may be slow or unreliable.  Examples include:

- Indexes of web sites created using the dtSearch Spider

- Indexes of Outlook message stores

- Indexes of network shares that may be offline or inaccessible for other reasons

Performance implications of caching documents and text 

Search speed:  No effect

Search reports:  Substantially faster if text is stored; no effect if only complete documents are cached

Opening documents after a search:  Can be substantially faster if complete documents are cached, and if access to the original documents is slow (for example, on a web site).

Indexing speed:  Indexing will be slower due to the need to compress and store additional data in the index.

Index size:  Cached documents and text are compressed using ZIP compression. 

Security implications of caching documents and text 

A user who is able to search an index will also be able to open any documents that are cached in the index.  Therefore, if documents are subject to security restrictions, the same security restrictions should apply to the index folder, if the documents are being stored in the index.

See Also

Caching Documents (Developer API Reference)