Last Reviewed: August 27, 2008
Article: DTS0221
Applies to: dtSearch Desktop, dtSearch Network, dtSearch Engine
Indexing data on network drives
dtSearch can index documents in any accessible network share, and only read access is needed. When indexing data on a network drive, each document indexed has to be read once, so the amount of network traffic generated will approximately equal the size of the documents being indexed. Because the documents are accessed read-only, there is no risk of damage to the documents being indexed.
Building indexes on network drives
Indexes can be located in any writable folder, including network drives and external drives.
Building an index requires a great deal of network I/O, and data is both read and written. When writing data across a network connection, data errors can occur, and data errors are much more likely when the amount of network I/O exceeds the capacity of the network or storage hardware. Writes to external devices can fail on the external device without any notification to the calling application (for example, delayed write errors on network drives). Because of this, network I/O errors can cause corrupt indexes even though dtSearch uses a transaction wrapper to protect the index against failed updates.
To minimize the risk of index corruption when building an index on a network drive:
(1) Set up dtSearch to locate temporary files created during indexing on a local drive. This will reduce the amount of network I/O required by over 50%. In dtSearch Desktop, click Options > Preferences > Indexing Resources, and set the location for "Temporary Files" to a location on the C: drive or another internal drive. In applications using the dtSearch Engine API, use IndexJob.TempFileDir to specify a folder that is located on an internal drive.
(2) Avoid updating multiple indexes on the same network drive at the same time.
On external drives, data errors are more likely with USB and Firewire drives. We have not had reports of data errors with eSATA drives.
For more information on indexing large document collections, see Optimizing indexing of large document collections.