Using dtSearch with network storage devices

 Last Reviewed: March 15, 2016

Article: dts0221

Indexing data on network drives

dtSearch can index documents in any accessible network share, and only read access is needed.   When indexing data on a network drive, each document indexed has to be read once, so the amount of network traffic generated will approximately equal the size of the documents being indexed.  Because the documents are accessed read-only, there is no risk of damage to the documents being indexed.

Building indexes on network drives

Indexes can be located in any writable folder, including network drives and external drives.  

Building an index requires a great deal of network I/O, and data is both read and written.  When writing data across a network connection, data errors can occur, and data errors are much more likely when the amount of network I/O exceeds the capacity of the network or storage hardware.  Writes to external devices can fail on the external device without any notification to the calling application (for example, delayed write errors on network drives).  Because of this, network I/O errors can cause corrupt indexes even though dtSearch uses a transaction wrapper to protect the index against failed updates.

To minimize the risk of index corruption when building an index on a network drive:

(1) Set up dtSearch to locate temporary files created during indexing on a local drive.  This will reduce the amount of network I/O required by over 50%.  In dtSearch Desktop, click Options > Preferences > Indexing Resources, and set the location for "Temporary Files" to a location on the C: drive or another internal drive.  In applications using the dtSearch Engine API, use IndexJob.TempFileDir to specify a folder that is located on an internal drive.

(2) Avoid updating multiple indexes on the same network drive at the same time.  

On external drives, data errors are more likely with USB and Firewire drives.  We have not had reports of data errors with eSATA drives.

For more information on indexing large document collections, see Optimizing indexing of large document collections.

Symptoms of network indexing problems

The exceptions.ix file in the index folder will log any I/O errors during indexing that dtSearch detected and was able to log in the index.  Errors logged in exceptions.ix do not necessarily mean that the index is corrupt, because the logged errors are the ones that dtSearch was able to detect and handle.

The most common symptom of network indexing problems is the error "The specified network name is no longer available (64)" during an index update.  This error indicates that dtSearch has detected that the network connection was lost during the indexing operation, and cancelled the index update.  The error will only be logged in exceptions.ix if the dropped connection was intermittent (otherwise dtSearch would have no way to write the message to the log).  

Searching indexes on network drives

Searching reads from indexes but does not write anything, so delayed write errors are not possible.  However, some Windows network settings can cause intermittent "Unable to access index" errors searching indexes with a high volume of concurrent updates.  Windows can be set to cache network metadata for relatively long periods of time, which can prevent search users from seeing a consistent view of the index folder.  If intermittent "Unable to access index" errors occur during searches, and if the index is being updated frequently, change the DirectoryCacheLifetime, FileNotFoundCacheLifetime, and FileInfoCacheLifetime settings as described in this Microsoft article:

SMB2 Client Redirector Caches Explained


Copyright © 1991-2022 dtSearch Corp. All Rights Reserved.  /  Terms of use  /  Privacy