Multithreaded operations

Article: dts0233

Applies to: dtSearch Engine

The dtSearch Engine supports multithreaded indexing, searching, and file conversion. As a general rule, any API function can be called from any thread at any time. However, access to API objects such as SearchResults, IndexJob, FileConvertJob, and SearchResults is not synchronized, so a single API object should not be accessed from more than one thread at a time.

Multithreaded Indexing

dtSearch includes support for using multiple threads to build a single index, greatly improving indexing speed on 64-bit Windows and Linux systems. At least 16 Gb of memory is required. (For example, on an i9-12900K Windows computer with 64 Gb of memory, indexing 97 Gb of documents was about 6 times faster with the multithreaded indexer enabled.)

In dtSearch Desktop/Network, click Options > Preferences > Indexing Resources to enable this option.

Using the dtSearch Engine API, set the new flag dtsIndexMultithread in IndexJob. For sample code, see the FolderDataSource sample application in the examples\NetStd folder.

For best performance, this feature should be used in combination with an external scalable allocator such as the Intel TBB library. For more information see: External allocator integration.

Concurrent index access

Any number of SearchJobs can target the same index, or different indexes, concurrently in different threads or in different processes. Searching is done without any need for file or record locks, so aside from the need to share CPU and other hardware resources, one search user has no effect on another concurrent search user.

The rules for concurrent access to indexes are: users can share access to an index while searching, even if an index is being updated, merged, or compressed, but only one user at a time can have write access to an index. Therefore, when an index is being updated or compressed, no other users can modify that index. A "user" is a single process or thread. In a multi-threaded application, only one thread at a time can have write access to an index.

A user who starts a search while an index is being updated will see the index as it existed before the update started. Once an index update commits, searches that begin after the commit will see changes made to the index by that update. If an update is terminated abnormally, the index reverts to its state as of the last successful commit. Searches will remain unaffected by the failed update. At the start of the next index update, any invalid data added by the failed update will be removed before new data is added. For information on resource contention issues when running multiple concurrent indexing operations, please see Optimizing indexing of large document collections.

Options

Option settings are maintained separately for each thread, so changes to options in one thread will not affect a job in progress on another thread. When a new thread is started, it will inherit the most recent option settings from other threads in the process.