You are here: Symbol Reference > dtSearch.Engine Namespace > Interfaces > DataSource Interface
Close
dtSearch .NET Standard API 2024.02
DataSource Interface

Interface for the dataSourceToIndex member of IndexJob, for indexing non-file data sources such as databases.

dtSearch.Engine.DataSource
public interface DataSource;
Overview

An IndexJob provides two ways specify the text to index: by files (the FoldersToIndex, IncludeFilters, and ExcludeFilters properties) and by data source (the DataSourceToIndex property). Most commonly, the text exists in disk files, in which case you would specify the files to be indexed using folder names and include and exclude filters. 

In some situations, however, the text to be indexed may not be readily available as disk files. For example, the text may exist as rows in a remote SQL database or in Microsoft Exchange message stores. To supply this text to the dtSearch indexing engine, you can create an object that accesses the text and then attach the object to an IndexJob as the DataSourceToIndex property.

GetNextDoc

The dtSearch Engine will call the GetNextDoc method of your DataSource implementation to obtain documents to index. On each call, dtSearch will use the properties supplied (DocName, DocModifiedDate, DocFields, DocBytes, etc.) to set up a document object to index. 

On each call to GetNextDoc, the DocTypeId, DocId, and DocWordCount properties will be filled in with the results of the previous document indexed. This enables the calling application to know the file type and document id assigned to each document after it has been indexed. (The document id is a unique integer identifying each document in an index, and can be used in SearchFilter objects to limit searches to a subset of the documents in the index.) 

If the IndexingFlags.dtsAlwaysAdd flag is not set in the IndexJob, the DocName and DocModifiedDate will be used to determine whether the document is already in the index with the same date, and, if so, the document will not be reindexed. In this case, the DocTypeId, DocId, and DocWordCount properties will be set to the values assigned when the document was originally indexed. 

When using the multithreaded DataSource API, the indexer will index all documents returned from GetNextDoc even if they have not changed since the last time they were indexed, so to prevent redundant indexing, the indexing application should only return new or modified documents from the DataSource. 

The IncludeFilters and ExcludeFilters in IndexJob do not apply to content returned from a data source.

Fields

The DocFields property lets you add meta-data to the document text. Fields can be searchable or non-searchable, and can be designated as "stored" so they will be returned as document properties in search results (for example, to store a row id for easy access after a search). Field names can also include nesting, so instead of just "Author" or "Subject" you could use "Meta/Author" and "Meta/Subject".

Plain text

The DocText property can be used to add plain-text content to the document. DocText is assumed to be text only, so if it contains text-like data such as RTF, HTML, or MIME-encoded email, the tags will be indexed as plain text rather than interpreted as RTF, HTML, or MIME.

See also

Overview - Indexing Databases in dtSearchApiRef.chm

Copyright (c) 1998-2023 dtSearch Corp. All rights reserved.