Interface for the dataSourceToIndex member of IndexJob, for indexing non-file data sources such as databases.
An IndexJob provides two ways specify the text to index: by files (the FoldersToIndex, Include Filters, and ExcludeFilters properties) and by data source (the DataSourceToIndex property). Most commonly, the text exists in disk files, in which case you would specify the files to be indexed using folder names and include and exclude filters.
In some situations, however, the text to be indexed may not be readily available as disk files. For example, the text may exist as rows in a remote SQL database or in Microsoft Exchange message stores. To supply this text to the dtSearch indexing engine, you can create an object that accesses the text and then attach the object to an IndexJob as the DataSourceToIndex property.
The dtSearch Engine will call the GetNextDoc method of your DataSource implementation to obtain documents to index. On each call, dtSearch will use the properties supplied (DocName, DocModifiedDate, DocFields, DocBytes, etc.) to set up a document object to index.
On each call to GetNextDoc, the DocTypeId, DocId, and DocWordCount properties will be filled in with the results of the previous document indexed. This enables the calling application to know the file type and document id assigned to each document after it has been indexed. (The document id is a unique integer identifying each document in an index, and can be used in SearchFilter objects to limit searches to a subset of the documents in the index.)
If the IndexingFlags.dtsAlwaysAdd flag is not set in the IndexJob, the DocName and DocModifiedDate will be used to determine whether the document is already in the index with the same date, and, if so, the document will not be reindexed. In this case, the DocTypeId, DocId, and DocWordCount properties will be set to the values assigned when the document was originally indexed.
DocName should preferably include the extension of the original file, because the extension provides useful information about the original document format in some cases.
Note: The IncludeFilters and ExcludeFilters in IndexJob do not apply to content returned from a data source.
The DocFields property lets you add meta-data to the document text. Fields can be searchable or non-searchable, and can be designated as "stored" so they will be returned as document properties in search results (for example, to store a row id for easy access after a search). Field names can also include nesting, so instead of just "Author" or "Subject" you could use "Meta/Author" and "Meta/Subject".
For sample code demonstrating the DataSource API, see:
C:\Program Files\dtSearch Developer\examples\cs2\ado_demo (C# sample) and
C:\Program Files\dtSearch Developer\examples\vb.net2\ado_demo (VB.NET sample)
Indexing Databases in dtSearchApiRef.chm