DataSource Interface

Interface for the dataSourceToIndex member of IndexJob, for indexing non-file data sources such as databases.

dtSearch.Engine.DataSource

public interface DataSource;

Remarks

Overview

An IndexJob provides two ways specify the text to index: by files (the FoldersToIndex, IncludeFilters, and ExcludeFilters properties) and by data source (the DataSourceToIndex property). Most commonly, the text exists in disk files, in which case you would specify the files to be indexed using folder names and include and exclude filters.

In some situations, however, the text to be indexed may not be readily available as disk files. For example, the text may exist as rows in a remote SQL database or in Microsoft Exchange message stores. To supply this text to the dtSearch indexing engine, you can create an object that accesses the text and then attach the object to an IndexJob as the DataSourceToIndex property.

GetNextDoc

The dtSearch Engine will call the GetNextDoc method of your DataSource implementation to obtain documents to index. On each call, dtSearch will use the properties supplied (DocName, DocModifiedDate, DocFields, DocBytes, etc.) to set up a document object to index.

On each call to GetNextDoc, the DocTypeId, DocId, and DocWordCount properties will be filled in with the results of the previous document indexed. This enables the calling application to know the file type and document id assigned to each document after it has been indexed. (The document id is a unique integer identifying each document in an index, and can be used in SearchFilter objects to limit searches to a subset of the documents in the index.)

If the IndexingFlags.dtsAlwaysAdd flag is not set in the IndexJob, the DocName and DocModifiedDate will be used to determine whether the document is already in the index with the same date, and, if so, the document will not be reindexed. In this case, the DocTypeId, DocId, and DocWordCount properties will be set to the values assigned when the document was originally indexed.

When using the multithreaded DataSource API, the indexer will index all documents returned from GetNextDoc even if they have not changed since the last time they were indexed, so to prevent redundant indexing, the indexing application should only return new or modified documents from the DataSource.

The IncludeFilters and ExcludeFilters in IndexJob do not apply to content returned from a data source.

Fields

The DocFields property lets you add meta-data to the document text. Fields can be searchable or non-searchable, and can be designated as "stored" so they will be returned as document properties in search results (for example, to store a row id for easy access after a search). Field names can also include nesting, so instead of just "Author" or "Subject" you could use "Meta/Author" and "Meta/Subject".

Plain text

The DocText property can be used to add plain-text content to the document. DocText is assumed to be text only, so if it contains text-like data such as RTF, HTML, or MIME-encoded email, the tags will be indexed as plain text rather than interpreted as RTF, HTML, or MIME.

See also

Overview - Indexing Databases in dtSearchApiRef.chm

Topics

Topic	Description
DataSource Members	The following tables list the members exposed by DataSource.
DataSource Methods	The methods of the DataSource class are listed here.
DataSource Properties	The properties of the DataSource class are listed here.

DataSource Methods

DataSource Methods	Description
GetNextDoc	Get the next document from the data source.
Rewind	Initialize the data source so the next GetNextDoc call will return the first document.

DataSource Properties

DataSource Properties	Description
DocBytes	Use DocBytes to provide an array of bytes for dtSearch to use as the binary contents of this document.
DocCreatedDate	The date that the document was originally created.
DocDisplayName	The DocDisplayName is a user-friendly version of the filename, which the dtSearch end-user product displays in search results.
DocError	If WasDocError is true, DocError will contain a string providing details on the nature of the error.
DocFields	In DocFields, supply any fielded data you want the dtSearch Engine to index.
DocId	Each time GetNextDoc() is called, DocId will contain the doc id of the previous document.
DocIsFile	If True, DocName will be interpreted as the name of a file to be indexed, and dtSearch will index the contents of the file along with any data provided in DocText and DocFields.
DocModifiedDate	The date that the document was last modified.
DocName	The DocName is the name of the document, as you want it to appear in search results.
DocStream	Use DocStream to provide access to binary document data for this document in the data source.
DocText	In DocText, supply the text you want the dtSearch Engine to index.
DocTypeId	Each time GetNextDoc() is called, DocTypeId will return an integer identifying the file type of the previous document.
DocWordCount	Each time GetNextDoc() is called, DocWordCount will contain the number of words in the previous document.
WasDocError	Each time GetNextDoc() is called, WasDocError will be true if there was an error processing the previous document (such as a file parsing error)