Interface for the dataSourceToIndex member of IndexJob, for indexing non-file data sources such as databases.
public interface DataSource;
DataSource.java
An IndexJob provides two ways specify the text you want to index: by files (the toAdd* properties) and by data source (the DataSourceToIndex property). Most commonly, the text exists in disk files, in which case you would specify the files to be indexed using folder names and include and exclude filters. In some situations, however, the text to be indexed may not be readily available as disk files. The text may exist as rows in a remote SQL database or in Microsoft Exchange message stores. You could copy the text from the database to local disk files and index the local disk files, but the dtSearch Engine provides a more direct and efficient solution. To supply this text to the dtSearch indexing engine, you create an object that accesses the text and then attach the object to an IndexJob using setDataSourceToIndex.
A data source is an object that implements the methods in the DataSource or DataSource2 interfaces (DataSource2 provides some extended capabilities).
The dtSearch Engine will call the getNextDoc method of your DataSource implementation to obtain documents to index. On each call, dtSearch will use the properties supplied (getDocName, getDocModifiedDate, getDocFields) to set up a document object to index. To index BLOB data in fields, use setDocBytes in the DataSource2 interface.
By default, field names are searchable along with field text. For example, if DocFields contains SampleField<TAB>Some Text, then you can find the document in a search either for "SampleField contains Text" or just "SampleField". To prevent a field name from being searchable, add * (asterisk) in front, like this:
*SampleField<TAB>Some Text
When a field name begins with *, only the text of the field is searchable, but not the name. Therefore, you can find the document in a search for "SampleField contains Text" but not by searching for just "SampleField". The * is not considered part of the field name for purposes of searching or designating stored fields.
When a field name begins with **, the field is considered a "hidden stored" field. The contents of a hidden stored field are not searchable at all, and are automatically stored in the index as document properties when the document is indexed.
Field names can include nesting. Example:
Meta/Subject<TAB> This is the subject<TAB>Meta/Author<TAB> This is the author
In this example, you could search across both fields by searching for "Meta contains (something)", or you could search for "Author contains (something)", or you could search for "Meta/Author contains (something)" to distinguish this "author" field from any other "author" fields that might be present in the document.
|
Method |
Description |
|
The date that the document was originally created. | |
|
The DocDisplayName is a user-friendly version of the filename, which the dtSearch end-user product displays in search results. | |
|
In DocFields, supply any fielded data you want the dtSearch Engine to index. | |
|
If True, DocName will be interpreted as the name of a file to be indexed, and dtSearch will index the contents of the file along with any data provided in DocText and DocFields. | |
|
The date that the document was last modified. | |
|
The DocName is the name of the document, as you want it to appear in search results. | |
|
In DocText, supply the text you want the dtSearch Engine to index. | |
|
Get the next document from the data source. | |
|
Initialize the data source so that the next GetNextDoc call will return the first document. |
|
Method |
|
Copyright (c) 1998-2012 dtSearch Corp. All rights reserved.
|