DataSource2 provides a way for data source indexing applications to obtain information about each document that is indexed, and provides better support for indexing binary data than DataSource.
File
File: DataSource2.java
Package: com.dtsearch.engine
Syntax
Methods
Method |
Description |
---|---|
Each time getNextDoc() is called, wasDocError will return true if there was an error processing the previous document, such as a file parsing error, and getDocError() will return the error message. | |
Each time getNextDoc() is called, getDocId() will return the doc id of the last document indexed. A doc id can be used to identify a document in a SearchFilter. | |
Each time getNextDoc() is called, getDocTypeId() will return an integer identifying the file type of the last document indexed | |
Each time getNextDoc() is called, getDocWordCount() will return the number of words in the last document indexed | |
| |
| |
Use setDocBytes to provide an array of bytes for dtSearch to use as the binary contents of this document. To tell dtSearch to check for an array of bytes, the data source must return true from haveDocBytes(). The calling program should call setDocBytes to provide the binary contents of a file to be indexed before returning from getNextDoc(). While getDocText() can only return a stream of plain text, setDocBytes can return any type of binary data, such as the contents of a Word document or a PDF file. | |
Each time getNextDoc() is called, wasDocError() will return true if there was an error processing the previous document, such as a file parsing error, and getDocError() will return the error message. |
Remarks
If an IndexJob.dataSourceToIndex is based on DataSource2 instead of DataSource, then on each call to getNextDoc, the application can call getWordCount, getDocId, and getTypeId to obtain information on the previously-indexed document.
Additionally, the data source can return binary data (such as document files) using setDocBytes() without the need to create a temporary file.
Class Hierarchy
com.dtsearch.engine.DataSource2