List words, fields, or filenames in an index to a file or to a memory buffer
File: ListIndexJob.java
Package: com.dtsearch.engine
Method |
Description |
---|---|
Call execute() to generate the list | |
After execute() returns, use getErrors to access error information. | |
Index to list | |
Name of the file to create. | |
Output can be directed to a string or to a disk file. For string output, use setOutputStringMaxSize to set the maximum size of the output string, and use getOutputString to access the result. | |
Maximum size of the output string to create (0 = no limit) | |
Set to ListIndexFlags values to control what is listed. | |
Set to a value from 1 to 10 to list using fuzzy matching | |
Index to list | |
Name of the file to create. | |
Maximum size of the output string to create (0 = no limit) | |
Output can be directed to a string or to a disk file. For string output, use setOutputStringMaxSize to set the maximum size of the output string, and use getOutputString to access the result. | |
Set to a non-zero value to force the search to halt after a specified time. | |
Optional search expression specifying the text to match against items being listed. |
You can use ListIndexFlags to specify the type of information included in the output.
When listing words, if dtsListIndexIncludeField is not set, then multiple instances of a word in different fields will be aggregated. For example, if "smith" occurs once in the "author" field and once in the "subject" field, that will result in a document count of 2 and a hit count of 2. In this case, the document count may not reflect the possibility that the two instances occur in the same document. To prevent this type of inaccuracy in the output, the dtsListIndexIncludeField flag can be used to distinguish instances in different fields.
For speed, ListIndexJob does not actually enumerate the references for each word and instead relies on counts incrementally stored in the index. Therefore, the reported counts may include artifacts of the indexing process such as reindexed or removed documents, so the counts may be higher than the actual count of references in the index. Compressing an index will remove these extra references.