List words, fields, or filenames in an index to a file or to a memory buffer
File: dtsearch.h
Members |
Description |
---|---|
long outputStringMaxSize; |
Maximum size of the output string to generate |
long listFlags; |
ListIndexFlags specifying the type of list to generate |
long searchFlags; |
SearchFlags specifying search features to be used in matching the toMatch expression, such as fuzziness, stemming, etc |
long fuzziness; |
If the dtsSearchFuzzy flag is set in searchFlags, set the fuzziness value from 1 to 10 to specify the level of fuzzy searching to apply. |
const char * toMatch; |
toMatch is an optional search expression specifying the text to match against items being listed. For example, to list all field names starting with "A", you would set listFlags to dtsListIndexFields, and set toMatch to "A*". |
const char * indexPath; |
Location of the index to list. |
const char * outputFile; |
Name of file to create (if the dtsListIndexReturnString flag is not set |
dtsStringHandle outputString; |
If the dtsListIndexReturnString flag is set, the list will be returned through a dtsStringHandle. The dtsStringHandle must be released by the caller. |
long fOutputStringWasTruncated; |
If true, the output was halted due to outputStringMaxSize before all items were listed |
You can use ListIndexFlags to specify the type of information included in the output.
When listing words, if dtsListIndexIncludeField is not set, then multiple instances of a word in different fields will be aggregated. For example, if "smith" occurs once in the "author" field and once in the "subject" field, that will result in a document count of 2 and a hit count of 2. In this case, the document count may not reflect the possibility that the two instances occur in the same document. To prevent this type of inaccuracy in the output, the dtsListIndexIncludeField flag can be used to distinguish instances in different fields.
For speed, ListIndexJob does not actually enumerate the references for each word and instead relies on counts incrementally stored in the index. Therefore, the reported counts may include artifacts of the indexing process such as reindexed or removed documents, so the counts may be higher than the actual count of references in the index. Compressing an index will remove these extra references.