Enumerable Fields

How to use "enumerable" fields to quickly enumerate document properties for documents retrieved in a search.

Remarks

Stored Fields

All fields in documents are automatically searchable once you have indexed the documents. There is no limit on the amount of field data that dtSearch can search in each file, and no limit on the size of each field.

In some situations it is useful not only to be able to search for the contents of fields, but to have the contents of one or more fields returned along with other document properties in search results. dtSearch can return fields in search results if they are designated when the document is indexed as "stored" fields. See Retrieving Fields in Search Results for more information on stored fields.

Quickly Enumerating Stored Fields

While stored fields are a good way to add field values to search results, in some cases an application may need a way to quickly enumerate and aggregate the field values for all documents in search results. For example, when implementing faceted search, the application will need count of the number of documents with each field value. To generate this, you could collect and aggregate the values from every document retrieved in the search, but that could get slow if the number of documents is large.

The enumerable fields feature in the dtSearch Engine provides a faster way to get this type of information. When you index the documents, you designate in the IndexJob which fields should be enumerable, using IndexJob.EnumerableFields. Then after a search, you can use the WordListBuilder object to quickly enumerate document properties for documents retrieved in a search. This feature can be used to implement a faceted search interface, in which search results summarize values and document counts for one or more categories of document metadata after a search.

Indexing

When documents are indexed, set IndexJob.EnumerableFields to a comma-separated list of field names. EnumerableFields are fields whose values will be stored in the index in a way that permits fast enumeration. All EnumerableFields are also StoredFields. The EnumerableFields setting has no effect on document retrieval.

Searching

(1) In SearchJob, set WantResultsAsFilter=true so a SearchFilter will be returned along with the SearchResults. The SearchFilter will be a bit vector identifying all of the documents that match the search request. (If you are executing a search specifically to create a SearchFilter and do not need SearchResults from the search, set the flag dtsSearchFastSearchFilterOnly in SearchJob to improve search speed.)

(2) To enumerate the values of a field for all documents retrieved in the search, create a WordListBuilder object and set it up as follows

- Set IndexPath to the index that was searched.

- Call WordListBuilder.SetFilter() to limit the values returned to the documents that were returned from the search.

- Call WordListBuilder.ListFieldValues() with the field name to enumerate.

A sample application demonstrating EnumerableFields is installed here:

C:\Program Files\dtSearch Developer\examples\cs2\ListFields

To use the sample,

(1) Build an index of some faceted data with one or more fields listed in EnumerableFields (enter * under EnumerableFields to make all fields enumerable)

(2) Use the Search and Browse Fields dialogs to search and browse field values within the results of a search.

Multivalue Fields

If you set the flag dtsIndexTokenizeEnumerableFields in IndexJob.IndexingFlags, dtsearch will use Options.StoredFieldDelimiterChar to tokenize enumerable fields into separate values when indexing. For example, if StoredFieldDelimiterChar is '|' and the SampleField contains the value "First|Second|Third", the indexer will create three separate enumerable field instances for SampleField in the document, "First", "Second", and "Third", instead of a single instance containing "First|Second|Third". Because StoredFieldDelimiterChar is used to delimit multiple instances of the same field in a document, the same result will occur if the document contains three separate instances of SampleField, containing "First", "Second", and "Third".

Group

Databases and Fields