Close
dtSearch Text Retrieval Engine Programmer's Reference
Document Ids

Each document in an index is assigned a unique integer identifier called a document id or "DocId".

Each document in an index is assigned a unique integer identifier called a document id or "DocId". The first document added to an index has the DocId 1, and subsequent documents will have sequentially numbered DocIds 2, 3, 4, and so forth. When a document is reindexed, its DocId is "cancelled" and a new DocId is assigned. 

Compressing an index renumbers all of the DocIds, so after an index has been compressed, a document's DocId may change. 

When an index is created, you can specify a starting DocId other than 1 using IndexJob.StartingDocId. This makes it possible to ensure that indexes have non-overlapping ranges of DocIds so DocIds can be preserved after the indexes are merged. 

 

DocIds are provided along with other document properties in search results. 

DocIds are used in:

  • SearchFilter objects, which use DocIds to identify the documents selected in the filter.
  • SearchResults.AddDoc, which lets you add a document to a SearchResults list by providing the index path and the DocId. You can use this to get the properties of the document from its DocId.
  • The IndexJob.ToRemoveListName list, which lets you remove documents from an index either by name or by DocId.

 

Each DocId value in an index requires one bit in a SearchFilter, so a SearchFilter for an index with DocIds ranging from 1 to 1,000,000 would require 125,000 bytes of memory. Note that this means there is a performance cost to using unnecessarily high values for IndexJob.StartingDocId.

How to get the DocId for a document

Searching The DocId for any document can be obtained from search results, as follows: 

 

Visual Basic: DocDetailItem("_docid") 

C++: dtsSearchResultsItem.docId 

Java: getDocDetailItem("_docid") 

.NET: SearchResultsItem.DocId 

 

Indexing When a document is indexed using the data source API, the data source will receive a notification of the DocId assigned to that document. (Note: this notification is only provided for documents that are not containers, such as ZIP files.) 

Additionally, in the C++ and .NET interfaces, the indexing status notification for each document includes the DocId. For more information, see dtsIndexProgressInfo (C++) and IndexFileInfo (.NET). 

 

When do DocIds change

When a document is added to an index, it is assigned a DocId, and DocIds are always numbered sequentially. 

When a document is reindexed, the old DocId is cancelled and a new DocId is assigned. 

When an index is compressed, all DocIds in the index are renumbered to remove the cancelled DocIds unless the dtsIndexKeepExistingDocIds flag is set in IndexJob. 

When an index is merged into another index, DocIds in the target index are never changed. The documents merged into the target index will all be assigned new, sequentially-numbered DocIds, unless (a) the dtsIndexKeepExistingDocIds flag is set in IndexJob and (b) the indexes have non-overlapping ranges of doc ids.