You are here: C++ API > Structs, Records, Enums > dtsOnIndexWordInfo Structure
dtSearch Text Retrieval Engine Programmer's Reference
dtsOnIndexWordInfo Structure

Used to modify words, or create additional words, as text is indexed.

File: dtsearch.h

struct dtsOnIndexWordInfo { const char * originalWord; long positionInFile; long iWhichCallback; long fShouldIndex; long nAdditionalWords; char wordToIndex[512]; };
const char * originalWord;
The original text of the word being indexed. Read-only.
long positionInFile;
The word offset (one-based) of the word being indexed. Read-only.
long iWhichCallback;
The first time the callback is called for a word, iWhichCallback will be zero. If nAdditionalWords is returned to request additional callbacks, iWhichCallback will be incremented once for each callback
long fShouldIndex;
The callback function can set fShouldIndex to false to prevent a word from being indexed. The dtSearch Indexer will skip the word as if it was a noise word
long nAdditionalWords;
If you want to provide additional variations on the word, in the same word position in the file, set nAdditionalWords to the number of variations to be added. The callback function will be called once for each variation. On each callback, the function should fill in an alternative form of the word in wordToIndex.
char wordToIndex[512];
wordToIndex can be used on the first callback to modify the text of the word to be indexed, and on additional callbacks it can be used to provide alternative forms of a word (alternative spellings, synonyms, equivalents in other languages, etc.) to be indexed in the same word position.

The dtsOnIndexWordInfo structure is used with the pOnIndexWordFn callback in dtsIndexJob to provide a way to modify the words in a document as they are being indexed. Some possible uses of this callback include: (1) Customizing the character handling for certain characters; (2) Excluding certain words from being indexed; or (3) adding variations on a word. 

Example (from the textdemo sample application): 


// Demonstrates how the OnIndexWord callback function can be used to // modify text being indexed. In this case, each word is indexed along with an // inverted version of the word. void OnIndexWord_Inverter(void *pData, dtsOnIndexWordInfo* pInfo) { // First time we are called for a word, just request another callback if (pInfo->iWhichCallback == 0) { pInfo->nAdditionalWords = 1; return; } // Second time we are called for a word, return the inverted version if (pInfo->iWhichCallback == 1) { int l = strlen(pInfo->originalWord); if (l >= sizeof pInfo->wordToIndex) l = sizeof pInfo->wordToIndex-1; for (int i = 0; i < l; ++i) pInfo->wordToIndex[i] = pInfo->originalWord[l-i-1]; pInfo->wordToIndex[l] = '0'; } }
Copyright (c) 1995-2023 dtSearch Corp. All rights reserved.