Used to modify words, or create additional words, as text is indexed.
File
File: dtsearch.h
Syntax
C++
struct dtsOnIndexWordInfo {
const char * originalWord;
long positionInFile;
long iWhichCallback;
long fShouldIndex;
long nAdditionalWords;
char wordToIndex[512];
};
Members
Members |
Description |
---|---|
const char * originalWord; |
The original text of the word being indexed. Read-only. |
long positionInFile; |
The word offset (one-based) of the word being indexed. Read-only. |
long iWhichCallback; |
The first time the callback is called for a word, iWhichCallback will be zero. If nAdditionalWords is returned to request additional callbacks, iWhichCallback will be incremented once for each callback |
long fShouldIndex; |
The callback function can set fShouldIndex to false to prevent a word from being indexed. The dtSearch Indexer will skip the word as if it was a noise word |
long nAdditionalWords; |
If you want to provide additional variations on the word, in the same word position in the file, set nAdditionalWords to the number of variations to be added. The callback function will be called once for each variation. On each callback, the function should fill in an alternative form of the word in wordToIndex. |
char wordToIndex[512]; |
wordToIndex can be used on the first callback to modify the text of the word to be indexed, and on additional callbacks it can be used to provide alternative forms of a word (alternative spellings, synonyms, equivalents in other languages, etc.) to be indexed in the same word position. |
Group
Remarks
The dtsOnIndexWordInfo structure is used with the pOnIndexWordFn callback in dtsIndexJob to provide a way to modify the words in a document as they are being indexed. Some possible uses of this callback include: (1) Customizing the character handling for certain characters; (2) Excluding certain words from being indexed; or (3) adding variations on a word.
Example (from the textdemo sample application):
// Demonstrates how the OnIndexWord callback function can be used to
// modify text being indexed. In this case, each word is indexed along with an
// inverted version of the word.
void OnIndexWord_Inverter(void *pData, dtsOnIndexWordInfo* pInfo)
{ // First time we are called for a word, just request another callback
if (pInfo->iWhichCallback == 0) {
pInfo->nAdditionalWords = 1;
return;
}
// Second time we are called for a word, return the inverted version
if (pInfo->iWhichCallback == 1) {
int l = strlen(pInfo->originalWord);
if (l >= sizeof pInfo->wordToIndex)
l = sizeof pInfo->wordToIndex-1;
for (int i = 0; i < l; ++i)
pInfo->wordToIndex[i] = pInfo->originalWord[l-i-1];
pInfo->wordToIndex[l] = '0';
}
}