You are here: Overviews > Relevance
dtSearch Text Retrieval Engine Programmer's Reference

dtSearch provides several mechanisms to control how documents are scored for relevance during a search.

Relevance affects both the selection of items to include in search results, and the sorting of search results after a search. Selection occurs when SearchJob.MaxFilesToRetrieve limits the number of documents to be returned after a search. For example, if MaxFilesToRetrieve is 10, only the 10 most relevant documents will be returned. Once a set of documents is returned in SearchResults, you can then sort the SearchResults list by other criteria such as filename.

Sorting by hit count

If no term weighting, field weighting, or positional scoring is applied, relevance is determined by simply counting hits. Every word that matches the search request counts as one hit. Each word in a phrase counts separately, so "first class mail" would be three hits.

Automatic relevance scoring

For relevancy-ranked searches, dtSearch uses a "vector-space" algorithm to calculate a score for each document that takes into account the relative rarity of the search terms and their density in the retrieved file. Infrequent terms count more heavily than common terms, and N hits in a short document count more heavily than N hits in a long document. 

An additional positional scoring mechanism increases the score when hits occur close to each other or close to the top of the file. With positional scoring, hits that are close together and at the top of the file count much more heavily than other hits. 

In the dtSearch Engine API, two flags in SearchJob control these types of scoring: use dtsSearchAutoTermWeight to enable the vector-space scoring, and dtsSearchPositionalScoring to enable the positional scoring. The two options can be combined, and using both is recommended.

Scaling of relevance scores

The percentage shown in search results is, for each document, score for that document as a percentage of the highest-scoring document in the search results list. (The best-matching document always has a score of 100%.)

Term and Field Weighting

In addition to automatic relevance scoring, dtSearch provides options to apply weights to specific terms or fields. These can be combined with positional scoring and automatic term weighting. 

1. Applying weights to search terms 

To apply a weight to a search term, add : and the numerical weight in the search request, like this:

apple:1 and pear:5

This would count hits on "pear" five times as much as hits on "apple". 

(2) Applying weights to fields. 

A weight can also apply to everything inside a "contains" expression for a field search, like this:

subject:5 contains (apple and pear)

This is the same as:

subject contains (apple:5 and pear:5)

(3) Specifying the weight for hits that occur in fields 

Another way to specify weights is to weight hits according to which field they occur in, even if the search request does not specify a field. For example, you might want to count hits in the HTML <TITLE> or a "Subject" field to count more than other hits. To specify this type of weighting, use SearchJob.FieldWeights to list the field weights, and enter the search request without any weighting. FieldWeights is a list of field names separated by commas, with a weight following each name. Example:

searchJob.FieldWeights = "HtmlTitle:20,Subject:10" searchJob.Request = "apple"

In this example, the search would return documents containing "apple" anywhere, but hits in the HTML <TITLE> would count 20 times as much as other hits, and hits in the Subject field would count 10 times as much as other hits. 

This type of weighting can be used in combination with page metatags to promote specific pages to the top of search results for a list of keywords. For example, if you have a web page that should be ranked higher in a search for "apple" or "pear", you could add a metatag to the page with a "Promote" field, like this:

<meta name="Promote" content="apple pear">

Then if you set up a SearchJob with FieldWeights = "Promote:1000", then any hits in the "Promote" field will count 1000 times as much as other hits. 

Document Promotion 

The dtSearch Engine API has several ways to promote certain documents to the top of search results lists: 

(1) Unconditionally promoting documents. If the promoted documents should always appear ahead of all other documents, you can put them in a separate index and search this index first, report documents from this index, and then search other lower-priority indexes. 

(2) Promotion by keyword. If the promoted documents should appear ahead of other documents only based on certain keywords, you can add these keywords to a field in the promoted documents and use field weighting to weight hits in this field very highly. For 

information on adding metadata to documents during indexing, see Adding Fields to Documents

(3) Promotion by document weighting. If the promoted documents should be given additional weight but should not necessarily go ahead of all other documents, you can add a keyword to a field in the documents and then add this keyword as a search criterion to all searches, with a higher weight given to hits in the field. Use the andany connector to add this term to a request without affecting document selection. For example, you could name the field "promotefield" and put a keyword xpromote in the field when you index the documents. Then in a search you could use term weighting to add weight to the promoted term, like this: 

(user request) andany (promotefield contains xpromote:100)

Copyright (c) 1995-2022 dtSearch Corp. All rights reserved.