How to sort search results after a search
The default sort order of search results depends on whether the maximum size of search results is limited, and the type of selection being done.
If SearchJob.MaxFilesToRetrieve is zero, then all documents found will be included in search results, with no sorting.
If MaxFilesToRetrieve is greater than zero, then only the best-matching files are included in search results. Search results will be sorted either by relevance or, if the flag dtsSearchSelectMostRecent is set, by modification date.
For information on how dtSearch computes relevance, and for weighting and other options to control this computation, please see: Relevance.
To sort results in a different order, use the Sort() method of the SearchResults object.
Sort() two arguments: flags and field. Flags is a combination of SortFlags values that specifies the criteria for sorting. The field is a string that is used when flags includes dtsSortByField. A field used for sorting must be designated as a stored field during indexing so the value will be available for sorting.
To sort search results by a combination of values, use SearchResults.SetSortKey() to set the sort key for each item in search results, and then call SearchResults.Sort(dtsSortBySortKey, "") to sort by the sort key.
For example, suppose you want to sort file date and then by filename. For each item in search results, you would generate a string combining these two values (example: "2005-07-22 Filename.doc"), and call SearchResults.SetSortKey to assign a generated sort key to each item. Example (vbscript):
Once every item in search results has been assigned a key, you can call sort(dtsSortKeySortkey). For a complete example, see this sample included with the dtSearch Engine:
C:\Program Files\dtSearch Developer\examples\vbs\sort.vbs
When sorting by something other than hits or relevance, it is important to keep in mind the difference between sorting and selection. In a search, dtSearch first selects the most relevant documents, using hit count, relevance score, or date to compare documents. After this is done, the results can be sorted by filename, file size, etc.
The difference between sorting and selection becomes significant when you are displaying search results in pages.
For example, suppose you have a web searching application that displays search results in pages of 10 items. To implement this, on each page there is a "Next page" link that points back to the searching script and that repeats the search, passing a variable that indicates which page of search results should be displayed. For the first search, the script sets maxFilesToRetrieve = 10 and displays the first 10 hits. For the second page, the script sets maxFilesToRetrieve=20 and displays the next 10 hits. As long as results are being sorted by hit count or relevance, this will work, because the criteria used to select items in the search is the same as the one used to sort items after the search.
Now suppose you try the same approach in a search that is sorted by filename. When you click "Next Page" to get the second page of hits, you will see results that may overlap with the first page. This is because the first page contains the 10 most relevant files, sorted by filename, while the second page contains the 20 most relevant files, sorted by filename. Items 11-20 in the second search results set may contain items that were reported in the first results set. For example, suppose the top-ranked document in search results is named "zzz.doc". It will appear in both sets of search results, and when sorted by filename, it will appear at the end of the list. This means it will appear as item #10 when you display the top 10 documents, sorted by filename, and it will appear as item #20 when you display items 11-20 from the top 20 documents, sorted by filename.
To avoid this problem, MaxFilesToRetrieve has to be much larger than the page size, so each page of search results will display a different range of items from the same sorted list. If instead of setting MaxFilesToRetrieve to 10 for the first page, and 20 for the next page, you set it to 200 for every page, and just report a different range of items, then each page will be consistent.