Links
dtSearch Text Retrieval Engine -- Java API
IndexJob Class
Classes | Legend | Members | Methods

The IndexJob object handles creation and updating of dtSearch indexes.

Class Hierarchy
public class IndexJob;
File

IndexJob.java

Remarks

To create or update an index, make an IndexJob, use the properties to describe the indexing task you want the engine to perform, and call the Execute method. IndexJob provides two ways to specify the text to be indexed: (1) the setIncludeFilters, setExcludeFilters, and setFoldersToIndex members let you specify directories and filename filters identifying a set of disk files to index, and (2) the DataSourceToIndex member lets you supply text data directly to the dtSearch Engine for indexing. dataSourceToIndex is useful for indexing data from non-file sources such as message stores, SQL databases, dynamically-generated data, or any non-file data accessible to your program.

Actions

The setActionXXX flags specify the actions you want the engine to perform. If more than one action is specified, the engine will perform the actions in the following order: create, removeDeleted, removeListed, add, compress, merge, verify. 

A brief summary of the meaning of each actionXXX flag is given below. For more information, see "Building and Maintaining Indexes" in the Overviews section. 

Add: Add documents to an existing index. 

Compress: Remove obsolete information from the index. 

Create: Create a new index. If an index already exists in the specified directory, the index will be destroyed and replaced with a new, empty index. 

RemoveDeleted: Check that each file in the index still exists on disk and remove from the index any files that no longer exist. 

RemoveListed: Remove the files listed in ToRemoveList from the index. 

Merge: Merge one or more indexes into the target index. Use setIndexPath to specify the location of the target index, and setIndexesToMerge to specify the location of the indexes to merge. Merging indexes combines two or more indexes into a single index, which contains any document that was in any of the merged indexes. If the same document appears in more than one of the merged indexes, only the most recent document will appear in the merged index. 

Verify: To verify an index, dtSearch scans all structures in the index and performs many detailed checks to validate all data in the index for consistency. 

Related Articles

Building and Maintaining Indexes in Overviews 

Database and Field Searching in Overviews (covers indexing databases)

Group
Methods
Method 
Description 
Performs the indexing job and returns 0 if the search is successful or -1 if an error occured. 
After an indexing job is done, the Errors property will contain a JobErrorInfo object with any error messages generated during the indexing job. 
Returns a table of values describing the index in IndexPath. 
Returns a number identifying the current indexing step in progress. The value will be one of the following:

  • ixStepBegin (1) Indexing started
  • ixStepCreatingIndex (2) Creating index
  • ixStepCheckingFiles (3) Checking files to see which files need to be reindexed
  • ixStepReadingFiles (4) Reading files
  • ixStepStoringWords (5) Storing word references in the index
  • ixStepMerging (6) Merging words into the index
  • ixStepCompressing (7) Compressing the index to remove obsolete information
  • ixStepMergingIndexes (9) Merging two or more indexes into a single index (if ActionMerge=true)
  • ixStepVerifyingIndex(10) The index is being verified (if ActionVerify=true)
  • ixStepDone (8) Done indexing

 

Returns the percentage of the index job that has been completed. This value can be used during a callback through the StatusHandler property. 
Index will preserve accents when indexing words. Otherwise, accents are stripped from words being indexed. Stripping of accents is done using information in the dtSearch alphabet file. 
Index will treat words with different capitalization as different words. (apple and Apple would be two different words.) 
Use relative rather than absolute paths in storing document locations. 
The dataSourceToIndex property provides a way to supply text to be indexed to the dtSearch Engine when the text is not accessible as a disk file. You can use this to index databases or other non-file data. 

A file will be indexed if it matches one of the IncludeFilters and does not match any of the ExcludeFilters. Each string can contain one or more filename filters, separated by spaces, to apply to files in the directories selected. (ExcludeFilters can be blank.) If a filename filter contains a space, put it in quotation marks. A filename filter that does not contain a backslash is compared to the name of each file. A filename filter that contains a backslash is compared to the fully-qualified pathname of each file.
Example:  
Use setFoldersToIndex to list names of the folders to be indexed, with a space between each folder name.

  • If a folder name contains a space, it should be quoted. A <+> at the end of a folder name indicates that
  • subfolders should be included.



Example:  

List of indexes to merge into the target index (IndexPath). ActionMerge must be true for the merge to occur.

  • For each index, provide the full path to the index. If more than one index path is provided, separate the
  • paths with spaces. If a path contains a space, use quotation marks around the path. Example:
    Example:  
IndexingFlags values controlling the indexing of documents 
Name of the index 
The directory where the index will be stored. The index will consist of a set of files named INDEX_*.IX. 
If non-zero, the first doc id to assign to documents in this index. 
The statusHandler is an object that will receive status updates from the engine during indexing. This can be useful to give the user a status display (such as a progress bar) and an opportunity to cancel an indexing job in progress. The statusHandler object must implement a CheckForAbort method that returns int. If it returns 0, the index job continues. If it returns 1, the index job is halted after data indexed so far is saved. If it returns 2, the index job is halted immediately. On each callback, the value of getStatusPercentDone can be checked to update a progress... more 
Used to specify that the text in certain fields should be collected, stored in the index, and returned in searches. To specify the fields to be stored, set StoredFields to a space-delimited list of field names (quote any field names that contain spaces). The field names in the list can contain wildcards (* and ?). A set containing a single entry "*" would match all fields, causing the text of every field to be stored in the index. 
Set TimeoutSeconds to the maximum amount of time you want to permit the index job to run. The default value, zero, allows the index job to continue until cancelled or complete. 
Name of file containing list of files to remove from the index. 
Legend
 
Method 
Links
You are here: Classes > IndexJob Class
Copyright (c) 1998-2008 dtSearch Corp. All rights reserved.