Implementation of a web site indexing spider to be used with dtSearch.Engine.IndexJob.
For a list of all members of this type, see HttpDataSource Members.
System.Object
dtSearch.Engine.DataSource
dtSearch.Spider.HttpDataSource
To use HttpDataSource, call Add() with one or more WebSites specifying the site(s) to crawl, then create a dtSearch.Engine.IndexJob and set its DataSourceToIndex to the HttpDataSource. Call HttpDataSource.StartCrawl() to start the Spider, then execute the IndexJob. For sample code, see the SpiderDemo sample.
Note: Before the first time the Spider is used in a program, SpiderInit::Initialized must be called, and SpiderInit::Terminate must be called before the program exits.
// Make IndexJob
indexJob = new IndexJob();
indexJob.ActionCreate = true;
indexJob.ActionAdd = true;
indexJob.IndexPath = this.IndexPathEdit.Text;
indexJob.IndexingFlags = IndexingFlags.dtsIndexCacheText | IndexingFlags.dtsIndexCacheOriginalFile;
// Make data source to crawl the web sites listed in webSiteList
dataSource = new HttpDataSource();
foreach (WebSite ws in webSiteList)
{ dataSource.Add(ws);
}
// Start the Spider
dataSource.StartCrawl();
// Attach the Spider's DataSource to the IndexJob
indexJob.DataSourceToIndex = dataSource;
// Start indexing. The indexer will repeatedly call dataSource.GetNextDoc() to obtain pages
// to index until dataSource.GetNextDoc() returns false.
indexJob.ExecuteInThread();
Namespace: dtSearch.Spider
Assembly: dtSearch.Spider (in dtSearch.Spider.dll)
HttpDataSource Members | dtSearch.Spider Namespace