dtSearch Text Retrieval Engine .NET interface

HttpDataSource Class

Implementation of a web site indexing spider to be used with dtSearch.Engine.IndexJob.

For a list of all members of this type, see HttpDataSource Members.

System.Object
   dtSearch.Engine.DataSource
      dtSearch.Spider.HttpDataSource

public class HttpDataSource : DataSource, IDisposable

Remarks

To use HttpDataSource, call Add() with one or more WebSites specifying the site(s) to crawl, then create a dtSearch.Engine.IndexJob and set its DataSourceToIndex to the HttpDataSource. Call HttpDataSource.StartCrawl() to start the Spider, then execute the IndexJob. For sample code, see the SpiderDemo sample.

Note: Before the first time the Spider is used in a program, SpiderInit::Initialized must be called, and SpiderInit::Terminate must be called before the program exits.

This example, from the SpiderDemo sample application, demonstrates how to set up the Spider to index a list of web sites. The dtsIndexCacheText and dtsIndexCacheOriginalFile flags are not necessary to index web sites, but using these flags makes hit-highlighting much faster and easier to implement.
// Make IndexJob
indexJob = new IndexJob();
indexJob.ActionCreate = true;
indexJob.ActionAdd = true;
indexJob.IndexPath = this.IndexPathEdit.Text;
indexJob.IndexingFlags = IndexingFlags.dtsIndexCacheText | IndexingFlags.dtsIndexCacheOriginalFile;

// Make data source to crawl the web sites listed in webSiteList
dataSource = new HttpDataSource();
foreach (WebSite ws in webSiteList)
{    dataSource.Add(ws);
}

// Start the Spider
dataSource.StartCrawl();

// Attach the Spider's DataSource to the IndexJob
indexJob.DataSourceToIndex = dataSource;

// Start indexing.  The indexer will repeatedly call dataSource.GetNextDoc() to obtain pages
// to index until dataSource.GetNextDoc() returns false.
indexJob.ExecuteInThread();

Requirements

Namespace: dtSearch.Spider

Assembly: dtSearch.Spider (in dtSearch.Spider.dll)

See Also

HttpDataSource Members | dtSearch.Spider Namespace