Close
dtSearch Engine API for .NET Framework 2.x-4.x 2024.01
HttpDataSource Class

Implementation of a web site indexing spider to be used with dtSearch.Engine.IndexJob.

dtSearch::Spider::HttpDataSource
public class HttpDataSource : dtSearch::Engine::DataSource, IDisposable;

To use HttpDataSource, 

1. Create a HttpDataSource. 

2. Call Add() with one or more WebSites specifying the site(s) to crawl. 

3. Create a dtSearch.Engine.IndexJob

4. Set the IndexJob's DataSourceToIndex to the HttpDataSource. 

5. Call HttpDataSource.StartCrawl() to start the Spider. 

6. Call IndexJob.Execute to start the indexer. 

For sample code, see the SpiderDemo sample.

This example, from the SpiderDemo sample application, demonstrates how to set up the Spider to index a list of web sites. The dtsIndexCacheText and dtsIndexCacheOriginalFile flags are not necessary to index web sites, but using these flags makes hit-highlighting much faster and easier to implement.

// Make IndexJob indexJob = new IndexJob(); indexJob.ActionCreate = true; indexJob.ActionAdd = true; indexJob.IndexPath = this.IndexPathEdit.Text; indexJob.IndexingFlags = IndexingFlags.dtsIndexCacheText | IndexingFlags.dtsIndexCacheOriginalFile; // Make data source to crawl the web sites listed in webSiteList dataSource = new HttpDataSource(); foreach (WebSite ws in webSiteList) { dataSource.Add(ws); } // Start the Spider dataSource.StartCrawl(); // Attach the Spider's DataSource to the IndexJob indexJob.DataSourceToIndex = dataSource; // Start indexing. The indexer will repeatedly call dataSource.GetNextDoc() to obtain pages // to index until dataSource.GetNextDoc() returns false. indexJob.ExecuteInThread();