See Also
dtSearch Text Retrieval Engine -- .NET API (.NET 2.0-4.0) 7.70
HttpDataSource Class

Implementation of a web site indexing spider to be used with dtSearch.Engine.IndexJob.

dtSearch::Spider::HttpDataSource
C#
public class HttpDataSource : dtSearch::Engine::DataSource, IDisposable;
Visual Basic
Public Class HttpDataSource
Inherits dtSearch::Engine::DataSource
Inherits IDisposable
Remarks

To use HttpDataSource, 

1. Create a HttpDataSource. 

2. Call Add() with one or more WebSites specifying the site(s) to crawl. 

3. Create a dtSearch.Engine.IndexJob. 

4. Set the IndexJob's DataSourceToIndex to the HttpDataSource. 

5. Call HttpDataSource.StartCrawl() to start the Spider. 

6. Call IndexJob.Execute to start the indexer. 

For sample code, see the SpiderDemo sample.

Example

This example, from the SpiderDemo sample application, demonstrates how to set up the Spider to index a list of web sites. The dtsIndexCacheText and dtsIndexCacheOriginalFile flags are not necessary to index web sites, but using these flags makes hit-highlighting much faster and easier to implement.

           // Make IndexJob
           indexJob = new IndexJob();
           indexJob.ActionCreate = true;
           indexJob.ActionAdd = true;
           indexJob.IndexPath = this.IndexPathEdit.Text;
           indexJob.IndexingFlags = IndexingFlags.dtsIndexCacheText | IndexingFlags.dtsIndexCacheOriginalFile;
 
           // Make data source to crawl the web sites listed in webSiteList
           dataSource = new HttpDataSource();
           foreach (WebSite ws in webSiteList)
           {   dataSource.Add(ws);
           }
 
           // Start the Spider
           dataSource.StartCrawl();
 
           // Attach the Spider's DataSource to the IndexJob
           indexJob.DataSourceToIndex = dataSource;
 
           // Start indexing.  The indexer will repeatedly call dataSource.GetNextDoc() to obtain pages
           // to index until dataSource.GetNextDoc() returns false.
           indexJob.ExecuteInThread();
See Also
You are here: dtSearch::Spider Namespace > HttpDataSource Class
Copyright (c) 1998-2012 dtSearch Corp. All rights reserved.