dtSearch::Spider::WebSite Structure

Describes a single web site to be crawled

public struct WebSite { public String Url; public bool IgnoreRobotsTxt; public int CrawlDepth; public int MaxItemsToIndex; public int MaxSizeToIndex; public int SiteTimeoutSeconds; public int PageTimeoutSeconds; public int WaitBetweenPagesMillis; public String IncludeFilters; public String ExcludeFilters; public String ServerFilters; public String UserAgent; public ProxyInfo Proxy; public AuthenticationInfo Authentication; public FormAuthenticationInfo FormAuthentication; }

Members

Members	Description
Url	Starting URL for the crawl.
IgnoreRobotsTxt	If true, the Spider will crawl areas of the site even if robots.txt excludes them.
CrawlDepth	Number of links from the start page to follow.
MaxItemsToIndex	Use this setting to limit the number of pages the Spider should index on this web site.
MaxSizeToIndex	Use this setting to limit the maximum size of files that the Spider will attempt to access.
SiteTimeoutSeconds	Use this setting to limit the amount of time the Spider will spend crawling pages on this web site.
PageTimeoutSeconds	Number of seconds to wait before timing out when trying to download a single page.
WaitBetweenPagesMillis	Number of milliseconds to wait between page downloads.
IncludeFilters	Filename filters indicating which pages should be indexed.
ExcludeFilters	Filename filters indicating which pages should be not indexed.
ServerFilters	List of server names other than the starting server that the Spider can visit.
UserAgent	Name to use to identify this program to the web server
Proxy	Proxy settings to use to connect to a web site.
Authentication	Authentication settings to use to connect to a web site.
FormAuthentication	Form authentication settings to use to connect to a web site that uses HTTP GET or POST requests for authentication.