Close
dtSearch Engine API for .NET Framework 2.x-4.x 2024.01
dtSearch::Spider::WebSite Structure

Describes a single web site to be crawled

public struct WebSite { public String Url; public bool IgnoreRobotsTxt; public int CrawlDepth; public int MaxItemsToIndex; public int MaxSizeToIndex; public int SiteTimeoutSeconds; public int PageTimeoutSeconds; public int WaitBetweenPagesMillis; public String IncludeFilters; public String ExcludeFilters; public String ServerFilters; public String UserAgent; public ProxyInfo Proxy; public AuthenticationInfo Authentication; public FormAuthenticationInfo FormAuthentication; }
Members
Description
Url
Starting URL for the crawl.
IgnoreRobotsTxt
If true, the Spider will crawl areas of the site even if robots.txt excludes them.
CrawlDepth
Number of links from the start page to follow.
MaxItemsToIndex
Use this setting to limit the number of pages the Spider should index on this web site.
MaxSizeToIndex
Use this setting to limit the maximum size of files that the Spider will attempt to access.
SiteTimeoutSeconds
Use this setting to limit the amount of time the Spider will spend crawling pages on this web site.
PageTimeoutSeconds
Number of seconds to wait before timing out when trying to download a single page.
WaitBetweenPagesMillis
Number of milliseconds to wait between page downloads.
IncludeFilters
Filename filters indicating which pages should be indexed.
ExcludeFilters
Filename filters indicating which pages should be not indexed.
ServerFilters
List of server names other than the starting server that the Spider can visit.
UserAgent
Name to use to identify this program to the web server
Proxy
Proxy settings to use to connect to a web site.
Authentication
Authentication settings to use to connect to a web site.
FormAuthentication
Form authentication settings to use to connect to a web site that uses HTTP GET or POST requests for authentication.