Describes a single web site to be crawled
C#
public struct WebSite {
public String Url;
public bool IgnoreRobotsTxt;
public int CrawlDepth;
public int MaxItemsToIndex;
public int MaxSizeToIndex;
public int SiteTimeoutSeconds;
public int PageTimeoutSeconds;
public int WaitBetweenPagesMillis;
public String IncludeFilters;
public String ExcludeFilters;
public String ServerFilters;
public String UserAgent;
public ProxyInfo Proxy;
public AuthenticationInfo Authentication;
public FormAuthenticationInfo FormAuthentication;
}
Members
Members |
Description |
---|---|
Url |
Starting URL for the crawl. |
IgnoreRobotsTxt |
If true, the Spider will crawl areas of the site even if robots.txt excludes them. |
CrawlDepth |
Number of links from the start page to follow. |
MaxItemsToIndex |
Use this setting to limit the number of pages the Spider should index on this web site. |
MaxSizeToIndex |
Use this setting to limit the maximum size of files that the Spider will attempt to access. |
SiteTimeoutSeconds |
Use this setting to limit the amount of time the Spider will spend crawling pages on this web site. |
PageTimeoutSeconds |
Number of seconds to wait before timing out when trying to download a single page. |
WaitBetweenPagesMillis |
Number of milliseconds to wait between page downloads. |
IncludeFilters |
Filename filters indicating which pages should be indexed. |
ExcludeFilters |
Filename filters indicating which pages should be not indexed. |
ServerFilters |
List of server names other than the starting server that the Spider can visit. |
UserAgent |
Name to use to identify this program to the web server |
Proxy |
Proxy settings to use to connect to a web site. |
Authentication |
Authentication settings to use to connect to a web site. |
FormAuthentication |
Form authentication settings to use to connect to a web site that uses HTTP GET or POST requests for authentication. |
See Also