Describes a single web site to be crawled
C#
public struct WebSite {
public String Url;
public bool IgnoreRobotsTxt;
public int CrawlDepth;
public int MaxItemsToIndex;
public int MaxSizeToIndex;
public int SiteTimeoutSeconds;
public int PageTimeoutSeconds;
public int WaitBetweenPagesMillis;
public String IncludeFilters;
public String ExcludeFilters;
public String ServerFilters;
public String UserAgent;
public ProxyInfo Proxy;
public AuthenticationInfo Authentication;
public FormAuthenticationInfo FormAuthentication;
}
Members
|
Members |
Description |
|---|---|
|
Url |
Starting URL for the crawl. |
|
IgnoreRobotsTxt |
If true, the Spider will crawl areas of the site even if robots.txt excludes them. |
|
CrawlDepth |
Number of links from the start page to follow. |
|
MaxItemsToIndex |
Use this setting to limit the number of pages the Spider should index on this web site. |
|
MaxSizeToIndex |
Use this setting to limit the maximum size of files that the Spider will attempt to access. |
|
SiteTimeoutSeconds |
Use this setting to limit the amount of time the Spider will spend crawling pages on this web site. |
|
PageTimeoutSeconds |
Number of seconds to wait before timing out when trying to download a single page. |
|
WaitBetweenPagesMillis |
Number of milliseconds to wait between page downloads. |
|
IncludeFilters |
Filename filters indicating which pages should be indexed. |
|
ExcludeFilters |
Filename filters indicating which pages should be not indexed. |
|
ServerFilters |
List of server names other than the starting server that the Spider can visit. |
|
UserAgent |
Name to use to identify this program to the web server |
|
Proxy |
Proxy settings to use to connect to a web site. |
|
Authentication |
Authentication settings to use to connect to a web site. |
|
FormAuthentication |
Form authentication settings to use to connect to a web site that uses HTTP GET or POST requests for authentication. |
See Also