Troubleshooting -- Spider forms authentication problems

Article: dts0214

Applies to: dtSearch Spider

If you are trying to index your own web site, the following are some ways to configure the web site so the dtSearch Spider can access it:

(1) Change the authentication form to allow authentication via the URL instead, like this:

https://www.example.com/login.aspx?user=abc&password=def

This way the start page for your crawl could embed the authentication information.

(2) Index the content in the folders where it occurs (through the file system) rather than by crawling the site. This will work if the content is documents like PDF files or static HTML pages, but not if it is dynamically generated.

(3) Change your authentication process to allow the Spider to bypass your login form in a way that does not compromise security (for example, you could allow this only if the IP address of the user matches the machine the Spider runs from).