How to index SharePoint sites

Article: dts0240

 

dtSearch can index SharePoint content in three ways: using the (1) replicating the content locally and indexing the local copy; (2) crawling the site as a web site using the dtSearch Spider; and (3) using the dtSearch Engine with the DataSource API.

(1) Replicating the content locally

Please see these Microsoft articles for information on maintaining a local folder with content from SharePoint.  Any content that is stored in the file system can be indexed like any other folders using the dtSearch Indexer.

Sync SharePoint and Teams files with your computer - Microsoft Support

Copy or move library files by using Open with Explorer - Microsoft Support (for older versions of SharePoint)

(2) Indexing as a web site with the dtSearch Spider

If the SharePoint data is all accessible through links on a web site, you can index it like any other web site using the dtSearch Spider.  For information on indexing web sites, please see: How to index a web site with the dtSearch Spider

The dtSearch Spider has a .NET API that you can use to implement web site crawling in your application.  For API documentation, please see:  dtSearch Spider API.  Using this API, you can have your application crawl SharePoint sites using HTTP.

(3) Indexing using the DataSource API

For a more direct connection with one of the SharePoint APIs, you can use the dtSearch Engine's DataSource API.  This API lets you pass binary documents (Word, PDF, etc.) directly to the dtSearch Engine along with a set of field-value pairs that will be indexed with the document as metadata.  For information on the DataSource API, please see:

How to index databases with the dtSearch Engine

API Overview -- Indexing Databases

.NET DataSource API documentation

For sample C# code demonstrating how to use the DataSource API to connect with the SharePoint client API, see the C:\Program Files\dtSearch Developer\examples\cs4\SharePointDemo folder.  Sample code demonstrating how to index using the SharePoint server API is also available, in the codeproject.com article "dtSearch's DataSource API for indexing SharePoint Site Collections".