Indexing desktop database formats such as Access, XBase, and CSV.
The dtSearch Engine uses internal file parsers to index XBase (*.dbf), CSV (*.csv), and Microsoft Access (*.mdb, *.accdb) data.
By default, dtSearch indexes each record of database files (*.mdb, *.accdb, *.csv, *.dbf) as a separate document. For example, if you index an Access database with 1000 records in one table and 500 records in another table, the index will contain separate document entries for each of the 1500 records.
The naming convention for the records is based on the File Parser API for items contained in a container. Example:
The following explains the components of this filename:
Component |
Meaning |
c:\docs\sample.mdb |
Path and filename of the database |
> |
Delimiter used in the File Parser API to separate the filename and subitem of a document stored inside a container. |
1cf |
Hex ordinal identifying the subitem (can be ignored) |
| |
Delimiter used in the File Parser API to separate the ordinal and name components of an item stored inside a container. |
Customer#128 |
Name of the table and row identifier |
The row identifier for a row is generated by each file parser in a different way according to the requirements of that format. For example, for CSV data the row identifier includes the byte offset of the start of the row.
Data in database rows is automatically searchable by field name, so if you have an Access database with a "Name" column, you could search for "Smith" in the Name column like this:
Indexing each record separately makes it possible to retrieve individual records with searches on terms in the records, but it also adds a lot of documents to the index and it can make indexing slower. An alternative option is to index each database file as single document containing all of the records in the database. All of the records will still be searchable, but as part of a single long document rather than many small documents.
To tell dtSearch to index all database files (Access, CSV, and XBase) as plain text , set the flag dtsoTfDatabasesAsText in Options.TextFlags. Each database will be indexed as a single text file, and searches by field name such as "Name contains Smith" will not be possible.
To index some databases as single documents, you can use the File Type Table to enable the file parsers that do this. The type ids that control this are:
- Microsoft Access, as records: it_MicrosoftAccess2 (288)
- Microsoft Access, as single document: it_MicrosoftAccessAsDocument (335)
- XBase, as records: it_XBase (263)
- XBase, as single document: it_XBaseAsDocument (336)
- CSV, as records: it_CSV (261)
- CSV, as single document, list format: it_CsvAsDocument(334)
- CSV, as single document, report format: it_CsvAsReport (333)
Example: The following filetype.xml would cause all XBase, Access, and CSV files to be indexed as single documents:
In each entry in the file type table, the "Name" is essentially a comment, and the "Flags" value 4 indicates that a disabled file parser should be enabled. Note that the CSV entry requires a filename filter (*.csv) because CSV files do not have a binary signature that allows for automatic detection.
There are two single-document parsers for CSV files: it_CsvAsDocument (334), and it_CsvAsReport (333). The "Report" format is intended for the types of CSV files that are created when you save an Excel spreadsheet as CSV. In this case, the layout is more like a report, with headings, rows, and columns, rather than records. The "Document" format is a single document containing a list of records, with each record formatted as a table of field-value pairs.
Another way to prevent CSV files from being indexed as databases is to disable the default CSV file parser entirely. When the default CSV file parser is disabled, CSV files will be indexed as plain text. To disable the default CSV file parser, create a file type table entry for it_CSV (261), using the Flags value 2, which disables a file parser:
dtSearch versions prior to 7.54 relied on ODBC and an external file parser, dtv_odbc.dll, to parse Microsoft Access (*.mdb) databases. This is no longer required unless you are indexing data created with Microsoft Access versions prior to 1995. To enable ODBC database indexing, put dtv_odbc.dll in a subdirectory named "viewers" of the dtSearch Engine's "Home" directory, which is usually the directory where dten600.dll is located.