Close
dtSearch Text Retrieval Engine Programmer's Reference
Database files (*.mdb, *.dbf, *.csv)

Indexing desktop database formats such as Access, XBase, and CSV.

The dtSearch Engine uses internal file parsers to index XBase (*.dbf), CSV (*.csv), and Microsoft Access (*.mdb, *.accdb) data.

Database Records

By default, dtSearch indexes each record of database files (*.mdb, *.accdb, *.csv, *.dbf) as a separate document. For example, if you index an Access database with 1000 records in one table and 500 records in another table, the index will contain separate document entries for each of the 1500 records. 

The naming convention for the records is based on the File Parser API for items contained in a container. Example:

c:\docs\sample.mdb>1cf|sample.customer#128

The following explains the components of this filename:

Component
Meaning
c:\docs\sample.mdb
Path and filename of the database
>
Delimiter used in the File Parser API to separate the filename and subitem of a document stored inside a container.
1cf
Hex ordinal identifying the subitem (can be ignored)
|
Delimiter used in the File Parser API to separate the ordinal and name components of an item stored inside a container.
Customer#128
Name of the table and row identifier

The row identifier for a row is generated by each file parser in a different way according to the requirements of that format. For example, for CSV data the row identifier includes the byte offset of the start of the row. 

Data in database rows is automatically searchable by field name, so if you have an Access database with a "Name" column, you could search for "Smith" in the Name column like this:

Name contains Smith
How to index each database as a single document

Indexing each record separately makes it possible to retrieve individual records with searches on terms in the records, but it also adds a lot of documents to the index and it can make indexing slower. An alternative option is to index each database file as single document containing all of the records in the database. All of the records will still be searchable, but as part of a single long document rather than many small documents.

(a) Indexing all databases as plain text

To tell dtSearch to index all database files (Access, CSV, and XBase) as plain text , set the flag dtsoTfDatabasesAsText in Options.TextFlags. Each database will be indexed as a single text file, and searches by field name such as "Name contains Smith" will not be possible.

(b) Specifying rules for individual database formats

To index some databases as single documents, you can use the File Type Table to enable the file parsers that do this. The type ids that control this are:

  • Microsoft Access, as records: it_MicrosoftAccess2 (288)
  • Microsoft Access, as single document: it_MicrosoftAccessAsDocument (335)
  • XBase, as records: it_XBase (263)
  • XBase, as single document: it_XBaseAsDocument (336)
  • CSV, as records: it_CSV (261)
  • CSV, as single document, list format: it_CsvAsDocument(334)
  • CSV, as single document, report format: it_CsvAsReport (333)

Example: The following filetype.xml would cause all XBase, Access, and CSV files to be indexed as single documents:

<?xml version="1.0" encoding="UTF-8" ?> <dtSearchFileTypeRules> <DefaultEncoding>Auto-detect (Recommended)</DefaultEncoding> <Item> <Name>Access as single document</Name> <TypeId>335</TypeId> <Filters></Filters> <Flags>4</Flags> </Item> <Item> <Name>DBF as single document</Name> <TypeId>336</TypeId> <Filters></Filters> <Flags>4</Flags> </Item> <Item> <Name>CSV as single document</Name> <TypeId>334</TypeId> <Filters>*.csv</Filters> <Flags>4</Flags> </Item> </dtSearchFileTypeRules>

 

In each entry in the file type table, the "Name" is essentially a comment, and the "Flags" value 4 indicates that a disabled file parser should be enabled. Note that the CSV entry requires a filename filter (*.csv) because CSV files do not have a binary signature that allows for automatic detection.

CSV Options

There are two single-document parsers for CSV files: it_CsvAsDocument (334), and it_CsvAsReport (333). The "Report" format is intended for the types of CSV files that are created when you save an Excel spreadsheet as CSV. In this case, the layout is more like a report, with headings, rows, and columns, rather than records. The "Document" format is a single document containing a list of records, with each record formatted as a table of field-value pairs. 

Another way to prevent CSV files from being indexed as databases is to disable the default CSV file parser entirely. When the default CSV file parser is disabled, CSV files will be indexed as plain text. To disable the default CSV file parser, create a file type table entry for it_CSV (261), using the Flags value 2, which disables a file parser:

<?xml version="1.0" encoding="UTF-8" ?> <dtSearchFileTypeRules> <DefaultEncoding>Auto-detect (Recommended)</DefaultEncoding> <Item> <Name>Disable the CSV parser</Name> <TypeId>261</TypeId> <Filters></Filters> <Flags>2</Flags> </Item> </dtSearchFileTypeRules>
ODBC

dtSearch versions prior to 7.54 relied on ODBC and an external file parser, dtv_odbc.dll, to parse Microsoft Access (*.mdb) databases. This is no longer required unless you are indexing data created with Microsoft Access versions prior to 1995. To enable ODBC database indexing, put dtv_odbc.dll in a subdirectory named "viewers" of the dtSearch Engine's "Home" directory, which is usually the directory where dten600.dll is located.