You are here: Overviews > File Parsers > Supported File Formats
Close
dtSearch Text Retrieval Engine Programmer's Reference
Supported File Formats

File format support included with the dtSearch Engine.

dtSearch can automatically recognize, index, and search the following formats. While file types are detected for the most part by the binary contents of the file, the filename extension often provides useful information as well, especially to resolve ambiguities. Therefore, where possible the original filename extension should be preserved when passing document contents to the dtSearch Engine.

Adobe Framemaker MIF (*.mif) Adobe Photoshop images (metadata only) (*.psd) Ami Pro (*.sam) Ansi Text (*.txt) Apple iWork KeyNote 2009 (*.key) Apple iWork Numbers 2009 (*.numbers) Apple iWork Pages 2009 (*.pages) ASCII Text ASF media files (metadata only) (*.asf) CSV (Comma-separated values) (*.csv) DBF (*.dbf) EBCDIC EML (emails saved by Outlook Express) (*.eml) Enhanced Metafile Format (*.emf) EMF Spool (*.spl) Eudora MBX message files (*.mbx) Flash (*.swf) GZIP (*.gz) HTML (*.htm, *.html) iCalendar (*.ics) Ichitaro (versions 5 and later) (*.jtd, *.jbw) JPEG (*.jpg) Lotus 1-2-3 (*.123, *.wk?) MBOX email archives such as Thunderbird, including attachments (see note 5) (*.mbx) MHT archives (HTML archives saved by Internet Explorer) (*.mht) MIME messages, including attachments (see note 5) MSG (emails saved by Outlook), including attachments (see note 5) (*.msg) Microsoft Access 95, 97, 2000, 2003, 2007, 2010, 2013, and 2016 MDB (see note 1) (*.mdb, *.accdb) Microsoft Document Imaging (*.mdi) Microsoft Excel for Mac 2.2, 3, 4, 5, 98, 2001, X, 2004, 2008, 2011 Microsoft Excel for Windows 2, 3, 4, 5 Microsoft Excel 95, 97, 2000, XP, 2003, 2007, 2010, 2013, 2016  (*.xls) Microsoft Excel 2003 XML (*.xml) Microsoft Excel Office Open XML 2007, 2010, 2013, and 2016 (*.xlsx) Microsoft OneNote 2007, 2010, 2013, and 2016 (*.one) Microsoft Outlook 97, 2000, 2003, 2007, 2010, 2013, and 2016 data files, including attachments (see note 5) (*.PST, *.OST) Microsoft Outlook/Exchange Messages, Notes, Contacts, Appointments, and Tasks (see note 2) Microsoft Outlook Express 5 and 6 (*.dbx) message stores Microsoft PowerPoint 3, 4, 95, 97, 98, 2000, 2001, 2002, 2003, 2004, 2007, 2008, 2010, 2011, 2013, 2016 (*.ppt) Microsoft PowerPoint Office Open XML  2007, 2010, 2013, and 2016 (*.pptx) Microsoft Rich Text Format (*.rtf) Microsoft Searchable Tiff (*.tiff) Microsoft Word for DOS 1, 2, 3, 4, 5, 6 (*.doc) Microsoft Word for Mac 1, 3, 4, 5, 6, 98, 2001, X, 2004, 2008, 2011 Microsoft Word for Windows 1, 2, 6 (*.doc) Microsoft Word 95, 97, 98, 2000, 2002, 2003, 2007, 2010, 2013, 2016 (*.doc) Microsoft Word 2003 XML (*.xml) Microsoft Word Office Open XML 2007, 2010, 2013, 2016 (*.docx) Microsoft Works WP (*.wks) MP3 (metadata only) (*.mp3) Multimate Advantage II (*.dox) Multimate version 4 (*.doc) OpenOffice/LibreOffice versions 1, 2, 3, 4, and 5 documents, spreadsheets, and presentations (*.sxc, *.sxd, *.sxi, *.sxw, *.sxg, *.stc, *.sti, *.stw, *.stm, *.odt, *.ott, *.odg, *.otg, *.odp, *.otp, *.ods, *.ots, *.odf) (includes OASIS Open Document Format for Office Applications) PDF files (*.pdf) (see note 6) PDF Portfolio files (*.pdf), including embedded non-PDF documents. Quattro Pro (*.wb1, *.wb2, *.wb3, *.qpw) QuickTime (*.mov, *.m4a, *.m4v) RAR (*.rar) (see note 4) TAR (*.tar) TIFF (metadata only) (*.tif) TNEF (winmail.dat) Treepad HJT files (*.hjt) Unicode (UCS16, Mac or Windows byte order, or UTF-8) Visio XML files (*.vdx) Windows Metafile Format (*.wmf) WMA media files (metadata only) (*.wma) WMV video files (metadata only) (*.wmv) WordPerfect 4.2 (*.wpd, *.wpf) WordPerfect (5.0 and later) (*.wpd, *.wpf) WordStar version 1, 2, 3 (*.ws) WordStar versions 4, 5, 6 (*.ws) WordStar 2000 Write (*.wri) XBase (including FoxPro, dBase, and other XBase-compatible formats) (*.dbf) XML (*.xml) XML Paper Specification (*.xps) XSL XyWrite ZIP (*.zip) (PKZIP 2.0-compatible)
Notes

[1] Databases. Each record in a database is treated as a separate document. Previous versions of dtSearch used ODBC to index Microsoft Access databases. Versions 7.54 and later have internal parsers for Access databases, so ODBC is no longer needed. For information on indexing SQL databases, see "Indexing Databases". 

 

[2] Outlook and Exchange.  dtSearch Desktop/Network can index Outlook and Exchange message stores using MAPI. dtSearch versions 7.77 and later can also index Outlook PST and OST files directly, without using Outlook or MAPI. 

 

[3] Web Sites. dtSearch products include a spider that can index and search dynamically-generated content or static content on web sites.  For more information, click here. 

 

[4] RAR Support. RAR support currently applies to the Windows and Linux versions of dtSearch only. 

 

[5] Attachments.  In all supported email formats, attachments, including nested attachments (for example, a .doc instead a ZIP attached to an email) are indexed as part of the main document by default.   

 

[6] PDF Support. Encrypted PDF files cannot be indexed, unless the PDF file can be opened without a password and the PDF file permissions allow for text extraction.   dtSearch versions 7.92 and earlier support PDF 1.x. dtSearch version 7.93 adds support for the new PDF 2.0 standard.

Copyright (c) 1995-2021 dtSearch Corp. All rights reserved.