dtSearch Text Retrieval Engine Programmer's Reference
Supported File Formats

File format support included with the dtSearch Engine.

dtSearch can automatically recognize, index, and search the following formats. While file types are detected for the most part by the binary contents of the file, the filename extension often provides useful information as well, especially to resolve ambiguities. Therefore, where possible the original filename extension should be preserved when passing document contents to the dtSearch Engine. 

For more information on file format support, see External link"What file formats does dtSearch Support? (".

Adobe Framemaker MIF (*.mif) Ami Pro (*.sam) Ansi Text (*.txt) Apple iWork KeyNote 2009 (*.key) Apple iWork Numbers 2009 (*.numbers) Apple iWork Pages 2009 (*.pages) ASCII Text CSV (Comma-separated values) (*.csv) DBF (*.dbf) EBCDIC EML (emails saved by Outlook Express) (*.eml) Enhanced Metafile Format (*.emf) EMF Spool (*.spl) Eudora MBX message files (*.mbx) Flash (*.swf) GZIP (*.gz) Hancom Hanword (*.hwp) Hancom Hanword 97(*.hwp) Hancom Hanword (*.hwpx) (versions 2021.02 and later) HTML (*.htm, *.html) iCalendar (*.ics) Ichitaro (versions 5 and later) (*.jtd, *.jbw) Lotus 1-2-3 (*.123, *.wk?) MBOX email archives such as Thunderbird, including attachments (see note 5) (*.mbx) MHT archives (web pages saved by Internet Explorer in the "Web archive, single file" format) (*.mht) MIME messages, including attachments (see note 5) MSG (emails saved by Outlook), including attachments (see note 5) (*.msg) Microsoft Access 95, 97, 2000, 2003, 2007, 2010, 2013, and 2016 MDB (see note 1) (*.mdb, *.accdb) Microsoft Excel for Mac 2.2, 3, 4, 5, 98, 2001, X, 2004, 2008, 2011 Microsoft Excel for Windows 2, 3, 4, 5 Microsoft Excel 95, 97, 2000, XP, 2003, 2007, 2010, 2013, 2016 (*.xls) Microsoft Excel 2003 XML (*.xml) Microsoft Excel Office Open XML 2007, 2010, 2013, and 2016 (*.xlsx) Microsoft OneNote 2007, 2010, 2013, and 2016 (*.one) Microsoft Outlook 97, 2000, 2003, 2007, 2010, 2013, and 2016 data files, including attachments (see note 5) (*.PST, *.OST) Microsoft Outlook/Exchange Messages, Notes, Contacts, Appointments, and Tasks (see note 2) Microsoft Outlook Express 5 and 6 (*.dbx) message stores Microsoft PowerPoint 3, 4, 95, 97, 98, 2000, 2001, 2002, 2003, 2004, 2007, 2008, 2010, 2011, 2013, 2016 (*.ppt) Microsoft PowerPoint Office Open XML 2007, 2010, 2013, and 2016 (*.pptx) Microsoft Rich Text Format (*.rtf) Microsoft Word for DOS 1, 2, 3, 4, 5, 6 (*.doc) Microsoft Word for Mac 1, 3, 4, 5, 6, 98, 2001, X, 2004, 2008, 2011 Microsoft Word for Windows 1, 2, 6 (*.doc) Microsoft Word 95, 97, 98, 2000, 2002, 2003, 2007, 2010, 2013, 2016 (*.doc) Microsoft Word 2003 XML (*.xml) Microsoft Word Office Open XML 2007, 2010, 2013, 2016 (*.docx) Microsoft Works WP (*.wks) Multimate Advantage II (*.dox) Multimate version 4 (*.doc) OpenOffice/LibreOffice versions 1, 2, 3, 4, and 5 documents, spreadsheets, and presentations (*.sxc, *.sxd, *.sxi, *.sxw, *.sxg, *.stc, *.sti, *.stw, *.stm, *.odt, *.ott, *.odg, *.otg, *.odp, *.otp, *.ods, *.ots, *.odf) (includes OASIS Open Document Format for Office Applications) PDF 1.x files (*.pdf) (see note 6) PDF 2.x files (*.pdf) (see note 7) PDF Portfolio files (*.pdf), including embedded non-PDF documents. Quattro Pro (*.wb1, *.wb2, *.wb3, *.qpw) RAR (*.rar) (see note 4) TAR (*.tar) TNEF (winmail.dat) Treepad HJT files (*.hjt) Unicode (UCS16, Mac or Windows byte order, or UTF-8) Visio XML files (*.vdx) Windows Metafile Format (*.wmf) WordPerfect 4.2 (*.wpd, *.wpf) WordPerfect (5.0 and later) (*.wpd, *.wpf) WordStar version 1, 2, 3 (*.ws) WordStar versions 4, 5, 6 (*.ws) WordStar 2000 Write (*.wri) XBase (including FoxPro, dBase, and other XBase-compatible formats) (*.dbf) XML (*.xml) XML Paper Specification (*.xps) XSL XyWrite ZIP (*.zip) (PKZIP 2.0-compatible) Media formats - metadata only Adobe Photoshop images (*.psd) APE (*.ape) (versions 2023.02 and later) Audio Interchange Format (*.aiff) (versions 2023.02 and later) ASF media files (*.asf) Free Lossless Audio Codec (*.flac) (versions 2023.02 and later) GIF (*.gif) (versions 2023.02 and later) HEIF (*.heif) (versions 2023.02 and later) JPEG (*.jpg) Microsoft Searchable Tiff (*.tiff) Microsoft Document Imaging (*.mdi) MP3 (*.mp3) MPEG-4 (*.m4a) OGG (*.ogg) (versions 2023.02 and later) OPUS (*.opus) (versions 2023.02 and later) QuickTime (*.mov, *.m4a, *.m4v) TIFF (*.tif) WEBP (*.webp) (versions 2023.02 and later) WAV (*.wav) (versions 2023.02 and later) WMA media files (*.wma) WMV video files (*.wmv)

[1] Databases. Each record in a database is treated as a separate document. Previous versions of dtSearch used ODBC to index Microsoft Access databases. Versions 7.54 and later have internal parsers for Access databases, so ODBC is no longer needed. For information on indexing SQL databases, see "Indexing Databases". 


[2] Outlook and Exchange.  dtSearch Desktop/Network can index Outlook and Exchange message stores using MAPI. dtSearch versions 7.77 and later can also index Outlook PST and OST files directly, without using Outlook or MAPI. 


[3] Web Sites. dtSearch products include a spider that can index and search dynamically-generated content or static content on web sites. 


[4] RAR Support. RAR support currently applies to the Windows and Linux versions of dtSearch only. 


[5] Attachments.  In all supported email formats, attachments, including nested attachments (for example, a .doc instead a ZIP attached to an email) are indexed as part of the main document by default.   


[6] PDF Support. Encrypted PDF files cannot be indexed, unless the PDF file can be opened without a password and the PDF file permissions allow for text extraction.   


[7] dtSearch versions 7.92 and earlier support PDF 1.x. dtSearch version 7.93 adds support for the new PDF 2.0 standard. 


[8] Office 365. Supported Microsoft Office formats are also supported when saved from Office 365.