Virtualization Technology News and Information
dtSearch 2022 Predictions: RIP Organizing Enterprise Data to Find Things

vmblog predictions 2022 

Industry executives and experts share their predictions for 2022.  Read them in this 14th annual series exclusive.

2022: RIP Organizing Enterprise Data to Find Things

By Elizabeth Thede, director of sales at dtSearch Corp.

Meticulously sorting enterprise documents, emails and the like into specific folders and subfolders so end-users can locate items they need has gone the way of relying on a telegraph to send messages. A search engine enables multiple users to concurrently search across terabytes employing a wide range of instant search options - no prior human organizing required.

Note that these are not "search the Internet" search engines like Google. Rather, these are precision, enterprise search engines like dtSearch. An enterprise search engine instantly searches terabytes of data after indexing. Indexing takes no human effort: simply point to the high-level folders to cover, and the search engine automatically does the rest.

Some technical background on how a search engine operates might be helpful. When you retrieve a word processing document in Microsoft Word, the Word application takes the binary file and displays it through the lens of the Word application. In contrast, a search engine goes directly to the binary file, bypassing the application entirely.

A search engine first needs to figure out the correct parsing specification to apply to the binary file. .DOC has a completely separate parsing specification from .DOCX, and both are very different from the specifications for Outlook/Exchange, PDF, PowerPoint, Access, Excel, OneNote, etc. To figure out the parsing specification to apply, a search engine has to look inside the binary file contents, not at the file extension. (Saving a PDF with a Microsoft Word extension and an Access database with an Excel extension is all-too-easy.)

But the advantage to a search engine of directly accessing binary files does not stop there. Metadata that may require extensive "clicking around" to see in an application view is immediately accessible in binary format. Black on black, white on white, or green on green text that may be hard to spot in an application view is "plain as day" in binary format. Even parsing multilayer nested files, like an email with a ZIP attachment containing an Access database with an Excel spreadsheet literally embedded inside, works better in binary format.

Organizing data in folders and subfolders typically relies on simple attributes like filename. By contrast, a search engine can instantly process over 25 different types of search requests. An end user can enter a "plain English" natural language search or craft a precision full-text or metadata search with expressions involving words or phrases in Boolean (and/or/not) or proximity relationships to other words or phrases.

Fuzzy searching can sift through spelling errors that can arise from OCR or simply mistyping in an email. Searching works not only with English, but any of the hundreds of international languages that Unicode supports. Beyond words, the search engine can search for numbers, number ranges or numeric expressions; dates or date ranges; and hash values. A search engine can even find credit card numbers in data.

The search engine has multiple options for relevancy ranking and other search results sorting (and immediate re-sorting), while displaying a full copy of retrieved items with highlighted hits. While a search engine can run on individual PCs, it can also enable concurrent searching across a network or an Internet or Intranet site. The site could be hosted "on premises" or hosted remotely in the cloud such as on Azure or AWS. Finally, an index can automatically update itself at specified intervals to accommodate content changes, without affecting continued concurrent searching.

So, as we enter 2022, RIP organizing enterprise data for the new year, and try a search engine.



Elizabeth Thede 

Elizabeth Thede is director of sales at dtSearch Corp. The company offers enterprise and developer products running "on premises" or in the cloud to instantly search terabytes with over 25 search options. dtSearch's own document filters support files, emails, databases and web data.

Published Monday, December 13, 2021 7:34 AM by David Marshall
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<December 2021>