Virtualization Technology News and Information
VMblog's Expert Interviews: Archive360 Talks How Machine Learning and Cloud Computing Will Transform Information Governance


Data categorization and management present formidable challenges for organizations due to regulatory, legal, and business concerns.  Yet new technologies, such as predictive analytics and cloud computing, provide a path towards the ‘holy grail' of information governance: completely autonomous, computer-driven data categorization and management.  Bill Tolson, Vice President of Marketing at Archive360, shares his insight into the past, present and future of this field.

VMblog:  What key points do organizations need to know regarding information governance?

Bill Tolson:  First, note that in the past, information governance was a task for the end user of the data. Even in organizations with well-defined information governance policies, the actual decisions regarding data - for example, which documents to keep, which ones to discard, and when - fell in the hands of the end user.

So when storage grew full, and the IT department informed end users they would be unable to save or send further documents until freeing up storage, it was up to the end user to decide what to save and what to delete. This was often done using irrelevant metrics, such as the creation date or file size, and without regard to regulatory, legal, or business requirements for archiving.

VMblog:  This sounds like an area ripe for technological innovation.  How did technology first begin to play a role in the information governance field?

Tolson: Around the turn of the millennium, tech firms started to offer ‘records management systems' designed to simplify information management. However, these electronic solutions were largely ineffective, since they lacked the ability to decide whether a file needed to be kept - for legal, regulatory, or business reasons - or could be discarded. They also could not process data into different streams - for example, to note how long a file should be stored, or move it into long-term storage.

Even today, the most sophisticated organizations - private, public, and non-profit - are struggling to take control of their information governance. The explosion of data makes it practically impossible for end users to implement information governance, even with software support.

VMblog:  Since in your view it's nearly impossible for organizations to maintain a full grasp on the information in their possession, do you think they should simply focus on the most crucial data, such as information that falls under legal or regulatory requirements, and ignore the rest?

Tolson:  This is the priority most organizations focus on - though it typically accounts for a mere 6-10% of the overall data in the organization. That still leaves a vast amount of data for end users to process, and in practice that means this data is not managed at all - it's simply not possible for the end user to properly categorize this volume of data. Instead, they stick it in a folder in their email or desktop - they don't delete it, but they do forget about it...

VMblog:  So, organizations can't simply ignore this data, or leave it to the end user.  What prospects do you see for how this situation could be solved?

Tolson:  The ideal information governance solution would be entirely automated, able to make intelligent decisions with fresh data, and do so in a highly accurate manner. Both the organization and the end users would benefit from this - after all, information governance is not a core component of most positions, and those end users don't see extra compensation or bonuses for effectively managing their data.

Microsoft is actively working on this: at a previous Microsoft Inspire Conference initial keynote address, Microsoft CEO, Satya Nadella, introduced the application of predictive automation to information governance. He discussed how predictive intelligence, and archiving in the cloud, could address data issues before they occur.

For example, predictive automation could analyze data to decide whether it is subject to regulatory, compliance, or legal mandates, where it should be stored, for how long, and any limitations regarding access or security. This would free end users from the responsibility of data governance, and ultimately ensure more efficient and accurate information management.

VMblog:  It seems like this is an ideal solution for organizations facing the typical data explosion.  How would a predictive automation solution work in practice?

Tolson:  Here's an illustration: I previously worked in the eDiscovery industry. We used predictive coding to automate the eDiscovery data review process, where documentation is reviewed for relevance to a legal proceeding.

Prior to the predictive analytics innovation, eDiscovery organizations gathered vast repositories of information, and performed an initial sort using keywords. They then assigned teams of attorneys and paralegals to read each document and consider its relevance to the case at hand. This was a costly and inefficient endeavor: just a few years ago, the average cost for an eDiscovery review was approximately $1.5 million - not including the actual trial or judgment process.

Our solution applied supervised machine learning to automate the eDiscovery process. We collected previous eDiscovery results, and used these sets to train computers to recognize relevant data points and interpret their meaning. We trained the computers using anywhere from 2 to 50 training cycles, and each training helped the computer better recognize relevant information.

Typically, additional trainings lower the error rate of the program, so the computer can more accurately recognize relevant case material: manual reviews have an error rate anywhere from 20 to 50%, but our predictive coding system had an error rate as low as 2%. Given this highly accurate performance, the courts began to accept predictive analytics as a legally acceptable tool for discovery.

VMblog:  I understand you're able to recognize the type of data you're dealing with, using predictive coding.  Yet it seems like you have to be present to help train the computer, so it is not a truly automated system.  Is there any way to completely eliminate the need for human monitoring, so the computer is able to manage the information governance process independently?

Tolson:  What you're describing is the 'holy grail' of predictive information governance: a completely independent computer system that can recognize documents, and manage them correctly, with no human input. In technical terms, we call this 'unsupervised machine learning.'

A fully developed, unsupervised machine learning system would be the realization of a truly automated predictive information governance system. It could gather, manage, store, safeguard, and decide whether to keep or delete information.

Even better would be such a system built on the cloud. A cloud-based solution would lower the cost and complexity of managing and storing data, since the cost of this capability would be shared by many organizations within the public cloud environment.

Fortunately, we are almost there. Microsoft's Cloud and Azure services are bringing us within sight of the holy grail of information governance: a fully automated, predicted governance system. Azure includes machine learning to help organizations develop self-adapting security and analytics, among other capabilities. Full data governance automation is on the horizon.


Bill Tolson is Vice President of Marketing for Archive360 ( He has more than 25 years of experience with multinational corporations and technology start-ups, including 15-plus years in the archiving, ECM, information governance, regulations compliance and legal eDiscovery markets. Prior to joining Archive360, Bill held leadership positions at Actiance, Recommind, Hewlett Packard, Iron Mountain, Mimosa Systems, and StorageTek. 

Published Monday, March 12, 2018 7:40 AM by David Marshall
Filed under: , , ,
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<March 2018>