Virtualization Technology News and Information
Article
RSS
Pilosa Launches Breakthrough Open Source Software to Dramatically Accelerate Data Queries

Pilosa, an open distributed bitmap index, today launched into public beta. Pilosa decouples the index from data storage and optimizes it for massive scale. The result is dramatically accelerated query speeds across multiple, massive data sets. Pilosa is available today on GitHub.

Pilosa solves a fundamental problem in data science. The volume of enterprise data has grown faster than Moore's Law, yet the speed at which we can read it has stagnated. Despite several years of major advances in databases, the technology that retrieves data has gone untouched and read speeds have lagged far behind write speeds. Pilosa's technology addresses this problem head-on, dramatically speeding up both queries to existing databases and the process of joining data from multiple stores.

"The next wave of scientific breakthroughs will come from research projects that work with datasets of a terabyte or more," said Higinio (H.O.) Maycotte, CEO of Pilosa. "We know how to store that data, but nobody has focused on accelerating access to that data. That changes today. Our commitment to open source ensures that this fundamental problem is solved once and for all."

Because Pilosa is a bitmap index, it is relatively small in volume and runs in-memory rather than on disk. The first version includes production-tested features including single and multi-node index support, replication, algorithm plugins, a data importer, and basic cluster management. There are eight patents in the first version alone.

The software helps data scientists and engineers make sense of multiple, massive data sets without purchasing more hardware and without hours-long batch job wait times. Benchmark tests indicate Pilosa queries consistently fast even at high volumes and without increasing complexity or processing rigor. No test exceeded 1.8 seconds and most queries were returned in fractions of a second. A simple query can traverse more than 2 billion edges in one second on commodity cloud hardware, approximating speeds only seen when leveraging expensive hardware such as GPUs.

"With Pilosa, you can work with all of your data, all at once. It's exhilarating to experience this first hand," said Troy Lanier, Vice President of Product at Pilosa. "Our focus today is really on building a community around this technology. We've already seen traction in bioinformatics and information security, but we're excited to see where users take us. If you're working with massive data sets, Pilosa can dramatically change the rules of the game."

Pilosa is a free, open-source software available under an Apache 2.0 license. It can be downloaded on Github at https://github.com/pilosa. To learn more about Pilosa, please visit: https://www.pilosa.com/

Published Tuesday, May 02, 2017 1:40 PM by David Marshall
Filed under:
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<May 2017>
SuMoTuWeThFrSa
30123456
78910111213
14151617181920
21222324252627
28293031123
45678910