Virtualization Technology News and Information
New O'Reilly Report Explores Tools and Best Practices for Advanced Analytics and Artificial Intelligence

O'Reilly, the premier source for insight-driven learning on technology and business, today announced the results of its "Evolving Data Infrastructure" survey, which explores the tools companies are using for their advanced analytics and Artificial Intelligence (AI) projects - and the best practices they have acquired along the way.

The research, which will be released in full at O'Reilly's upcoming Strata Data Conference in San Francisco, found that more than half (58 percent) of today's companies are either building or evaluating data science platforms - which are essential for companies that are keen on growing their data science teams and machine learning capabilities - while 85 percent of companies already have data infrastructure in the cloud.

Some of the key other findings from the research include:

  • Companies are building or evaluating solutions in foundational technologies needed to sustain success in analytics and AI. These include data integration and Extract, Transform and Load (ETL) (60 percent), data preparation and cleaning (52 percent), data governance (31 percent), metadata analysis and management (28 percent) and data lineage management (21 percent).
  • Companies are building data infrastructure in the cloud. Eighty-five percent indicated that they had data infrastructure in at least one of the seven top cloud providers, with two-thirds (63 percent) using Amazon Web Services (AWS). The results also showed that users of AWS, Microsoft Azure or Google Cloud Platform (GCP) tended to use multiple cloud providers.
  • The use of durable cloud storage is prevalent. Sixty-two percent of all respondents indicated they used at least one of the following: Amazon S3 or Glacier, Azure Storage, or Google Cloud Storage.
  • Data scientists and data engineers are in demand. When asked what skills their teams needed to strengthen, 44 percent said data science and 41 percent said data engineering.
  • Respondents used a variety of streaming and data processing technologies. Half of the respondents (49 percent) used either Apache Spark or Spark Streaming, while other popular tools included open source projects (Apache Kafka, Apache Hadoop) and their related managed services in the cloud (Elastic MapReduce, AWS Kinesis).
  • Business intelligence uses a mix of open source and managed services. When it comes to SQL, respondents favored open source tools (Spark SQL, Apache Hive) and managed services in the cloud (AWS RedShift, Google BigQuery).
  • Although a majority (60 percent) aren't using serverless technologies, one-third (30 percent) are already using AWS Lambda. In fact, 38 percent indicated that they were using at least one serverless technology - a pattern that remained consistent across geographic regions.

"It is clear that in 2019 companies are planning to invest in implementing analytics, AI and automation tools," said Ben Lorica, O'Reilly's chief data scientist and chair of the Strata Data Conference. "However, in order to do so successfully, initial investments must be made in the foundational technologies and infrastructure needed to sustain success. Our research shows that a majority of companies understand this and are already building - or at the very least evaluating - platform solutions and tools to make this possible."

For more information and to register to download a copy of the report, please visit:

Published Tuesday, January 29, 2019 10:59 AM by David Marshall
Filed under: ,
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
<January 2019>