Virtualization Technology News and Information
Article
RSS
Databricks Expands Platform for Turnkey Production Apache Spark Deployments in the Cloud
Databricks, the company founded by the team that created the popular Apache Spark project, announced new capabilities to its platform that further simplify the production deployment of Spark in the cloud. The production enhancements complement the existing Databricks environment for data science, which enable users to collaboratively analyze data in real-time with data science notebooks and immediately deploy them as production Spark jobs and workflows. The announcement was made today at the 2016 Amazon Web Services (AWS) re:Invent conference.

The production features announced today enable users to effortlessly setup and run Spark jobs and workflows without humans in the loop via APIs, monitor performance and troubleshoot errors with detailed logs, manage AWS EC2 costs with AWS Tags, control access to resources with AWS IAM Roles, and increase the scalability of long-running workloads with encrypted AWS Elastic Block Storage (EBS). Databricks is the first and only vendor to offer a SOC2 and HIPAA compliant Spark platform that provides turnkey deployment of both real-time analysis and production Spark workloads with a seamless transition from analysis to production.

As organizations across industries deploy Apache Spark in the public cloud, the task of minimizing costly downtimes of mission-critical workloads, such as applications that predict equipment failure, falls on data engineering teams. Yet, building sophisticated systems around Spark to ensure that such workloads are resilient, easy to troubleshoot, and secure, requires a high level of technical expertise and meticulous efforts that most organizations struggle to spare.

"As enterprises increasingly rely on Apache Spark to power more diverse production workloads supporting more people, it becomes critical to prevent business system outages that could cost millions of dollars," said Nik Rouda, Senior Analyst at Enterprise Strategy Group.

In Databricks' production environment, data engineers can bypass the difficult and tedious tasks of developing, configuring, tuning and securing infrastructure to easily achieve production requirements with features such as:

  • HIPAA and SOC2-compliant Apache Spark clusters fully managed and tuned by the Spark committers at Databricks;
  • REST APIs to orchestrate and monitor sophisticated Spark jobs and workflows programmatically, without humans in the loop;
  • End-to-end logs and performance metrics to easily debug and fine-tune Spark workloads, accessible via APIs programmatically or in the Databricks user interface;
  • Customizable AWS tags to manage the AWS EC2 usage of each Spark cluster;
  • Encrypted AWS Elastic Block Storage (EBS) to increase the reliability of long-running Spark jobs on AWS EC2 instances by automatically providing additional storage;
  • AWS IAM Roles integration to provide secure access to AWS resources to diverse user groups in the same organization;
  • Direct integration with the data science environment to let organizations instantly move exploratory work to production without re-engineering;
  • SSH Access to provide engineers direct access to the production environment to troubleshoot and inspect the Spark clusters.

"Databricks is experiencing unprecedented demand for a robust and secure Apache Spark platform in the cloud to run production workloads," says Ali Ghodsi, CEO and Co-Founder of Databricks. "We are proud to enable one of our core user groups, the data engineers, to meet the most stringent of operational requirements."

Visit databricks.com or Booth #1341 at AWS re:Invent to learn more.

Published Wednesday, November 30, 2016 4:14 PM by David Marshall
Filed under:
Comments
There are no comments for this post.
To post a comment, you must be a registered user. Registration is free and easy! Sign up now!
Calendar
<November 2016>
SuMoTuWeThFrSa
303112345
6789101112
13141516171819
20212223242526
27282930123
45678910