In an exclusive pre-event interview with VMblog, Manveer Sahota, Sr. Director of Product Marketing at Starburst, offers a compelling preview of the company's cutting-edge data platform strategies for AWS re:Invent 2024.
Starburst is set to showcase its Open Hybrid Lakehouse platform, Starburst Galaxy, which promises to revolutionize data analytics by addressing critical challenges facing enterprises in 2024, including rising cloud costs, data silos, and the accelerating demand for AI-driven insights.
VMblog: Before we dive into AWS re:Invent specifics,
can you give our readers a brief overview of your company and what sets you
apart from others in the market?
Manveer Sahota: Starburst,
the Open Hybrid Lakehouse, is the leading end-to-end data platform to securely
access, analyze, and share data for analytics and AI across hybrid,
on-premises, and multi-cloud environments. As the leaders in Trino, a modern
open-source SQL engine, Starburst empowers the most data-intensive and
security-conscious organizations like Comcast, Halliburton, Vectra, EMIS
Health, and 7 of the top 10 global banks to democratize data access, enhance
analytics performance, add intelligence to their analytics stack with AI
Agents, and improve architecture optionality. With the Open Hybrid Lakehouse
from Starburst, enterprises globally can easily discover and use all their data
to power business-critical applications like anti-money laundering and fraud
analytics, next-best products, customer 360, log analytics, and ESG
reporting.
VMblog: How can attendees of the event find you? What do you have planned at your booth this
year? What type of things will attendees
be able to do at your booth?
Sahota: Attendees can find us at booth 1175 and we are bringing back our
show-stopping swag! Guests can also meet some of Starburst's executives and
lead solution architects to learn more about our open hybrid lakehouse,
Icehouse, how to effectively create an AI strategy with Data Products, and
leverage AI Agents to build, understand, and power analytics with Data
Products.
VMblog:
Have you sponsored AWS re:Invent in the past? If so, what is it about
this show that keeps you coming back as a sponsor?
Sahota: In our seven-year history, Starburst has been a sponsor for four years.
Each year, we continue to be amazed by the energy and the technologists that
the event brings from around the world, who share a passion for data,
technology, analytics, and AI.
VMblog: Do you have any speaking sessions during the
event? If so, can you give us the
details?
Sahota: Attendees can join Adrian Estala, Starburst Field Chief Data Officer, on
Tuesday at 2:30pm PT in Theater 4 (Data & AI/ML Pavilion) for Feed your AI
Strategy with Data Products to learn how organizations can leverage concepts
like Data Products to accelerate their AI initiative from concept to
production.
VMblog: What are the most significant cloud-related
challenges your customers are facing in 2024, and how does your solution
address these pain points?
Sahota:
- Rising
costs - Customers seek greater efficiencies in their data, analytics, and
AI stacks. This means they are consolidating to the platform and only
relying on novel point solutions for very specific business problems.
Starburst is helping its customers by augmenting or completely replacing
their data analytics stacks with our lakehouse, which provides
industry-leading price performance for streaming and interactive
workloads.
- Concerns
over lack of tech stack flexibility - Customers also seek greater
flexibility and ownership of their data and analytics. This means they
want to be able to use the right tools for the job, and in the case of
analytics, using open table formats like Iceberg and Trino-based engines
in their Lakehouse architecture to enable them to take ownership of their
data and SQL.
- Data
silos - With the surge of AI adoption and a greater need for improved
analytics, data silos are stifling innovation and limiting intelligence
within an organization. Customers are looking for secure and effective
means to discover and access relevant data for analytics and AI without
worrying about data movement or duplication, especially for non-critical
internal workloads, which raises costs and security risks.
- Accelerating analytics -
with dynamic market conditions driven by customer demand, competition, and
changing geopolitical environments, businesses are looking to improve
their responsiveness with faster time to insights to make informed
decisions without paying for the premium of traditional real-time
analytics. Therefore, customers are seeking solutions like Starburst to
enable near real-time analytics where large volumes of data can be
ingested, transformed, governed, and ready to be queried within a minute -
with optimized efficiency for data management and compute costs.
VMblog: Which of your products or solutions will you
be highlighting at AWS re:Invent 2024, and why are they particularly relevant
for today's AWS users?
Sahota: We'll be showcasing Starburst Galaxy, our fully managed Lakehouse
platform that brings together the power of Trino with the flexibility of
Iceberg. Galaxy is relevant for today's AWS users because it simplifies the
data and analytics experience for data practitioners (data engineers, data
scientists, and data analysts) by automating and simplifying complex tasks and
infusing AI Agents to understand and analyze data, while helping business teams
realize value from their distributed within minutes without incurring massive
cost run-ups.
VMblog: How does your technology specifically
complement or enhance AWS services, and what unique value proposition do you
offer to AWS customers?
Sahota: Starburst helps AWS customers build and maintain an Iceberg-based data
lakehouse while still providing access to their distributed data across
warehouses and other SaaS applications. Starburst easily fits into an
AWS-centric architecture, as highlighted by Gilead Sciences at re:Invent in
2022, and with the latest enhancements, we've drastically improved the
experience. Now, customers of Starburst and AWS can easily ingest streaming
data from Amazon MSK or other Kafka topics into Iceberg tables in S3 at verified
rates of ingestion of 100 GB/second, use
automated data transformation and compaction to make the data usable, apply
necessary governance to secure it, and then analyze the data with a highly
optimized Trino based SQL engine. Users can also leverage Starburst to
discover, access, and analyze their data in Amazon Redshift and 20+ other
sources and continue to use Amazon Glue as their data catalog of choice.
Lastly, Starburst brings AI Agents front and center with AWS Bedrock to make it
easier than ever to understand, build, and analyze data products. Ultimately
Starburst helps to accelerate and simplify data management, analytics, and data
sharing within AWS.
VMblog: With AI and machine learning being hot
topics, how is your company incorporating these technologies to improve cloud
operations and management?
Sahota: Starburst includes a few
GenAI capabilities to improve user experiences and help automate some data
engineering tasks. These futures are currently in preview.
- AI Agents for Data Products to power intelligent
analytics: This transformational Data Product capability allows organizations
to harness the full potential of ALL of their data assets, regardless of source
and location, for AI-driven/AI-assisted solutions, insights, and
decision-making. An AI Agent within Starburst simplifies the discovery of
business context of enterprise data from a schema to columns, documents or
enriches data, creates Data Products, and makes the Data Products available to
AWS Bedrock for AI assist / allowing users to use natural language to analyze
the data.
- SQL statement generation from business questions
(text-to-SQL): Starburst allows users to ask natural language questions about
their data within a schema or table, resulting in an SQL statement answering
the question. As part of the prompting process, we provide additional metadata
about the source to ground the LLM and help provide more pertinent results.
- Query explanations (SQL-to-text): This enables
Starburst to not only explain the query that was generated and executed but
also allows you to provide additional context and dig deeper into the questions
you may have.
As a chatbot, SQL-to-text can
be leveraged to generate any type of output you desire, from a simple technical
or domain-specific answer to a question to a summary or comprehensive
documentation that can be used with your data assets or products.
This also means business
continuity can be maintained - critical, but undocumented transformations can
quickly be explained. With added context provided by the user during the
Q&A process, businesses can also more effectively derisk staff turnover.
- Data classification: Starburst ABAC allows administrators to associate access policies to tags, and apply policies to data by assigning tags to the data. The tags and policies are typically created and assigned based on the data’s business context (e.g. a “PII” tag with a masking policy to an e-mail or username).
While ABAC more effectively allows administrators to adhere to principles of least privilege, its scalability is low compared to RBAC, particularly for larger organizations. Starburst’s data classification feature makes ABAC scalable by leveraging AI paradigms to assess data and recommend tags that an administrator can apply, modify, or ignore. This reduces the last-mile burden of tagging to allow organizations to scale their ABAC implementation.
VMblog: What new product announcements or
demonstrations can attendees expect to see at your booth during AWS re:Invent
2024?
Sahota: Attendees can visit us at Booth 1175 to see Starburst Galaxy, our fully
managed Open Lakehouse built with Trino, in action. Core demos on display will
include:
- AI Agents for Data Products to power intelligent
analytics: This transformational Data Product capability allows organizations
to harness the full potential of ALL of their data assets, regardless of source
and location, for AI-driven/AI-assisted solutions, insights, and
decision-making. Attendees can see how they can use an AI Agent to discover the
business context of enterprise data from a schema to columns, document or
enrich data, create Data Products, and
make the Data Products available to AWS Bedrock for AI assist / allowing users
to query the data with natural language.
- Near real-time analytics: See how Starburst can
ingest up to 100GB/second of Amazon MSK or other Kafka topics, land into an
Iceberg table in S3, transform and govern it, and make it ready to be analyzed
within a minute by an optimized Trino engine.
- Data federation: Discover and securely access
distributed cloud data across AWS S3, Redshift, Snowflake, BigQuery, and 15+
other sources for interactive or ad hoc analytics.
VMblog: Cost optimization in the cloud remains a
crucial concern - how does your solution help organizations maximize their AWS
investment?
Sahota: Starburst helps organizations maximize their AWS investment by offering
industry-leading price performance for their analytics, which lowers their
compute costs. Starburst easily integrates into the existing AWS stack,
requiring minimal configuration, and customers can begin realizing value
immediately. Starburst can deliver up to 9.85x cost savings for streaming and
interactive workloads and up to 11.5x faster SQL.
VMblog: Security and compliance are top priorities
for AWS users. How does your solution strengthen an organization's cloud
security posture?
Sahota: Starburst becomes a single access point for organizations' distributed
data, whether in S3, Redshift, or other data stores. At this single point of
access, customers can apply necessary access control policies using RBAC and
ABAC to ensure the right users can access authorized data. Furthermore,
customers can use AWS Private Link with Starburst Galaxy for more security.
VMblog: What hands-on experiences or interactive
demonstrations will you be offering at your booth this year?
Sahota: This year, attendees can visit booth 1175 for
interactive demos from expert solution architects on AI Agents for analytics,
data products, Icehouse, federated queries, price-performant SQL, and more.
One of the sexiest demos at the show will be
using AI Agents to build, understand, and analyze Data Products with AWS
Bedrock and Starburst.
VMblog: Many organizations are adopting multi-cloud
strategies. How does your solution support customers who use AWS alongside
other cloud providers?
Sahota: Starburst can be easily used across AWS, Azure, and GCP, as it's
available as a SaaS or self-managed solution on all three clouds. Furthermore,
customers can deploy on AWS and use federated queries to analyze data in other
non-object store cloud sources like Snowflake for ad-hoc analytics or data
discovery use cases.
VMblog: What specific roles or job functions within
an organization would benefit most from visiting your booth at re:Invent?
Sahota: Practitioners and leaders in data engineering, data science, AI/ML
engineering, and analytics can all benefit from seeing how Starburst's Open
Hybrid Lakehouse makes it extremely easy to discover, access, govern, analyze,
and share.
VMblog: For attendees who want to learn more, what
special offers, resources, or follow-up opportunities will be available at your
booth during AWS re:Invent 2024?
Sahota: We encourage attendees to visit us at booth 1175 to see Starburst Galaxy
in action and sign up for a free trial to experience the power of Galaxy
firsthand. We'll also have additional resources related to Data Products, Near
Real-time Analytics, Hadoop
Modernization, and more for attendees to pick up or download.
VMblog: Are you giving away any prizes at your booth
or participating in any prize giveaways?
Sahota: No, but we are bringing back our show-stopping
swag that is free for all re:Invent attendees.
##