Article Written by Daniel Kulp, VP of Open
Source Development and Application Integration at Talend
Twenty years ago, the Open Source framework was published,
delivering what would be the most significant trend in software development
since that time. The Open Source Initiative, a non-profit organization that
advocates for open source development and non-proprietary software, pegs the
date of inception at February 3, 1998.
Since its creation, OSS has disrupted the status quo in
groundbreaking ways while also becoming mainstream in the process. According
to Ovum,
open source is now the default option across several big data categories
ranging from storage, analytics and applications to machine learning. In the
latest Black Duck Software and North Bridge's survey,
90 percent of respondents reported they rely on open source "for improved
efficiency, innovation and interoperability," most commonly because of "freedom
from vendor lock-in, competitive features and technical capabilities, ability
to customize, and overall quality."
If you're an IT leader of any sized organization, you should be
thinking about and planning for incorporating OSS into your infrastructure, or
thinking about the next project if you've already started. OSS can enable
extreme agility and lightning fast responses to customers, business needs and
market challenges -- but with thousands of successful open source projects
underway, there are so many options it can be hard to know which to take note of.
Thus, here are five projects we recommend you look into and keep a
pulse on, to consider for the potential impact they may have on your IT
infrastructure and overall business:
1. Apache Beam is a project model
that got its name from combining the terms for big data processes batch and
streaming because it's a single model for both cases. Beam = Batch + strEAM.
Under the Beam model, you only need to design a data pipeline once, and choose
from multiple processing frameworks later. You don't need to redesign every
time you want to choose a different processing engine, meaning your team can
choose the right processing engine for multiple use cases.
2. Apache Carbon Data is
an indexed columnar data format for incredibly fast analytics on big data
platforms such as Hadoop and Spark. This new kind of file format solves the
problem of querying analysis for different use cases. With Apache Carbon, the
data format is unified so you can access through a single copy of data and use
only the computing power needed, thus making your queries run much
faster.
3. Apache
Spark is one of the most widely used Apache projects
and a popular choice for incredibly fast big data processing (cluster
computing) with built-in capabilities for real-time data streaming, SQL,
machine learning and graph processing. Spark is optimized to run in memory, and
enables interactive streaming analytics. Unlike batch processing, you can
analyze vast amounts of historical data with live data to make real-time
decisions, such as fraud detection, predictive analytics, sentiment analysis
and next-best offer.
4. Docker and Kubernetes are container and
automated container management technologies that speed deployments of
applications. Using technologies like containers makes your architecture
extremely flexible and more portable. Your DevOps process will benefit from
increased efficiencies in continuous deployment.
5. TensorFlow is an extremely
popular open source library for machine intelligence, which enables far more
advanced analytics at scale. TensorFlow is designed for large-scale distributed
training and inference, but it's also flexible enough to support
experimentation with new machine learning models and system-level
optimizations. Before TensorFlow, there was no single library that deftly
caught the breadth and depth of machine learning and possessed such huge
potential. But TensorFlow is very readable, well documented and expected to
continue to grow into a more vibrant community.
Not all open source projects are created equal, and not just any
open source project will propel your company to the head of the pack. Every
company must still develop its strategy and choose the open source project that
would best fuel the desired business outcomes. It's important to join the open
source communities relative to your projects and interests, to educate
yourself, your team and management about the different benefits. OSS is so
valuable largely in part because you can leverage the collective minds of the
community instead of reinventing the wheel.
At the end of the day,
change has always been the only constant in human existence and business. But
change in technology is happening faster now than at any other time in history.
By staying open-minded, attuned to open source and aware of the many ways to
use data and analytics, you'll be well prepared for whatever pops up next on
the horizon.
##
About the Author
Daniel
Kulp is an ASF member and committer of Apache CXF, Apache Aries, Apache Maven,
Apache WebServices, Apache ServiceMix and Apache Camel.
Daniel attended
Northeastern University in Boston where he received degrees in Chemical
Engineering and Computer Science. As the VP of Open Source Development for the
Application Integration Division at Talend, Dan gets to practice his passion
for coding open source at work, and still has time to dedicate to his loving
family.