In a brief video explainer and commentary, Josh Stella, chief architect at Snyk lays out the five universal fundamentals for running an effective cloud security program.
The word "misconfiguration" can seem quite innocuous - an innocent mistake that's easy to fix, like putting your car into drive while the parking brake is still engaged. You quickly realize what's wrong and release the brake. But what if there are hundreds of misconfigurations throughout your car? No one has the time to manually check the engine, transmission, suspension, brakes and electronics every time they get in their car, even though any one misconfiguration could result in car failure or personal injury.
Cloud infrastructure environments also involve a large number of complex configurations, all of which are the responsibility of the cloud customer to set and maintain. Customer mistakes in the form of cloud misconfigurations can result in system downtime or a major breach. And every enterprise cloud environment is rife with misconfigurations.
According to The State of Cloud Security 2021 Report, 36% of companies suffered a severe cloud security leak or breach due to cloud misconfiguration in just the prior year. The National Security Agency (NSA) warns that "misconfiguration of cloud resources remains the most prevalent cloud vulnerability and can be exploited to access cloud data and services. Often arising from cloud service policy mistakes or misunderstanding shared responsibility, misconfiguration has an impact that varies from denial of service susceptibility to account compromise. The rapid pace of [cloud service provider] innovation creates new functionality but also adds complexity to securely configuring an organization's cloud resources."
Misconfigurations Minor and Major
Identifying and fixing computer misconfigurations is not a new concept for security professionals. In the data center, things like network protocols and firewall ports are configurable components of an IT system, and a misconfiguration here can also represent a security risk that requires fixing.
In the cloud, misconfigurations vary from simple errors involving single resources, like leaving a dangerous port open, to deep architectural design flaws involving multiple resources that can be challenging for security teams to spot. But for all its complexity, the nature of cloud computing as 100% software means these mistakes are entirely preventable. Those that successfully prevent misconfigurations address it as a software engineering problem, not as they would with physical data center infrastructure.
That's because cloud infrastructure isn't physically built, it's programmed. And those programming it, typically developers or DevOps engineers, are making decisions about the configuration of their cloud infrastructure - and then changing it on a daily basis. You want them to have this power because it's critical to their ability to deploy and improve applications rapidly, which is one of the biggest drivers of cloud adoption. But every change brings new risks - and new kinds of risks.
Application programming interfaces (APIs) drive cloud computing, and they play a central role in how we use the cloud - and how attackers exploit it. APIs are the software "middlemen" that allow different applications and cloud resources to interact with each other. There is no fixed IT architecture in a centralized location, and security teams can't rely on any network perimeter to identify and block incoming attacks.
Security teams need to focus their attention on the cloud control plane, which is the API surface used to configure and operate the cloud. For example, cloud customers use this control plane to build a virtual server, modify a network route, and gain access to data. But when attackers get a foothold in a cloud environment, they also use the control plane to learn about the environment, move laterally, and extract data. This is the cloud attack surface.
Every major cloud breach involves attackers compromising the control plane. Unlike data center attacks, which tend to follow a "low and slow" approach to avoid detection on the network, control plane compromise attacks are lightning fast "smash and grab" exploits that don't traverse traditional networks that can be monitored. Organizations that succeed in preventing these attacks are thinking differently about cloud security, starting with the developers and DevOps engineers who are creating and managing cloud infrastructure.
When developers build applications in the cloud, they're also creating the infrastructure for the applications - as opposed to buying physical infrastructure and shoving apps into it. Building cloud infrastructure is done with code, which means developers and DevOps largely own that process. This new paradigm compels the security team to become the domain experts on secure cloud architecture and impart that knowledge to the developers to help them build securely. The way security teams do this is with policy as code.
Policy as code enables security teams to express security and compliance rules in a programming language that an application can use to check the correctness of configurations. Programs can use policy as code to automatically check other code and running environments for unwanted conditions, including dangerous misconfigurations. This means all cloud stakeholders are operating on the same page on security without ambiguity or disagreement on the rules, and different teams are empowered to apply policy at every stage of the software development life cycle (SDLC).
Because the scale of cloud services in use across your organization is likely increasing and will continue to do so over the long term, automating the process of identifying and remediating cloud misconfigurations is essential to eliminating vulnerabilities before attackers can exploit them, and it reduces the manual burdens on security teams that are already likely stretched thin. When that automation is built on policy as code, you can scale it as cloud use and complexity grows. And developers can use those same policies to ensure infrastructure is secure pre-deployment to reduce the frequency of misconfigurations that need to be addressed by security teams.
Organizations that have implemented effective cloud security programs share characteristics that any enterprise can emulate. Whenever I'm asked where to start to better secure cloud environments, I first recommend establishing full knowledge of your environment and the SDLC for cloud infrastructure - and to start thinking like a hacker in order to identify flaws in your architecture. If you're solely focused on eliminating individual misconfigurations, you need to get it right 100% of the time, when hackers only need to get lucky once. You need to understand what an attacker could do should they penetrate your environment.
That leads to the second thing successful organizations focus on: prevention and secure design. Once an attacker has gained entry to your environment and has compromised the control plane, it's too late to detect and stop it. The key is to prevent misconfigurations from being deployed and design cloud environments to deny adversaries access to the control plane and minimize the blast radius of any potential penetration.
The third thing we see successful organizations do is empower developers and DevOps engineers with tools that help them design and build cloud infrastructure securely. This is a sea change in the relationship between your security team, developers and operations - with everyone integrating security into their processes. Security teams take on the role of security architects, guiding other teams. Again, the technology enabler here is policy as code.
The fourth thing we see every successful organization do is build cloud security on a foundation of policy as code. I've discussed policy as code at length here, but it is the essential technology foundation of any cloud security program if you want it to scale along with your cloud use, without having to scale up your security team. Without policy as code, you can't distribute policy across the organization consistently, and you can't empower developers with tooling to help them work more securely.
The fifth and final cloud security imperative is measuring what matters. You need to know where you stand today on cloud security, where you want to go, and be able to measure your progress along the way. How much risk are you taking in the cloud? How fast are your teams delivering secure innovation in the cloud? How many engineering hours are you investing in cloud security?
For example, if your developers are waiting around for security teams to manually review and approve deployments, how long are they waiting? How many hours are security teams investing in evaluating and prioritizing cloud misconfigurations and routing those to DevOps teams for remediation? How much time and effort is involved in the rework needed to address architectural security issues, as opposed to producing inherently secure architecture from the start?
Successful organizations view cloud security as an innovation enabler rather than a blocking function because the nature of the cloud means we can address security as a software engineering problem - and create software engineering solutions to help everyone move faster and more securely.
To summarize, those that get cloud security right focus on these five fundamentals:
- Know your environment: Understand everything running in your environment in full context, how it's designed and deployed, and how hackers could exploit it.
- Focus on prevention and secure design: Shift your security mentality toward preventing the kinds of cloud vulnerabilities hackers are exploiting through secure design and deployment processes.
- Empower developers: Shift left on cloud security by empowering everyone involved in designing, developing and managing cloud infrastructure with tools to help them get security right up front.
- Align and automate using policy as code: Get all teams operating under the same source of truth regarding security, and build a scalable technology foundation for cloud security.
- Measure what matters: Identify the key metrics you should be tracking around risk, velocity, and security investment. Establish your current baselines and objectives and measure your progress.