A Serverless Explainer for Defenders


What is all this “serverless” stuff and why should security people care? The tl;dr is that serverless compute services – also known as “function-as-a-service” – are efficient at handling events on-demand in applications that don’t have a constant firehose of requests.

Your developers will vibe with not having to worry about anything but their code, your engineering leads will chill from not having to hire more ops headcount, and your executives will thirst over the potential cost savings. This means as a security professional, you should pepper your memory bank with an informed perspective on serverless computing lest you stay salty — and to avoid frantically trying to backfill your knowledge if serverless security concerns become a higher priority to your organization.

What it means: “Serverless” does not mean that code magically runs without any servers involved on the backend. It is “serverless” in the sense that the developer never interacts with the servers underlying the computing environment. The service provider deals with provisioning, scaling, upgrading, and managing all the things that lie beneath the application code being run.

The options: AWS Lambda, Google Cloud Functions, and Azure Functions are, unsurprisingly, the top three players in the function-as-a-service (FaaS) market for now. Edge computing services, like Cloudflare Workers and Fastly [email protected], represent an emerging alternative to FaaS.

How it works: At a high level, a serverless system works by receiving a specified event and consequently executing the specified code. FaaS is described as “event-driven” because a function only executes when a particular thing happens. For instance, a user visiting a static webpage delivered via AWS Cloudfront could represent the specified event, and a browser prompt to enter in a username and password to access the webpage could be specified as code in the Lambda function.

Of course, this represents an abstraction on top of a more complex process that underpins this functionality. When the FaaS system receives the specified event, it passes it to an API that either automatically finds an existing environment available for running the function’s specified code or spins up the serverless hamsters to create a new environment appropriately configured for running it.

In serverless lingo, this process is described as the function being invoked and routed to an execution environment. Any resulting output from the function invocation is passed back out to the relevant service. For instance, our previous password prompt function would be invoked by Cloudfront and routed to an execution environment for it to run the code and return the username and password form to Cloudfront for it to deliver to the user. From the user’s perspective, the only ingredients required are:

  • A specified event (or events)
  • A specified identity and access management (IAM) role
  • Some code written to handle the event(s)

The user must determine the type of event(s) to which they want the system to respond with some code. The user must also assign a particular access role to the function so that all the other services can be accessed to perform whatever the function is supposed to do. Naturally, this is a simplification (which is on-brand with the abstraction level of serverless!).

The security model: The two facets of the serverless security model worth understanding at a basic level to start are isolation and IAM. The good news is that some of the security toil required with other forms of compute is handled by the FaaS provider instead. Notably, the FaaS provider is the one that manages software updating and patching for all the layers beneath the user’s application code itself, down to the hardware level.

Isolation
The general idea behind the isolation model of FaaS across vendor implementations is to sandbox each function so that functions can’t access each other’s execution environments. There are two ways vendors can approach this:

  1. The function runs within a container which runs within a virtual machine (VM)
  2. The function runs within a container which interfaces with an emulated kernel that serves as a broker to the host kernel

 

What happens in each approach if an attacker uses a container escape to break the function’s isolation boundary? In the first approach, an attacker escaping the container will also need to escape the hypervisor boundary to access the underlying host kernel. If the same account had multiple functions running at the same time, the attacker could potentially access other function execution environments within the same VM – but that’s the extent of their reach.

In the second approach, an attacker will be manipulating an imitation kernel (running as an unprivileged process) rather than the real underlying kernel, limiting what they can execute1. If the attacker escapes the function sandbox, they gain access to whatever else is running on the host kernel.2

The first approach typically employs lightweight, performance-optimized VMs called “MicroVMs.” Firecracker, which AWS uses for Lambda and is written in Rust (so hot rn), is one example, but Microsoft Azure Functions, Alibaba Function Compute, and Oracle Functions use the same general approach, too, with different underlying VM types. The second approach, as far as I can tell, is only employed by Google Cloud Functions at present. Google leverages gVisor, written in Go (also so hot rn), as its hypervisor-based container engine.

The deepest level of isolation provided by either approach is at the function version level. You can think of function versions similarly to a VM snapshot (or, if you’re familiar with containers, an image version), capturing the unique code and configurations of the function (but the same event triggers can apply across function versions). Leveraging our prior example, the first version of the password prompt function might accept the password [email protected][email protected]. We decide to modify the code to instead accept Hell0_Zuk0_here as the password, which creates a second version that receives its own execution environment.

Neither approach provides isolation per invocation since it would be horrendously inefficient from a performance perspective. If each password prompt accepting Hell0_Zuk0_here required its own execution environment featuring a dedicated kernel, the few hundred milliseconds required for the serverless hamsters to spin up the environment would likely lead to unacceptable latency that could propagate throughout the service.

Nevertheless, the lack of invoke-level isolation creates the potential for lingering state from one invocation to affect the next invocation. A deeper dive into the resulting threat model based on these characteristics is best served as the subject of a future post.

Identity & Access Management
IAM is one of the few security responsibilities assigned to the customer rather than the provider, along with security of the code you write for the function and, to a certain extent, monitoring and observability. There are two facets of serverless IAM to consider: configuring IAM for human users to access FaaS resources and configuring IAM for non-human serverless functions to access other resources. Unfortunately, IAM for FaaS is not necessarily straightforward and specific implementation depends on the serverless provider used.

At a high level, most serverless systems employ a role-based access control (RBAC) approach. Depending on the provider, there are user roles like Owner / Admin, Editor / Developer, or Reader / Viewer that provide different levels of permissions. Access is typically granted to human users either on a per-function or per-project basis – again, depending on the provider3. It is unclear how often FaaS permissioning is borked (such as festering accounts lacking an active owner4), but it is likely worth your attention as a defender.

Functions receive their own non-human identity to access any resources required to perform their defined purpose. Generally, the user must (or at the very least, can) configure an execution role when creating a function that defines either the specific actions or specific resources the function can access. For instance, with Lambda, permissions can be defined to allow the function to read events from a Kinesis data stream, send logs to Amazon CloudWatch Logs, or connect to a virtual private cloud (VPC)5.

Unsurprisingly, the classic security principle of Least Privilege is relevant for FaaS, too. Implementation of least privilege is easier said than done, and I personally fully anticipate that misconfiguration of FaaS access control will present a thorny challenge for defenders until there is clearer documentation, less disparate terminology across providers, standardized FaaS IAM patterns, or all of the above. While the problem area may be more boring than isolation model concerns like function escapology via fancy exploitation, it is no less onerous.

Conclusion: Serverless compute offers a cost-effective, maintainable path for organizations to quickly implement new application functionality – which is why defenders of all stripes, but especially infosec professionals, should start understanding it at sufficient depth to appropriately define a defensive strategy for it.

FaaS presents multiple tangible benefits for defense. A serverless function is like a box within a box, with more extensive isolation boundaries than containers6. Vulnerability scanning and patching of everything but the application code is offloaded to the service provider. The ephemeral lifespan of functions makes persistence more challenging to attackers.

However, there are still security challenges for serverless ahead, and it isn’t like attackers just give up and retire when they encounter new infrastructure and defense paradigms (otherwise ASLR would’ve entirely murdered memory corruption as an exploit class). IAM for FaaS is fragmented and non-trivial, and concerns around security observability and compliance require further discussion, too.

—-
If you’d like to learn more about serverless, join our live webcast, “Infrastructure Security: Doing More with Serverless,” on March 24 with Capsule8 CTO Ryan Petrich. Ryan will define serverless as well as explore some of the key benefits of deploying this type of infrastructure with a focus on the impact to an organization’s attack surface. Register here.

If you’re already on your serverless journey and exploring your security monitoring strategy, we’re opening up our design partnership program to additional participants interested in influencing the near future of our serverless security observability product. You can contact us here. Otherwise, stay tuned for announcements of new capabilities soon.

[1] For more on gVisor’s memory safety stuff, see https://gvisor.dev/blog/2020/09/18/containing-a-real-vulnerability/

[2] It’s unclear if GCP schedules other customers’ functions on the same host kernel or if a particular host kernel is reserved only for functions from a single account.

[3] Lambda offers AWS-managed policies tied to API actions as an alternative to defining permissions by function or project. Other providers do not seem to offer this capability.

[4] There can be a bias towards IAM mistakes failing open — meaning that operations continue even if an IAM misconfiguration occurs — since delivery of the application to end-users is often higher priority than authenticating access. This means that usage of an account not associated with an active or valid user could somewhat easily go unrecognized without additional scrutiny.

[5] For more examples, see: https://docs.aws.amazon.com/lambda/latest/dg/lambda-intro-execution-role.html

[6] To be clear, one’s cloud-native security strategy should not rely on the isolation boundary of containers holding firm.