What is eBPF?
eBPF stands for extended Berkeley Packet Finder and is a Linux based architecture that runs directly in the kernel and allows for resource efficient observability. Although a recent spike in eBPF interest may lead one to believe that this is a new approach, eBPF has a rich history that stems from research dating back to 2011.
eBPF was created to address the need for a lightweight, non-agent based way to monitor the performance of applications. By running at the kernel level, eBPF allows for event-driven metrics of numerous hook locations - Examples include network events, function entry & exit, system calls, and more.
The History of eBPF
The “Berkely Packet Filter” was initially released in 1992 as a tool that allowed programmers to analyze network traffic. It achieves this goal by interfacing with the second layer (Data Layer) of the OSI model. Furthermore, in promiscuous mode, BPF is capable of receiving packets that are destined for other hosts.
In 2007, Robert Watson and Christian Peron made a critical addition to BPF. Their zer-copy buffer allowed kernel packet capture in the device driver interrupt handler. This allowed it to write directly to user process memory thus eliminating the need for two copies for the data captured.
As of 2014, the primary maintainer of Linux, David S. Miller, incorporated the rework of the in-kernel BPF interpreter which had been labeled eBPF. As the protocol matured, it gained various features that furthered its ability to monitor network traffic.
Understanding eBPF
Since 2014, eBPF expanded the original idea of BPF. It did so by delivering tools that deploy and run code directly on the Linux kernel. In other words, the primary use case of eBPF became speed, performance, low intrusiveness, security and convenience when it comes to dynamic tracing.
The simplified eBPF is shown in the figure above. The protocol allows applications in the user space layer to generate and execute logic on the Linux kernel. After the verification is cleared, execution can start. At this point, the user can request metrics from the hooks deployed by the application.
The main verification and eBPF level program will undergo before it’s executed is related to infinite loops also known as deadlocks. By making sure that the kernel will operate without interrupts, we ensure a continuous operation that won’t cause problems for the other software that’s queued in the stack.
eBPF Hooks
As mentioned above, eBPF allows programs to establish hooks into the kernel that execute in an event-driven environment. In other words, they will detect changes based on thresholds set by the user / kernel and transmit the data accordingly. What kind of hooks can be used with eBPF?
- System Calls | The calls are made at the time of function insertion & execution on the kernel.
- Function Entry & Exit | Before a function is executed and after it’s complete, the kernel can transmit interrupts over eBPF.
- Network Events | All inbound and outbound traffic can be monitored through eBPF.
- Kprobes and uprobes | eBPF calls can be attached to probes for kernel or user functions.
Note that the examples above aren’t exhaustive.
eBPF Helper Functions
As eBPF triggers through the hooks deployed on the kernel, they can execute specific - helper functions. These functions are vital to the usefulness of eBPF as they allow applications to execute custom code at the kernel level, thus making eBPF extremely versatile. What kind of tasks can be accomplished by the helper functions?
- Socket-Based Functions | Networking operations such as binding, cookie retrieval, packet sniffing, and more can be performed by these calls.
- Metadata Functions | Retrieve, modify, and send metadata.
- Data Manipulation | Search, delete, modify, and store key-value pairs in various tables.
- eBPF Specific | Tail calls used to chain and execute subsequent eBPF functions.
eBPF Mapping
As discussed above, eBPF aims to connect user applications and what’s happening at the kernel level. eBPF maps allow programs to store variables between function calls and share them accordingly with user space applications.
eBPF maps can be instantiated using the syscall “bpf_cmd” with the BPF_MAP_CREATE parameter. The maps are accessed through the Linux file descriptors.
Why is eBPF Important for Observability?
We’ve had numerous conversations with leaders in the software industry. We’ve uncovered that numerous tools are used for observability in every vertical. These tools aim to reduce downtime, increase development productivity and developer system ownership, and ultimately “do more with less.” The challenge in the current landscape of these tools is that they’re expensive, difficult to deploy and provide high-level metrics in many instances. In other words, they don’t always solve the problem and are often costly to implement and maintain. eBPF is lightweight and fast. The impact on computing power is minimal as it operates without an agent.
Furthermore, deploying without an agent allows eBPF to be integrated faster and without costly downtime. It’s important to note that eBPF has full accessibility to the kernel. If it or the applications it’s linked to were to be compromised, there’s a risk for the enterprise.
eBPF in the Kernel
eBPF protocols are validated and loaded into the Linux Kernel. What happens next? To start, the program is now ready to execute. It’s waiting for the right triggers. The triggers are continuously monitored. When the program is executed, it’s able to access the packets, inputs, outputs, etc using eBPF maps or predefined file descriptors.
eBPF programs are written in “low level” language as it needs to point and retrieve register values stored in the kernel. It’s common to write these functions in Go, or C++. Here’s an example of eBPF code to illustrate the process.
On the user side, the following snipper will return the number of programs executed on the system. Note that as discussed above, both components must be in place for the application to take advantage of eBPF. The code above runs in the kernel while the code below captures the response and gives access to the user application.
Building eBPF Applications - SDKs
It was difficult to build and deploy eBPF programs in the early days of the protocol. Since its “wide” adoption into the kernel in 2014, various tools have surfaced to make the developpes job easier than it used to be. Furthermore, we’ve also seen a number of improvements made to the bpf_helper functions, as well as eBPF maps.
We’d like to highlight two key tools when it comes to building applications that leverage eBPF - BCC and libpf.
A number of high level languages have adopted libraries needed to build eBPF functions. We’re seeing developers utilize Python, Golang, and Rust.
Conclusion on eBPF
eBPF was built with the intent of giving difficult to access kernel metrics to user programs.It has come a long way since the late 1990s. It allows programmers to create functions that create a link between the Linux kernel and external software based on specific triggers and maps. A number of libraries and tools have been developed to support observability. We believe that eBPF adoption will continue to grow and bring value to businesses by reducing runtime computing power and decrease the cost of observability metrics.