Happy New Year! Welcome to this issue of Activation Function.
For those of you who are new, it’s pretty simple - Every other week, I introduce you to a new and exciting open-source backend technology (that you’ve probably only kind of heard about ) and explain it to you in 5 minutes or less so you can make better technical decisions moving forward.
In this issue, we’ll explore K8sGPT, an open-source tool that combines expert knowledge and Generative AI to help you identify, understand, and troubleshoot your Kubernetes cluster in plain English.
Before you dismiss it as “just another ChatGPT wrapper,” let me assure you that it’s not. It’s much more than that. I think it’s a thoughtful implementation of GenAI that has the potential to bring tangible benefits and make your life a bit easier.
Alright, Let’s dig in!
Quick Facts:
- K8sGPT was launched at KubeCon Amsterdam 2023.
- K8sGPT is open-sourced under an Apache-2.0 license.
- K8sGPT was accepted to CNCF on December 19, 2023, at the Sandbox maturity level.
- K8sGPT now has over 4K GitHub stars and 50 contributors.
Introduction
For years, we’ve used ML for APM and Observability to identify patterns, trends, and anomalies (e.g., behavioral changes in applications or infrastructure), anticipate performance issues, generate early warnings, etc.
But so far, it hasn’t been great at actually helping us understand and fix issues.
The story goes something like this.
You get an alert from your ML-enabled monitoring tool and then spend a lot of time looking through and interpreting complex error messages, trying to communicate those to your team, and iterating through remediation steps until you fix the problem.
Enters K8sGPT. Its primary function is to help you quickly triage and remedy issues by giving hints and shortcuts in plain English. It does this by leveraging expert knowledge, Natural Language Processing (NLP), and Generative AI.
How K8sGPT Works
K8sGPT uses Analyzers that contain codified knowledge, which is essentially a series of rules/checks that an experienced Site Reliability Engineer (SRE) might use to diagnose Kubernetes issues such as pod crashes, service failures, and ingress misconfigurations, etc.
Thought: It was a good decision to avoid using GenAI at this stage to prevent speculative guesses, hallucinations, and, ultimately, a lot of noise.
Once K8sGpT is connected to your Kubernetes cluster through the API Server, the Analyzers will start scanning your cluster and looking at logs, error messages, configurations, and pod status to identify potential issues.
Technical Note: K8sGPT has a set of analyzers that are built in, but you will be able to write your own analyzers.
It then tries to correlate issues and run additional checks if necessary to add more context and figure out the root cause before packaging the data and sending it to the AI backend provider to get a simple and descriptive text of the issue and potential remediation steps.
Technical Note: As you can see, the use of Gen AI is targeted and practical. Of course, the AI can hallucinate and provide incorrect remediation steps, but that’s a minor inconvenience you can spot and ignore easily.
Getting Started with K8sGPT
The easiest way to get started is using a CLI installation by running a brew command on Linux/Mac or downloading the Windows Bianary if you’re on Windows.
Once installed, you’ll need to authenticate your chosen AI backend, and within a few minutes, you’ll be ready to run your first scan.
Technical Note: K8sGPT is designed to work with various AI providers, including OpenAI, Cohere, Amazon Bedrock, Amazon SageMaker, Azure OpenAI, Google Gemini, and LocalAI.
You can find how to configure different backends here.
Once everything is configured, you can start using K8sGPT directly from your CLI using the following set of commands:
[Technical Note] You’ll soon be able to debug interactively by asking K8sGPT questions about any issue directly in your CLI. As for writing, this feature isn’t live.
K8sGPT Operator
K8sGPT can also be installed as an Operator within your Kubernetes Cluster. By using the K8sGPT operator, you’ll be able to scan your cluster for errors continuously.
The K8sGPT Operator works by leveraging Kubernetes Custom Resources and producing reports that it stores in your cluster as YAML manifests.
[Technical Note] You can cache your results locally or offload them to a remote location like AWS S3.
You can also easily customize the analysis and output of the operator to fit your needs and workflow. For example, you specify a certain namespace or resource and a specific output format like JSON. This makes K8sGPT easy to integrate into your CI/CD pipelines and other workflows.
Integrations
K8sGPT integrates with observability tools like Grafana and Prometheus, but the really cool thing is that it offers an integration feature that allows you to extend scanning and troubleshooting capabilities by writing your own plugins that connect it to other tools.
The first one, contributed by the core team, is the Trivy plugging, which enables you to generate vulnerability scans as part of your analysis and leverage AI to help you understand and triage Common Vulnerabilities and Exposures (CVEs).
Security & Privacy
Alright! I’m sure this is the first or second thing that came to your mind when hearing that K8sGPT sends your data to an external AI-API.
Well, the K8sGPT team thought of this and developed a nifty little feature called anonymize, which masks sensitive data like Kubernetes object names and labels before sending it to the AI backend for analysis.
All you need to do is use the anonymize flag:
k8sgpt analyze --explain --anonymize
During the analysis, k8sGPT retrieves sensitive data and masks it before sending it to the AI backend. Once the solution is returned, the masked data is replaced with the actual Kubernetes object names and labels.
Where to go from here?
K8sGPT is still a nascent project and is quickly evolving thanks to an active community of contributors. As of writing, an exciting new feature on the horizon is the interactive debugging mode that lets you ask questions directly inside your terminal.
Other things on the horizon include Kubernetes auto-remediation with K8sGPT, Karpenter, deep integration into AWS services, and new backends into services such as Hugging Face.
If you want to dig deeper into K8sGPT, here are a few resources for you:
Documentation & Repos
- k8sgpt - The main code repository
- docs - Documentation Website (https://docs.k8sgpt.ai)
- community - Community-related information
Tutorials
- K8SGPT - Kubernetes SRE SUPERPOWERS
- Using AI with Kubernetes | K8sGPT Operator | DEMO
- Full Tutorial: K8sGPT -- SRE superpowers through AI
- K8sGPT + LocalAI: Unlock Kubernetes superpowers for free!
Talks
- AI for Kubernetes with ChatGPT and k8sgpt
- Unleashing the Power of AI in Kubernetes through K8sGPT | Alex Jones
Communities
- Slack: https://k8sgpt.slack.com
Who to Follow
- Alex Jones (Creator of K8sGPT) on Linkedin and X
- K8sGPT on LinkedIn and X
- Thomas Schuetz (Maintainer) on Linkedin and X
- Matthis Holleville (Maintainer) on Linkedin and X
It’s a wrap! I hope this gave you a good overview of K8sGPT and how Gen AI might make your life a bit easier when troubleshooting your Kubernetes clusters. If you want to take K8sGPT for a quick spin, I suggest you check out this Sandbox tutorial.
Until next time!