Guides

What is AWS CloudWatch?

Learn about AWS CloudWatch, its main features, and what metrics and value users can expect from the service. 

February 3, 2023

Introduction

Application Performance Monitoring (APM) has become a standard practice at large Enterprises in the last decade. Since its launch in 2009, AWS CloudWatch has gained popularity with significant enterprises worldwide, including JP Morgan Chase & Co, Electronic Arts (EA), and Stripe, to name a few, being a catalyst in mobilizing their APM game. 

In this article, we will cover AWS CloudWatch, its main features, and what metrics and value users can expect from the service. 

Let’s dive in.

What is AWS Cloudwatch? 

AWS Cloudwatch is a service offered by AWS that collects, monitors, and visualizes real-time logs, metrics, and event data. It allows users to take preventive courses and optimize the performance of their applications running on AWS, on-prem or multi-cloud. 

Anthony Giles, Executive Director - Architecture & SRE Center of Excellence at JP Morgan Chase & Co, sums up the cross-functional value of CloudWatch. 

“Our customers are our most important assets. Amazon CloudWatch provides developers and Site Reliability Engineers with real-time observability of systems and products that reduce customer impact. We collaborated across many Lines of Business, testing Amazon CloudWatch features that help monitor and troubleshoot with a real-time unified view across applications and infrastructure. This has produced game-changing observability and correlation of data on metrics, logs, and traces. Data without context is meaningless, and CloudWatch is helping contextually correlate data making it meaningful and impactful.”

The main stages that AWS CloudWatch promotes and orientates features around are:

  • Collect
  • Monitor
  • Act
  • Analyze
  • Compliance and Security 

Collect stage

This is where CloudWatch collects logs from your resources, applications, and services in real-time. 

CloudWatch features used in the Collect Stage are: 

  • Infrastructure and application logging
  • Containers logging
  • Lambda logging 
  • Stream Metrics (allows users to continuously stream metrics to a desired location)

Monitor stage

The main features enabling CloudWatch to monitor logs in complex systems include the following:

  • Cross-account observability across multiple AWS accounts
  • Unified operational view with dashboards
  • Composite alarms (combine multiple alarms to reduce alert fatigue)
  • High-resolution alarms (set thresholds on metrics to trigger alarms)
  • Application Insights
  • Container monitoring insights
  • Internet monitoring (internet issues impact performance
  • Lamda monitoring insights
  • Anomaly Detection
  • ServiceLens (visualize health, performance, and availability of applications)
  • Synthetics (monitor endpoints)
  • RUM (visibility into client-side performance of applications)

Act stage

The main features includes:

  • Autoscaling
  • Automate response to operational changes with CloudWatch events (take corrective action in real-time)
  • Alarm and automate actions on EKS, ECS, and K8s clusters

Analyze stage

CloudWatch allows you to:

  • Granular data and extended retention (up to one-second health metrics and 15-month retention)
  • Customer operations on metrics (calculus on current metrics to create an extra layer of context)
  • Log Analytics 
  • Analyze container metrics, logs, and traces. 
  • Analyze Lambda metrics, logs, and traces.
  • Contributor Insights (time series data to provide a view of top contributors influencing system performance)
  • Metric Insights (SQL-based query engine)
  • Evidently (lets application developers conduct experiments and identify unintended consequences of new features before rolling them out for general use via canaries) 

Compliance and Security 

Integrate with AWS IAM providing user governance and provisioning. CloudWatch is also PCI and FedRamp compliant.

Log monitoring with CloudWatch

There are three main categories of logs:

  1. Vended Logs: Natively published by AWS
  2. Logs published by AWS: More than 30 services publish logs to AWS, including Route 53, Lambda, and CloudTrail.
  3. Custom logs: Derived from your applications and on-premise resources. 

With CloudWatch, users can monitor for specific phrases, values, metrics, or patterns. 

For example, you look at request latency graphs for a specific application, the historic CPU performance of a database, or track CPU utilization for your EC2 instances. 

Others use cases of Log Monitoring with CloudWatch

  • Error monitoring in your Applications:

Users have the ability to set thresholds and alerts, flagging users with error volume that meets this threshold. This allows Ops or SREs to act therein then and not be blindsided.

  • EC2 Performance tracking:

In CloudWatch Dashboards, users can tag various EC2 instances and track CPU utilization, Disk Space, Status, Network Packet, and more in real-time and view historical logs, which helps create timelines when some go wrong.

Supported CloudWatch logs

For every log sent to CloudWatch, it generates five system fields:

System Field Description
@message Raw unparsed low event
@timestamp Event timestamp
@ingestionTime The time when CloudWatch logs received the log event
@logStream Contains the log stream that the event log was added to
@log Log group identifier, format: account-id:log-group-name

Here is a list of log types and log fields.

Log Types Log Fields
Amazon VPC flow logs @timestamp, @logStream, @message, accountId, endTime, interfaced, logStatus, startTime, version, action, bytes,dstAddr, dstPort, packets, protocol, srcAddr, srcPort
Route 53 logs @timestamp, @logStream, @message, edgeLocation,hostZoneId, protocol, queryName, queryTimestamp,queryType, resolverIp, responseCode, version
Lambda logs @timestamp, @logStream, @message, @requestId, @duration, @billedDuration, @type, @maxMemoryUsed, @memorySize If a Lambda log line contains an X-Ray trace ID, it also includes the following fields: @xrayTraceId and @xraySegmentId. CloudWatch Logs Insights automatically discovers log fields in Lambda logs, but only for each log event's first embedded JSON fragment. If a Lambda log event contains multiple JSON fragments, you can parse and extract the log fields using the parse command. For more information, see Fields in JSON logs.
Other @timestamp, @ingestionTime, @logStream, @message, @log.

As you can see from the Table, Lambda logs can potentially allow users to monitor the state of their serverless ecosystem through Lambda logs.

Installing CloudWatch

To collect logs from your Amazon EC2 instances and on-premise servers in CloudWatch logs, the recommended path is to use the Unified CloudWatch Agent.

The agent collects both logs and advanced metrics and supports multiple operating systems.

You can install the agent a few different ways:

  • Using the command line for Amazon Linux 2
  • Using the AWS System Manager
  • New instances using AWS CloudFormations
  • Verifying the signature of the CloudWatch agent package.

Metrics collected by the CloudWatch agent

The CloudWatch agent can be installed on EC2 instanced, on-premises servers, and on computers running Linux, macOS, and Windows Server.

When installed on a Windows Server, CloudWatch will collect metrics related to the counters Windows Performance monitoring. You can learn more about Windows Performance Monitor counters, by referring to the Microsoft Windows Server documentation.

When installed on  Linux of MacOS server instanced, CloudWatch will collect the metrics listed here.

Benefits of AWS CloudWatch

Interoperability with AWS native services

CloudWatch can seamlessly slot into your AWS stack, pull logs from over 100+ AWS services, and be managed centrally via the AWS Management Console

Alerting capabilities 

There is a lot of user flexibility when it comes to setting up thresholds for alerts and alarms, and users have the ability to vary set up from application or service.

Data integrity 

Since 66% of the logs created are from AWS services, the accuracy and integrity of the data are high. AWS also has the infrastructure to handle use volumes of log data and store it for 15+ months, key decision criteria for companies in heavily regulated industries. 

CloudWatch Limitations

Limited discovery

For services outside of AWS preview, a user must be aware of these and tag these services manually, or else they will go unmonitored by CloudWatch

Complex user interface

As the system becomes more complex with scale, it can be tricky to keep track of all the alerts, configurations, log streams, and so on. The UX isn’t smart, and our research means that operations teams can spend a lot of time managing the service itself.

Complex pricing scheme. 

Ten paid features and various measurement instruments make up the pricing conditions for CloudWatch. Although AWS has done a good job at creating an online calculator, it still is daunting to complete, let alone make an actual decision. 

Conclusion 

CloudWatch is a widely used Application Performance Monitoring tool, mainly used by Enterprises to keep track of performance, optimize and troubleshoot. Given the number of features and berth of metrics it can collect, engineering teams need the time and capacity to set this up - or have the resources to use a third party. However, from customer interviews and testimonials, CloudWatch is a fan favorite for existing AWS Enterprise customers.

This was just a brief overview of AWS CloudWatch. Anyone making a purchase, I recommend going through the documentation that AWS provides, along with taking it for a test drive. https://aws.amazon.com/cloudwatch/

Most popular