Skip to content

CloudWatch

Overview

  • Provides metrics for every AWS service.
  • Metrics are grouped by namespace.
  • Dimensions are attributes of a metric (instance id, environment etc).
  • Max 10 dimensions per metric.
  • Metrics have timestamps.
  • EC2 metrics by default cover disk, CPU and network (at a high level). No RAM.

EC2 Detailed Monitoring

  • EC2 instance metrics have 5min granularity by default.
  • Detailed monitoring sets granularity to 1min.
  • 10 detailed monitoring metrics on the free tier.
  • EC2 memory usage is not pushed by default. Needs to be a custom metric.
  • Group metric collection is not enabled by default for EC2 instances.

Custom Metrics

  • 1min granularity.
  • Enable high-resolution custom metrics (StorageResolution API parameter) to set granularity up to 1second.
  • High resolution metrics only allow 10sec or 30sec for the alarm period.
  • Use PutMetricData to send a custom metric to CloudWatch.
  • Use exponential backoff in case of throttle errors.

CloudWatch Alarms

  • Alarms can trigger notifications for any metric.
  • Alarms action can trigger Auto Scaling action, EC2 action, or SNS notification.
  • Alarms can be based on sample count, percentage, max, min etc).
  • Alarms states are OK, INSUFFICIENT_DATA, ALARM.
  • Alarm periods are the time window used to evaluate the metric.
  • Missing data can be treated as good (threshold not breached), bad (threshold breached), ignore (don't change alarm state) or missing.

CloudWatch Logs

  • Applications can send logs to CloudWatch via the SDK.
  • Logs can be collected from Elastic Beanstalk, ECS, AWS Lambda, VPC Flows, API Gateway, CloudTrail based on a filter, CloudWatch log agents, Route53 etc.
  • Logs can be sent to CloudWatch, S3 for archiving, streaming to ElasticSearch for analytics.
  • Supports filter expressions for searching.
  • Log group is a name, usually represents the application.
  • Log stream is instances within the application/log files/containers.
  • Log expiration policy used for data retention (never expire, 30 days etc).
  • Logs can be tailed using the AWS CLI.
  • Make sure IAM permissions are correct to send logs.
  • Logs can be encrypted using KMS at the Group level (use associate-cmk-key and create-log-group in the API).
  • Can use a Lambda subscription filter, or an ElasticSearch subscription filter to send logs to Lambda/ElasticSearch for further analysis.

CloudWatch Agent

  • Need to run the CloudWatch Agent to send logs from EC2 instances to CloudWatch.
  • Requires an IAM role to allow logs to be sent to CloudWatch.
  • CloudWatch Agent can run on-prem as well.

CloudWatch Logs Agent

  • Old version of the agent that can only send to CloudWatch logs.

CloudWatch Unified Agent

  • New version of the CloudWatch Agent.
  • Can collect metrics like CPU, disk metrics, RAM, Netstat, processes, swap space.
  • Can collect logs to send to CloudWatch logs.
  • Can use centralized configuration using SSM parameter store to manage the agent.

CloudWatch Logs Metric Filter

  • Filter expressiones to search logs.
  • Can trigger alarms.
  • Filters don't retroactively filter data. Only public metric data points for events that occure after the filter was created.

CloudWatch Events

  • Can be scheduled via a cronjob.
  • Can use an event pattern to react to a service doing something (eg: CodePipeline state changed).
  • Triggers Lambda functions, SQS/SNS/Kinesis messages, CodeBuild, CodePipeline, Firehose, ECS task, EC2 TerminateInstance API call etc.
  • Creates a JSON document with some information about the change.

Amazon EventBridge

  • Next evoluation of CloudWatch Events.
  • Default event bus is generated by AWS services (CloudWatch Events).
  • Partner event bus to recieve events from SaaS service or application (Zendesk, DataDog, Segment, Auth0 etc).
  • Custom event buses for your own applications.
  • Event buses can be accessed by other AWS accounts.
  • Rules define how to process events (similar to CloudWatch Events).

Amazon EventBridge Schema Registry

  • EventBridge can analyze events in the bus and infer the schema.
  • Schema Registry helps to generate code for the application, that knows in advance how data is structured in the event bus.
  • Schemas are versioned.

Last update: June 30, 2021