AWS CloudWatch
- CloudWatch monitors resources and applications capture logs and sends events.
- CloudWatch monitoring is the standard mechanism for keeping tabs on AWS resources. A wide range of metrics and dimensions are available via CloudWatch, allowing you to create time based graphs, alarms, and dashboards.
- Alarms are the most practical use of CloudWatch, allowing you to trigger notifications from any given metric.
- Alarms can trigger SNS notifications, Auto Scaling actions, or EC2 actions.
- Alarms also support alerting when any M out of N datapoints cross the alarm threshold.
- Publish and share graphs of metrics by creating customizable dashboard views.
- Monitor and report on EC2 instance system check failure alarms.
- Using CloudWatch Events:
- Events create a mechanism to automate actions in various services on AWS. You can create event rules from instance states, AWS APIs, Auto Scaling, Run commands, deployments or time-based schedules (think Cron).
- Triggered events can invoke Lambda functions, send SNS/SQS/Kinesis messages, or perform instance actions (terminate, restart, stop, or snapshot volumes).
- Custom payloads can be sent to targets in JSON format, this is especially useful when triggering Lambdas.
- Using CloudWatch Logs:
- CloudWatch Logs is a streaming log storage system. By storing logs within AWS you have access to unlimited paid storage, but you also have the option of streaming logs directly to ElasticSearch or custom Lambdas.
- A log agent installed on your servers will process logs over time and send them to CloudWatch Logs.
- You can export logged data to S3 or stream results to other AWS services.
- CloudWatch Logs can be encrypted using keys managed through KMS.
- Detailed monitoring: Detailed monitoring for EC2 instances must be enabled to get granular metrics, and is billed under CloudWatch.
CloudWatch Tips
- Some very common use cases for CloudWatch are billing alarms, instance or load balancer up/down alarms, and disk usage alerts.
- You can use EC2Config to monitor watch memory and disk metrics on Windows platform instances. For Linux, there are example scripts that do the same thing.
- You can publish your own metrics using the AWS API. Incurs additional cost.
- You can stream directly from CloudWatch Logs to a Lambda or ElasticSearch cluster by creating subscriptions on Log Groups.
- Don’t forget to take advantage of the CloudWatch non-expiring free tier.
CloudWatch has the following quotas for metrics, alarms, API requests, and alarm email notifications.
Resource | Default Quota |
---|---|
Actions | 5/alarm. This quota cannot be changed. |
Alarms | 10/month/customer for free. 5000 per Region, per account. You can request a quota increase.
Alarms based on metric math expressions can have up to 10 metrics. |
Anomaly detection models | 500 per Region, per account. |
API requests | 1,000,000/month/customer for free. |
Canaries | 100 per Region per account in the following Regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), and Asia Pacific (Tokyo). 20 per Region per account in all other Regions.
You can request a quota increase. |
Contributor Insights API requests | GetInsightRuleReport has a quota of 20 transactions per second (TPS), per Region. You can request a quota increase.
The following APIs have a quota of 1 TPS per Region. This quota cannot be changed. |
Contributor Insights rules | 100 rules per account.
You can request a quota increase. |
Custom metrics | No quota. |
Dashboards | Up to 500 metrics per dashboard widget. Up to 2500 metrics per dashboard, across all widgets.
These quotas include all metrics retrieved for use in metric math functions, even if those metrics are not displayed on the graph. These quotas cannot be changed. |
DescribeAlarms | 9 transactions per second (TPS) per Region. The maximum number of operation requests you can make per second without being throttled.
You can request a quota increase. |
DeleteAlarms request
DescribeAlarmHistory request DisableAlarmActions request EnableAlarmActions request SetAlarmState request |
3 TPS per Region for each of these operations. The maximum number of operation requests you can make per second without being throttled.
These quotas cannot be changed. |
DescribeAlarmsForMetric request | 9 TPS per Region. The maximum number of operation requests you can make per second without being throttled.
This quotas cannot be changed. |
DeleteDashboards request
GetDashboard request ListDashboards request PutDashboard request |
10 TPS per Region for each of these operations. The maximum number of operation requests you can make per second without being throttled.
These quotas cannot be changed. |
PutAnomalyDetector | 10 TPS per Region. The maximum number of operation requests you can make per second without being throttled. |
DeleteAnomalyDetector | 5 TPS per Region. The maximum number of operation requests you can make per second without being throttled. |
Dimensions | 10/metric. This quota cannot be changed. |
GetMetricData | 50 TPS per Region. The maximum number of operation requests you can make per second without being throttled. You can request a quota increase.
180,000 Datapoints Per Second (DPS) if the The DPS is calculated based on estimated data points, not actual data points. The data point estimate is caculated using the requested time range, period, and retention period. This means that if the actual data points in the requested metrics are sparse or empty, throttling still occurs if the estimated data points exceed the quota. The DPS quota is per-Region. |
GetMetricData | A single GetMetricData call can include as many as 500 MetricDataQuery structures.
This quota cannot be changed. |
GetMetricStatistics | 400 TPS per Region. The maximum number of operation requests you can make per second without being throttled.
You can request a quota increase. |
GetMetricWidgetImage | Up to 500 metrics per image. This quota cannot be changed.
20 TPS per Region. The maximum number of operation requests you can make per second without being throttled. This quota cannot be changed. |
ListMetrics | 25 TPS per Region. The maximum number of operation requests you can make per second without being throttled.
You can request a quota increase. |
Metric data storage | 15 months. This quota cannot be changed. |
Metric data values | The value of a metric data point must be within the range of -2^360 to 2^360. Special values (for example, NaN, +Infinity, -Infinity) are not supported. This quota cannot be changed. |
MetricDatum items | 20/PutMetricData request. A MetricDatum object can contain a single value or a StatisticSet object representing many values. This quota cannot be changed. |
Metrics | 10/month/customer for free. |
Period | Maximum value is one day (86,400 seconds). This quota cannot be changed. |
PutMetricAlarm request | 3 TPS per Region. The maximum number of operation requests you can make per second without being throttled.
You can request a quota increase. |
PutMetricData request | 40 KB for HTTP POST requests. PutMetricData can handle 150 transactions per second (TPS), which is the maximum number of operation requests you can make per second without being throttled.
You can request a quota increase. |
Amazon SNS email notifications | 1,000/month/customer for free. |