AWS Feed
Best practices for migrating self-hosted Prometheus on Amazon EKS to Amazon Managed Service for Prometheus

With Amazon Web Services (AWS) customers adopting Amazon Managed Service for Prometheus (AMP) on Amazon Elastic Kubernetes Service (Amazon EKS), we often see requests for information regarding best practices to follow when moving self-managed Prometheus on Amazon EKS to AMP.

In this article, we’ll examine those best practices, with a focus on the five pillars of the AWS Well-Architected Framework: security, cost optimization, performance efficiency, operational excellence, and reliability. Some of the best practices described will also be applicable for moving from self-managed Prometheus on self-managed Kubernetes (on Amazon EC2) to an AMP environment.

Introduction

Prometheus is a popular open source monitoring tool that provides powerful querying features and has wide support for a variety of workloads. AMP is a fully managed Prometheus-compatible service that makes it easier to monitor environments, such as Amazon EKS, Amazon Elastic Container Service (Amazon ECS), and Amazon Elastic Compute Cloud (Amazon EC2), securely and reliably.

AMP reduces the effort required to build secure, highly available, and scalable monitoring service for long-term retention of application performance and availability metrics. Data can be sent to AMP through any collector that can send data to a Prometheus-compliant remote write endpoint, such as OpenTelemetry and Prometheus server with remote write, to collect metrics from any container environment, such as Amazon EKS, Amazon ECS, self-managed Kubernetes on AWS, non-clustered instances like Amazon EC2, or on-premises infrastructure.

Customers using Prometheus to monitor their container environments face challenges in managing a highly available, scalable, and secure Prometheus server. AMP helps solve these problems by providing a fully managed, secure, and highly available environment using multiple Availability Zone deployments. AMP is tightly integrated with AWS Identity and Access Management (IAM) for authentication and access control, AWS PrivateLink Virtual Private Cloud (VPC) endpoint to provide easy and secure access, and AWS CloudTrail for logging API calls. Refer to our getting started with AMP blog post to start setting up AMP.

AMP architecture

The following figure illustrates the overall architecture of AMP and its interaction with other components.

illustration the overall architecture of AMP and its interaction with other components

In this diagram, we specify three different agents used to scrape and ingest metrics from Amazon EKS into AMP. These are Prometheus server, AWS Distro for OpenTelemetry (ADOT), and Grafana Agent. Regardless of which agent you use, you will need to set up remote_write to send metrics to AMP workspace. Additionally, Signature Version 4 (SigV4) is used to authenticate and authorize for securely writing to this workspace when using any of these agents.

AWS customers sending metrics to a single AMP workspace from multiple AWS Regions as a central repository for metrics is common. In this case, you may want to set up AMP in a separate account and create an Amazon VPC endpoint to allow cross-account remote write to it. To learn more about how to set up cross-Region ingestion of data into AMP, read “Set up cross-region metrics collection for Amazon Managed Service for Prometheus workspaces”.

You can deploy Prometheus server using Helm charts on Amazon EKS, and this is the preferred method for collecting and scraping application metrics and Amazon EKS worker node metrics to an AMP workspace. Another option is to use ADOT, which requires you to set up and configure two OpenTelemetry components: Prometheus receiver and the AWS Prometheus remote write exporter. For a detailed description on how to set up metrics ingestion with ADOT, refer the user guide.

Grafana Agent is a lightweight alternative to running full-scale Prometheus server on Amazon EKS. This agent enables easier sharding mechanisms that let users shard agent across the entire Amazon EKS cluster and lower memory requirements per machine. A typical deployment of Grafana Agent for Prometheus metrics can provide up to 40 percent reduction in memory usage as compared to other agents. However, Grafana Agent comes with tradeoffs and removes some capabilities, such as support for recording or alerting rules. Read “Configuring Grafana Cloud Agent for Amazon Managed Service for Prometheus” for a detailed description on setting up Grafana Cloud Agent.

Best practices

Security

Security is a top priority at AWS, and the security pillar of the AWS Well-Architected Framework ensures protection of your applications’ information and systems. The security pillar focuses on protecting information and systems. Key topics include confidentiality and integrity of data, identifying and managing who can do what with permission management, protecting systems, and establishing controls to detect security events.

IAM roles for service account

One important security best practice when using AMP is to use IAM roles for service account (IRSA). IRSA allows you to assign IAM role capabilities to Kubernetes computing resources, such as pods and jobs. Prometheus server sends metrics using HTTP requests, which must be signed with valid AWS credentials using the AWS Signature Version 4 algorithm to authenticate and authorize each client request for the managed service. To facilitate this process, the requests are sent to an instance of AWS signing proxy, which will forward the requests to the managed service. The AWS signing proxy can be deployed to an Amazon EKS cluster to run under the identity of a Kubernetes service account. With IRSA, you can associate an IAM role with a Kubernetes service account and thus provide AWS permissions to any pod that uses that service account. By using IRSA, we follow the principle of least privilege to securely configure the AWS signing proxy to help ingest Prometheus metrics into AMP.

Amazon VPC interface endpoints

Another security best practice with AMP is to ensure network security by using AWS PrivateLink. Although ingestion of Prometheus metrics using remote write is supported over the public internet endpoint, using PrivateLink ensures that the network traffic from Amazon VPCs is secured within the AWS network without going over the public internet. In the architecture diagram shown previously, we depicted using two Amazon VPC endpoints: one for ingestion of metrics, and the other for querying metrics using Amazon Managed Service for Grafana (AMG). To query metrics from AMP, AMG automatically uses PrivateLink VPC endpoints, whereas open source Grafana must be manually configured to use the PrivateLink VPC endpoint. To create a PrivateLink VPC endpoint for AMP, refer to the user guide.

Cross-account AMP workspace access

Many organizations may have their workloads running globally and spread across multiple AWS accounts. IAM policies will be configured to allow cross-account access from the workloads to the AMP workspace. To achieve this, you will set up an AMP workspace in your central monitoring account. Then you must create a role inside the monitoring account that trusts our workload’s accounts with write permissions on our AMP workspace. On each workload account, you will deploy a Prometheus server into an Amazon EKS cluster to collect metrics. With the IAM roles for service accounts feature of Amazon EKS, you will grant IAM permissions to assume a cross-account role in the central monitoring account. If you need to keep the traffic to AMP private, you could use Amazon Virtual Private Cloud (VPC) endpoint, VPC peering, and Amazon Route 53 private hosted zones.

Secure access to AMP using AWS Systems Manager for hybrid environments

Customers running on-premises workloads and looking to use AMP for monitoring their workloads can face challenges with secure access to AMP due to lack of support for IAM roles on premises. A common solution to this problem is using access keys, which are long-term credentials stored in a secure location and retrieved at startup. This approach poses risk to comply with the credentials rotation best practices. A better approach is to use temporary credentials using AWS Security Token Service (AWS STS) API Reference; however, this would require identity federation using Security Assertion Markup Language (SAML), OpenID Connect (OIDC), etc.

AWS Systems Manager helps you to manage infrastructure on AWS and on-premises. When AWS Systems Manager is configured to manage hybrid environments, an Agent is deployed into on-premises instances and IAM roles must be created for Systems Manager. AWS Systems Manager will go over the activation process using TLS either by Amazon certificates or private certificates using AWS Certificate Manager (ACM). Read “Collect on-premises metrics using Amazon Managed Service for Prometheus” for details on how to set up the Systems Manager and using AMP for monitoring.

Securing Prometheus metrics from self-managed Kubernetes cluster on Amazon EC2

AMP also supports ingesting metrics from a self-managed Kubernetes cluster running on Amazon EC2 instances. The steps for a self-managed Kubernetes cluster on Amazon EC2 are the same, except that you must set up authentication and authorization using an OIDC provider and IAM roles on your own in the self-managed Kubernetes cluster. In Kubernetes version 1.12, support was added for a new ProjectedServiceAccountToken feature, which is an OIDC JSON Web Token that also contains the service account identity and supports a configurable audience. Kubernetes doesn’t provide an OIDC identity provider. You can use an existing public OIDC identity provider, or you can run your own identity provider. For a list of certified providers, refer to OpenID Certification on the OpenID site.

Once you have the issuer URL, which is the URL of the OIDC identity provider that allows the API server to discover public signing keys for verifying tokens, and the client ID, which is the ID for the client application that makes authentication requests to the OIDC identity provider, you can associate the OIDC provider to the self-managed Kubernetes cluster. The issuer URL of the OIDC identity provider must be publicly accessible, so that self-managed Kubernetes can discover the signing keys. OIDC federation access allows you to assume IAM roles via the AWS Secure Token Service (STS), enabling authentication with an OIDC provider, receiving a JSON Web Token (JWT), which in turn can be used to assume an IAM role. Kubernetes, however, can issue so-called projected service account tokens, which happen to be valid OIDC JWTs for pods.

Our setup equips each pod with a cryptographically signed token that can be verified by STS against the OIDC provider of your choice to establish the pod’s identity. IAM roles for service account allow you to assign IAM role capabilities to Kubernetes computing resources, such as pods and jobs. The Prometheus server sends metrics to AMP for long-term storage and for subsequent querying by monitoring tools. The data is sent using HTTP requests, which must be signed with valid AWS credentials using the AWS Signature Version 4 algorithm to authenticate and authorize each client request for the managed service. To facilitate this, the requests are sent to an instance of AWS signing proxy, which will forward the requests to the managed service. The AWS signing proxy can be deployed to an Amazon EKS cluster to run under the identity of a Kubernetes service account.

With IRSA, you can associate an IAM role with a Kubernetes service account and thus provide AWS permissions to any pod that uses that service account. This follows the principle of least privilege by using IRSA to configure the AWS signing proxy securely to help ingest Prometheus metrics into AMP.

Cost optimization

The cost optimization pillar focuses on avoiding unnecessary costs. Key topics include understanding and controlling where money is being spent, selecting the most appropriate and right number of resource types, analyzing spend over time, and scaling to meet business needs without overspending.

Single Amazon VPC interface endpoint for multi-account environment

In a multi-account environment, best practices recommend using a single Amazon VPC endpoint for accessing AMP in a centralized networking model. Considering a 100 multi-account AWS environment in which you have one Amazon VPC endpoint for AMP per account, which costs ~$90/day, versus having a single VPC endpoint for AMP would cost $0.90/day, resulting in 99 percent cost savings. Read “Centralized DNS management of hybrid cloud with Amazon Route 53 and AWS Transit Gateway” to learn about centralized networking with Amazon VPC endpoints using Amazon Route 53 and AWS Transit Gateway.

Right-sizing Amazon EKS workloads using metrics from Prometheus servers

Pod size is an important element for controlling Amazon EKS costs, and monitoring the CPU and memory utilization metrics of the Prometheus collector instances is recommended. Also set resource constraints, which effectively verifies that no program or operator of the Amazon EKS system uses too many resources. A container can’t use more than the resource limit you set. You can specify the required minimum resources known as requests, and the maximum resource usage called limits.

Remembering that resources are declared for each container in the pod individually and not per a pod as a whole is important. The total resources required by a pod is the sum of the required resources of all its containers. The kubelet (and container runtime) implement the memory cap, for example, 512 MiB for a particular container. The container’s runtime prohibits it from exceeding the configured resource cap. Following is a screenshot showing metrics from an Amazon EKS cluster queried on an AMP data source.

screenshot showing metrics from an EKS cluster queried on an AMP data source

Performance efficiency

The performance efficiency pillar helps organizations use IT and computing resources efficiently. This pillar specifically helps you build architectures on AWS that efficiently deliver sustained performance over time. With AMP, you can use the same open source Prometheus data model and query language that you use today to monitor the performance of your containerized workloads and also enjoy improved scalability, availability, and security without having to manage the underlying infrastructure. AMP automatically scales the ingestion, storage, and querying of operational metrics as workloads scale up and down.

High cardinality

Both Prometheus and AMP are designed to deal with high-cardinality metrics. Every unique combination of key-value label pairs represents a new time series. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values. Doing so may lead to way too many individual time series data points causing slower/unusable queries that cannot be visualized. For instance, you should not add high-cardinality labels such as user_id to your metric. If you do so, you’ll have hundreds of thousands of different time series, one for each user, and they will all have only one data point. Recording rules can allow you to reduce the cardinality of metrics by pre-aggregating in your self-managed Prometheus server before sending the metrics to AMP.

Single Amazon VPC interface endpoint for multi-account environment

Injecting metrics into AMP is recommended to be done via Amazon VPC endpoints. Amazon VPC endpoints not only allow private connectivity between resources in the VPC and supported AWS services, but also lowers your network latency to reach AMP endpoint because the traffic is not routed through the internet. In a multi-account setup, a hub-and-spoke network is recommended to simplify your network routing and security using AWS Transit Gateway. Read “Use an AWS Transit Gateway to Simplify Your Network Architecture” to learn more about AWS Transit Gateway.

Query optimization

For both Prometheus and AMP, query execution time is dictated by the number of time series, the time range, and the number of data points a query has. To improve query performance, having as many selectors in the PromQL query to minimize the number of time series scanned and choosing a step interval that is inline with the time range is recommended. For longer time ranges, a higher step interval is recommended, so that the number of samples returned fits the resolution of the Grafana dashboard panel.

Graphs from Prometheus use the query_range endpoint and, with each query, there is a start time, an end time, and a step. The provided query is evaluated at the start time and then at the start time plus one step, and so forth. The smaller the step, the more samples in the response. For example, if querying for a 7-day period, shifting the query step interval from the default to five minutes or more can speed up retrieving query results as there are fewer output points. Coupled with recording rules, these optimizations allow efficient analysis of high-cardinality data.

Operational excellence

The operational excellence pillar focuses on running and monitoring systems to deliver business value, and continually improving processes and procedures. Key topics include automating changes, responding to events, and defining standards to manage daily operations.

Logging and monitoring the Prometheus collectors

Log forwarders such as Fluent Bit and Fluentd can be used to forward all the logs from the underlying node to log destinations like Amazon CloudWatch. Per the benchmark results, the Fluent Bit plugin is more resource efficient than other log forwarders, such as Fluentd. On average, Fluentd uses more than four times the CPU and six times the memory of the Fluent Bit plugin. Using Fluent Bit to collect application, data plane, host, and performance logs of corresponding containers is recommended.

CloudWatch log groups automatically create CloudWatch log groups insights, which support a query language that you can use to perform queries on your log groups to discover and learn from logs out of Prometheus collectors. You can customize CloudWatch alarm notifications to alert you on the status of the Prometheus collectors by monitoring the query log and checking for status : error. For example, if you create a metric filter on CloudWatch Logs and want to get the event cause in the notification message, you can use CloudWatch Logs Customize alarms. Read “Customize Amazon CloudWatch alarm notifications to your local time zone – Part 1” to learn more about setting customized CloudWatch alarms.

Monitoring Prometheus collectors, such as Prometheus server and Grafana agent, is another recommended best practice, and monitoring Prometheus collectors helps you take actions based on failures. Prometheus has an absent(v instant-vector) function that returns an empty vector if the vector passed to it has any elements and a 1-element vector with the value 1 if the vector passed to it has no elements. Prometheus has an UP metric, which is known as a scraper/collector metric. More specifically, we can use absent function in combination with the UP metric like absent(up{job="<job-name>"), where <job-name> specifies the job name in the Prometheus config that targets the collector’s endpoint.

The following figure visualizes the zero value returned from the absent function in combination with the UP metric of the Prometheus collector. Prometheus collector receives UP metric periodically, but the absent function returns an empty vector to the query because the vector passed to it has elements for the UP metric received from Prometheus collector. With this approach, we can establish a nearly continuous monitoring pattern on Prometheus collectors, and we can set up alerts with Grafana in response to failures on Prometheus collectors. Read “Amazon Managed Service for Grafana – Getting started” to get started with AMG.

elamaras prometheus f3

Monitoring and managing the AMP service quotas

AMP has service quotas for the amount of data that a workspace can receive from Prometheus servers. For example, your remote_write API calls will get a 429 response error when requests exceed quotas on ingestion or number of active metrics in a workspace.

To proactively monitor and manage these quotas, you can run the following queries and set up Grafana dashboard alerts to receive notifications before a threshold value (for example, at 80 percent of limit) is reached. You can then request an increase in corresponding service quota. Additionally, you can query logs from Prometheus collectors to determine whether you reach service quota limits. These service quotas can be increased by opening a support ticket, unless there is a hard limit listed in the quota page for AMP. Refer to the user guide for more service quota information for AMP. Read the documentation for creating alerts to set up alerts in Grafana.

Type of data Query to use Default AMP quota limit
Current active series prometheus_tsdb_head_series 1,000,000
Current ingestion rate rate(prometheus_tsdb_head_samples_appended_total[5m]) 70,000 samples/second
Most-to-least list of active series per metric name sort_desc(count by(__name__) ({__name__!=""})) 200,000
Number of labels per metric series group by(mylabelname) ({__name__ != ""}) 70

Reliability

The reliability pillar focuses on ensuring a workload performs its intended function correctly and consistently when it’s expected to. A resilient workload quickly recovers from failures to meet business and customer demand. Key topics include distributed system design, recovery planning, and how to handle change.

Ingestion, storage, and querying of metrics

The reliability pillar helps organizations prevent and quickly recover from failures to meet business and customer demand. AMP is designed to handle high-cardinality monitoring data with a large volume of tags (Prometheus labels) and dimensions generated by container-based applications.

AMP manages the operational complexity of elastically scaling the ingestion, storage, and querying of metrics. AMP is highly available and data ingested into an AMP workspace is replicated across three Availability Zones in the same Region. AMP stores metrics, metadata, and samples on Amazon Simple Storage Service (Amazon S3), which is designed for durability of 99.999999999 percent of objects across multiple Availability Zones, offering high durability, availability, and performance object storage for frequently accessed data.

AMP has the ability to auto scale and optimize the querying and caching of monitored data to accommodate thousands of end users querying metrics. During an outage when services are down, AMP automatically scales to support thousands of end users querying metrics.

Deduplication of metrics collection

For high-availability purposes, you can set up AMP to use multiple Prometheus servers to discover and collect the metrics from the same source and send them to a single AMP workspace. With the correct configuration, AMP deduplicates the metrics. Each Prometheus collector instance can be either a Prometheus server or an ADOT agent. AMP sets one instance as the leader replica and ingests from only that replica. If AMP stops receiving samples from the leader for a 30-second period, it automatically switches to another leader and begins ingesting metrics from the new leader.

Conclusion

This blog post outlined the best practices involved in moving a self-managed Prometheus server to use Amazon Managed Service for Prometheus to securely ingest, store, and query Prometheus metrics that were collected from application workloads deployed to an Amazon EKS cluster. The best practices discussed here align with the five pillars of the AWS Well-Architected Framework.