Amazon Web Services Feed
Advanced Connection Pooling with the Heimdall Proxy
By Erik Brandsberg, CTO at Heimdall Data
By Roneel Kumar, Database Solutions Architect at AWS
As databases are often a key component of internet infrastructure, IT departments may be challenged by poor connection management from the application.
There are three areas where connection management can be an issue:
- CPU overhead when an application “thrashes” connections rapidly by opening, closing, and authenticating connections.
. - Memory overhead when applications hold open long-lived connections that are often idle, which would be better used as block cache or may require a larger instance size than CPU requirements dictate.
. - Noisy neighbor congestion for a multi-tenant database; limiting the number of active connections on a per-customer basis ensures fairness.
The Heimdall Proxy helps developers, database administrators, and architects horizontally scale out and optimize connections through connection pooling for Amazon Relational Database Service (Amazon RDS) and Amazon Redshift without any application changes. As a result, you will reduce your database instance size and support higher user counts.
In this post, we explain how these functions work and are configured. Heimdall Data is an AWS Advanced Technology Partner with the AWS Data & Analytics Competency, and AWS Service Ready designations for Amazon RDS and Amazon Redshift.
Solution Overview
The Heimdall Proxy advanced connection pooling includes:
- General connection reuse: New connections leverage already established connections to the database to avoid connection thrashing. This results in lower CPU utilization.
. - Connection multiplexing: Disconnects idle connections from the client to the database, freeing those connections for reuse for other clients. This dramatically reduces connection counts to the database, and frees memory that were being held by these connections.
. - Tenant connection management: The combination of 1) per-user and per database connection limiting, and 2) multiplexing, control the number of active queries a particular tenant can use at a time. This protects database resources and helps ensure the service level agreement (SLA) is met and not disrupted by a noisy neighbor using the same database.
Figure 1 – Heimdall Proxy architecture.
Basic Connection Pooling
A basic connection pooler opens a number of connections upfront (the pool). Then, as an application needs a connection, instead of opening a new connection, it simply borrows a connection from the pool and returns it as soon as it’s not needed.
For most pools to be effective:
- The application is aware pooling will be used, and does not leave connections idle, but instead opens and closes them as needed.
- All connections leverage the same properties, such as the database user and catalog (database instance).
- State is not changed on the connection.
For a typical application server environment (J2EE, for example), basic pooling is supported. If pooling was not part of the initial design, simply inserting a connection pooler can cause more overhead than expected. Instead of connecting directly to the database, for example, all connections are now routed through the proxy/connection pooler, which adds to overhead in the connections.
Let’s go through a few scenarios:
- When multiple users are connected to the database, and each user rarely uses more than a few connections (such as Amazon Redshift or Greenplum), this may open a set of connections per user (such as PgBouncer), or close connections that are retrieved from the pool that do not match the desired properties and open new ones.
.
This results in a large amount of connection thrashing seen in Apache Tomcat pooling and most other poolers. For example, UserA connects to DatabaseA and this opens a connection on the connection pool. However, if the next query connects to DatabaseA, because the connection properties are not the same, this causes multiple connections to be open for the same user.
. - When many catalogs are used, in order to avoid changing the connection state, a discrete pool per catalog is created, allowing an appropriate connection to be reused. For MySQL, this avoids triggering a USE statement before each new request. For Postgres, the catalog cannot be changed once a connection has been established, so a new connection must be created, just as if it was a different user.
. - When you need to limit the total connections to the database and on a per-user basis. For example, if you have 10 users and only want each user to have a maximum of five connections.
Figure 2 – Basic connection pooling.
For basic connection pooling, an active (green) front-side connection is paired with a back-side connection, as shown above in Figure 2.
Additionally, you may have idle (red), unassigned connections in the backend for new connections. As such, you are not reducing the total number of connections, but are reducing the thrashing that occurs as the connections are opened and closed. The main benefit of basic pooling is lower CPU load on the database server.
To configure connection pooling on Heimdall Central Console, select the Data Source tab. The only thing that’s required is to click the checkbox for Enable Pooled Connection.
Connection Multiplexing
Beyond basic pooling, there is connection multiplexing which does not associate a client connection with a fixed database connection. Instead, active queries or transactions are assigned to an existing idle database connection, or else a new connection will be established.
If a client connection is idle, no resources are used on the database, reducing connection load and freeing memory. Shown in Figure 3 below, only active connections (green) are connected to the database via the connection pool. Idle connections (red) are ignored.
Figure 3 – Connection multiplexing.
Multiplexing is a more complicated technology than basic pooling. Therefore, many factors need to be accounted for.
In the following situations, multiplexing will halt and a connection will remain pinned, including:
- If a transaction starts, the connection mapping will remain until the transaction completes with a commit, rollback, or the client connection is terminated.
- If a prepared statement occurs on a connection, this makes the connection stateful and will remain pinned to the database until the connection is terminated.
- If a temporary table is created, the connection will remain pinned until the table is deleted.
To configure multiplexing on the Heimdall Central Console, go to the VirtualDB tab. Under Proxy Configuration, check the box for Multiplex.
In the event that special queries break multiplexing logic, and multiplexing needs to be disabled on the connection, go to the Rules tab for more granular control (along with other pool behaviors).
Below is an example for a Postgres environment. Click on the icon to edit the existing rule.
Figure 4 – Postgres rule to disable multiplexing with a search path.
For a MySQL environment, here’s an example of two rules that:
- Disables multiplexing when locking tables.
- Enables multiplexing when unlocking tables.
Figure 5 – MySQL rule to disable multiplexing with an explicit lock.
Tenant Connection Management
The third use case helps ensure SLAs by enforcing per-tenant limits on connections, and when combined with multiplexing, total active queries. This prevents one user from saturating the database, ensuring fairness of resources for others. A second tier of pool management is activated, that of “user pools.”
In the Data Sources tab, the pool can be managed at two tiers: the user level and the database. Each user can be limited to a number of total connections and idle connections, as demonstrated by the following configuration.
Figure 6 – Multi-tenant connection configuration example.
As shown above, the total connections allowed to the database across all users is 1000, but each user is only allowed 100, and of those, only 10 can be idle. Excess idle connections will be disposed of.
Each time a connection is returned from the pool, there’s a chance the connection will be closed. A value of 1000 means there is a 1/1000 chance the connection will be closed. This behavior is different from most connection poolers that set an absolute connection age, which for large deployments can result in a stampede of many connections closing and reopening at once.
Figure 7 – Multi-tenancy with pooling, multiplexing, and per-tenant connection limits.
The two tenants shown in Figure 7 (with unique usernames or catalogs) allow only active connections (green) to the database when multiplexing is enabled. If Tenant A attempts to perform a third query (blue) while two are active, it will be queued until one of the current active queries completes.
The net result of the combination of pooling, multiplexing, and per-tenant limits is that no single tenant can saturate database capacity, resulting in the SLAs of other customers failing.
Furthermore, beyond a certain point, adding execution threads to the database will result in negative performance. This control can improve overall performance in many cases, allowing more capacity during peak load.
Customer Use Cases
Magento
Magento is an e-commerce package written in PHP. Since PHP does not support efficient connection pooling due to its processing model, each page view opens a connection to the database, requests all of the data it needs, and then closes the connection.
For every page view, it results in a high amount of connection thrashing against the database, and can consume a large percentage of the database CPU. With the Heimdall Proxy, basic connection pooling alone can reduce the load on the database by up to 15 percent.
Odoo
Odoo is an e-commerce platform written in Python. The platform supports connection pooling, but it uses an Object-Relational-Mapping (ORM) to abstract the application layer from the database connections. The ORM uses transactions for every activity, which would normally prevent multiplexing from providing benefit.
The Heimdall Proxy has another advanced feature called “delayed transactions” that works around this problem. It will prevent a true transaction from starting until Data Manipulation Language (DML) is actually required. When activated with Odoo, the connection counts on the database can easily be reduced by up to 20x depending on the load.
Slatwall Commerce
Slatwall is an e-commerce platform written in Java and is natively designed to use pooling. Under heavy load, it can result in the saturation of the allowed connections on Amazon Aurora for MySQL.
In order to support larger user loads, the Heimdall Proxy reduces the connection load by an order of magnitude, resolving connection limits on the database and allowing the CPU load to be the limiting factor in larger deployments. Per Slatwall, Heimdall Proxy offload with multiplexing and pooling resulted in a 10x reduction in connections to the database.
Complementary Features
While the Heimdall Proxy provides connection management for databases, there are other features provided that further improve SQL scale, performance, and security.
- Query caching: The best way to lower database CPU is to never issue a query against the database in the first place. The Heimdall Proxy provides the caching and invalidation logic for Amazon ElastiCache as a look-aside results cache. It’s a way to improve application scalability on Amazon RDS and improve response times without application changes.
. - Automated Read/Write split: When a single database becomes too expensive to upgrade, or there is already a standby reader that is sitting idle, offloading write queries to read replicas improves resource utilization. Additionally, replication lag detection is supported to ensure ACID compliance.
. - Active Directory/LDAP integration: By authenticating against LDAP, the Heimdall Proxy manages connections for a large number of users, and synchronizes the authentication credentials into the database. In environments that require database resources to be accessible to many users in the enterprise, while providing data security, this feature is easy to administer, while preventing individual users from over-taxing resources.
Conclusion
The Heimdall Proxy was designed to support high web-scale environments by minimizing the number of backend connections so your database tier can operate optimally. The granular controls ensures fairness for connected users while ensuring the database is never overwhelmed.
Deployment of the Heimdall Proxy requires no application changes, saving months of deployment and maintenance. To get started, download Heimdall’s Amazon Machine Image (AMI) on AWS Marketplace as an Amazon Elastic Compute Cloud (Amazon EC2) or container instance.
For alternative connection poolers, check out the Amazon RDS Proxy, PgBouncer, PgPool, or ProxySQL.
Resources
- Blog: Using the Heimdall Proxy to Split Reads and Writes for Amazon RDS
- Blog: Automated Query Caching for Amazon RDS and Redshift
- Blog: Heimdall Proxy for Redshift Overview
- Heimdall Data technical documentation
- Contact: info@heimdalldata.com
Heimdall Data – AWS Partner Spotlight
Heimdall Data is an AWS Competency Partner and SQL database proxy for Amazon Redshift, Amazon RDS, and Amazon Aurora. It is transparently deployed to improve your read/write queries without any code changes.
Contact Heimdall Data | Partner Overview | AWS Marketplace
*Already worked with Heimdall Data? Rate the Partner
*To review an AWS Partner, you must be a customer that has worked with them directly on a project.