Monday, April 16, 2012

Connection pools

One of the oft-heard terms in Java is "Connection pool".

This is a very simple idea which is at times blown out of proportion.

Whenever you make a HTTP request to a web application, you need to open a TCP connection. After you are done, you need to close the connection.

Suppose you make several requests to the server, it is not very efficient to open and close connections each and every time.

Better to open a connection and leave it open for the duration of the requests. That way you can send all requests on the same connection, saving yourself significant overhead.

Now, pushing this idea further, why not open several connections to the server and keep them ready. Now you can send multiple requests to the server simultaneously.

This is a connection pool.

The idea is simple as that. But not without its quirks.

First, how do you know when to close the connections. Leaving a pool of connections open for eternity is not smart and consumes resources. How can you tell when the "last" request has been fulfilled?

Second, how many connections should be in the pool? 5 or 10 or 1000? This is called pool size.

Worse problems surface very soon. What if a connection has been idle for a long time and TCP has timed out and closed it? This leaves dead connections in the pool.

Even more challenging - what if you have 10 connections in the pool and a 11th request is received? Should you queue the request? How many requests should be queued??

It is all these questions which have no easy answer which make connection pools easy to understand, but hard to configure.

Eventually, all connection pool implementations come with several configuration parameters like pool size, idle timeout, queuing policy, connection validation, etc.

What makes things worse is that there are usually several queues involved in a typical web application infrastructure - connection pool maintained by hibernate, another pool maintained by web server, yet another from app server to databases, etc.

Mismatches between their configuration means you really have no idea what is going on under the hood!

No comments: