Load Balancing and Failover

Distributing Load Over a Pool of Database Servers

There are probably a dozen commercial and Open Source Load Balancers available today. They are often implemented differently, and some have unique features, but they all provide a single IP address for a pool of servers and distribute incoming connections over the pool.

It's possible to set up a pool of database servers behind a Load Balancer. Database clusters such as Oracle RAC come with their own built-in load-balancer. SQL Relay should work just fine with these systems. Just configure the instance of SQL Relay to connect to the IP address of the pool or cluster and the Load Balancer will distribute the connections over the pool of servers.

If you don't have a Load Balancer or would rather not use one, SQL Relay can still distribute load over a pool of database servers. In effect, SQL Relay is a very specialized Load Balancer. An instance of SQL Relay can be configured to maintain connections to more than one database server and distribute client sessions over the pool of servers it's connected to.

Note that SQL Relay distributes client sessions, not individual queries. When a client connects to the SQL Relay listener daemon, it will be assigned to one of the database servers in the pool, and as long as it remains connected, all of its queries will be run against the same database server. If the client disconnects and reconnects, it may be assigned to a different database server the next time around.

To configure an instance of SQL Relay to connect to a set of database servers, you have to define multiple connection tags, as follows:

<?xml version="1.0"?>
<!DOCTYPE instances SYSTEM "sqlrelay.dtd">

<instances>

        <instance id="example" port="9000" socket="/tmp/example.socket" dbase="oracle8" connections="15" maxconnections="15" maxqueuelength="0" growby="1" ttl="60" endofsession="commit" sessiontimeout="600" runasuser="nobody" runasgroup="nobody" cursors="5"  deniedips="" allowedips="" debug="none" maxquerysize="65536" maxstringbindvaluelength="4000" maxlobbindvaluelength="71680" idleclienttimeout="-1">
                <users>
                        <user user="user1" password="password1"/>
                </users>
                <connections>
                        <connection connectionid="DB1" string="user=exampleuser1;password=examplepassword1;oracle_sid=EXAMPLE1;" metric="1"/>
                        <connection connectionid="DB2" string="user=exampleuser2;password=examplepassword2;oracle_sid=EXAMPLE2;" metric="1"/>
                        <connection connectionid="DB3" string="user=exampleuser3;password=examplepassword3;oracle_sid=EXAMPLE3;" metric="1"/>
                </connections>
        </instance>

</instances>

In this example, SQL Relay will maintain 15 persistent database connections. Since this instance is configured to connect to 3 different database servers, SQL Relay will maintain 5 persistent connections to each server.

The metric attribute may be used to alter the distribution of connections over the databases in the pool. Lets say that the server running DB1 and DB2 are old machines, but the server running DB3 is brand new and can handle twice as many connections as DB1 or DB2. Assigning a metric of 1 to DB1 and DB2 and 2 to DB3 will cause twice as many connections to be started to DB3 than either DB1 or DB2, making it twice as likely that a client will use DB3 than either DB1 or DB2. In this example, only 15 connections will be started, but 7 or 8 will be started against DB3 and 3 or 4 will be started against each of DB1 and DB2.

For instance:

<?xml version="1.0"?>
<!DOCTYPE instances SYSTEM "sqlrelay.dtd">

<instances>

        <instance id="example" port="9000" socket="/tmp/example.socket" dbase="oracle8" connections="15" maxconnections="15" maxqueuelength="0" growby="1" ttl="60" endofsession="commit" sessiontimeout="600" runasuser="nobody" runasgroup="nobody" cursors="5" deniedips="" allowedips="" debug="none" maxquerysize="65536" maxstringbindvaluelength="4000" maxlobbindvaluelength="71680" idleclienttimeout="-1">
                <users>
                        <user user="user1" password="password1"/>
                </users>
                <connections>
                        <connection connectionid="DB1" string="user=exampleuser1;password=examplepassword1;oracle_sid=EXAMPLE1;" metric="1"/>
                        <connection connectionid="DB2" string="user=exampleuser2;password=examplepassword2;oracle_sid=EXAMPLE2;" metric="1"/>
                        <connection connectionid="DB3" string="user=exampleuser3;password=examplepassword3;oracle_sid=EXAMPLE3;" metric="2"/>
                </connections>
        </instance>

</instances>

If the maxconnections attribute is greater than the connections attribute and conditions are such that new connections need to be spawned, the number of new connections that will be spawned against each database server is proportional to the metric for that database server. In our example, (if maxconnections were 25) if 10 new connections were spawned, 5 would be spawned against DB3 and 2 or 3 would be spawned against each of DB1 and DB2.

Database Server Failover

SQL Relay doesn't have any built-in database server failover mechanism. If a database server that SQL Relay is connected to goes down, SQL Relay doesn't currently open new connections to a different "failover" database to make up for it. This is on the TODO list, but has not yet been implemented.

Currently, if an SQL Relay connection daemon notices that the database server it is connected to has gone down it will mark itself unavailable to clients, log out and loop, attempting to re-connect to that database server. If that connection daemon is configured with the behindloadbalancer attribute set to "no", then it will also raise a flag and all connection daemons connected to that database server mark themselves unavailable to clients, close their connections and loop, attempting to re-connect to that database server. When the database server comes back up, as each connection daemon successfully re-connects to the database server, it marks itself available to clients again. While one database server is down, client sessions are still distributed over the servers that are still up, albiet through a smaller pool of persistent database connections.

On the other hand, some Load Balancers can facilitate failover. Some Load Balancers can be configured to keep track of whether the servers it's distributing load over are up and running. If one of them isn't, the Load Balancer removes it from the pool and adds it back later if it comes back up. If one of the databases SQL Relay is connected to through a Load Balancer goes down, when SQL Relay tries to re-connect, it will end up connecting to a different database server.

Distributing Load Over a Pool of SQL Relay Servers

If your pool of application servers and web servers is sufficiently large, you might want to set up a pool of SQL Relay servers between them rather than just a single server.

You can use a Load Balancer to make a pool of SQL Relay servers appear to be a single server. If you don't have a Load Balancer or would rather not use one, you can still set up a pool of SQL Relay servers and distribute client connections over them using Round Robin DNS.

SQL Relay Server Failover

SQL Relay clients can only be configured to connect to a single SQL Relay server. The client can also be configured to attempt to reconnect if the server is unavailable. This ability, combined with a Load Balancer or Round Robin DNS can facilitate failover. In effect, the SQL Relay client can be coerced into trying to log into a server, then if that server is down, try another server, then another, and so on until either all servers are determined to be down or the client successfully logs into one of them.

Some Load Balancers can be configured to keep track of whether the servers it's distributing load over are up and running. If one of them isn't, the Load Balancer removes it from the pool and adds it back later if it comes back up. When running against an SQL Relay server behind a Load Balancer, the client can be configured to attempt to log in to the SQL Relay server several times with a short pause between each attempt. If an attempt fails, the Load Balancer should soon realize that the SQL Relay server is down and remove it from the pool. After that, it's likely that a future attempt to log in to the SQL Relay server will be directed to a server that is running and will succeed.

For example, this C++ client is configured to log into the server sqlrserver on port 9000. It will try to log in 10 times with a 1 second pause between tries.

sqlrconnection  sqlrcon("sqlrserver",9000,NULL,"sqlruser","sqlrpass",1,10);

If you don't have a Load Balancer or would rather not use one, you can still implement SQL Relay server failover using Round Robin DNS.

The SQL Relay client has a minimal built-in failover facility. If Round Robin DNS is set up, when the SQL Relay client looks up the IP address of the host it's trying to connect to, it actually gets back a list of all of the IP addresses in the pool. It tries to connect to the first address in the list. If it fails, it tries the next address, then the next, etc. Eventually, it will either succeed in connecting to one of the servers or run out of addresses and fail.

For example, this C++ client is configured to log into the server sqlrserver on port 9000. It will try to log in to each address that sqlrserver resolves to.

sqlrconnection  sqlrcon("sqlrserver",9000,NULL,"sqlruser","sqlrpass",0,1);