A long time ago, I made a blogpost about a solution for Drupal and a MASTER-MASTER replication setup: http://www.jochenhebbrecht.be/site/2011-01-13/drupal/drupal-master-mast…. At that moment of writing, I was very happy with that solution. But after some months of experience, I must say, I cannot agree any longer with my own solution. Some people were commenting that blogpost, and I really wanted to post my point of view of a new possible architecture. I definitely don't want to spread around a bad architecture setup. The new environment holds some improvements (which I cover at the bottom of this blogpost). To be honest, I must say that I only implemented the memcache improvement of the new architecture. There was no time/budget in my previous company for the other improvements, but I strongly believe those extra improvements are the way to handle a MASTER-MASTER replication setup.
Why is the old architecture so bad?
In the matter of fact, we were having too much issues with the old architecture. We were losing time and the architecture was just unstable. I'll explain why:
- The caching mechanism was initially configured to use the MySQL databases. So all CRUD operations on caching tables were executed towards the MySQL databases. Each web instance was having his own caching mechanism - we didn’t replicate the cache tables (replicate-ignore-table setting in /etc/mysql/my.cnf). The tables were handling string keys as primary keys, so it would be impossible to avoid conflicts. As a small reminder: you can only set even or odd integer numbers per MySQL instance to autoincrement ID numbers. So we had to switch to a mechanism where both instances were having their own cache tables, to avoid duplicate SQL entries.
- However, Drupal is writing so many things to the cache tables, the MySQL BIN log files were really getting overloaded. As I can remember, we had every 7 or 8 minutes a new MySQL BIN log file of 100MB. And every time a Drupal node_save() was called, "all" cache was just blown away ... I still don’t believe this is a very efficient way of flushing the cache. I tried a conversation on D.O.: http://drupal.org/node/115319#comment-5243634, but as you can read, changes would only effect Drupal 8.
- There was no replication system to flush the remote cache on the other web instance. Therefore, we had to write a custom mechanism. This mechanism has been explained in my other blogpost (I’m not gonna repeat it again). I still doubt this mechanism was working efficient ... It was working yes, but still ... Very amateurish.
- We were having issues with duplicate SQL entries on the semaphore table. The reason why is again explained in the previous blogpost. Solution can be found over there either. But this was still introducing us some other problem. Now Drupal could rebuild the table menu_router at both instances at the same time. The lock to this rebuilding mechanism is created by an entry in the semaphore table. But this lock wasn't replicated as we didn't replice the the semaphore table. menu_router is using the path column as a primary key. path is varchar. So again, we received duplicate SQL entries on this one. The only way to solve this problem was to manually skip the blocking SQL queries (which you don’t like doing when it happens at 3 O’Clock in the night :-) ...). A temporary solution was to whenever we wanted to rebuild the menu_router table (e.g.: you are flushing the menu cache, and believe me, this can happen a lot, even when you don’t know it), shutting down the second web instance. Wait for all traffic to be gone. If there's no traffic on the second web instance, the second web instance will not be rebuilding the menu_router table. Luckily, the load balancer would switch all traffic for us. Sessions are replicated too, so don’t worry about that one either. Then we could flush the menu cache. And when menu_router has been rebuilt, we could bring the second web instance again up.
- We were copying content files (which are located in /sites/##YOUR_SITE_NAME##/files) from server 1 to server 2 (and vice versa) using rsync. As rsync couldn't know if we were deleting a file, or if we were adding a file, we didn't allow deletes. Our htdocs folder was growing really fast. The monthly full backup was taking a lot of gigabytes.
The new architecture
So, after all these problems, we came up (also thanks to the hosting provider of my previous company) to this new setup which seems to me a better solution:
- We moved the caching mechanism to memcache. To know more about Drupal and memcache, read the documentation on: http://drupal.org/project/memcache. Memcache can store all caching data in memory. So it doesn't use the MySQL tables any longer. Memcache is just way too awesome and bloody fast :-). Also, Memcache can run in a clustered environment, so no need to manually flush the remote cache. The memcache Drupal module and memcache daemon would take care of it
- Because of the movement of caching to memcache, our databases weren’t on heavy load any longer. Those MySQL cache queries were really hard to process for our MySQL daemons. The MySQL BIN log files are growing now very slow, which is perfect!
Both MySQL daemons are clustered using MySQL High-Availability (MHA).
MHA performs automating master failover and slave promotion with minimal downtime, usually within 10-30 seconds. MHA prevents replication consistency problems and saves on expenses of having to acquire additional servers. All this with zero performance degradation, no complexity (easy-to-install) and requiring no change to existing deployments. MHA also provides scheduled online master switching, safely changing the currently running master to a new master, within mere seconds (0.5-2 seconds) of downtime (blocking writes only). MHA provides the following functionality, and can be useful in many deployments in which high availability, data integrity and near non-stop master maintenance are required.
So how does it work?
- MySQL HA serves 1 virtual database IP. Both web instances connect to that IP. In the background, 2 database instances are running in MASTER-MASTER replication, but one instance is active, the other is passive. The difference is made on a percentage. For example: database 01 is receiving percentage 150%, the other one 100%. But what does that mean? It means all queries coming from both the web instances are going to the MySQL daemon with the highest percentage. The other (passive) one is idle (no requests are coming in), but the passive one keeps track of all changes of the active one (using the well-known MySQL replication mechanism). If the active one goes out (e.g.: too much CPU load, database unavailable, ...), the percentage on the active one drops from 150% to 100%. The percentage of the passive one increases from 100% to 150%. So all requests will now be sent to the other database instance.
- This MySQL HA has one disadvantage: there’s a single point of failure. If the virtual DB IP is unreachable, all traffic to the web intances will be gone - they cannot contact the database any longer. Also, database 01 and database 02 must be strong enough to capture the MySQL CRUD queries from both web instances. But as long as you keep caching queries out of MySQL, I do think you’re fine.
File storage replication
A last improvement would be to store all data on a NAS file storage.
- The NAS storage holds all data in sites/##YOUR_SITE_NAME##/files directory. Compared with the previous solution, we don’t need to sync data again. And we can delete files, nice! I would suggest to have the source code on both web instances, and not on the file storage. This makes it possible to release new "source code" (not database!) instances of Drupal modules. Or you can quickly change some lines on a PROD environment for debugging (as long as you block traffic from visitors to that web instance of course ;-)).
- Again: one disadvantage here: if the NAS file storage goes out: no file in your files will be served. Nor by webinstance 1, nor by webinstance 2.
So, happy to have written this down :-). It took me a while! I hope you enjoyed reading this blogpost. Let me know what you think about this new architecture.
how do we configure in the drupal settings file, if we have 2 databases.
For Drupal, there will be only 1 database. Drupal will connect to the virtual DB ip. But in the backend, several databases are used. But Drupal is not aware of that, it will only see 1 database.
So configure your Drupal settings.php like you're used to configure it :-)
Great Thanks for your quick reply!
I have database1 & database2 (master & master) which mapped with an loadbalancer server, so I tried to give that loadbalancer IP address on drupal settings db_url but it was not taken because it would expect mysql username and password.
So as per your suggestion I have to configure an mysql proxy server? that would have the IP and username and password which we can use it for settings.php file db_url config?
In the blogpost above, I suggest to use MySQL HA for loadbalancing: http://dev.mysql.com/doc/mysql-ha-scalability/en/index.html.
What kind of software package do you use to loadbalance the MySQL databases 1 and 2?
We are using 'piranha' for loadbalancer.
If I change the loadbalancer to MySQL HA is this will have same as username and password mysql servers, so that I can use that credentials in drupal settings.php file for db_url?
MySQL HA is a "concept", not an implementation/solution. There are many solutions such as MySQL Replication, DRBD, Oracle VM Template, MySQL Cluster, ...
And indeed, you just use the username/password of the actual database, as the MySQL HA solution is just a virtual layer in front of the real database.
Btw, another solution which I didn't mention is "keepalived". More details on this can be found on this blogpost: http://alexzeng.wordpress.com/2012/10/31/how-to-setup-mysql-ha-by-using…
Hi Jochus, continuing on Jagadees's Comment, I am running a master-master MySQL with LVS (HA) as in i have 3 machines : 2 Real Mysql Servers (Running in master-master mode), One Real HA server on which both MySQL servers are load balanced via a VIRTUAL IP Address.
My problem is when i try and connect to the MySQL database via a simple php script on the Drupal webserver i get a positive response, but when i add the virtual server's IP address with the db name /user/pass details of the mysql server in settings.php it does not work. How do i overcome this? does drupal support Master-Master MySQL @ all? If yes, then i my case how do i achieve that?
What do you mean with: "it does not work"? Can you provide me some details on the error you're receiving?
Can you post me your settings.php $db configuration? (yes, replace URL, username and password please :-))
Apologies for posting twice, experiencing some issues with my browser.
Here are some more details :
DB Master 1
DB Master 2
HA Machine IP
Virtual IP load balancing the above mysql servers on this machine
Now, when i use either of the IP's i.e. 18.104.22.168 or 22.214.171.124 directly on the settings.php, my website loads up and runs perfectly.
When i use the Virtual IP 126.96.36.199 with the same user and pass as used in the above case, the website fails, and i get no error messages, i get the white screen of death, no errors on apache or mysql end too?
snippet from settings.php
$databases['default']['default'] = array(
'driver' => 'mysql',
'database' => 'abc',
'username' => 'user1',
'password' => 'password1',
'host' => '188.8.131.52',
'prefix' => '',
Which version of Drupal are you running? 6? 7? 8?
You should get some errors if you receive a white screen. Check your Apache HTTPD error.log or the "Recent log entries" in Drupal
No problem on your double post, easily removed :-).
Running Drupal version 7.21 to be precise.
Get no errors on HTTPD error.log :(
Not sure where do i check the "Recent log entries" can you help me with that?
Check this page for more details on error reporting: http://drupal.org/node/158043.
If this doesn't work for you, try to add ZendDebugger to your PHP environment. And then using Eclipse, you can step through your code using a debugger. In this way you will see the actual exception.
ps: I only have experience with D6. I left the Drupal world and I'm currently working on Java projects.
Many thanks, as far as i know, ERROR_Reporting is ON, however will try to look into this and get back to you.