Why Not MySQL?

by Ben Adida, part of the OpenACS Project.

It happens almost every week, and sometimes more often: someone wants to know why we're not using MySQL as the RDBMS behind OpenACS. The ACS Classic team (ArsDigita) gets the same question over and over again on their Discussion Forums. If it's good enough for Slashdot, it must be good enough for OpenACS, right?

Wrong. This quick paper attempts to explain why MySQL is not just the wrong choice for OpenACS, but how it should never be used by any system that handles critical data.

The Purpose of an RDBMS

An RDBMS exists for the purpose of providing a reliable permanent storage mechanism with very strict properties embodied in the ACID test. I will quote directly from Philip Greenspun's great explanation

Atomicity
Results of a transaction's execution are either all committed or all rolled back. All changes take effect, or none do. Suppose that a user is editing a comment. A Web script tells the database to "copy the old comment value to an audit table and update the live table with the new text". If the hard drive fills up after the copy but before the update, the audit table insertion will be rolled back.
Consistency
The database is transformed from one valid state to another valid state. A transaction is legal only if it obeys user-defined integrity constraints. Illegal transactions aren't allowed and, if an integrity constraint can't be satisfied the transaction is rolled back. For example, suppose that you define a rule that postings in a discussion forum table must be tied to a valid user ID. Then you hire Joe Novice to write some admin pages. Joe writes a delete-user page that doesn't bother to check whether or not the deletion will result in an orphaned discussion forum posting. Oracle will check, though, and abort any transaction that would result in you having a discussion forum posting by a deleted user.
Isolation
The results of a transaction are invisible to other transactions until the transaction is complete. For example, suppose you have a page to show new users and their photographs. This page is coded in reliance on the publisher's directive that there will be a mugshot for every user and will present a broken image if there is not. Jane Newuser is registering at your site at the same time that Bill Olduser is viewing the new user page. The script processing Jane's registration does inserts into several tables: users, mugshots, users_demographics. This may take some time if Jane's mugshot is large. If Bill's query starts before Jane's transaction commits, Bill won't see Jane at all on his new-users page, even if Jane's insertion into some of the tables is complete.
Durability
Once committed (completed), the results of a transaction are permanent and survive future system and media failures. Suppose your ecommerce system inserts an order from a customer into a database table and then instructs CyberCash to bill the customer $500. A millisecond later, before your server has heard back from CyberCash, someone trips over the machine's power cord. Oracle will not have forgotten about the new order. Furthermore, if a programmer spills coffee into a disk drive, it will be possible to install a new disk and recover the transactions up to the coffee spill, showing that you tried to bill someone for $500 and still aren't sure what happened over at CyberCash.

If what you want is raw, fast storage, use a filesystem. If you want to share it among multiple boxes, use NFS. If you want simple reliability against simplistic failure, use mirroring. Want a SQL interface to it all? Use MySQL.

Now, if what you want is data storage that guarantees a certain number of invariants in your data set, that allows for complex operations on this data without ever violating those constraints, that isolates simultaneous users from each other's partial work, and that recovers smoothly from just about any kind of failure, then get your self a real RDBMS. Yes, it will be slower than the MySQL file system. Just like TCP is slower than UDP, while providing better service guarantees.

The State and Future of MySQL

Building a true RDBMS is a tough problem, probably tougher than any other systems issue. Most products on the market (Oracle, Sybase, PostgreSQL, Interbase) have been in development for years, sometimes more than 10 or 15.

MySQL claims that they have compromised on certain features to guarantee better performance. While this may be an interesting way to track non-critical data such as clickthrough tracking, compromising on perfect data integrity is not acceptable, even for speed, when dealing with critical data.

The OpenACS team is happy to take a closer look at MySQL as it matures. However, it doesn't seem that the MySQL team understands the concepts and importance of true ACID capabilities: The MySQL Todo mentions "transactions" in a long list that includes questions such as "do sleeping threads take CPU." Furthermore, the MySQL manual claims that MySQL will soon implement "atomic operations" through the use of table locks, but without rollback. This is a blatant misuse of the term "atomic," which implies that either none or all operations will complete. A hardware or power failure in the middle of a set of statements will break the atomicity of the block if there is no rollback capability.

Rollback is not just a convenient feature, it is a critical basis for solid data storage.

There are very good reasons for using MySQL. A need for a reliable, ACID-compliant datastore isn't one of them.

A Few More Details

The Bottom Line

An enterprise-level system will never compromise certain features for speed. The ACID properties of an RDBMS are an absolute necessity for any critical data. Critical web sites that run on non-ACID-compliant systems are asking for trouble.

The OpenACS project refuses to break with the important principles of the ACID test. We are out to build an enterprise-level Open Source web toolkit. PostgreSQL and soon InterBase are appropriate RDBMS candidates for this project. MySQL is just a glorified filesystem with a SQL interface.


ben@mit.edu