<HEAD><TITLE>Why Not MySQL?</TITLE></HEAD>
<BODY bgcolor=white>
<h2>Why Not MySQL?</h2>
by <a href=mailto:ben@mit.edu>Ben Adida</a>, part of the <a href=/>OpenACS Project</a>.
<hr><p>

It happens almost every week, and sometimes more often: someone wants
to know why we're not using MySQL as the RDBMS behind <a href=/>OpenACS</a>. The ACS
Classic team (<a href=http://www.arsdigita.com>ArsDigita</a>) gets the same question over and over again on
their <a href=http://photo.net/bboard>Discussion Forums</a>. If it's
good enough for <a href=http://www.slashdot.org>Slashdot</a>, it must
be good enough for OpenACS, right?
<p>
<b>Wrong</b>. This quick paper attempts to explain why MySQL is not just the
wrong choice for OpenACS, but how it should never be used by any
system that handles critical data.
<p>

<h3>The Purpose of an RDBMS</h3>
An RDBMS exists for the purpose of providing a reliable permanent
storage mechanism with very strict properties embodied in the ACID
test. I will quote directly from Philip Greenspun's <a
href=http://photo.net/wtr/aolserver/introduction-2.html>great explanation</a>
<p>

<blockquote>
<dl>
<dt>Atomicity
<dd>

Results of a transaction's execution are either all committed or all
rolled back. All changes take effect, or none do. Suppose that a user
is
editing a comment.  A Web script tells the database to "copy the old
comment value to an audit table and update the live table with the new
text".  If the hard drive fills up after the copy but before the
update,
the audit table insertion will be rolled back.


<dt>

Consistency 



<dd>


The database is transformed from one valid state to another valid
state. A transaction is legal only if it obeys user-defined integrity
constraints. Illegal transactions aren't allowed and, if an integrity
constraint can't be satisfied the transaction is rolled back. For
example, suppose that you define a rule that postings in a discussion
forum table must be tied to a valid user ID.  Then you hire Joe Novice
to write some admin pages.  Joe writes a delete-user page that doesn't
bother to check whether or not the deletion will result in an orphaned
discussion forum posting.  Oracle will check, though, and abort any
transaction that would result in you having a discussion forum posting
by a deleted user.

<dt>

Isolation


<dd>

The results of a transaction are invisible to other transactions until
the transaction is complete. For example, suppose you have a page to
show new users and their photographs.  This page is coded in reliance
on
the publisher's directive that there will be a mugshot for every user
and will present a broken image if there is not.  Jane Newuser is
registering at your site at the same time that Bill Olduser is viewing
the new user page.  The script processing Jane's registration does
inserts into several tables: <code>users, mugshots,
users_demographics</code>.  This may take some time if Jane's mugshot
is
large.  If Bill's query starts before Jane's transaction commits, Bill
won't see Jane at all on his new-users page, even if Jane's insertion
into some of the tables is complete.

<dt>

Durability 


<dd>

Once committed (completed), the results of a transaction are permanent
and survive future system and media failures. Suppose your ecommerce
system inserts an order from a customer into a database table and then
instructs CyberCash to bill the customer $500.  A millisecond later,
before your server has heard back from CyberCash, someone trips over
the machine's power cord.  Oracle will not have forgotten about the
new
order.  Furthermore, if a programmer spills coffee into a disk
drive, it will be possible to install a new disk and recover the
transactions up to the coffee spill, showing that you tried to bill
someone for $500 and still aren't sure what happened over at
CyberCash.

</dl>
</blockquote>
<p>
 
If what you want is raw, fast storage, use a filesystem. If you want
to share it among multiple boxes, use NFS. If you want simple
reliability against simplistic failure, use mirroring. Want a SQL
interface to it all? Use MySQL.
<p>
Now, if what you want is data storage that guarantees a certain number
of invariants in your data set, that allows for complex operations on
this data without ever violating those constraints, that isolates
simultaneous users from each other's partial work, and that recovers
smoothly from just about any kind of failure, then get your self a
real RDBMS. Yes, it will be slower than the MySQL file system. Just
like TCP is slower than UDP, while providing better service guarantees.

<h3>The State and Future of MySQL</h3>

Building a true RDBMS is a tough problem, probably tougher than any
other systems issue. Most products on the market (Oracle, Sybase,
PostgreSQL, Interbase) have been in development for years, sometimes
more than 10 or 15.

<p>

MySQL claims that they have compromised on certain features to
guarantee better performance. While this may be an interesting way to
track non-critical data such as clickthrough tracking, compromising on
perfect data integrity is not acceptable, even for speed, when dealing
with critical data.

<p>

The OpenACS team is happy to take a closer look at MySQL as it
matures. However, it doesn't seem that the MySQL team understands the
concepts and importance of true ACID capabilities: The <a
href=http://web.mysql.com/Manual_chapter/manual_TODO.html#TODO>MySQL
Todo</a> mentions "transactions" in a long list that includes
questions such as "do sleeping threads take CPU." Furthermore, the <a
href=http://web.mysql.com/Manual_chapter/manual_Compatibility.html#Missing_functions>MySQL
manual</A> claims that MySQL will soon implement "atomic operations"
through the use of table locks, but <i>without rollback</i>. This is a
blatant misuse of the term "atomic," which implies that either none or
all operations will complete. A hardware or power failure in the
middle of a set of statements will break the atomicity of the block
if there is no rollback capability.

<p>

<i>Rollback is not just a convenient feature, it is a critical basis
for solid data storage</i>.

<p>

There are very good reasons for using MySQL. A need for a reliable,
ACID-compliant datastore isn't one of them.

<h3>A Few More Details</h3>
<ul>

<li> <b>MySQL has no subqueries</b>. <br>
Instead of performing one complex query that is entirely processed on
the database end, MySQL users have to perform 2 or more serial queries
that each must go over inter-process or network communication between
the app and the database. This significantly reduces the speed
advantages of MySQL.

<li> <b>MySQL has no stored procedures</b>.<br> 
If a series of DB actions need to be performed in a block, MySQL
requires each SQL statement to be sent from the app, again in a serial
manner, again over IPC or network.

<li> <b>MySQL has no triggers or foreign key constraints</b>. <br>
Data invariants must be maintained by application-level code, which
requires building carefully-planned abstractions to guarantee
integrity (for every means of accessing your DB), and even more
unnecessary back-and-forth communication between the app and the
database.

<li> <b>MySQL only has table-level locking</b>. <br>
Only one user can write to a table at the same time. For web usage,
that falls under the category of "pathetic."

</ul>

<h3>The Bottom Line</h3> 

An enterprise-level system will never compromise certain features for
speed. The ACID properties of an RDBMS are an absolute necessity for
any critical data. Critical web sites that run on non-ACID-compliant
systems are asking for trouble.
<p>
The OpenACS project refuses to break with the important principles of
the ACID test. We are out to build an enterprise-level Open Source web
toolkit. <a href=http://www.postgresql.org>PostgreSQL</a> and soon <a
href=http://www.interbase.com>InterBase</a> are appropriate RDBMS
candidates for this project. MySQL is just a glorified filesystem with
a SQL interface.
<p>



<p>
<hr>
<address>ben@mit.edu</address>
</BODY>