The Database Access API
By Pete Su and
Jon Salz. Modified
by Roberto Mello.
Overview
One of ACS's great strengths is that code written for it is very close to
the database. It is very easy to interact with the database from anywhere
within ACS. Our goal is to develop a coherent API for database access which
makes this even easier.
The Old Way
Here's a typical block of code from an ACS 3.x dynamic page:
set tcl_var "foo"
set db [ns_db gethandle]
ns_db dml $db "begin transaction"
set sql "select
foo,
bar,
baz
from some_table,
some_other_table
where some_table.id = some_other_table.id
and some_table.condition_p = '$tcl_var'
"
set selection [ns_db select $db $sql]
set count 0
while { [ns_db getrow $db $selection] } {
set_variables_after_query
...
call_some_proc $foo $bar $baz
incr count
}
ns_db releasehandle $db
Writing code like this had the following annoyances:
It was repetitive, tedious and error prone to write the same type of
loops over and over again.
Using Tcl variable interpolation in a literal string, to pass values
from the page to the database, is error prone, relatively inefficient,
and a good way to compromise the security of a web site.
Magic like set_variables_after_query made code confusing.
The scope of transactions is not clear from reading the code.
Passing handles around explicitly made it easy to use them in bad
ways, like holding a handle for too long while returning data to a
user's browser.
Introduced in ACS 3.4, the new Database API is meant to save
developers from the above tedium and provide a more structured syntax
for specifying database operations, including transactions.
The New Way
Here is how you would code up the example above using the new API.
set count 0
set tcl_var "foo"
set sql {
select
foo,
bar,
baz
from some_table, some_other_table
where some_table.id = some_other_table.id
and some_table.condition_p = :tcl_var
}
db_transaction {
db_foreach my_example_query_name $sql {
...
call_some_proc $foo $bar $baz
incr count
}
}
There are several things to note here:
No explicit code for grabbing and releasing handles. Usage of the
Database API implicitly deals with all handle management issues.
The new command db_transaction makes the scope of a
transaction clear. db_transaction takes the code block
argument and automatically runs it in the context of a transaction.
The new command db_foreach writes our old while loop for us.
Every SQL query has a name, meant to be unique within the server
instance (though this is not enforced).
Finally and most importantly, there is a new scheme for passing data
from a Tcl variable to the database, which we'll cover next.
Bind Variables
Bind variables are placeholders for literal values in an SQL query
being sent to the server. Take the example query above: in the old
way, data was generally passed to Oracle directly, via Tcl string
interpolation. So in the example above, the actual query we send would
look like this:
select
foo,
bar,
baz
from some_table, some_other_table
where some_table.id=some_other_table.id
and some_table.condition_p = 'foo'
There are a few problems with this:
If the literal value is a
huge string, then we waste a lot of time in the database server doing
useless parsing.
Second, if the literal value contains characters like
single quotes, we have to be careful to double-quote them, because not
quoting them will lead to surprising errors.
Third, no type checking
occurs on the literal value. Finally, if the Tcl variable is passed in
or between web forms or otherwise subject to external modification,
there is nothing keeping malicious users from setting the Tcl variable
to some string that changes the query textually.
This type of attack,
called SQL smuggling, can be very
damaging - entire tables can be
exposed or have their contents deleted, for example. Another very
important reason for using bind variables is performance. Oracle caches
all previously parsed queries. If there are values in the where clause,
that is how the query is cached. It also performs bind variable
susbstitution after parsing the SQL statement. This means that SQL
statements that use bind variables will always match (assuming all else is
the same) while SQL statements that do not use bind variables will not
match unless the values in the statement are exactly the same. This will
improve performance considerably.
To fix all these problems, we replace literal values in the query with
a placeholder character, and then send the data along after. So the
query looks like this:
select
foo,
bar,
baz
from some_table, some_other_table
where some_table.id = some_other_table.id
and some_table.condition_p = ?
The '?' character means "This will be filled in later with literal
data". In use, you might write code that looks like this:
set statement [prepare_query "
select
foo,
bar,
baz
from some_table, some_other_table
where some_table.id = some_other_table.id
and some_table.condition_p = ?
"]
[bind_param $statement 1 $tcl_var]
The above example is meant to be psuedo-Tcl - no API like this
actually exists. What happens is that we first send the SQL statement
to the server for parsing, then later we bind values to the
placeholders, and send those values along seperately. This seperate
binding step is where the term bind variable comes from.
This split has several advantages. First, type checking happens on the
literal. If the column we are comparing against holds numbers, and we
send a string, we get a nice error. Second, since string literals are
no longer in the query, no extra quoting is required. Third,
substitution of bind variables cannot change the actual text of the
query, only the literal values in the placeholders.
The database API makes bind variables easy to use by hooking them
smoothly into the Tcl runtime. Rather than using a '?' as a generic
placeholder, you use a colon followed by the name of the Tcl variable
that you wish to pass as a literal. So here's the final, real-life
form of the example query:
select
foo,
bar,
baz
from some_table, some_other_table
where some_table.id = some_other_table.id
and some_table.condition_p = :tcl_var
The database API parses the query and pulls out all the bind variable
specifications and replaces them with generic placeholders. It then
automatically pulls the values of the named Tcl vars out of the
runtime environment of the script, and passes them to the database.
Note that while this looks like a simple syntactic change, it really
is very different from how we've written queries in the past. You use
bind variables to replace what would otherwise be a literal value in a
query, and Tcl style string interpolation does not happen. So you
cannot do something like:
set table "baz"
set condition "where foo = bar"
db_foreach my_query { select :table from some_table where :condition }
SQL will not allow a literal to occur where we've put the bind
variables, so the query is syntactically incorrect. You have to
remember that while the bind variable syntax looks similar to variable
interpolation in Tcl, it is not the same thing at all.
Finally, the DB API has several different styles for passing bind
variable values to queries. In general, use the style presented here
because it is the most convenient.
UsageEvery db_* command accepting a SQL command as an argument
supports bind variables. You can either
Specify the -bind switch to provide a set with bind variable
values, or
Specify the -bind switch to explicitly provide a list of
bind variable names and values, or
Not specify a bind variable list at all, in which case Tcl variables are
used as bind variables.
The default behavior (i.e., if the -bind switch is omitted) is
that these procedures expect to find local variables that correspond in name
to the referenced bind variables, e.g.:
set user_id 123456
set role "administrator"
db_foreach user_group_memberships_by_role {
select g.group_id, g.group_name
from user_groups g, user_group_map map
where g.group_id = map.user_id
and map.user_id = :user_id
and map.role = :role
} {
# do something for each group of which user 123456 is in the role
# of "administrator"
}
The value of the local Tcl variable user_id (123456) is bound to
the user_id bind variable.
The -bind switch can takes the name of an ns_set
containing keys for each bind variable named in the query, e.g.:
set bind_vars [ns_set create]
ns_set put $bind_vars user_id 123456
ns_set put $bind_vars role "administrator"
db_foreach user_group_memberships_by_role {
select g.group_id, g.group_name
from user_groups g, user_group_map map
where g.group_id = map.user_id
and map.user_id = :user_id
and map.role = :role
} -bind $bind_vars {
# do something for each group in which user 123456 has the role
# of "administrator"
}
Alternatively, as an argument to -bind you can specify a list of
alternating name/value pairs for bind variables:
db_foreach user_group_memberships_by_role {
select g.group_id, g.group_name
from user_groups g, user_group_map map
where g.group_id = map.user_id
and map.user_id = :user_id
and map.role = :role
} -bind [list user_id 123456 role "administrator"] {
# do something for each group in which user 123456 has the role
# of "administrator"
}
Nulls and Bind Variables
When processing a DML statement, Oracle coerces empty strings into
null. (This coercion does not occur in the
WHERE clause of a query, i.e.
col = '' and
col is null are not equivalent.)
As a result, when using bind variables, the only way to make Oracle set a
column value to null is to set the corresponding bind variable
to the empty string, since a bind variable whose value is the string
"null" will be interpreted as the literal string
"null".These Oracle quirks complicate the process of writing clear and abstract
DML difficult. Here is an example that illustrates why:
#
# Given the table:
#
# create table foo (
# bar integer,
# baz varchar(10)
# );
#
set bar ""
set baz ""
db_dml foo_create "insert into foo(bar, baz) values(:bar, :baz)"
#
# the values of the "bar" and "baz" columns in the new row are both
# null, because Oracle has coerced the empty string (even for the
# numeric column "bar") into null in both cases
Since databases other than Oracle do not coerce empty strings into
null, this code has different semantics depending on the
underlying database (i.e., the row that gets inserted may not have null as
its column values), which defeats the purpose of SQL abstraction.
Therefore, the Database Access API provides a database-independent way to
represent null (instead of the Oracle-specific idiom of the
empty string): db_null.Use it instead of the empty string whenever you want to set a column value
explicitly to null, e.g.:
set bar [db_null]
set baz [db_null]
db_dml foo_create "insert into foo(bar, baz) values(:bar, :baz)"
#
# sets the values for both the "bar" and "baz" columns to null
Sequence Pooling
The database library can transparently maintain pools of sequence values, so
that each request for a new sequence value (using db_nextval)
does not incur a roundtrip to the server. For instance, this functionality is
very useful in the security/sessions library, which very frequently allocates
values from the sec_id_seq sequence. To utilize this
functionality for a particular sequence, register the sequence to be pooled,
either using the db_register_pooled_sequence procedure at server
startup time, or by including a configuration parameter of the form
PoolSequence.sequence_name_seq=count
in any configuration section in the yourservername.ini
file, e.g., e.g.,
[ns/server/yourservername/acs/security]
PoolSequence.sec_id_seq=20
The database library will allocate this number of sequence values at server
startup. It will periodically scan pools and allocate new values for
sequences which are less than half-full. (This normally occurs every 60
seconds, and is configurable via the
PooledSequenceUpdateInterval parameter in the
[ns/server/yourservername/acs/database] configuration
section.)
Basic API
The Database API has several functions that wrap familiar parts of the
AOLserver database API.
Note that you never have to use ns_db anymore (including
ns_db gethandle)! Just start doing stuff, and (if you want) call
db_release_unused_handles when you're done as a hint to
release the database handle.
db_abort_transaction
db_abort_transactionAborts all levels of a transaction. That is if this is called within
several nested transactions, all of them are terminated. Use this insetead of
db_dml "abort" "abort transaction".
db_null
db_null
Returns a value which can be used in a bind variable
to represent the SQL value
null.
See Nulls and
Bind Variables above.
db_foreach
db_foreachstatement-name sql [ -bind bind_set_id | -bind bind_value_list ] \
[ -column_array array_name | -column_set set_name ] \
code_block [ if_no_rows if_no_rows_block ]
Performs the SQL query sql, executing
code_block
once for each row
with variables set to column values (or a set or array
populated if
-column_array or
column_set is
specified). If the query returns no rows, executes
if_no_rows_block
(if provided).
Example:
db_foreach select_foo "select foo, bar from greeble" {
doc_body_append "<li>foo=$foo; bar=$bar\n"
} if_no_rows {
doc_body_append "<li>There are no greebles in the database.\n"
}
The code block may contain break statements (which terminate the
loop and flush the database handle) and continue statements
(which continue to the next row of the loop). db_1row
db_1rowstatement-namesql [ -bind bind_set_id | -bind bind_value_list ] \
[ -column_array array_name | -column_set set_name ]
Performs the SQL query sql,
setting variables to column values. Raises an error if the
query does not return exactly 1 row.
Example:
db_1row select_foo "select foo, bar from greeble where greeble_id = $greeble_id"
# Bombs if there's no such greeble!
# Now $foo and $bar are set.
db_0or1row
db_0or1rowstatement-namesql [ -bind bind_set_id | -bind bind_value_list ] \
[ -column_array array_name | -column_set set_name ]
Performs the SQL query
sql.
If a row is returned, sets variables to column values and
returns 1. If no rows are returned, returns 0. If more
than one row is returned, throws an error.
db_nextvaldb_nextvalsequence-name
Returns the next value for the sequence sequence-name (using a
SQL statement like SELECTsequence-name.nextval FROM
DUAL). If sequence pooling is enabled for the sequence, transparently
uses a value from the pool if available to save a round-trip to the database
(see ).
db_register_pooled_sequence
db_register_pooled_sequencesequence-namepool-sizeRegisters the sequence sequence-name to be pooled, with a pool
size of pool-size sequence values
(see ).
db_stringdb_stringstatement-namesql [ -default default ] [ -bind bind_set_id | -bind bind_value_list ]
Returns the first column of the result of SQL query
sql.
If sql doesn't return a
row, returns
default
(or throws an error if
default is unspecified). Analogous to
database_to_tcl_string and
database_to_tcl_string_or_null.
db_listdb_liststatement-namesql [ -bind bind_set_id | -bind bind_value_list ]
Returns a Tcl list of the values in the first column of the result of SQL
query
sql.
If sql doesn't
return any rows, returns an empty list. Analogous to
database_to_tcl_list.
db_list_of_listsdb_list_of_listsstatement-namesql [ -bind bind_set_id | -bind bind_value_list ]
Returns a Tcl list, each element of which is a list of all column values
in a row of the result of SQL query sql. If
sql doesn't return any rows, returns an empty list.
(Analogous to database_to_tcl_list_list.)
db_dmldb_dmlstatement-namesql \
[ -bind bind_set_id | -bind bind_value_list ] \
[ -blobs blob_list | -clobs clob_list |
-blob_files blob_file_list | -clob_files clob_file_list ]
Performs the DML or DDL statement sql. If a length-n list of blobs or clobs is provided, then the SQL
should return n blobs or clobs into the bind variables
:1, :2, ... :n.
blobs or clobs, if specified,
should be a list of individual BLOBs or CLOBs to insert;
blob_files or clob_files, if
specified, should be a list of paths to files containing the data to
insert. Only one of -blobs, -clobs,
-blob_files, and -clob_files may be provided.Example:
db_dml insert_photos "
insert photos(photo_id, image, thumbnail_image)
values(photo_id_seq.nextval, empty_blob(), empty_blob())
returning image, thumbnail_image into :1, :2
" -blob_files [list "/var/tmp/the_photo" "/var/tmp/the_thumbnail"]
This inserts a new row into the photos table, with the contents
of the files /var/tmp/the_photo and
/var/tmp/the_thumbnail in the image and
thumbnail columns, respectively.
db_write_clob,
db_write_blob,
db_blob_get_filedb_write_clobstatement-namesql [ -bind bind_set_id | -bind bind_value_list ]
db_write_blobstatement-namesql [ -bind bind_set_id | -bind bind_value_list ]
db_blob_get_filestatement-namesql [ -bind bind_set_id | -bind bind_value_list ]
Analagous to ns_ora write_clob/write_blob/blob_get_file.
db_release_unused_handlesdb_release_unused_handlesReleases any allocated, unused database handles. db_transactiondb_transactioncode_block [ on_error { code_block } ]
Executes code_block transactionally. Nested
transactions are supported (end transaction is transparently
ns_db dml'ed when the outermost transaction completes). The
db_abort_transaction command can be used to abort all levels of
transactions. It is possible to specify an optional on_error
code block that will be executed if some code in code_block throws
an exception. The variable errmsg will be bound in that scope.
If there is no on_error code, any errors will be propagated. Example:
proc replace_the_foo { col } {
db_transaction {
db_dml "delete from foo"
db_dml "insert into foo(col) values($col)"
}
}
proc print_the_foo {} {
doc_body_append "foo is [db_string "select col from foo"]<br>\n"
}
replace_the_foo 8
print_the_foo ; # Writes out "foo is 8"
db_transaction {
replace_the_foo 14
print_the_foo ; # Writes out "foo is 14"
db_dml "insert into some_other_table(col) values(999)"
...
db_abort_transaction
} on_error {
doc_body_append "Error in transaction: $errmsg"
}
print_the_foo ; # Writes out "foo is 8"
db_resultrowsdb_resultrowsReturns the number of rows affected or returned by the previous
statement.
db_with_handledb_with_handlevarcode_blockPlaces a database handle into the variable var and
executes code_block. This is useful when you don't
want to have to use the new API (db_foreach,
db_1row, etc.), but need to use database handles explicitly. Example:
proc lookup_the_foo { foo } {
db_with_handle db {
return [db_string unused "select ..."]
}
}
db_with_handle db {
# Now there's a database handle in $db.
set selection [ns_db select $db "select foo from bar"]
while { [ns_db getrow $db $selection] } {
set_variables_after_query
lookup_the_foo $foo
}
}
db_nullify_empty_string
db_nullify_empty_stringstringFor true SQL purists, we provide the convenience function
db_nullify_empty_string, which returns
[db_null] if its string argument is the empty string
and can be used to encapsulate another Oracle quirk:
set baz ""
# Clean out the foo table
#
db_dml unused "delete from foo"
db_dml unused "insert into foo(baz) values('$baz')"
set n_rows [db_string unused "select count(*) from foo where baz is null"]
#
# $n_rows is 1; in effect, the "baz is null" criterion is matching
# the empty string we just inserted (because of Oracle's coercion
# quirk)
To balance out this asymmetry, you can explicitly set baz to
null by writing:
db_dml foo_insert "insert into foo(baz) values(:1)" {[db_nullify_empty_string $baz]}
Implementation Design (work in progress)The ideas here are preliminary, so please send feedback to michael@arsdigita.com. There may well
be a much simpler, superior design that I (Michael) am just missing right
now. If so, please let me know!The basic idea is to translate the logical
statement-name into an actual SQL statement, written in
the appropriate SQL dialect for the RDBMS that is in use. The
sql argument is essentially a convenience that enables
the SQL for the "default dialect" to be written inline. For 3.4, we
will probably use configuration parameters to tell the Database Access API
what the default dialect is and what dialect is actually in use:
[ns/server/server_name/acs]
...
DefaultSQLDialect=oracle8
SQLDialect=postgres7
(An alternative approach would be to use the ACS Package Manager, i.e.,
install a "pseudo-package" with no actual code to indicate what
RDBMS is installed. Then, the Database Access API could query the APM to
figure what SQL dialect to employ.)
For instructing the Database Access API to translate a named statement in
a specific SQL dialect, we may define a new API call:statement_locationstatement_namesql_dialectsql
which would be called at server initialization time. The Database Access API
will then know to use the SQL statement appropriate for the specified
SQLDialect. (The name db_implement_statement is
very tentative.) Issues:
Is making the caller of db_implement_statement explicitly
specify the statement location (e.g., "/bboard/q-and-a") too much
of a pain? Can we make this more convenient somehow?
In the case that the inline SQL is not in the specified
SQLDialect, reading the rewritten SQL into memory for the life
of the server may not be a good idea. The three basic approaches I can think
of to implement the
db_implement_statement API are:
Cache the rewritten SQL for the appropriate SQL dialect in an
nsv array
Cache the rewritten SQL for the appropriate SQL dialect in a special
database table that we keep pinned in memory
Cache the rewritten SQL for the appropriate SQL dialect in a special
file, maybe even a DBM file
Given the above two issues, should we rethink the
db_implement_statement API
altogether?
One possibility is a file-based approach, where the alternative SQL
statements would live in conventionally named and located files, e.g.,
/bboard/q-and-a.postgres7 would contain Postgres 7 versions of
the SQL statements in /bboard/q-and-a.tcl.) A potential con of
this approach is that the Database Access API would have to perform file I/O
for every SQL statement that's been rewritten. This may be a non-issue; I
don't actually know. (We could augment this approach with caching too,
perhaps a fixed-size LRU cache.)Another similar approach would be just to have one massive, magic file for
each SQL dialect that maps each statement identifier (location plus name) to
the corresponding statement.Another larger problem is the fact that this design does not work for
instances where we build a SQL statement based on control flow logic, e.g.,
we sometimes join in an extra table based on the user input. This problem
doesn't mean that the design as a whole is broken; it just means that
this design alone does not get us all the way to full SQL abstraction.
Version 2.1 of the ArsDigita Oracle
Driver adds a set of ns_ora analogs for the following
ns_db calls:
0or1row, 1row,
select, and
dml.
(It also adds ns_ora
array_dml.) Thus, the groundwork for implementing the above API for
ACS/Oracle is already established. ($Id: db-api.xml,v 1.2 2001/12/24 19:31:42 rmello Exp $)