The OpenACS Database Access API

By Pete Su and Jon Salz. Modified by Roberto Mello.

Overview

One of OpenACS's great strengths is that code written for it is very close to the database. It is very easy to interact with the database from anywhere within OpenACS, and we have a coherent API for database access which makes this even easier.

More detailed information about the DB api is available at Database Access API.

DB API Examples

The OpenACS database API is meant to save developers from making common mistakes and to provide a more structured syntax for specifying database operations, including transactions. Here's an example of the API.

set count 0
set tcl_var "foo"
set sql {
      	SELECT foo, bar, baz
       FROM some_table, some_other_table
       WHERE some_table.id = some_other_table.id
         and some_table.condition_p = :tcl_var
}

db_transaction {
    db_foreach my_example_query_name $sql {
        lappend rows [list $foo $bar $baz]
        incr count
    }
    foreach row $rows { 
        call_some_proc $foo $bar $baz
    }
}

There are several things to note here:

  1. No explicit code for grabbing and releasing handles. Usage of the Database API implicitly deals with all handle management issues.

  2. The db_transaction command makes the scope of a transaction clear; db_transaction takes the code block argument and automatically runs it in the context of a transaction. If you use something like db_foreach though, you need to make sure that there are no calls in the code block which would take a second db handle since the transaction is only valid for one handle (thats why we build up a list of returned values and call a second proc outside the db_foreach loop).

  3. The command db_foreach writes our old while loop for us.

  4. Every SQL query has a name, which is used in conjunction with .XQL files to support multiple databases.

  5. Finally and most importantly, there API implements bind variables, which we will cover next.

Bind Variables

Bind variables are placeholders for literal values in an SQL query being sent to the server. In the old way, data was generally passed to directly to the DB backend, via Tcl string interpolation. In the example above, the query would look like:

select foo, bar, baz 
from some_table, some_other_table
where some_table.id=some_other_table.id  
and some_table.condition_p = '$foo'

There are a few problems with this:

  1. If the value of $foo is a huge string, then we waste a lot of time in the database server doing useless parsing.

  2. Second, if the literal value contains characters like single quotes, we have to be careful to properly escape them, because not quoting them will lead to surprising errors.

  3. Third, no type checking occurs on the literal value. Finally, if the Tcl variable is passed in or between web forms or otherwise subject to external modification, there is nothing keeping malicious users from setting the Tcl variable to some string that changes the query textually. This type of attack, called SQL smuggling, can be very damaging - entire tables can be exposed or have their contents deleted, for example.

    Another very important reason for using bind variables is performance. Oracle can cache previously parsed queries. If there are values in the where clause, that is how the query is cached. It also performs bind variable susbstitution after parsing the SQL statement. This means that SQL statements that use bind variables will always match (assuming all else is the same) while SQL statements that do not use bind variables will not match unless the values in the statement are exactly the same. This will improve the query cache considerably, which can make the server much more efficient.

What the DB API (in conjuntion with the database drivers implemented for aolserver) do is send the SQL statement to the server for parsing, then bind values to the variables and sends those values along seperately as a second step. This seperate binding step is where the term bind variable comes from.

This split has several advantages. First, type checking happens on the literal. If the column we are comparing against holds numbers, and we send a string, we get a nice error. Second, since string literals are no longer in the query, no extra quoting is required. Third, substitution of bind variables cannot change the actual text of the query, only the literal values in the placeholders. The database API makes bind variables easy to use by hooking them smoothly into the Tcl runtime so you simply provide :tclvar and the value of $tclvar is sent to the backend to actually execute the query.

The database API parses the query and pulls out all the bind variable specifications and replaces them with generic placeholders. It then automatically pulls the values of the named Tcl vars out of the runtime environment of the script, and passes them to the database.

Note that while this looks like a simple syntactic change, it really is very different from how interpolated text queries work. You use bind variables to replace what would otherwise be a literal value in a query, and Tcl style string interpolation does not happen. So you cannot do something like:

set table "baz"
set condition "where foo = bar"

db_foreach my_query { select :table from some_table where :condition }
    

SQL will not allow a literal to occur where we've put the bind variables, so the query is syntactically incorrect. You have to remember that while the bind variable syntax looks similar to variable interpolation in Tcl, It is not the same thing at all.

Finally, the DB API has several different styles for passing bind variable values to queries. In general, use the style presented here because it is the most convenient.

Usage

Every db_* command accepting a SQL command as an argument supports bind variables. You can either

  • Specify the -bind switch to provide a set with bind variable values, or

  • Specify the -bind switch to explicitly provide a list of bind variable names and values, or

  • Not specify a bind variable list at all, in which case Tcl variables are used as bind variables.

The default behavior (i.e., if the -bind switch is omitted) is that these procedures expect to find local variables that correspond in name to the referenced bind variables, e.g.:


set user_id 123456
set role "administrator"

db_foreach user_group_memberships_by_role {
    select g.group_id, g.group_name
    from user_groups g, user_group_map map
    where g.group_id = map.user_id
    and map.user_id = :user_id
    and map.role = :role
} {
    # do something for each group of which user 123456 is in the role
    # of "administrator"
}

      

The value of the local Tcl variable user_id (123456) is bound to the user_id bind variable.

The -bind switch can takes the name of an ns_set containing keys for each bind variable named in the query, e.g.:


set bind_vars [ns_set create]
ns_set put $bind_vars user_id 123456
ns_set put $bind_vars role "administrator"

db_foreach user_group_memberships_by_role {
    select g.group_id, g.group_name
    from user_groups g, user_group_map map
    where g.group_id = map.user_id
    and map.user_id = :user_id
    and map.role = :role
} -bind $bind_vars {
    # do something for each group in which user 123456 has the role
    # of "administrator"
}

      

Alternatively, as an argument to -bind you can specify a list of alternating name/value pairs for bind variables:

	
db_foreach user_group_memberships_by_role {
    select g.group_id, g.group_name
    from user_groups g, user_group_map map
    where g.group_id = map.user_id
    and map.user_id = :user_id
    and map.role = :role
} -bind [list user_id 123456 role "administrator"] {
    # do something for each group in which user 123456 has the role
    # of "administrator"
}

      

Nulls and Bind Variables

When processing a DML statement, Oracle coerces empty strings into null. (This coercion does not occur in the WHERE clause of a query, i.e. col = '' and col is null are not equivalent.)

As a result, when using bind variables, the only way to make Oracle set a column value to null is to set the corresponding bind variable to the empty string, since a bind variable whose value is the string "null" will be interpreted as the literal string "null".

These Oracle quirks complicate the process of writing clear and abstract DML difficult. Here is an example that illustrates why:


#
# Given the table:
#
#   create table foo (
#           bar        integer,
#           baz        varchar(10)
#   );
#

set bar ""
set baz ""

db_dml foo_create "insert into foo(bar, baz) values(:bar, :baz)"
#
# the values of the "bar" and "baz" columns in the new row are both
# null, because Oracle has coerced the empty string (even for the
# numeric column "bar") into null in both cases

      

Since databases other than Oracle do not coerce empty strings into null, this code has different semantics depending on the underlying database (i.e., the row that gets inserted may not have null as its column values), which defeats the purpose of SQL abstraction.

Therefore, the Database Access API provides a database-independent way to represent null (instead of the Oracle-specific idiom of the empty string): db_null.

Use it instead of the empty string whenever you want to set a column value explicitly to null, e.g.:

set bar [db_null]
set baz [db_null]

db_dml foo_create "insert into foo(bar, baz) values(:bar, :baz)"
#
# sets the values for both the "bar" and "baz" columns to null

Sequence Pooling

The database library can transparently maintain pools of sequence values, so that each request for a new sequence value (using db_nextval) does not incur a roundtrip to the server. For instance, this functionality is very useful in the security/sessions library, which very frequently allocates values from the sec_id_seq sequence. To utilize this functionality for a particular sequence, register the sequence to be pooled, either using the db_register_pooled_sequence procedure at server startup time, or by including a configuration parameter of the form


PoolSequence.sequence_name_seq=count

    

in any configuration section in the yourservername.ini file, e.g.,


[ns/server/yourservername/acs/security]
PoolSequence.sec_id_seq=20

    

The database library will allocate this number of sequence values at server startup. It will periodically scan pools and allocate new values for sequences which are less than half-full. (This normally occurs every 60 seconds, and is configurable via the PooledSequenceUpdateInterval parameter in the [ns/server/ yourservername /acs/database] configuration section.)

Basic API

The Database API has several functions that wrap familiar parts of the AOLserver database API.

Note that you never have to use ns_db anymore (including ns_db gethandle)! Just start doing stuff, and (if you want) call db_release_unused_handles when you're done as a hint to release the database handle.

db_abort_transaction
db_abort_transaction
	  

Aborts all levels of a transaction. That is if this is called within several nested transactions, all of them are terminated. Use this insetead of db_dml "abort" "abort transaction".

db_multirow
db_multirow [ -local ] [ -append ] [ -extend column_list ] \
    var-name statement-name sql \
    [ -bind bind_set_id | -bind bind_value_list ] \
    code_block [ if_no_rows if_no_rows_block ]
	

Performs the SQL query sql, saving results in variables of the form var_name:1, var_name:2, etc, setting var_name:rowcount to the total number of rows, and setting var_name:columns to a list of column names.

Each row also has a column, rownum, automatically added and set to the row number, starting with 1. Note that this will override any column in the SQL statement named 'rownum', also if you're using the Oracle rownum pseudo-column.

If the -local is passed, the variables defined by db_multirow will be set locally (useful if you're compiling dynamic templates in a function or similar situations).

You may supply a code block, which will be executed for each row in the loop. This is very useful if you need to make computations that are better done in Tcl than in SQL, for example using ns_urlencode or ad_quotehtml, etc. When the Tcl code is executed, all the columns from the SQL query will be set as local variables in that code. Any changes made to these local variables will be copied back into the multirow.

You may also add additional, computed columns to the multirow, using the -extend { col_1 col_2 ... } switch. This is useful for things like constructing a URL for the object retrieved by the query.

If you're constructing your multirow through multiple queries with the same set of columns, but with different rows, you can use the -append switch. This causes the rows returned by this query to be appended to the rows already in the multirow, instead of starting a clean multirow, as is the normal behavior. The columns must match the columns in the original multirow, or an error will be thrown.

Your code block may call continue in order to skip a row and not include it in the multirow. Or you can call break to skip this row and quit looping.

Notice the nonstandard numbering (everything else in Tcl starts at 0); the reason is that the graphics designer, a non programmer, may wish to work with row numbers.

Example:

db_multirow -extend { user_url } users users_query {
    select user_id first_names, last_name, email from cc_users
} {
    set user_url [acs_community_member_url -user_id $user_id]
}
    

You can also iterate over a multirow after it has been created - check the documentation for template::multirow

For example,

db_multirow assets assets {
  select asset_id,
    from ...
}

..

set asset_id_l [list]
multirow foreach assets {
  lappend asset_id_l $asset_id
}
          

Technically it's equivalent to using a code block on the end of your db_multirow.

db_null
db_null
	  

Returns a value which can be used in a bind variable to represent the SQL value null. See Nulls and Bind Variables above.

db_foreach
db_foreach statement-name sql [ -bind bind_set_id | -bind bind_value_list ] \
    [ -column_array array_name | -column_set set_name ] \
    code_block [ if_no_rows if_no_rows_block ]
	  

Performs the SQL query sql , executing code_block once for each row with variables set to column values (or a set or array populated if -column_array or column_set is specified). If the query returns no rows, executes if_no_rows_block (if provided).

Example:


db_foreach select_foo "select foo, bar from greeble" {
    doc_body_append "<li>foo=$foo; bar=$bar\n"
} if_no_rows {
    doc_body_append "<li>There are no greebles in the database.\n"
}

	  

The code block may contain break statements (which terminate the loop and flush the database handle) and continue statements (which continue to the next row of the loop).

db_1row
db_1row statement-name sql [ -bind bind_set_id | -bind bind_value_list ] \
    [ -column_array array_name | -column_set set_name ]
	  

Performs the SQL query sql, setting variables to column values. Raises an error if the query does not return exactly 1 row.

Example:


db_1row select_foo "select foo, bar from greeble where greeble_id = $greeble_id"
# Bombs if there's no such greeble!
# Now $foo and $bar are set.

	  
db_0or1row
db_0or1row statement-name sql [ -bind bind_set_id | -bind bind_value_list ] \
    [ -column_array array_name | -column_set set_name ]
	  

Performs the SQL query sql. If a row is returned, sets variables to column values and returns 1. If no rows are returned, returns 0. If more than one row is returned, throws an error.

db_nextval
db_nextval sequence-name
	  

Returns the next value for the sequence sequence-name (using a SQL statement like SELECT sequence-name.nextval FROM DUAL). If sequence pooling is enabled for the sequence, transparently uses a value from the pool if available to save a round-trip to the database (see Sequence Pooling).

db_register_pooled_sequence
db_register_pooled_sequence sequence-name pool-size
	  

Registers the sequence sequence-name to be pooled, with a pool size of pool-size sequence values (see Sequence Pooling).

db_string
db_string statement-name sql [ -default default ] [ -bind bind_set_id | -bind bind_value_list ]
	  

Returns the first column of the result of SQL query sql. If sql doesn't return a row, returns default (or throws an error if default is unspecified). Analogous to database_to_tcl_string and database_to_tcl_string_or_null.

db_list
db_list statement-name sql [ -bind bind_set_id | -bind bind_value_list ]
	  

Returns a Tcl list of the values in the first column of the result of SQL query sql. If sql doesn't return any rows, returns an empty list. Analogous to database_to_tcl_list.

db_list_of_lists
db_list_of_lists statement-name sql [ -bind bind_set_id | -bind bind_value_list ]
	  

Returns a Tcl list, each element of which is a list of all column values in a row of the result of SQL query sql. If sql doesn't return any rows, returns an empty list. (Analogous to database_to_tcl_list_list.)

db_dml
db_dml statement-name sql \
    [ -bind bind_set_id | -bind bind_value_list ] \
    [ -blobs blob_list | -clobs clob_list |
      -blob_files blob_file_list | -clob_files clob_file_list ]
	  

Performs the DML or DDL statement sql.

If a length-n list of blobs or clobs is provided, then the SQL should return n blobs or clobs into the bind variables :1, :2, ... :n. blobs or clobs, if specified, should be a list of individual BLOBs or CLOBs to insert; blob_files or clob_files, if specified, should be a list of paths to files containing the data to insert. Only one of -blobs, -clobs, -blob_files, and -clob_files may be provided.

Example:


db_dml insert_photos "
        insert photos(photo_id, image, thumbnail_image)
        values(photo_id_seq.nextval, empty_blob(), empty_blob())
        returning image, thumbnail_image into :1, :2
    "  -blob_files [list "/var/tmp/the_photo" "/var/tmp/the_thumbnail"] 

	  

This inserts a new row into the photos table, with the contents of the files /var/tmp/the_photo and /var/tmp/the_thumbnail in the image and thumbnail columns, respectively.

db_write_clob, db_write_blob, db_blob_get_file
db_write_clob statement-name sql [ -bind bind_set_id | -bind bind_value_list ]

db_write_blob statement-name sql [ -bind bind_set_id | -bind bind_value_list ]

db_blob_get_file statement-name sql [ -bind bind_set_id | -bind bind_value_list ]
	  

Analagous to ns_ora write_clob/write_blob/blob_get_file.

db_release_unused_handles
	    db_release_unused_handles
	  

Releases any allocated, unused database handles.

db_transaction
db_transaction code_block [ on_error { code_block } ]
	  

Executes code_block transactionally. Nested transactions are supported (end transaction is transparently ns_db dml'ed when the outermost transaction completes). The db_abort_transaction command can be used to abort all levels of transactions. It is possible to specify an optional on_error code block that will be executed if some code in code_block throws an exception. The variable errmsg will be bound in that scope. If there is no on_error code, any errors will be propagated.

Example:


proc replace_the_foo { col } {
    db_transaction {
        db_dml "delete from foo"
        db_dml "insert into foo(col) values($col)"
    }
}

proc print_the_foo {} {
    doc_body_append "foo is [db_string "select col from foo"]<br>\n"
}

replace_the_foo 8
print_the_foo ; # Writes out "foo is 8"

db_transaction {
    replace_the_foo 14
    print_the_foo ; # Writes out "foo is 14"
    db_dml "insert into some_other_table(col) values(999)"
    ...
    db_abort_transaction
} on_error {
    doc_body_append "Error in transaction: $errmsg"
}
    

print_the_foo ; # Writes out "foo is 8"

	  
db_resultrows
db_resultrows
	  

Returns the number of rows affected or returned by the previous statement.

db_with_handle
db_with_handle var code_block
	  

Places a database handle into the variable var and executes code_block. This is useful when you don't want to have to use the new API (db_foreach, db_1row, etc.), but need to use database handles explicitly.

Example:


proc lookup_the_foo { foo } {
    db_with_handle db {
        return [db_string unused "select ..."]
    }
}

db_with_handle db {
    # Now there's a database handle in $db.
    set selection [ns_db select $db "select foo from bar"]
    while { [ns_db getrow $db $selection] } {
        set_variables_after_query

        lookup_the_foo $foo
    }
}

	  
db_nullify_empty_string
db_nullify_empty_string string
	  

For true SQL purists, we provide the convenience function db_nullify_empty_string, which returns [db_null] if its string argument is the empty string and can be used to encapsulate another Oracle quirk:


set baz ""

# Clean out the foo table
#
db_dml unused "delete from foo"

db_dml unused "insert into foo(baz) values('$baz')"

set n_rows [db_string unused "select count(*) from foo where baz is null"]
#
# $n_rows is 1; in effect, the "baz is null" criterion is matching
# the empty string we just inserted (because of Oracle's coercion
# quirk)

	  

To balance out this asymmetry, you can explicitly set baz to null by writing:


db_dml foo_insert "insert into foo(baz) values(:1)" {[db_nullify_empty_string $baz]}

	  

($Id: db-api.html,v 1.45 2009/07/12 01:08:26 donb Exp $)

Caching Database API Results

The database API allows for direct caching of query results. Repeated calls will return the cached value until it is either explicitly flushed using db_flush_cache, times out (configured the ns_cache is called to create the cache), or another cached query fills the cache, causing older entries to be flushed.

Values returned by a query are cached if you pass the "-cache_key" switch to the database procedure. The switch value will be used as the key in the ns_cache eval call used to execute the query and processing code. The db_flush proc should be called to flush the cache when appropriate. The "-cache_pool" parameter can be used to specify the cache pool to be used, and defaults to db_cache_pool. The size of the default cache is governed by the kernel parameter "DBCacheSize" in the "caching" section.

Currently db_string, db_list, db_list_of_lists, db_1row, db_0or1row, and db_multirow support caching.

For caching to be effective, one must carefully design a cache_pool and cache_key strategy that uniquely identifies a query within the system, including the relevant objects being referenced by the query. Typically a cache_key should include one or more object_ids and a name that identifies the operation being done.

Here is an example from the layout-manager package:


# Query to return the elements of a page as a list. The prefix "page_" is used to denote
# that this is a page-related query, page_id is used to uniquely identify the query
# by object, and the suffix uniquely defines the operation being performed on the
# page object.

db_list -cache_key page_${page_id}_get_elements get_elements {}

# When the contents of a page are changed, we flush all page-related queries for the given
# page object using db_flush_cache.

db_flush_cache -cache_key_pattern page_${page_id}_*

    
View comments on this page at openacs.org