Index: openacs-4/packages/acs-templating/www/doc/index.html =================================================================== RCS file: /usr/local/cvsroot/openacs-4/packages/acs-templating/www/doc/index.html,v diff -u -r1.2 -r1.3 --- openacs-4/packages/acs-templating/www/doc/index.html 10 Sep 2002 13:41:04 -0000 1.2 +++ openacs-4/packages/acs-templating/www/doc/index.html 27 Aug 2003 14:08:12 -0000 1.3 @@ -18,6 +18,11 @@
+ This text is written as a result of our efforts to make the ACS + installation for the German Bank project work, therefore it is + based on field experience rather than academic discussion. We + hope you will find it useful. + +
+ All variables are now quoted by default, except those explicitly + protected by ;noquote. ++ + This means that the only way your code can fail is if the new code + quotes a variable which is not meant to be quoted. Which is where + ;noquote needs to be added. That's all porting effort + that is required. + +
+ This is not hard because most variables will not be affected by + this change. Most variables either need to be quoted (those + containing textual data that comes from the database or from the + user) or are unaffected by quoting (numerical database IDs, + etc.) The variables where this behavior is undesired are + those that contain HTML which is expected to be included + as part of the page, and those that are already quoted by + Tcl code. Such variables should be protected from quoting by + the ;noquote modifier. + +
+ +
+ Hidden form variables.
+
+ Also known as "hidden input fields", hidden form variables are
+ form fields with pre-defined values which are not shown to the
+ user. These days they are used for transferring internal state
+ across several form pages. In HTML, hidden form variables look
+ like this:
+
+
++ + ACS has a convenience function for creating hidden form variables, + export_form_vars. It accepts a list of variables and + returns the HTML code containing the hidden input tags that map + variable names to variable values, as found in the Tcl + environment. In that case, the Tcl code would set the HTML code + to a variable: + ++<form> + <input name=var1 value="value1"> + <input name=var2 value="value2"> + ... real form stuff ... +</form> ++
++ + The ADP will simply refer to the form_vars variable: + ++set form_vars [export_form_vars var1 var2] ++
++ + This will no longer work as intended because form_vars + will be, like any other variable, quoted, and the user will end up + seeing raw HTML text of the hidden variables. Even worse, the + browser will not be aware of these form fields, and the page will + not work. After protecting the variable with ;noquote, + everything works as expected: + ++<form> + @form_vars@ <!-- WRONG! Needs noquote --> + ... real form stuff ... +</form> ++
++ ++<form> + @form_vars;noquote@ + ... real form stuff ... +</form> ++
+ Snippets of HTML produced by Tcl code, aka
+ widgets.
+
+ Normally we try to fit all HTML code into the ADP template and
+ have the Tcl code handle the "logic" of the program. And yet,
+ sometimes pieces of relatively convoluted HTML need to be
+ included in many templates. In such cases, it makes sense to
+ generate the widget programmatically and include it into
+ the template as a variable. A typical widget is a date entry
+ widget which provides the user the input and selection boxes for
+ year, month, and day, all of which default to the current date.
+
+ Another example of widgets is the context bar often found + on top of ACS pgages. +
+ Obviously, all widgets should be treated as HTML and therefore + adorned with the ;noquote qualifier. This also assumes + that the routines that build the widget are correctly + written and that they will quote the components used to + build the widget. + +
+ Pieces of text that are already quoted.
+
+ This quoting is usually part of a more general preparation for
+ HTML rendering of the text. For instance, a bboard posting can
+ be either HTML or text. If it is HTML, we transmit it as is; if
+ not, we perform quoting, word-wrapping, etc. In both cases it
+ is obvious that quoting performed by the templating system would
+ be redundant, so we must be careful to add ;noquote to
+ the ADP.
+
+
+ The property tag is used to pass a piece of information + to the master template. This is used by the ADP whose writer + consciously chose to let the master template handle a variable + given by the Tcl code. Typically page titles, headings, and + context bars are handled this way. For example: + +
+ master: ++ + The obvious intention of the master is to allow its slave + templates to provide a "title" and a "heading" of the page in a + standardized fashion. The obvious intention of our slave template + is to allow its corresponding Tcl code to set a single variable, + title, which will be used for both title and heading. + What's wrong with this code? + ++<head> + <title>@title@</title> +</head> +<body bgcolor="#ffffff"> + <h1>@heading@</h1> + <slave> +</body> ++ slave: ++<master> +<property name="title">@title@</property> +<property name="heading">@title@</property> +... ++
+ The problem is that title gets quoted twice, once by + the slave template, and once by the master template. This is + the result of how the templating system works: every + occurrence of @variable@ is converted to + [ad_quotehtml $variable], even when it + is used only to set a property and you would expect the quoting + to be suppressed. + +
+
+ + Implementation note: Ideally, the templating system should + avoid this pitfall by quoting the variable (or not) only once, + at the point where the value is passed from the Tcl code to + the templating system. However, no such point in time exists + because what in fact happens is that the template gets compiled + into code that simply takes what it needs from the + environment and then does the quoting. Properties are + passed to the master so that all the property variables are + shoved into an environment; by the time the master template is + executed, all information on which variable came from where + and whether it might have already been quoted is lost. + ++ +
+ This occurrence is often referred to as over-quoting. + Over-quoting is sometimes hard to detect because things seem to + work fine in most cases. To notice the problem in the example + above (and in any other over-quoting example), the title needs + to contain one of the characters <, > or + &. If it does, they will appear quoted to the user + instead of appearing as-is. + +
+ Over-quoting is resolved by adding ;noquote to one of + the variables. We strongly recommend that you add + ;noquote inside the property tag rather than + in the master. The reason is that, first, it makes sense to do + so because conceptually the master is the one that "shows" the + variable, so it makes sense that it gets to quote it. Secondly, + a property tag is supposed to merely transfer + a piece of text to the master; it is much cleaner and more + maintainable if this transfer is defined to be non-lossy. This + becomes important in practice when there is a hierarchy of + master templates -- e.g. one for the package and one + for the whole site. + +
+ To reiterate, a bug-free version of the slave template looks + like this: + +
+ slave sans over-quoting: ++ ++<master> +<property name="title">@title;noquote@</property> +<property name="heading">@title;noquote@</property> +... ++
+ The exact same problems when the include statement + passes some text. Here is an example: + +
+ Including template: ++ + Here an include statement is used to include an HTML form widget + parts of which are defined with Tcl variables $id and + $default_reason whose values presumably come from the + database. + ++<include src="user-kick-form" id=@kicked_id@ reason=@default_reason@> ++ Included template: ++<form action="do-kick" method=POST> + Kick user @name@.<br> + Reason: <textarea name=reason>@reason@</textarea><br> + <input type=submit value="Kick"> +</form> ++
+ What happens is that reason that prefills the + textarea is over-quoted. The reasons are the same as + in the last example: it gets quoted once by the includer, and + the second time by the included page. The fix is also similar: + when you transfer non-constant text to an included page, make + sure to add ;noquote. + +
+ Including template, sans over-quoting: ++ ++<include src="user-kick-form" id=@kicked_id@ reason=@default_reason;noquote@> ++
+
+
+
+
+
+
+
+ To get rid of over-quoting, make sure that the variables + don't get quoted in transport, such as in the + property tag or as an attribute of the + include tag. Also, make sure that your Tcl code is + not quoting the variable name. + +
+ To get rid of under-quoting, make sure that your variable + gets quoted exactly once. This can be achieved either by + removing a (presumably overzealous) ;noquote or by + quoting the string from Tcl. The latter is necessary when + building HTML components, such as a context bar, from + strings that come from the database or from the user. +
+ Templating systems, as deployed by most web software, serve to + distinguish the programming logic of the system from the + presentation that is output to the user. +
+ Before introduction of a templating systems to ACS, pages were + built by outputting HTML text directly from Tcl code. + Therefore it was hard for a designer or a later reviewer to + change the appearance of the page. "Change the color of the + table? How do I do that when I cannot even find the body tag?" + At this point some suggest to embed Tcl code in the document + rather than the other way around, like PHP does. But it + doesn't solve the problem, because the code is still tightly + coupled with the markup, requiring programmer-level + understanding for every change. The only workable solution is + to try to uncouple the presentation from the design as much as + possible. +
ACS 4.0 addressed the problem by introducing a +custom-written templating system loosely based on the already-present +capabilities of the AolServer, the ADP pages. Unlike the ADP system, +which allowed the coder to register his own tags to encapsulate +often-used functionality, the new templating system came with a +pre-programmed set of tags that performed the basic transformations +needed to process the page, and some additional value. +
Comparing ACS templating to other templating systems, it +is my impression that the former was designed to be useful in real +life rather than minimalistic -- which is only makes sense given the +tight deadlines most ArsDigita projects have to face. Besides the if +tag, multiple tag and @variable@ variable substitution, which are +sufficient to implement any template-based page, it also includes +features like including one template in another, customizing site- or +module-wide look using the master templates, directly importing query +results to the template, facilities for building grid-tables, and +more. This utilitarian approach to templating urges us to consider the +quoting issues as integral part of the system. +
+ In the context of HTML, we define quoting as transforming text + in such a way that the HTML-rendered version of the transformed + text is identical to the original text. Thus one way to quote + the text "<i>" is to transform it to + "<i>". When a browser renders the transformed + text, entities < and > are converted back to < + and >, which makes the rendered version of the transformation + equal to the original. +
The easiest way to guarantee + correct transformation in all cases is to "escape" ("quote") all + the characters that HTML considers special. In the minimalistic + case, it is enough to transform &, <, and > into their + quoted equivalents, &, <, and > + respectively. For additional usefulness in quoted fields, it's a + good idea to also quote double and single quotes into " + and ' respectively. +
All of this assumes that the text to be quoted is not +meant to be rendered as HTML in the first place. So if your text +contains "<i>word</i>", and you expect the word to show up +in italic, you should not quote that entire string. However, if word +in fact comes from the database and you don't want it to, for +instance, close the <i> behind your back, you should quote it, +and then enclose it between <i> and </i>. +
The ACS has a procedure that performs HTML quoting, +ad_quotehtml. It accepts the string that needs to be quoted, and +returns the quoted string. In ACS 3.x, properly written code was +expected to call ad_quotehtml every time it published a string to a +web page. For example: +
+doc_body_append "<ul>\n" set db [ns_db gethandle] set selection +[ns_db select $db {SELECT name FROM bboard_forums}] while {[ns_db +getrow $db $selection]} { set_variables_after_query doc_body_append +"<li>Forum: <tt>[ad_quotehtml $name]</tt>\n" } +doc_body_append "</ul>\n" +
Obviously, this was very error-prone, and more often than not, + the programmers would forget to quote the variables that come + from the database or from the user. This would "usually" work, + but in some cases it would lead to broken pages and even pose a + security problem. For instance, one could imagine a + mathematicians' forum being named "0 < 1", or an HTML + designers' forum being named "The Woes of <h1>".
In + some cases the published variable must not be quoted. Examples + for that are the bboard postings that are posted in HTML, or + variables containing the result of export_form_vars. All in all, + the decision about when to quote had to be made by the + programmer on a case-by-case basis, and many programmers simply + enjoyed the issue because the resulting code happened to work in + 95% of the cases. +
Then came ACS 4. One hoped that ACS 4, with its advanced templating system, would + provide an easy and obvious solution for the (lack of) quoting + problem. It turned out that this did not happen, partly because + no easy solution exists, and partly because the issue was + ignored or postponed. +
Let's review the ACS 3.x code from + above. The most important change is that it comes in two parts: + the presentation template, and the programming logic code. The + template will look like this: +
+<ul> <multiple name=forums> <li>Forum: + <tt>@forums.name@</tt> </multiple> </ul> +
Once you understand the (simple) workings of the multiple tag, + this version strikes you as much more readable than the old + one. But we're not done yet: we need to write the Tcl code that + grabs forum names from the database. The db_multirow proc is + designed exactly for this; it retrieves rows from the database + and assigns variables from each row to template variables in + each pass of a multiple of our choice. +
+db_multirow forums get_forum_names { SELECT name FROM forums } +
+ At this point the careful reader will wonder at which point the + forum name gets quoted, and if so, how does the templating + system know whether the forum name needs to be quoted or not? + The answer is amazingly blunt: no quoting happens anywhere in + the process. If a forum name contains HTML special characters, + you have a problem. +
There are two remedies for this + situation, and neither is particularly appealing. One can + rewrite the nice db_multirow with a db_foreach loop, manually + create a multirow, and feed it the quoted data in the loop. That + is ugly and error-prone because it is more typing and it + requires you to explicitly name the variables you wish to export + at several points. It is exactly the kind of ugly code that + db_multirow was designed to avoid. +
The alternative approach means less typing, but it's even +uglier in its own subtle way. The trick is to remember that our +templating still supports all the ADP features, including embedding +Tcl code in the template. Thus instead of referring to the multirow +variable with the @forums.name@ variable substitutions, we use +<%=�[ad_quotehtml�@forums.name@]�%>. This works +correctly, but obviously breaks the abstraction barrier between ADP +and Tcl syntaxes. The practical result of breaking the abstraction is +that every occurrence of Tcl code in an ADP template will have to be +painstakingly reviewed and converted once ADPs start being invoked by +Java code rather than Tcl. +
At this point, most programmers simply give up and +don't quote their variables at all +. Quoting is +handled only in the areas where it is really crucial and where not +handling it would quote immediate and visible breakage, such as in the +case of displaying the bodies of bboard articles. This is not +exaggeration; it has been proven by auditing the ACS 4.0, both +manually and through grepping for ad_quotehtml. Strangely, this +otherwise sad fact allows us to deploy a very radical but much more +robust solution to the problem. +
+ At the time when we came to realize how serious the quoting + deficiencies of ACS 4.0 were, we were about two weeks away from + the release of a project for the German Bank. There was simply + no time to hunt all the places where a variable needs to be + quoted and implement one of the above quoting tricks. +
While examining the ADPs, we noticed that most substituted + variable fall into one of three categories: +
Those that need to be quoted -- names and + descriptions of objects, and in general stuff that + ultimately comes from the user. + +
Those for which it doesn't make a difference whether + they are quoted or not -- e.g. all the database IDs. + +
Those that must not be quoted -- e.g. exported form + vars stored to a variable. + +
Finally we also remembered the fact that almost none of the + variables are quoted in the current source base.
Our + reasoning went further: if it is a fact that most variables + are not quoted, and if the majority of variables either + require quoting or are not harmed by it, then we are in a much + better position if we make the templating system + +quote all variables + by default! That way + the variables from the first and the second category will be + handled correctly, and the variables from the third category + will need to be marked as noquote to function correctly. But + even those should not be a problem, because HTML code that + ends up quoted in the page is immediately visible, and all you + need to do to fix it is add the marker. +
We decided to test whether the idea will work by +attempting to convert our system to work that way. I spent several +minutes making the change to the templating system. Then we went +through all the ADPs and replaced the instances of @foo@ where foo +contained HTML code with @foo;noquote@. +
The change took two people less than one day for the +system that consisted of core ACS 4.0.1, and modules bboard, news, +chat, and bookmarks. (We were also doing other things, so it's hard to +measure correctly.) During two of the following days, we would find a +broken page from time to time, typically by spotting the obviously +visible HTML markup. Such a page would get fixed it in a matter of +seconds by appending ;noquote to the name of the offending variable. +
We launched successfully within schedule. +
+After some discussion, it was decided that these changes will be +included into the next ACS release. Since the change is incompatible, +it will be announced to module owners and the general +public. Explanation on how to port your existing modules and the +"gotchas" that one can expect follows in a +separate document +. +
The discussion about speed, i.e. benchmarking results +before and after the change, is +also available +. +
+ +Hrvoje Niksic + + +