Index: openacs-4/packages/acs-templating/www/doc/index.html =================================================================== RCS file: /usr/local/cvsroot/openacs-4/packages/acs-templating/www/doc/index.html,v diff -u -r1.2 -r1.3 --- openacs-4/packages/acs-templating/www/doc/index.html 10 Sep 2002 13:41:04 -0000 1.2 +++ openacs-4/packages/acs-templating/www/doc/index.html 27 Aug 2003 14:08:12 -0000 1.3 @@ -18,6 +18,11 @@ What the template system should do for you. + + Noquote + A revision in 5.0 that escapes all html codes by default. + + Design Gets more specific and discusses the way the templating system integrates with ACS. Gory details. @@ -35,6 +40,11 @@ Template markup tag reference + + Using Noquote + Upgrading and writing new pages with noquote. + + Developer Guide   API for programming the TCL part of a page Index: openacs-4/packages/acs-templating/www/doc/no-quote-upgrade.html =================================================================== RCS file: /usr/local/cvsroot/openacs-4/packages/acs-templating/www/doc/no-quote-upgrade.html,v diff -u --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ openacs-4/packages/acs-templating/www/doc/no-quote-upgrade.html 27 Aug 2003 14:08:12 -0000 1.1 @@ -0,0 +1,393 @@ + + + + Upgrading existing ADPs to noquote templating + + + +

Upgrading existing ADPs to noquote templating

+ +

Introduction.

+ + The variable substitution in the templating has been changed to + become more friendly towards quoting. The rationale for the + change and the definition of terms like quoting are present + in the quoting article. As it discusses these + concepts in some depths, we see no reason to repeat them here. + Instead, we will assume that you have read the previous article + and focus on the topic of this one: the changes you need to apply + to make your module conformant to the new quoting rules. + +

+ This text is written as a result of our efforts to make the ACS + installation for the German Bank project work, therefore it is + based on field experience rather than academic discussion. We + hope you will find it useful. + +

Recap of the Theory.

+ + The change to the templating system can be expressed in one + sentence: + +
+ All variables are now quoted by default, except those explicitly + protected by ;noquote. +
+ + This means that the only way your code can fail is if the new code + quotes a variable which is not meant to be quoted. Which is where + ;noquote needs to be added. That's all porting effort + that is required. + +

+ This is not hard because most variables will not be affected by + this change. Most variables either need to be quoted (those + containing textual data that comes from the database or from the + user) or are unaffected by quoting (numerical database IDs, + etc.) The variables where this behavior is undesired are + those that contain HTML which is expected to be included + as part of the page, and those that are already quoted by + Tcl code. Such variables should be protected from quoting by + the ;noquote modifier. + +

+ +

The Most Common Cases.

+ + The most common cases where you need to add ;noquote to + the variable name are easy to recognize and identify. + +

+ Hidden form variables. +
+ Also known as "hidden input fields", hidden form variables are + form fields with pre-defined values which are not shown to the + user. These days they are used for transferring internal state + across several form pages. In HTML, hidden form variables look + like this: + +

+
+<form>
+  <input name=var1 value="value1">
+  <input name=var2 value="value2">
+  ... real form stuff ...
+</form>
+      
+
+ + ACS has a convenience function for creating hidden form variables, + export_form_vars. It accepts a list of variables and + returns the HTML code containing the hidden input tags that map + variable names to variable values, as found in the Tcl + environment. In that case, the Tcl code would set the HTML code + to a variable: + +
+
+set form_vars [export_form_vars var1 var2]
+      
+
+ + The ADP will simply refer to the form_vars variable: + +
+
+<form>
+  @form_vars@              <!-- WRONG!  Needs noquote -->
+  ... real form stuff ...
+</form>
+      
+
+ + This will no longer work as intended because form_vars + will be, like any other variable, quoted, and the user will end up + seeing raw HTML text of the hidden variables. Even worse, the + browser will not be aware of these form fields, and the page will + not work. After protecting the variable with ;noquote, + everything works as expected: + +
+
+<form>
+  @form_vars;noquote@
+  ... real form stuff ...
+</form>
+      
+
+ +

+ Snippets of HTML produced by Tcl code, aka + widgets. +
+ Normally we try to fit all HTML code into the ADP template and + have the Tcl code handle the "logic" of the program. And yet, + sometimes pieces of relatively convoluted HTML need to be + included in many templates. In such cases, it makes sense to + generate the widget programmatically and include it into + the template as a variable. A typical widget is a date entry + widget which provides the user the input and selection boxes for + year, month, and day, all of which default to the current date. +

+ Another example of widgets is the context bar often found + on top of ACS pgages. +

+ Obviously, all widgets should be treated as HTML and therefore + adorned with the ;noquote qualifier. This also assumes + that the routines that build the widget are correctly + written and that they will quote the components used to + build the widget. + +

+ Pieces of text that are already quoted. +
+ This quoting is usually part of a more general preparation for + HTML rendering of the text. For instance, a bboard posting can + be either HTML or text. If it is HTML, we transmit it as is; if + not, we perform quoting, word-wrapping, etc. In both cases it + is obvious that quoting performed by the templating system would + be redundant, so we must be careful to add ;noquote to + the ADP. + +

The property and include Gotchas.

+ + Transfer of parameters between included ADPs often requires manual + addition of ;noquote. Let's review why. +

+ The property tag is used to pass a piece of information + to the master template. This is used by the ADP whose writer + consciously chose to let the master template handle a variable + given by the Tcl code. Typically page titles, headings, and + context bars are handled this way. For example: + +

+ master: +
+<head>
+  <title>@title@</title>
+</head>
+<body bgcolor="#ffffff">
+  <h1>@heading@</h1>
+  <slave>
+</body>
+      
+ slave: +
+<master>
+<property name="title">@title@</property>
+<property name="heading">@title@</property>
+...
+      
+
+ + The obvious intention of the master is to allow its slave + templates to provide a "title" and a "heading" of the page in a + standardized fashion. The obvious intention of our slave template + is to allow its corresponding Tcl code to set a single variable, + title, which will be used for both title and heading. + What's wrong with this code? + +

+ The problem is that title gets quoted twice, once by + the slave template, and once by the master template. This is + the result of how the templating system works: every + occurrence of @variable@ is converted to + [ad_quotehtml $variable], even when it + is used only to set a property and you would expect the quoting + to be suppressed. + +

+

+ + Implementation note: Ideally, the templating system should + avoid this pitfall by quoting the variable (or not) only once, + at the point where the value is passed from the Tcl code to + the templating system. However, no such point in time exists + because what in fact happens is that the template gets compiled + into code that simply takes what it needs from the + environment and then does the quoting. Properties are + passed to the master so that all the property variables are + shoved into an environment; by the time the master template is + executed, all information on which variable came from where + and whether it might have already been quoted is lost. + +
+ +

+ This occurrence is often referred to as over-quoting. + Over-quoting is sometimes hard to detect because things seem to + work fine in most cases. To notice the problem in the example + above (and in any other over-quoting example), the title needs + to contain one of the characters <, > or + &. If it does, they will appear quoted to the user + instead of appearing as-is. + +

+ Over-quoting is resolved by adding ;noquote to one of + the variables. We strongly recommend that you add + ;noquote inside the property tag rather than + in the master. The reason is that, first, it makes sense to do + so because conceptually the master is the one that "shows" the + variable, so it makes sense that it gets to quote it. Secondly, + a property tag is supposed to merely transfer + a piece of text to the master; it is much cleaner and more + maintainable if this transfer is defined to be non-lossy. This + becomes important in practice when there is a hierarchy of + master templates -- e.g. one for the package and one + for the whole site. + +

+ To reiterate, a bug-free version of the slave template looks + like this: + +

+ slave sans over-quoting: +
+<master>
+<property name="title">@title;noquote@</property>
+<property name="heading">@title;noquote@</property>
+...
+      
+
+ +

+ The exact same problems when the include statement + passes some text. Here is an example: + +

+ Including template: +
+<include src="user-kick-form" id=@kicked_id@ reason=@default_reason@>
+      
+ Included template: +
+<form action="do-kick" method=POST>
+  Kick user @name@.<br>
+  Reason: <textarea name=reason>@reason@</textarea><br>
+  <input type=submit value="Kick">
+</form>
+      
+
+ + Here an include statement is used to include an HTML form widget + parts of which are defined with Tcl variables $id and + $default_reason whose values presumably come from the + database. + +

+ What happens is that reason that prefills the + textarea is over-quoted. The reasons are the same as + in the last example: it gets quoted once by the includer, and + the second time by the included page. The fix is also similar: + when you transfer non-constant text to an included page, make + sure to add ;noquote. + +

+ Including template, sans over-quoting: +
+<include src="user-kick-form" id=@kicked_id@ reason=@default_reason;noquote@>
+      
+
+ +

Upgrade Overview.

+ + Upgrading a module to handle the new quoting rules consists of + applying the process mentioned above to every ADP in the module. + Using the knowledge gained above, we can specify exactly what + needs to be done for each template. The items are sorted + approximately by frequency of occurrence of the problem. + +
    +
  1. + Audit the template for variables that export form variables + and add ;noquote to them. +

    +

  2. + More generally, audit the template for variables that are + known to contain HTML, e.g. those that contain widgets or HTML + content provided by the user. Add ;noquote to them. +

    +

  3. + Add ;noquote to variables used inside the + property tag. +

    +

  4. + Add ;noquote to textual variables whose values are + attributes to the include tag. +

    +

  5. + Audit the template for occurrences of + <%= [ad_quotehtml @variable@] => + and replace them with @variable@. +

    +

  6. + Audit the Tcl code for occurrences of ad_quotehtml. + If it is used to build an HTML component, leave it, but take + note of the variable the result gets saved to. Otherwise, + remove the quoting. +

    +

  7. + Add ;noquote to the "HTML component" variables noted + in the previous step. +
+ + After that, test that the template behaves as it should, and + you're done. + +

Testing.

+ + Fortunately, most of the problems with automatic quoting are very + easy to diagnose. The most important point for testing is that it + covers as many cases as possible: ideally testing should cover all + the branches in all the templates. But regardless of the quality + of your coverage, it is important to know how to conduct proper + testing for the quoting changes. Here are the cases you need to + watch out for. + + + +
+
Hrvoje Niksic
+ + +Last modified: Mon Oct 7 12:27:47 CEST 2002 + + + Index: openacs-4/packages/acs-templating/www/doc/noquote.html =================================================================== RCS file: /usr/local/cvsroot/openacs-4/packages/acs-templating/www/doc/noquote.html,v diff -u --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ openacs-4/packages/acs-templating/www/doc/noquote.html 27 Aug 2003 14:08:12 -0000 1.1 @@ -0,0 +1,209 @@ + +HTMLQuoting as Part of the Templating System - Requirements

HTMLQuoting as Part of the Templating System - Requirements


The Templating System.

+ Templating systems, as deployed by most web software, serve to + distinguish the programming logic of the system from the + presentation that is output to the user. +

+ Before introduction of a templating systems to ACS, pages were + built by outputting HTML text directly from Tcl code. + Therefore it was hard for a designer or a later reviewer to + change the appearance of the page. "Change the color of the + table? How do I do that when I cannot even find the body tag?" + At this point some suggest to embed Tcl code in the document + rather than the other way around, like PHP does. But it + doesn't solve the problem, because the code is still tightly + coupled with the markup, requiring programmer-level + understanding for every change. The only workable solution is + to try to uncouple the presentation from the design as much as + possible. +

ACS 4.0 addressed the problem by introducing a +custom-written templating system loosely based on the already-present +capabilities of the AolServer, the ADP pages. Unlike the ADP system, +which allowed the coder to register his own tags to encapsulate +often-used functionality, the new templating system came with a +pre-programmed set of tags that performed the basic transformations +needed to process the page, and some additional value. +

Comparing ACS templating to other templating systems, it +is my impression that the former was designed to be useful in real +life rather than minimalistic -- which is only makes sense given the +tight deadlines most ArsDigita projects have to face. Besides the if +tag, multiple tag and @variable@ variable substitution, which are +sufficient to implement any template-based page, it also includes +features like including one template in another, customizing site- or +module-wide look using the master templates, directly importing query +results to the template, facilities for building grid-tables, and +more. This utilitarian approach to templating urges us to consider the +quoting issues as integral part of the system. +

Quoting. +

+ In the context of HTML, we define quoting as transforming text + in such a way that the HTML-rendered version of the transformed + text is identical to the original text. Thus one way to quote + the text "<i>" is to transform it to + "&lt;i&gt;". When a browser renders the transformed + text, entities &lt; and &gt; are converted back to < + and >, which makes the rendered version of the transformation + equal to the original. +

The easiest way to guarantee + correct transformation in all cases is to "escape" ("quote") all + the characters that HTML considers special. In the minimalistic + case, it is enough to transform &, <, and > into their + quoted equivalents, &amp;, &lt;, and &gt; + respectively. For additional usefulness in quoted fields, it's a + good idea to also quote double and single quotes into &quot; + and &#39; respectively. +

All of this assumes that the text to be quoted is not +meant to be rendered as HTML in the first place. So if your text +contains "<i>word</i>", and you expect the word to show up +in italic, you should not quote that entire string. However, if word +in fact comes from the database and you don't want it to, for +instance, close the <i> behind your back, you should quote it, +and then enclose it between <i> and </i>. +

The ACS has a procedure that performs HTML quoting, +ad_quotehtml. It accepts the string that needs to be quoted, and +returns the quoted string. In ACS 3.x, properly written code was +expected to call ad_quotehtml every time it published a string to a +web page. For example: +

+doc_body_append "<ul>\n" set db [ns_db gethandle] set selection
+[ns_db select $db {SELECT name FROM bboard_forums}] while {[ns_db
+getrow $db $selection]} { set_variables_after_query doc_body_append
+"<li>Forum: <tt>[ad_quotehtml $name]</tt>\n" }
+doc_body_append "</ul>\n"
+

Obviously, this was very error-prone, and more often than not, + the programmers would forget to quote the variables that come + from the database or from the user. This would "usually" work, + but in some cases it would lead to broken pages and even pose a + security problem. For instance, one could imagine a + mathematicians' forum being named "0 < 1", or an HTML + designers' forum being named "The Woes of <h1>".

In + some cases the published variable must not be quoted. Examples + for that are the bboard postings that are posted in HTML, or + variables containing the result of export_form_vars. All in all, + the decision about when to quote had to be made by the + programmer on a case-by-case basis, and many programmers simply + enjoyed the issue because the resulting code happened to work in + 95% of the cases. +

Then came ACS 4. One hoped that ACS 4, with its advanced templating system, would + provide an easy and obvious solution for the (lack of) quoting + problem. It turned out that this did not happen, partly because + no easy solution exists, and partly because the issue was + ignored or postponed. +

Let's review the ACS 3.x code from + above. The most important change is that it comes in two parts: + the presentation template, and the programming logic code. The + template will look like this: +

+<ul> <multiple name=forums> <li>Forum:
+  <tt>@forums.name@</tt> </multiple> </ul>
+

Once you understand the (simple) workings of the multiple tag, + this version strikes you as much more readable than the old + one. But we're not done yet: we need to write the Tcl code that + grabs forum names from the database. The db_multirow proc is + designed exactly for this; it retrieves rows from the database + and assigns variables from each row to template variables in + each pass of a multiple of our choice. +

+db_multirow forums get_forum_names { SELECT name FROM forums }
+

+ At this point the careful reader will wonder at which point the + forum name gets quoted, and if so, how does the templating + system know whether the forum name needs to be quoted or not? + The answer is amazingly blunt: no quoting happens anywhere in + the process. If a forum name contains HTML special characters, + you have a problem. +

There are two remedies for this + situation, and neither is particularly appealing. One can + rewrite the nice db_multirow with a db_foreach loop, manually + create a multirow, and feed it the quoted data in the loop. That + is ugly and error-prone because it is more typing and it + requires you to explicitly name the variables you wish to export + at several points. It is exactly the kind of ugly code that + db_multirow was designed to avoid. +

The alternative approach means less typing, but it's even +uglier in its own subtle way. The trick is to remember that our +templating still supports all the ADP features, including embedding +Tcl code in the template. Thus instead of referring to the multirow +variable with the @forums.name@ variable substitutions, we use +<%=�[ad_quotehtml�@forums.name@]�%>. This works +correctly, but obviously breaks the abstraction barrier between ADP +and Tcl syntaxes. The practical result of breaking the abstraction is +that every occurrence of Tcl code in an ADP template will have to be +painstakingly reviewed and converted once ADPs start being invoked by +Java code rather than Tcl. +

At this point, most programmers simply give up and +don't quote their variables at all +. Quoting is +handled only in the areas where it is really crucial and where not +handling it would quote immediate and visible breakage, such as in the +case of displaying the bodies of bboard articles. This is not +exaggeration; it has been proven by auditing the ACS 4.0, both +manually and through grepping for ad_quotehtml. Strangely, this +otherwise sad fact allows us to deploy a very radical but much more +robust solution to the problem. +

Quote Always, Except When Told Not to. +

+ At the time when we came to realize how serious the quoting + deficiencies of ACS 4.0 were, we were about two weeks away from + the release of a project for the German Bank. There was simply + no time to hunt all the places where a variable needs to be + quoted and implement one of the above quoting tricks. +

While examining the ADPs, we noticed that most substituted + variable fall into one of three categories: +

  1. Those that need to be quoted -- names and + descriptions of objects, and in general stuff that + ultimately comes from the user. + +

  2. Those for which it doesn't make a difference whether + they are quoted or not -- e.g. all the database IDs. + +

  3. Those that must not be quoted -- e.g. exported form + vars stored to a variable. + +

  4. Finally we also remembered the fact that almost none of the + variables are quoted in the current source base.

Our + reasoning went further: if it is a fact that most variables + are not quoted, and if the majority of variables either + require quoting or are not harmed by it, then we are in a much + better position if we make the templating system + +quote all variables + by default! That way + the variables from the first and the second category will be + handled correctly, and the variables from the third category + will need to be marked as noquote to function correctly. But + even those should not be a problem, because HTML code that + ends up quoted in the page is immediately visible, and all you + need to do to fix it is add the marker. +

We decided to test whether the idea will work by +attempting to convert our system to work that way. I spent several +minutes making the change to the templating system. Then we went +through all the ADPs and replaced the instances of @foo@ where foo +contained HTML code with @foo;noquote@. +

The change took two people less than one day for the +system that consisted of core ACS 4.0.1, and modules bboard, news, +chat, and bookmarks. (We were also doing other things, so it's hard to +measure correctly.) During two of the following days, we would find a +broken page from time to time, typically by spotting the obviously +visible HTML markup. Such a page would get fixed it in a matter of +seconds by appending ;noquote to the name of the offending variable. +

We launched successfully within schedule. +

Porting the quoting changes to the ACS. +

+After some discussion, it was decided that these changes will be +included into the next ACS release. Since the change is incompatible, +it will be announced to module owners and the general +public. Explanation on how to port your existing modules and the +"gotchas" that one can expect follows in a +separate document +. +

The discussion about speed, i.e. benchmarking results +before and after the change, is +also available +. +

+ +Hrvoje Niksic + + +

View comments on this page at openacs.org