Timing a Templated Page

I. Introduction

One of the requirements for the template system asks for efficiency:

110.0 Performance
The Templating System must not cause any performance problems to a site. It must be fast and efficient, and it must not slow down page load speed by more than 10% versus a TCL page with inline HTML.

This page documents the attempt to verify this requirement.

II. Methods

I wrote a sample page for this test. It expands four real numbers into continued fractions. I created three versions:

chain-frac-0, a TCL page with inline HTML,
chain-frac-1, a templated page, i.e. a TCL and an HTML file, and
chain-frac-2, an ADP page that simply <include>s chain-frac-1.

The reason for creating chain-frac-2.adp is that in this way, the script chain-frac-1.tcl is handled inside the templating system, and hence loaded once and cached. There is hope that this might be faster.

Normally, the templating system re-reads a file whenever the version on disk is out of date. ADP pages are compiled to TCL, and both ADP and TCL pages are cached as TCL procs. The parameter RefreshCache in section template can be set to always or never to affect the cacheing strategy; the latter may be useful for a production site. All timing is carried out for the three settings always, normal, and never; the associated variable is called check.

I created a script in e-Tester that requests the three pages from my development server on dev0103-001. One timing of requesting a page isn't very expressive. A lot of factors affect the page load time. To offset these and get better results, I repeatedly request the pages. For the timing, I have e-Tester iterate this script 200 times. To compesate for varying load on the machine, i ran the iteration twice for each setting of RefreshCache at different times of the day.

The timing information is taken from the error log file entries that the request processor produces with parameter LogDebugP=1. For finer granularity I changed rp_debug to divide the clock clicks (microsecond) difference by 1000.0 instead of 1000. Delivering a page gives us 15 log file entries with timestamps. I treat the first one (? ms) as 0. There must be no other page requests from this AOLserver during the measurement. I note the length of the error log before and after one run of the script. Afterwards I cut out the error log sections indicated by these positions into files never, normal, and always.

The following steps extract the relevant information and bring it in a form suitable for gnuplot.

Extract time from log file sections. This is done in tcsh.

foreach i ( never normal always )
  fgrep '] Notice: RP (' $i                        >  $i-0
  echo                                             >  $i-1
  foreach conn (`cut -d- -f2 always-0 | sort -u`)
    echo "$i '$conn'"
    fgrep "[-$conn-]" $i-0 | cut -d: -f5-          >> $i-1
  end
  cut -d" " -f3 $i-1| cut -c2-| tr \? 0            >  $i-2
end

The three resulting files, ending in -2, contain 18000 numbers, for 2 runs * 200 tries * 3 pages * 15 stages.

Group and sort times. The time one stage of processing takes is given by the difference of two time adjacent entries. This defines stage00 to stage13; I keep the total time in stage14. This is done by the TCL script dev0103-001:/home/brech/prog/tcl/timing.tcl, which generates the directories stage00 to stage14 and the ten files {0,1,2}-{never,normal,always}, and t_max in each of them. Each of the former files contains the 2*200 samples, ordered for graphing.
III. Presentation

Color Codes

The different experiments are color coded as follows.

never normal always

chain-frac-0

chain-frac-1

chain-frac-2

(They lie in the isoluminant plane in the middle of the RGB color space.)
Presenting Distributions

A number of statistic measures can summarize an ensemble of data, in our case, the 400 timings. The average is affected by outliers; the median is much more robust. Maybe some dispersion measure would be of interest, too. Some people plot histograms, but that approach is frought with its own problems. I chose to use all the data and plot estimated distribution functions. Our random variable being time T, its distribution function is defined as
F_T(t) = P[T <= t] . It is sometimes referred to as cumulative density function. Its deriviative p(t) = F'(t) is the distribution density of the random variable T, which histograms approximate.

The curve always increases monotonically from 0 to 1. In case of a normal distribution, you get the erf shape; for a uniform distribution it is a ramp. This form of presentation throws away no information, and it shows all about the distribution of a single variable. I am pretty sure the times that different stages of one request take are statistically dependent, but I don't consider that for the time being. The median is the abscissa t at which the ordinate F(t)=1/2.

The curves for all nine experiments are drawn in the same graph for comparison. Note that the abscissa is scaled very differently for various stages.

Steps
- I scp the stage?? directories to my linux box, where gnuplot is installed. Another approach would be to install gnuplot on the machine that runs the server, i.e., dev0103-001.
- The csh script plot-all goes into each stage?? subdirectory and runs the gnuplot script distrib.gplt:
```
   #! /bin/csh
   foreach i (stage[01][0-9])
     (cd $i; gnuplot ../distrib.gplt > $i.gif)
     echo $i done.
   end
	
```
IV. Results
- graphs from dev0103-001, approximately acs-4-0-beta-R20001001.
- graphs from reusia.boston, approximately acs-4-0-beta-2-R20001009.
- graphs from reusia.boston, from ACS 3.4.6 tarball. This comparison is not completely fair.
V. Conclusion

Currently, the template system doesn't meet the performance requirement.

Earlier on dev0103-001, templated pages loaded fast enough. Although the processing stage seems to a lot be more than 10% slower, the overall performance is rather increased than slowed by templating.

On reusia.boston, we had a much better performance of the request processor. The processing times of the pages proper (stage 10 before, 11 now) didn't change much; we just got clearer results. The total processing time of TCL-only pages is around 155ms, while templated pages take around 220ms, that is 42% longer (or TCL-only pages take 30% less).

Relative times depend on the other components of the pipeline. The difference of 65ms is a large percentage of a total serving time of 155ms; when other parts of the system (e.g., the request processor) were slower, this wasn't that noticeable.

For ACS 3.4, TCL-only chain-frac-0 pages take 115ms, where the templated versisons are much slower, 320ms for chain-frac-1 and 340 for -2.

VI. Further Work

Tune templating in ACS 4.0.

Christian Brechbühler
Last modified: Tue Oct 17 20:11:49 EDT 2000

	never	normal	always
chain-frac-0
chain-frac-1
chain-frac-2