Index: openacs-4/packages/acs-lang/www/doc/i18n-design.adp
===================================================================
RCS file: /usr/local/cvsroot/openacs-4/packages/acs-lang/www/doc/i18n-design.adp,v
diff -u -N -r1.1.2.1 -r1.1.2.2
--- openacs-4/packages/acs-lang/www/doc/i18n-design.adp 20 Aug 2015 17:43:22 -0000 1.1.2.1
+++ openacs-4/packages/acs-lang/www/doc/i18n-design.adp 25 Aug 2015 18:02:07 -0000 1.1.2.2
@@ -2,16 +2,20 @@
For Internationalization to be effective, it needs to be
integrated into every module in the system. Thus making the
@@ -24,6 +28,7 @@
on matching of template files for locales. A set of unit tests are included in the acs-lang
package, to allow automatic testing after installation. We will refer to a Locale by a combination of a language
and country. In the
Java Locale API there is an optional variant which can
-be added to a locale, which we will omit in the Tcl API. The language is a valid ISO Language Code. These
+be added to a locale, which we will omit in the Tcl API. The language is a valid ISO Language Code. These
codes are the lower-case two-letter codes as defined by ISO-639.
You can find a full list of these codes at a number of sites, such
as: The country is a valid ISO Country Code. These
+ The country is a valid ISO Country Code. These
codes are the upper-case two-letter codes as defined by ISO-3166.
You can find a full list of these codes at a number of sites, such
as: Examples are The i18n module figures out the locale for a current request
-makes it accessible via the ad_locale function:ACS 4 Globalization Detailed Design
+
by Henry Minsky
I. Essentials
+
When applicable, each of the following items should receive its own
link:
II. Introduction
III. Historical Considerations
V. Design Tradeoffs
+
+II. Introduction
+III. Historical Considerations
+V. Design Tradeoffs
+
+
Areas of interest to developers:
VI. API
VI.A Locale API
10.30 A Locale object represents a specific geographical,
+
+VI. API
+VI.A Locale API
+10.30
+ A Locale object represents a specific geographical,
political, or cultural region. An operation that requires a Locale
to perform its task is called locale-sensitive and uses the Locale
to tailor information for the user. For example, displaying a
@@ -45,27 +54,36 @@
http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt
-
http://www.chemie.fu-berlin.de/diverse/doc/ISO_3166.html
-
-en_US English US
ja_JP Japanese
fr_FR France French.
+
Examples are
++en_US English US+
ja_JP Japanese
fr_FR France French.
The i18n module figures out the locale for a current request +makes it accessible via the ad_locale function:
+[ad_locale user locale ] => fr_FR [ad_locale subsite locale ] => en_US+ It has not yet been decided how the user's preferred locale will be initialized. For now, there is a site wide default package parameter [parameter::get -parameter DefaultLocale -default -"en_US"], and an API for setting the locale with the +"en_US"] +, and an API for setting the locale with the preference stored in a session variable: The ad_locale_set + function is used to set the user's preferred locale to a desired value. It saves the value in the current session.
@@ -77,32 +95,44 @@+ The request processor should use the ad_locale API to figure out the preferred locale for a request (perhaps combining user preference with subsite defaults in some way). It will make this -information accesible via the ad_conn function: -
ad_conn locale
ad_conn locale+
Content-Type: text/html; charset=iso-8859-1 -
You can obtain the preferred character set for a locale via the -ad_locale API shown below:
++
You can obtain the preferred character set for a locale via the +ad_locale API shown below:
+set locale "en_US" [ad_locale charset $locale ] => "iso-8859-1" or "shift_jis"+ Returns a case-insensitive name of a MIME character set.
We already have an AOLserver function to convert a MIME charset -name to a Tcl encoding name:
+name to a Tcl encoding name: ++[ns_encodingforcharset "iso-8859-1"] => iso8859-1 -Templating
+
For presenting data in multiple languages, there are two basic ways to use templates for a given abstract URL. Say we have the URL "foo", for example. We can provide templates for it in the -following ways:
Have a copy of each template file in each language you support, e.g., foo.en.adp, foo.fr.adp, @@ -118,6 +148,7 @@ template.
Let's say you have a template file "foo.adp" and it contains -calls to look up message strings using the TRN tag:
++ +Let's say you have a template file "foo.adp" and it contains +calls to look up message strings using the TRN tag:
+-If the user requests the page foo, and their -ad_locale is "en_US" then effective locale is + +If the user requests the page foo +, and their +ad_locale + is "en_US" then effective locale + is "en_US". Message lookups are done using the effective locale. If the user's locale is "fr_FR", then the effective locale will be "fr_FR".<master> <trn key=username_prompt>Please enter your username</tr> <input type=text name=username> <p> <trn key=password_prompt>Enter Password:</trn> <input type=password name=passwd>If we evaluate the TRN tags at compile time then we need to associate the effective locale in which the page was -evaluated with the cached compiled page code.
The effective locale of a template page that has an explicit +evaluated with the cached compiled page code.
+The effective locale of a template page that has an explicit locale, such as a file named "foo.en.adp" or "foo.en_US.adp", will be that explicit locale. So for example, even if a user has a preferred locale of "fr_FR", if there is only a page named "foo.en.adp", then that page will be evaluated (and cached) with an -effective locale of en_US.
VI.B Naming of Template Files To Encode Language and Character -Set
10.40 The templating system will use the Locale API to +effective locale of en_US. +VI.B Naming of Template Files To Encode Language and Character +Set
+10.40 + The templating system will use the Locale API to obtain the preferred locale for a page request, and will attempt to find a template file which most closely matches that locale.We will use the following convention for naming template files: -filename.locale_or_language.adp.
Examples:
++filename.locale_or_language.adp. +Examples:
+foo.en_US.adp foo.en.adp @@ -171,13 +217,16 @@ foo.ja_JP.adp foo.ja.adp -The user request has a locale which is of the form +
The user request has a locale which is of the form language_country. If someone wants English, they will implicitly be choosing a default, such as en_US or en_GB. The default locale for a language can be configured in the system locale tables. So for example the default locale for "en" could be -"en_US".
The algorithm for finding the best matching template for a -request in a given locale is given below:
+"en_US". +
+The algorithm for finding the best matching template for a +request in a given locale is given below:
+
- Find the desired target locale using [ad_conn locale] NOTE: This will always be a specific Locale (i.e., language_COUNTRY)
- Look for a template file whose locale suffix matches exactly. @@ -204,46 +253,59 @@ foo.en.adp, use that.
- If no locale-specific template is found, look for a simple .adp file, such as foo.adp.
-Once a template file is found we must decide what character set +
Once a template file is found we must decide what character set it is authored in, so that we can correctly load it into Tcl (which -converts it to UTF8 internally).
It would be simplest to mandate that all templates are authored +converts it to UTF8 internally).
+It would be simplest to mandate that all templates are authored in UTF8, but that is just not a practical thing to enforce at this point, I believe. Many designers and other people who actually author the HTML template files will still find it easier to use legacy tools that author in their "native" character sets, such as -ShiftJIS in Japan, or BIG5 in China.
So we make the convention that the template file is authored in +ShiftJIS in Japan, or BIG5 in China.
+So we make the convention that the template file is authored in it's effective locale's character set. For multilingual templates, we will load the template in the site default character set as specified by the AOLserver OutputCharset initializatoin parameter. For now, we will say that authoring generic multilingual adp files can and should be done in ASCII. -Eventually we can switch to using UTF8.
A character set corresponding to a locale can be found using the +Eventually we can switch to using UTF8.
+A character set corresponding to a locale can be found using the [ad_locale charset$locale] command. The templating system should call this right after it computes the effective locale, so it can set up that charset encoding conversion -before reading the template file from disk.
We read the template file using this encoding, and set the +before reading the template file from disk.
+We read the template file using this encoding, and set the default output character set to it as well. Inside of either the .adp page or the parent .tcl page, it is possible for the developer to issue a command to override this default output character set. The way this is done is currently to stick an explicit content-type header in the AOLserver output headers, for example to force the -output to ISO-8859-1, you would do
+output to ISO-8859-1, you would do ++ns_set put [ns_conn outputheaders] "content-type" "text/html; charset=iso-8859-1" -+design questionWe should have an API for this. The hack now is that the adp handler adp_parse_ad_locale user_file looks at the output headers, and if it sees a content type with an explicit charset, it passes -it along to ns_return.The default character set for a template .adp file -should be the default system encoding.
VI.C Loading Regular Tcl Script Files
10.50 By default, tcl and template files in the system will +it along to ns_return.
The default character set for a template .adp file +should be the default system encoding.
+This default can be overridden by setting the AOLserver init parameter for the MIME type of .tcl files to include an explcit character set. If an explicit MIME type is not found, ns_encodingfortype will default to the AOLserver init -parameter value DefaultCharset if it is set.
Example AOLserver .ini configuration file to set default script -file and template file charset to ShiftJIS:
++parameter value DefaultCharset if it is set. +Example AOLserver .ini configuration file to set default script +file and template file charset to ShiftJIS:
+ns_section {ns/mimetypes } ... ns_param .tcl {text/plain; charset=shift_jis} @@ -258,24 +320,31 @@ ns_param HttpOpenCharset shift_jis ns_param DefaultCharset shift_jis -VI.A Message Catalog API
+
For AOLserver/TCL, to make the message catalog more manageable, we will split it into one message catalog per package, plus one default global message namespace in case we need it. So for -example,
Message lookups are done using a combination of a key string and +example,
+Message lookups are done using a combination of a key string and a locale or language, as well as an implicit package prefix on the key string. The API for using the message catalog is as -follows:
+follows: +++ The locale arg can actually be a full locale, or else a simple -language abbrev, such as fr, en, etc. The lookup +language abbrev, such as fr +, en +, etc. The lookup rules for finding strings based on key and locale are tried in order as follows:lang_message_lookuplocalekey [default_string]lang_message_lookup
is abbreviated by the procedure named "_
", which is the convention used by the GNU strings message catalog package.@@ -285,8 +354,10 @@ prefix.
+ Example: You are looking up the message string "Title" in the -notes package. +notes + package.- Lookup is tried with language and key without package prefix.
[lang_message_lookup $locale notes.title "Title"] @@ -298,29 +369,39 @@ [_ $locale title "Title"]+ The string is looked up by the symbolic key notes.title -(or title for short), and the constant value -"Title" is supplied as documentation and as a default + +(or title + for short), and the constant value +"Title" + is supplied as documentation and as a default value. Having a default value allows developers to code their application immediately without waiting to populate the message catalog.Default Package Namespace
+ By default, keys are prefixed with the name of the current package (if a page request is being processed). So a lookup of the key "title" in a page in the bboard package will actually reference the "bboard.title" entry in the message catalog.You can override this behavior by either using a fully qualified key such as bboard.title or else by changing the message -catalog namespace using the lang_set_package command:
+catalog namespace using the lang_set_package command: ++[lang_set_package "bboard"]+ So for example code that runs in a scheduled proc, where there is not necessarily any concept of a "current package", would either use fully qualified keys to look up messages, or else call -lang_set_package before doing a message lookup. +lang_set_package + before doing a message lookup.Message Catalog Definition Files
+ A message catalog is defined by placing a file in the -catalog subdirectory of a package. Each file defines a set +catalog + subdirectory of a package. Each file defines a set of messages in different locales, and the file is written in a character set specified by it's file suffix:@@ -329,6 +410,7 @@ bboard.shift_jis bboard.iso-8859-6+ A message catalog file consists of tcl code to define messages in a given language or locale:@@ -338,19 +420,27 @@ ...+ In the example above, if the catalog file was loaded from the bboard package, all of the keys would be prefixed autmatically with -"bboard.
". +"bboard.
+".Loading A Message Catalog At Package Init Time
+ The API functionlang_catalog_loadpackage_key+ Is used to load the message catalogs for a package. The catalog -files are stored in a package subdirectory called catalog. -Their file names have the form *.encoding.cat, -where encoding is the name of a MIME charset encoding -(not a Tcl charset name as was used in a previous version of +files are stored in a package subdirectory called catalog +. +Their file names have the form *.encoding.cat +, +where encoding + is the name of a MIME charset encoding +(not + a Tcl charset name as was used in a previous version of this command)./packages/bboard/catalog @@ -360,26 +450,33 @@ /other.iso8859-1.cat /other.shift_jis.cat /other.iso-8859-6.cat -You can add more pseudo-levels of hierarchy in naming the +
You can add more pseudo-levels of hierarchy in naming the message keys, using any separator character you want, for -example
+example +-+ which will be stored with the full key of -bboard.alerts.mail_notification. +bboard.alerts.mail_notification +._mr fr alerts.mail_notification "Le notification du email"Calling the Message Catalog API from inside of Templates
+ Inside of a template, you can always make a call to the message catalog API via a Tcl escape:<%= [_ $locale bboard.passwordPrompt "Enter Password"]%>+ However, this is awkward and ugly to use. We have defined an ADP tag which invokes the message catalog lookup. As explained in the previous section, since our system precompiles adp templates, we can get a performance improvement if we can cache the message lookups at template compile time.The <TRN> tag is a call to lang_message_lookup that can be -used inside of an ADP file. Here is the documention:
Procedure that gets called when the <trn> tag is +used inside of an ADP file. Here is the documention: +Procedure that gets called when the <trn> tag is encountered on an ADP page. The purpose of the procedure is to register the text string enclosed within a pair of <trn> tags as a message in the catalog, and to display the appropriate @@ -438,15 +535,19 @@ <trn key="hello" static>Hello</trn>VII. Data Model Discussion
Internationalizing the Data Models
+
Tables which are in acs kernel and have user-visible names that may need to be translated in order to create an admin back end in -another language:
+another language: +user groups: group_name @@ -482,14 +583,20 @@ parameter_name section_name+ One approach is to split a table into two tables, one holding language-independent datam, and the other holding language-dependent data. This approach was described in the ASJ -Multilingual Site Article. +Multilingual Site Article +.In that case, it is convenient to create a new view which looks like the original table, with the addition of a language column -that you can specify in the queries.
Drawbacks to Splitting Tables
It is not totally transparent to developers
+that you can specify in the queries. +Drawbacks to Splitting Tables
+It is not totally transparent to developers +
+ Every query against the table which requests or modifies language-dependent columns must now include a WHERE clause to select the language. @@ -498,7 +605,11 @@ The extra join of the two tables may cause queries to slow down, although I am not sure what the actual performance hit might be. It shouldn't be too large, because the join is against a fully indexed -table.VIII. User Interface
IX. Configuration/Parameters
X. Code Examples
ad_proc adp_parse_ad_conn_file {} { @@ -521,10 +632,16 @@
The revision history table below is for this template - -modify it as needed for your actual design document.
Document Revision # | Action Taken, Notes | When? | By Whom? |
---|---|---|---|
0.4 | Definition of effective locale for template caching, documentation of TRN tag | 12/12/2000 | Henry Minsky |