Index: openacs-4/packages/acs-lang/www/doc/i18n-design.html =================================================================== RCS file: /usr/local/cvsroot/openacs-4/packages/acs-lang/www/doc/i18n-design.html,v diff -u -r1.3 -r1.3.2.1 --- openacs-4/packages/acs-lang/www/doc/i18n-design.html 27 Oct 2014 16:39:40 -0000 1.3 +++ openacs-4/packages/acs-lang/www/doc/i18n-design.html 22 Jun 2016 07:45:44 -0000 1.3.2.1 @@ -1,852 +1,852 @@ - - -ACS 4 Globalization Detailed Design - - - -

ACS 4 Globalization Detailed Design

by Henry Minsky -
- - -

I. Essentials

When applicable, each of the following items should -receive its own link: -

-

- -

II. Introduction

- -

- -

III. Historical Considerations

- -

V. Design Tradeoffs

- -Areas of interest to developers: - -

- -

VI. API

- -

VI.A Locale API

- -10.30 - -A Locale object represents a specific geographical, political, or -cultural region. An operation that requires a Locale to perform its -task is called locale-sensitive and uses the Locale to tailor -information for the user. For example, displaying a number is a -locale-sensitive operation--the number should be formatted according -to the customs/conventions of the user's native country, region, or -culture. - -

-We will refer to a Locale by a combination of a language and country. -In the Java Locale API there is an optional variant which can be added to a locale, -which we will omit in the Tcl API. -

- -The language is a valid ISO Language Code. These codes are the -lower-case two-letter codes as defined by ISO-639. You can find a full -list of these codes at a number of sites, such as: -
- -http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt -

-The country is a valid ISO Country Code. These codes are the upper-case two-letter codes as defined by ISO-3166. You can find a full list of these codes at a number of sites, such as: -
-http://www.chemie.fu-berlin.de/diverse/doc/ISO_3166.html -

- -Examples are

-en_US English US -
- ja_JP Japanese
- fr_FR France French. -
- - -

-The i18n module figures out the locale for a current request -makes it accessible via the ad_locale function: -

-[ad_locale user locale] => fr_FR
-[ad_locale subsite locale] => en_US
-
- -It has not yet been decided how the user's preferred locale will be -initialized. For now, there is a site wide default package parameter -[parameter::get -parameter DefaultLocale -default "en_US"], and an API for setting -the locale with the preference stored in a session variable: - -The ad_locale_set function is used to set the user's preferred locale -to a desired value. It saves the value in the current session. - -
-    ad_locale_set locale "en_US"
-       will also automatically set [ad_locale user language]
-          ( to "en" in this case)
-
-    ad_locale_set timezone "PST"
-
-    
-
- -The request processor should use the ad_locale API to figure out the -preferred locale for a request (perhaps combining user preference with -subsite defaults in some way). It will make this information accesible -via the ad_conn function: - -
-    ad_conn locale 
-
- - -

- -

Character Sets and Encodings

- -We refer to MIME character set names which are the valid values -which can be passed in a MIME header, such as -
-Content-Type: text/html; charset=iso-8859-1
-
- -

You can obtain the preferred character set for a locale via the - ad_locale API shown below: - -

- -

-set locale "en_US"
-[ad_locale charset $locale] => "iso-8859-1" or "shift_jis"
-
- Returns a case-insensitive name of a MIME character set. -

- - -

-We already have an AOLserver function to convert a MIME charset name to a Tcl encoding name: -

-

-[ns_encodingforcharset "iso-8859-1"] => iso8859-1
-
- -

- - -

Templating

- -The goal of templates is to separate program logic from data -presentation. -

-For presenting data in multiple languages, there are two basic ways to -use templates for a given abstract URL. Say we have the URL "foo", for example. -We can provide templates for it in the following ways: - -

- -Both styles of authoring templates will probably be used; For pages -which contain a lot of free form text content, then having a separate -template page for each language would be easiest. - -

-But for a page which has a very fixed format, such as a data entry -form, it would mean a lot less redundant work to use a single template -source page to handle all the languages, and to have all -language-dependent strings be looked in a message catalog. We can do this either by -creating data sources which call lang_message_lookup, or else -use the <TRN> tag to do the same thing from within an ADP file. - -

- -

Caching multilingual ADP Templates

Message catalog lookups -can be potentially expensive, if many of them are done in a page. The -templating system can already precompile and and cache adp pages. -This works fine for a page in a specific language such as -foo.en.adp, but we need to modify the caching mechanism if we -want to use a single template file to target multiple languages. - -

Computing the Effective Locale

- -

-Let's say you have a template file "foo.adp" and it contains calls to -look up message strings using the TRN tag: - -

-
-<master>
-<trn key=username_prompt>Please enter your username</tr>
-<input type=text name=username>
-<p>
-<trn key=password_prompt>Enter Password:</trn>
-<input type=password name=passwd>
-
-
- -If the user requests the page foo, and their ad_locale -is "en_US" then effective locale is "en_US". Message lookups -are done using the effective locale. If the user's locale is "fr_FR", -then the effective locale will be "fr_FR". - -

- -If we evaluate the TRN tags at compile time then we need to associate -the effective locale in which the page was evaluated with the -cached compiled page code. -

- -The effective locale of a template page that has an explicit locale, -such as a file named "foo.en.adp" or "foo.en_US.adp", will be that -explicit locale. So for example, even if a user has a preferred locale -of "fr_FR", if there is only a page named "foo.en.adp", then that page -will be evaluated (and cached) with an effective locale of en_US. - -

- -

VI.B Naming of Template Files To Encode Language and Character Set

- -10.40 -The templating system will use the Locale API to obtain the preferred -locale for a page request, and will attempt to find a template file which -most closely matches that locale. -

-We will use the following convention for naming template files: -filename.locale_or_language.adp. - -

-Examples: -

-
-foo.en_US.adp
-foo.en.adp
-
-foo.fr_FR.adp
-foo.fr.adp
-
-foo.ja_JP.adp
-foo.ja.adp
-
-
-
-

- - -The user request has a locale which is of the form -language_country. If someone wants English, they will -implicitly be choosing a default, such as en_US or en_GB. The default -locale for a language can be configured in the system locale -tables. So for example the default locale for "en" could be "en_US". -

- - - -The algorithm for finding the best matching template for a request in -a given locale is given below: - -

    - -
  1. Find the desired target locale using [ad_conn locale] - NOTE: This will always be a specific Locale (i.e., language_COUNTRY) - -

    -

  2. Look for a template file whose locale suffix matches exactly. -

    - For example, if the filename in the URL request is simply foo - and [ad_conn locale] returns en_US then look for a file - named foo.en_US.adp. - -

    -

  3. -If an exact match is not found, look for template files whose name - matches the language portion of the target locale. -

    - - For example, if the URL request name is foo and [ad_conn locale] returns - en_US and a file named foo.en_US.adp is not found, then look for - all templates matching "en_*" as well as any template which just has the "en" suffix. - -

    - So for example if the user's locale en_GB and the following files exist: - -

    - foo.en_US.adp - -

    - then use foo.en_US.adp -

    - - If however both foo.en_US.adp and foo.en.adp -exist, then use foo.en.adp preferentially, i.e., don't -switch locales if you can avoid it. The reasoning here is that people -can be very touchy about switching locales, so if there is a generic -matching language template available for a language, use it rather -than using an incorrect locale-specific template. - - - -

    -

  4. - - If no locale-specific template is found, look for a template matching - just the language -

    - - I.e., if the request is for en_US, and there exists a file - foo.en.adp, use that. - - -

    -

  5. - -If no locale-specific template is found, look for a simple .adp file, - such as foo.adp. -
-

-Once a template file is found we must decide what character set it is -authored in, so that we can correctly load it into Tcl (which converts it -to UTF8 internally). -

- -It would be simplest to mandate that all templates are authored in UTF8, but -that is just not a practical thing to enforce at this point, I believe. Many -designers and other people who actually author the HTML template files -will still find it easier to use legacy tools that author in their -"native" character sets, such as ShiftJIS in Japan, or BIG5 in China. -

- -So we make the convention that the template file is authored in it's -effective locale's character set. For multilingual templates, -we will load the template in the site default character set as -specified by the AOLserver OutputCharset initializatoin -parameter. For now, we will say that authoring generic multilingual -adp files can and should be done in ASCII. Eventually we can switch to -using UTF8. - -

-A character set corresponding to a locale can be found using the -[ad_locale charset $locale] command. The templating -system should call this right after it computes the effective locale, so it -can set up that charset encoding conversion before reading the template file from disk. -

- -We read the template file using this encoding, and set the default -output character set to it as well. Inside of either the .adp page or -the parent .tcl page, it is possible for the developer to issue a -command to override this default output character set. The way this -is done is currently to stick an explicit content-type header in -the AOLserver output headers, for example to force the output to ISO-8859-1, you -would do - - -

-ns_set put [ns_conn outputheaders] "content-type" "text/html; charset=iso-8859-1"	
-
- - -
-design questionWe should have an API for this. The hack now is that the -adp handler adp_parse_ad_locale user_file looks at the output headers, and if it sees a content type with -an explicit charset, it passes it along to ns_return. -
-
- -

-The default character set for a template .adp file should -be the default system encoding. - -

- -

VI.C Loading Regular Tcl Script Files

- -10.50 By default, tcl and template files in the system will be -loaded using the default system encoding. This is generally ISO-8859-1 -for AOLserver running on Unix systems in English. -

-This default can be -overridden by setting the AOLserver init parameter for the MIME type -of .tcl files to include an explcit character set. If an explicit -MIME type is not found, ns_encodingfortype will default to the -AOLserver init parameter value DefaultCharset if it is set. -

- -Example AOLserver .ini configuration file to set default script file -and template file charset to ShiftJIS: - - -

-
-ns_section {ns/mimetypes }
-...
-ns_param .tcl {text/plain; charset=shift_jis}
-ns_param .adp {text/html; charset=shift_jis}
-
-ns_section ns/parameters
-...
-# charset hacking
-ns_param HackContentType 1
-ns_param URLCharset shift_jis
-ns_param OutputCharset shift_jis
-ns_param HttpOpenCharset shift_jis
-ns_param DefaultCharset shift_jis
-
-
-
- -

VI.A Message Catalog API

- -We want to use something like the Java ResourceBundle, where the -developer can declare a set of resources for a given namespace -and locale. - -

-For AOLserver/TCL, to make the message catalog more manageable, we will -split it into one message catalog per package, plus one default global -message namespace in case we need it. So for example, -

-Message lookups are done using a combination of a key string and a -locale or language, as well as an implicit package prefix on the key -string. The API for using the message catalog is as follows: -

- -

-
-lang_message_lookup locale key [default_string]
-
- -lang_message_lookup is abbreviated by the procedure named "_", -which is the convention used by the GNU strings message catalog package. -

-

- - -The locale arg can actually be a full locale, or else a simple language abbrev, such as fr, en, etc. - -The lookup rules for finding strings based on key and locale are tried -in order as follows: -
    - -
  1. Lookup is first tried with the full locale (if present) and package.key - -
  2. Lookup is tried with just the language portion of the locale and - package.key - -
  3. Lookup is tried with the full locale and key without package prefix. - -
  4. Lookup is tried with language and key without package prefix. - -
- -Example: You are looking up the message string "Title" in the notes -package. -

-

-[lang_message_lookup $locale notes.title "Title"]
-
-can be abbreviated by
-[_ $locale notes.title "Title"]
-
-# message key "title" is implicitly with respect to package key
-#  "notes", i.e., notes.title
-[_ $locale title "Title"]
-
-
- -The string is looked up by the symbolic key notes.title (or title for short), and the constant value "Title" is supplied as documentation and -as a default value. Having a default value allows developers to code their application -immediately without waiting to populate the message catalog. -

- -

Default Package Namespace

- -By default, keys are prefixed with the name of the current package (if -a page request is being processed). So a lookup of the key "title" in -a page in the bboard package will actually reference the -"bboard.title" entry in the message catalog. - -

-You can override this behavior by -either using a fully qualified key such as bboard.title or -else by changing the message catalog namespace using the -lang_set_package command: - -

-[lang_set_package "bboard"]
-
- -So for example code that runs in a scheduled proc, where there is not necessarily -any concept of a "current package", would either use fully qualified keys to -look up messages, or else call lang_set_package before doing a message lookup. - - - -

-

Message Catalog Definition Files

- -A message catalog is defined by placing a file in the catalog -subdirectory of a package. Each file defines a set of messages -in different locales, and the file is written in a character set -specified by it's file suffix: - -
-/packages/bboard/catalog/
-			 bboard.iso-8859-1
-			 bboard.shift_jis
-			 bboard.iso-8859-6
-
- - -A message catalog file consists of tcl code to define -messages in a given language or locale: - -
-
-_mr en mail_notification "This is an email notification"
-_mr fr mail_notification "Le notification du email"
-...
-
-
- - -In the example above, if the catalog file was loaded from the bboard -package, all of the keys would be prefixed autmatically with "bboard.". - -

Loading A Message Catalog At Package Init Time

- -The API function -
-lang_catalog_load package_key
-
- -Is used to load the message catalogs for a package. -The catalog files are stored in a package subdirectory called catalog. Their -file names have the form *.encoding.cat, where encoding -is the name of a MIME charset encoding (not a Tcl charset name as was used -in a previous version of this command). - -
-/packages/bboard/catalog
-                        /main.iso8859-1.cat
-                        /main.shift_jis.cat
-                        /main.iso-8859-6.cat
-                        /other.iso8859-1.cat
-                        /other.shift_jis.cat
-                        /other.iso-8859-6.cat
-
- - -

- -You can add more pseudo-levels of hierarchy in naming the message keys, using -any separator character you want, for example - -

-
-_mr fr alerts.mail_notification "Le notification du email"
-
-
-which will be stored with the full key of bboard.alerts.mail_notification. - -

- -

Calling the Message Catalog API from inside of Templates

- -Inside of a template, you can always make a call to the message -catalog API via a Tcl escape: - -
-<%= [_ $locale bboard.passwordPrompt "Enter Password"]%> 
-
- -However, this is awkward and ugly to use. We have defined an ADP tag -which invokes the message catalog lookup. As explained in the previous -section, since our system precompiles adp templates, we can get a -performance improvement if we can cache the message lookups at -template compile time. -

- -The <TRN> tag is a call to lang_message_lookup that can be used inside -of an ADP file. Here is the documention: - -

- Procedure that gets called when the <trn> tag is encountered on an ADP page. - The purpose of the procedure is to register the text string enclosed within a - pair of <trn> tags as a message in the catalog, and to display the appropriate - translated string. - Takes three optional parameters: lang, type - and key. - - Example 1: Display the text string Hello on an ADP page (i.e. do nothing special): -
-    <trn>Hello</trn>
-    
- Example 2: Assign the key key hello to the text string Hello and display - the translated string in the user's preferred language: -
-    <trn key="hello">Hello</trn>
-    
- Example 3: Specify that Bonjour needs to be registered as the French translation - for the key hello (in addition to displaying the translation in the user's - preferred language): -
-    <trn key="hello" lang="fr">Bonjour</trn>
-    
- Example 4: Register the string and display it in the preferred language of the - current user. Note that the possible values for the type - paramater are determined by what has been implemented in the ad_locale procedure. - By default, only the user type is implemented. An example of a type that - could be implemented is subsite, - for displaying strings in the language of the subsite that owns the current web page. -
-    <trn key="hello" type="user">Hello</trn>
-    
- -

- Example 5: Translates the string once at template compile time, using the effective local of the page. - -

-    <trn key="hello" static>Hello</trn>
-    
- - - -
- - - -

VII. Data Model Discussion

- -

Internationalizing the Data Models

- -Some data which is stored in ACS package and core database tables may -be presented to users, and thus may need to be stored in multiple -languages. Examples of this are the descriptions of package or site -parameters in the administrative interface, the "pretty names" of -objects, and group names. - -

- -Tables which are in acs kernel and have user-visible names that may -need to be translated in order to create an admin back end in another -language: - -

-user groups:
-   group_name
-
-acs_object_types:
-   pretty_name
-   pretty_plural
-
-acs_attributes:
-   pretty_name
-   pretty_plural
-
-acs_attribute_descriptions
-   description (clob)
-
-procedure add_description- add a lang arg ?
-
-acs_enum_values ? pretty_name
-
-acs_privileges: 
-  pretty_name
-  pretty_plural
-
-apm_package_types
-  pretty_name
-  pretty_plural
-
-
-apm_package "instance_name"? Maybe a given instance
-gets instantiated with a name in the desired language?
-
-
-apm_parameters: 
-   parameter_name
-   section_name
-
- -One approach is to split a table into two tables, one holding -language-independent datam, and the other holding language-dependent -data. This approach was described in the ASJ Multilingual Site Article. -

-In that case, it is convenient to create a new view which looks like -the original table, with the addition of a language column that you -can specify in the queries. -

-

Drawbacks to Splitting Tables

- It is not totally transparent -to developers -
Every query against the table which requests or -modifies language-dependent columns must now include a WHERE clause to -select the language. -

-Extra join may slow things down
The extra join of the two -tables may cause queries to slow down, although I am not sure what the -actual performance hit might be. It shouldn't be too large, because -the join is against a fully indexed table. - -

- - -

VIII. User Interface

- -

IX. Configuration/Parameters

- - - -

X. Code Examples

- - -

XI. Future Improvements/Areas of Likely Change

- -

XII. Authors

- - -

XII. Revision History

-

The revision history table below is for this template - modify it as -needed for your actual design document.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Document Revision #Action Taken, NotesWhen?By Whom?
0.1Creation12/4/2000Henry Minsky
0.2More specific template search algorithm, extended message catalog API to use package -keys or other namespace12/4/2000Henry Minsky
0.3Details on how the <TRN> tag works in templates12/4/2000Henry Minsky
0.4Definition of effective locale for template caching, documentation of TRN tag12/12/2000Henry Minsky
-

-


-hqm@arsdigita.com
- - - + + +ACS 4 Globalization Detailed Design + + + +

ACS 4 Globalization Detailed Design

by Henry Minsky +
+ + +

I. Essentials

When applicable, each of the following items should +receive its own link: +

+

+ +

II. Introduction

+ +

+ +

III. Historical Considerations

+ +

V. Design Tradeoffs

+ +Areas of interest to developers: + +

+ +

VI. API

+ +

VI.A Locale API

+ +10.30 + +A Locale object represents a specific geographical, political, or +cultural region. An operation that requires a Locale to perform its +task is called locale-sensitive and uses the Locale to tailor +information for the user. For example, displaying a number is a +locale-sensitive operation--the number should be formatted according +to the customs/conventions of the user's native country, region, or +culture. + +

+We will refer to a Locale by a combination of a language and country. +In the Java Locale API there is an optional variant which can be added to a locale, +which we will omit in the Tcl API. +

+ +The language is a valid ISO Language Code. These codes are the +lower-case two-letter codes as defined by ISO-639. You can find a full +list of these codes at a number of sites, such as: +
+ +http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt +

+The country is a valid ISO Country Code. These codes are the upper-case two-letter codes as defined by ISO-3166. You can find a full list of these codes at a number of sites, such as: +
+http://www.chemie.fu-berlin.de/diverse/doc/ISO_3166.html +

+ +Examples are

+en_US English US +
+ ja_JP Japanese
+ fr_FR France French. +
+ + +

+The i18n module figures out the locale for a current request +makes it accessible via the ad_locale function: +

+[ad_locale user locale] => fr_FR
+[ad_locale subsite locale] => en_US
+
+ +It has not yet been decided how the user's preferred locale will be +initialized. For now, there is a site wide default package parameter +[parameter::get -parameter DefaultLocale -default "en_US"], and an API for setting +the locale with the preference stored in a session variable: + +The ad_locale_set function is used to set the user's preferred locale +to a desired value. It saves the value in the current session. + +
+    ad_locale_set locale "en_US"
+       will also automatically set [ad_locale user language]
+          ( to "en" in this case)
+
+    ad_locale_set timezone "PST"
+
+    
+
+ +The request processor should use the ad_locale API to figure out the +preferred locale for a request (perhaps combining user preference with +subsite defaults in some way). It will make this information accesible +via the ad_conn function: + +
+    ad_conn locale 
+
+ + +

+ +

Character Sets and Encodings

+ +We refer to MIME character set names which are the valid values +which can be passed in a MIME header, such as +
+Content-Type: text/html; charset=iso-8859-1
+
+ +

You can obtain the preferred character set for a locale via the + ad_locale API shown below: + +

+ +

+set locale "en_US"
+[ad_locale charset $locale] => "iso-8859-1" or "shift_jis"
+
+ Returns a case-insensitive name of a MIME character set. +

+ + +

+We already have an AOLserver function to convert a MIME charset name to a Tcl encoding name: +

+

+[ns_encodingforcharset "iso-8859-1"] => iso8859-1
+
+ +

+ + +

Templating

+ +The goal of templates is to separate program logic from data +presentation. +

+For presenting data in multiple languages, there are two basic ways to +use templates for a given abstract URL. Say we have the URL "foo", for example. +We can provide templates for it in the following ways: + +

+ +Both styles of authoring templates will probably be used; For pages +which contain a lot of free form text content, then having a separate +template page for each language would be easiest. + +

+But for a page which has a very fixed format, such as a data entry +form, it would mean a lot less redundant work to use a single template +source page to handle all the languages, and to have all +language-dependent strings be looked in a message catalog. We can do this either by +creating data sources which call lang_message_lookup, or else +use the <TRN> tag to do the same thing from within an ADP file. + +

+ +

Caching multilingual ADP Templates

Message catalog lookups +can be potentially expensive, if many of them are done in a page. The +templating system can already precompile and and cache adp pages. +This works fine for a page in a specific language such as +foo.en.adp, but we need to modify the caching mechanism if we +want to use a single template file to target multiple languages. + +

Computing the Effective Locale

+ +

+Let's say you have a template file "foo.adp" and it contains calls to +look up message strings using the TRN tag: + +

+
+<master>
+<trn key=username_prompt>Please enter your username</tr>
+<input type="text" name=username>
+<p>
+<trn key=password_prompt>Enter Password:</trn>
+<input type=password name=passwd>
+
+
+ +If the user requests the page foo, and their ad_locale +is "en_US" then effective locale is "en_US". Message lookups +are done using the effective locale. If the user's locale is "fr_FR", +then the effective locale will be "fr_FR". + +

+ +If we evaluate the TRN tags at compile time then we need to associate +the effective locale in which the page was evaluated with the +cached compiled page code. +

+ +The effective locale of a template page that has an explicit locale, +such as a file named "foo.en.adp" or "foo.en_US.adp", will be that +explicit locale. So for example, even if a user has a preferred locale +of "fr_FR", if there is only a page named "foo.en.adp", then that page +will be evaluated (and cached) with an effective locale of en_US. + +

+ +

VI.B Naming of Template Files To Encode Language and Character Set

+ +10.40 +The templating system will use the Locale API to obtain the preferred +locale for a page request, and will attempt to find a template file which +most closely matches that locale. +

+We will use the following convention for naming template files: +filename.locale_or_language.adp. + +

+Examples: +

+
+foo.en_US.adp
+foo.en.adp
+
+foo.fr_FR.adp
+foo.fr.adp
+
+foo.ja_JP.adp
+foo.ja.adp
+
+
+
+

+ + +The user request has a locale which is of the form +language_country. If someone wants English, they will +implicitly be choosing a default, such as en_US or en_GB. The default +locale for a language can be configured in the system locale +tables. So for example the default locale for "en" could be "en_US". +

+ + + +The algorithm for finding the best matching template for a request in +a given locale is given below: + +

    + +
  1. Find the desired target locale using [ad_conn locale] + NOTE: This will always be a specific Locale (i.e., language_COUNTRY) + +

    +

  2. Look for a template file whose locale suffix matches exactly. +

    + For example, if the filename in the URL request is simply foo + and [ad_conn locale] returns en_US then look for a file + named foo.en_US.adp. + +

    +

  3. +If an exact match is not found, look for template files whose name + matches the language portion of the target locale. +

    + + For example, if the URL request name is foo and [ad_conn locale] returns + en_US and a file named foo.en_US.adp is not found, then look for + all templates matching "en_*" as well as any template which just has the "en" suffix. + +

    + So for example if the user's locale en_GB and the following files exist: + +

    + foo.en_US.adp + +

    + then use foo.en_US.adp +

    + + If however both foo.en_US.adp and foo.en.adp +exist, then use foo.en.adp preferentially, i.e., don't +switch locales if you can avoid it. The reasoning here is that people +can be very touchy about switching locales, so if there is a generic +matching language template available for a language, use it rather +than using an incorrect locale-specific template. + + + +

    +

  4. + + If no locale-specific template is found, look for a template matching + just the language +

    + + I.e., if the request is for en_US, and there exists a file + foo.en.adp, use that. + + +

    +

  5. + +If no locale-specific template is found, look for a simple .adp file, + such as foo.adp. +
+

+Once a template file is found we must decide what character set it is +authored in, so that we can correctly load it into Tcl (which converts it +to UTF8 internally). +

+ +It would be simplest to mandate that all templates are authored in UTF8, but +that is just not a practical thing to enforce at this point, I believe. Many +designers and other people who actually author the HTML template files +will still find it easier to use legacy tools that author in their +"native" character sets, such as ShiftJIS in Japan, or BIG5 in China. +

+ +So we make the convention that the template file is authored in it's +effective locale's character set. For multilingual templates, +we will load the template in the site default character set as +specified by the AOLserver OutputCharset initializatoin +parameter. For now, we will say that authoring generic multilingual +adp files can and should be done in ASCII. Eventually we can switch to +using UTF8. + +

+A character set corresponding to a locale can be found using the +[ad_locale charset $locale] command. The templating +system should call this right after it computes the effective locale, so it +can set up that charset encoding conversion before reading the template file from disk. +

+ +We read the template file using this encoding, and set the default +output character set to it as well. Inside of either the .adp page or +the parent .tcl page, it is possible for the developer to issue a +command to override this default output character set. The way this +is done is currently to stick an explicit content-type header in +the AOLserver output headers, for example to force the output to ISO-8859-1, you +would do + + +

+ns_set put [ns_conn outputheaders] "content-type" "text/html; charset=iso-8859-1"	
+
+ + +
+design questionWe should have an API for this. The hack now is that the +adp handler adp_parse_ad_locale user_file looks at the output headers, and if it sees a content type with +an explicit charset, it passes it along to ns_return. +
+
+ +

+The default character set for a template .adp file should +be the default system encoding. + +

+ +

VI.C Loading Regular Tcl Script Files

+ +10.50 By default, tcl and template files in the system will be +loaded using the default system encoding. This is generally ISO-8859-1 +for AOLserver running on Unix systems in English. +

+This default can be +overridden by setting the AOLserver init parameter for the MIME type +of .tcl files to include an explcit character set. If an explicit +MIME type is not found, ns_encodingfortype will default to the +AOLserver init parameter value DefaultCharset if it is set. +

+ +Example AOLserver .ini configuration file to set default script file +and template file charset to ShiftJIS: + + +

+
+ns_section {ns/mimetypes }
+...
+ns_param .tcl {text/plain; charset=shift_jis}
+ns_param .adp {text/html; charset=shift_jis}
+
+ns_section ns/parameters
+...
+# charset hacking
+ns_param HackContentType 1
+ns_param URLCharset shift_jis
+ns_param OutputCharset shift_jis
+ns_param HttpOpenCharset shift_jis
+ns_param DefaultCharset shift_jis
+
+
+
+ +

VI.A Message Catalog API

+ +We want to use something like the Java ResourceBundle, where the +developer can declare a set of resources for a given namespace +and locale. + +

+For AOLserver/TCL, to make the message catalog more manageable, we will +split it into one message catalog per package, plus one default global +message namespace in case we need it. So for example, +

+Message lookups are done using a combination of a key string and a +locale or language, as well as an implicit package prefix on the key +string. The API for using the message catalog is as follows: +

+ +

+
+lang_message_lookup locale key [default_string]
+
+ +lang_message_lookup is abbreviated by the procedure named "_", +which is the convention used by the GNU strings message catalog package. +

+

+ + +The locale arg can actually be a full locale, or else a simple language abbrev, such as fr, en, etc. + +The lookup rules for finding strings based on key and locale are tried +in order as follows: +
    + +
  1. Lookup is first tried with the full locale (if present) and package.key + +
  2. Lookup is tried with just the language portion of the locale and + package.key + +
  3. Lookup is tried with the full locale and key without package prefix. + +
  4. Lookup is tried with language and key without package prefix. + +
+ +Example: You are looking up the message string "Title" in the notes +package. +

+

+[lang_message_lookup $locale notes.title "Title"]
+
+can be abbreviated by
+[_ $locale notes.title "Title"]
+
+# message key "title" is implicitly with respect to package key
+#  "notes", i.e., notes.title
+[_ $locale title "Title"]
+
+
+ +The string is looked up by the symbolic key notes.title (or title for short), and the constant value "Title" is supplied as documentation and +as a default value. Having a default value allows developers to code their application +immediately without waiting to populate the message catalog. +

+ +

Default Package Namespace

+ +By default, keys are prefixed with the name of the current package (if +a page request is being processed). So a lookup of the key "title" in +a page in the bboard package will actually reference the +"bboard.title" entry in the message catalog. + +

+You can override this behavior by +either using a fully qualified key such as bboard.title or +else by changing the message catalog namespace using the +lang_set_package command: + +

+[lang_set_package "bboard"]
+
+ +So for example code that runs in a scheduled proc, where there is not necessarily +any concept of a "current package", would either use fully qualified keys to +look up messages, or else call lang_set_package before doing a message lookup. + + + +

+

Message Catalog Definition Files

+ +A message catalog is defined by placing a file in the catalog +subdirectory of a package. Each file defines a set of messages +in different locales, and the file is written in a character set +specified by it's file suffix: + +
+/packages/bboard/catalog/
+			 bboard.iso-8859-1
+			 bboard.shift_jis
+			 bboard.iso-8859-6
+
+ + +A message catalog file consists of tcl code to define +messages in a given language or locale: + +
+
+_mr en mail_notification "This is an email notification"
+_mr fr mail_notification "Le notification du email"
+...
+
+
+ + +In the example above, if the catalog file was loaded from the bboard +package, all of the keys would be prefixed autmatically with "bboard.". + +

Loading A Message Catalog At Package Init Time

+ +The API function +
+lang_catalog_load package_key
+
+ +Is used to load the message catalogs for a package. +The catalog files are stored in a package subdirectory called catalog. Their +file names have the form *.encoding.cat, where encoding +is the name of a MIME charset encoding (not a Tcl charset name as was used +in a previous version of this command). + +
+/packages/bboard/catalog
+                        /main.iso8859-1.cat
+                        /main.shift_jis.cat
+                        /main.iso-8859-6.cat
+                        /other.iso8859-1.cat
+                        /other.shift_jis.cat
+                        /other.iso-8859-6.cat
+
+ + +

+ +You can add more pseudo-levels of hierarchy in naming the message keys, using +any separator character you want, for example + +

+
+_mr fr alerts.mail_notification "Le notification du email"
+
+
+which will be stored with the full key of bboard.alerts.mail_notification. + +

+ +

Calling the Message Catalog API from inside of Templates

+ +Inside of a template, you can always make a call to the message +catalog API via a Tcl escape: + +
+<%= [_ $locale bboard.passwordPrompt "Enter Password"]%> 
+
+ +However, this is awkward and ugly to use. We have defined an ADP tag +which invokes the message catalog lookup. As explained in the previous +section, since our system precompiles adp templates, we can get a +performance improvement if we can cache the message lookups at +template compile time. +

+ +The <TRN> tag is a call to lang_message_lookup that can be used inside +of an ADP file. Here is the documention: + +

+ Procedure that gets called when the <trn> tag is encountered on an ADP page. + The purpose of the procedure is to register the text string enclosed within a + pair of <trn> tags as a message in the catalog, and to display the appropriate + translated string. + Takes three optional parameters: lang, type + and key. + + Example 1: Display the text string Hello on an ADP page (i.e. do nothing special): +
+    <trn>Hello</trn>
+    
+ Example 2: Assign the key key hello to the text string Hello and display + the translated string in the user's preferred language: +
+    <trn key="hello">Hello</trn>
+    
+ Example 3: Specify that Bonjour needs to be registered as the French translation + for the key hello (in addition to displaying the translation in the user's + preferred language): +
+    <trn key="hello" lang="fr">Bonjour</trn>
+    
+ Example 4: Register the string and display it in the preferred language of the + current user. Note that the possible values for the type + paramater are determined by what has been implemented in the ad_locale procedure. + By default, only the user type is implemented. An example of a type that + could be implemented is subsite, + for displaying strings in the language of the subsite that owns the current web page. +
+    <trn key="hello" type="user">Hello</trn>
+    
+ +

+ Example 5: Translates the string once at template compile time, using the effective local of the page. + +

+    <trn key="hello" static>Hello</trn>
+    
+ + + +
+ + + +

VII. Data Model Discussion

+ +

Internationalizing the Data Models

+ +Some data which is stored in ACS package and core database tables may +be presented to users, and thus may need to be stored in multiple +languages. Examples of this are the descriptions of package or site +parameters in the administrative interface, the "pretty names" of +objects, and group names. + +

+ +Tables which are in acs kernel and have user-visible names that may +need to be translated in order to create an admin back end in another +language: + +

+user groups:
+   group_name
+
+acs_object_types:
+   pretty_name
+   pretty_plural
+
+acs_attributes:
+   pretty_name
+   pretty_plural
+
+acs_attribute_descriptions
+   description (clob)
+
+procedure add_description- add a lang arg ?
+
+acs_enum_values ? pretty_name
+
+acs_privileges: 
+  pretty_name
+  pretty_plural
+
+apm_package_types
+  pretty_name
+  pretty_plural
+
+
+apm_package "instance_name"? Maybe a given instance
+gets instantiated with a name in the desired language?
+
+
+apm_parameters: 
+   parameter_name
+   section_name
+
+ +One approach is to split a table into two tables, one holding +language-independent datam, and the other holding language-dependent +data. This approach was described in the ASJ Multilingual Site Article. +

+In that case, it is convenient to create a new view which looks like +the original table, with the addition of a language column that you +can specify in the queries. +

+

Drawbacks to Splitting Tables

+ It is not totally transparent +to developers +
Every query against the table which requests or +modifies language-dependent columns must now include a WHERE clause to +select the language. +

+Extra join may slow things down
The extra join of the two +tables may cause queries to slow down, although I am not sure what the +actual performance hit might be. It shouldn't be too large, because +the join is against a fully indexed table. + +

+ + +

VIII. User Interface

+ +

IX. Configuration/Parameters

+ + + +

X. Code Examples

+ + +

XI. Future Improvements/Areas of Likely Change

+ +

XII. Authors

+ + +

XII. Revision History

+

The revision history table below is for this template - modify it as +needed for your actual design document.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Document Revision #Action Taken, NotesWhen?By Whom?
0.1Creation12/4/2000Henry Minsky
0.2More specific template search algorithm, extended message catalog API to use package +keys or other namespace12/4/2000Henry Minsky
0.3Details on how the <TRN> tag works in templates12/4/2000Henry Minsky
0.4Definition of effective locale for template caching, documentation of TRN tag12/12/2000Henry Minsky
+

+


+hqm@arsdigita.com
+ + +