Index: openacs-4/packages/acs-core-docs/www/xml/index.xml =================================================================== RCS file: /usr/local/cvsroot/openacs-4/packages/acs-core-docs/www/xml/index.xml,v diff -u -N -r1.8 -r1.9 --- openacs-4/packages/acs-core-docs/www/xml/index.xml 22 Sep 2002 22:44:28 -0000 1.8 +++ openacs-4/packages/acs-core-docs/www/xml/index.xml 1 Oct 2002 09:42:43 -0000 1.9 @@ -21,6 +21,7 @@ + @@ -32,6 +33,7 @@ + @@ -186,6 +188,7 @@ &templates; &permissions; &subsites; + &i18n-devel; @@ -270,6 +273,7 @@ &subsites-design; &apm-req; &apm-design; + &i18n-req; &security-req; &security-design; Index: openacs-4/packages/acs-core-docs/www/xml/developers-guide/i18n.xml =================================================================== RCS file: /usr/local/cvsroot/openacs-4/packages/acs-core-docs/www/xml/developers-guide/i18n.xml,v diff -u -N --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ openacs-4/packages/acs-core-docs/www/xml/developers-guide/i18n.xml 1 Oct 2002 09:42:46 -0000 1.1 @@ -0,0 +1,105 @@ + + + + + By Peter Marklund + + + + Introduction + + + This document describes how to develop internationalized OpenACS packages. + + + + + Multilingual Text - Using the Message Catalog + + + In this section we present the mechanisms that OpenACS provide that allows your + OpenACS packages to handle text in multiple languages. + + + + Multilingual OpenACS Parameters + + + The syntax for storing multilingual pieces of text in APM parameters is identical + to the one used for adp templates. Any message catalog keys in APM parameters should + be surrounded by hash marks and will be replaced by the parameter::get procedure if + it is invoked with the -localize flag. The following three examples illustrate: + + + + + + + + + + + Parameter Name + Parameter Value + Command used to retrieve Value + Retrieved Value + + + + + class_instance_pages_csv + #dotlrn.class_page_home_title#,Simple 2-Column;#dotlrn.class_page_calendar_title#,Simple 1-Column;#dotlrn.class_page_file_storage_title#,Simple 1-Column + parameter::get -localize -parameter class_instances_pages_csv + Kurs Startseite,Simple 2-Column;Kalender,Simple 1-Column;Dateien,Simple 1-Column + + + departments_pretty_name + #departments_pretty_name# + parameter::get -localize -parameter departments_pretty_name + Abteilung + + + ... + + + departments_pretty_name + #departments_pretty_name# + parameter::get -parameter departments_pretty_name + #departments_pretty_name# + + + +
+ + + The value in the rightmost column in the table above is the value returned by an invocation + of parameter::get. Not that for localization to happen you must use the -localize flag. + The locale used for the message lookup will be + the locale of the current request, or if there is no current request, the site-wide default locale + (set by the parameter SiteWideLocale of the acs-lang package). + + +
+ + + Multilingual Page Templates (.adp Files) + + + There are two syntaxes to choose from for doing message catalog lookups in adp templates. + Any message catalog keys surrounded by hash marks (i.e. #message_key#) will be replaced + with the corresponding text in the message catalog (the procedure + lang::message::lookup is used for the lookup) using the locale of the request (given by + ad_conn locale). If there is no message can be retrieved from the message catalog then + a translation missing message will be used instead. + + + + The other syntax for message lookups in adp pages is <trn key="message_key">default text</trn>. + Use the trn tag if you want to provide a default message + in the template. The default message is in the body of the trn tag and is mandatory. + The default message is only used if no message could be retrieved from the message catalog. + + +
+ +
Index: openacs-4/packages/acs-core-docs/www/xml/kernel/i18n-requirements.xml =================================================================== RCS file: /usr/local/cvsroot/openacs-4/packages/acs-core-docs/www/xml/kernel/i18n-requirements.xml,v diff -u -N --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ openacs-4/packages/acs-core-docs/www/xml/kernel/i18n-requirements.xml 1 Oct 2002 09:42:47 -0000 1.1 @@ -0,0 +1,750 @@ + + OpenACS &version; Internationalization Requirements + + + by Henry Minsky, + Yon Feldman, + Lars Pind, + Peter Marklund, + Christian Hvid, + and others. + + + + Introduction + + + This document describes the requirements for functionality in + the OpenACS platform to support globalization of the core and optional + modules. The goal is to make it possible to support delivery of + applications which work properly in multiple locales with the + lowest development and maintenance cost. + + + + + Definitions + + + + internationalization (i18n) + + + The provision within a computer program of the + capability of making itself adaptable to the requirements of different + native languages, local customs and coded character sets. + + + + + + locale + + + The definition of the subset of a user's environment that depends on + language and cultural conventions. + + + + + + localization (L10n) + + + The process of establishing information within a computer system + specific to the operation of particular native languages, local + customs and coded character sets. + + + + + + globalization + + + A product development approach which ensures that software products + are usable in the worldwide markets through a combination of + internationalization and localization. + + + + + + + + + + Vision Statement + +The Mozilla project suggests keeping two catchy phrases in +mind when thinking about globalization: + + + +One code base for the world + + + +English is just another language + + + +Building an application often involves making a number of +assumptions on the part of the developers which depend on their own +culture. These include constant strings in the user interface and +system error messages, names of countries, cities, order of given +and family names for people, syntax of numeric and date strings and +collation order of strings. + +The ACS should be able to operate in languages and regions +beyond US English. The goal of ACS Globalization is to provide a +clean and efficient way to factor out the locale dependent +functionality from our applications, in order to be able to easily +swap in alternate localizations. + +This in turn will reduce redundant, costly, and error prone +rework when targeting the toolkit or applications built with the +toolkit to another locale. + +The cost of porting the ACS to another locale without some +kind of globalization support would be large and ongoing, since +without a mechanism to incorporate the locale-specific changes +cleanly back into the code base, it would require making a new fork +of the source code for each locale. + +System/Application Overview + +A globalized application will perform some or all of the +following steps to handle a page request for a specific +locale: + + + +Decide what the target locale is for an incoming page +request + + + +Decide which character set encoding the output should be +delivered in + + + +If a script file to handle the request needs to be loaded +from disk, determine if a character set conversion needs to be +performed when loading the script + + + +If needed, locale-specific resources are fetched. These can +include text, graphics, or other resources that would vary with the +target locale. + + + +If content data is fetched from the database, check for +locale-specific versions of the data (e.g. country names). + + + +Source code should use a message catalog API to translate +constant strings in the code to the target locale + + + +Perform locale-specific linguistic sorting on data if +needed + + + +If the user submitted form input data, decide what character +set encoding conversion if any is needed. Parse locale-specific +quantities if needed (number formats, date formats). + + + +If templating is being used, select correct locale-specific +template to merge with content + + + +Format output data quantities in locale-specific manner +(date, time, numeric, currency). If templating is being used, this +may be done either before and/or after merging the data with a +template. + + + +Since the internationalization APIs may potentially be used +on every page in an application, the overhead for adding +internationalization to a module or application must not cause a +significant time delay in handling page requests. + +In many cases there are facilities in Oracle to perform +various localization functions, and also there are facilities in +Java which we will want to move to. So the design to meet the +requirements will tend to rely on these capabilities, or close +approximations to them where possible, in order to make it easier +to maintain Tcl and Java ACS versions. + +Use-cases and User-scenarios + +Here are the cases that we need to be able to handle +efficiently: + + + +A developer needs to author a web site/application in a +language besides English, and possibly a character set besides +ISO-8859-1. This includes the operation of the ACS itself, i.e., +navigation, admin pages for modules, error messages, as well as +additional modules or content supplied by the web site +developer. + +What do they need to modify to make this work? Can their +localization work be easily folded in to future releases of +ACS? + + + +A developer needs to author a web site which operates in +multiple languages simultaneously. For example, arsDigita.com with +content and navigation in English, German, and Japanese. + +The site would have an end-user visible UI to support these +languages, and the content management system must allow articles to +be posted in these languages. In some cases it may be necessary to +make the modules' admin UI's operate in more than one +supported language, while in other cases the backend admin +interface can operate in a single language. + + + +A developer is writing a new module, and wants to make it +easy for someone to localize it. There should be a clear path to +author the module so that future developers can easily add support +for other locales. This would include support for creating +resources such as message catalogs, non-text assets such as +graphics, and use of templates which help to separate application +logic from presentation. + + + +Competitive +Analysis + +Other application servers: ATG Dyanmo, Broadvision, Vignette, +... ? Anyone know how they deal with i18n ? + +Related +Links + + + +System/Package "coversheet" - where all +documentation for this software is linked off of + + + +Design document + + + +Developer's guide + + + +User's guide + + + +Other-cool-system-related-to-this-one +documentLI18NUX +2000 Globalization Specification: +http://www.li18nux.net/ + +Mozilla +i18N Guidelines: +http://www.mozilla.org/docs/refList/i18n/l12yGuidelines.html + +ISO +639:1988 Code for the representation of names of languages +http://sunsite.berkeley.edu/amher/iso_639.html + +ISO 3166-1:1997 +Codes for the representation of names of countries and their +subdivisions Part 1: Country codes +http://www.niso.org/3166.html + +IANA +Registry of Character Sets + + + +Test plan + + + +Competitive system(s) + + + +Requirements + +Because the requirements for globalization affect many areas +of the system, we will break up the requirements into phases, with +a base required set of features, and then stages of increasing +functionality. + +Locales + +10.0 +A standard representation of locale will be used throughout +the system. A locale refers to a language and territory, and is +uniquely identified by a combination of ISO language and ISO +country abbreviations. + +
+See +Content +Repository Requirement 100.20 + +10.10 Provide a consistent +representation and API for creating and referencing a locale + +10.20 There will be a Tcl library of +locale-aware formatting and parsing functions for numbers, dates +and times. Note that Java has builtin support for these +already. + +10.30 For each locale there will be +default date, number and currency formats. +
+ +
Associating a Locale with a Request + +20.0 +The request processor must have a mechanism for associating a +locale with each request. This locale is then used to select the +appropriate template for a request, and will also be passed as the +locale argument to the message catalog or locale-specific +formatting functions. + +
+20.10 The locale for a request should be +computed by the following method, in descending order of +priority: + + + +get locale associated with subsite or package id + + + +get locale from user preference + + + +get locale from site wide default + +20.20 An API will be provided for +getting the current request locale from the +ad_conn structure. + + +
+ +
Resource Bundles / Content Repository + +30.0 +A mechanism must be provided for a developer to group a set +of arbitrary content resources together, keyed by a unique +identifier and a locale. + +For example, what approaches could be used to implement a +localizable nav-bar mechanism for a site? A navigation bar might be +made up of a set of text strings and graphics, where the graphics +themselves are locale-specific, such as images of English or +Japanese text (as on www.arsdigita.com). It should be easy to +specify alternate configurations of text and graphics to lay out +the page for different locales. + +Design note: Alternative mechanisms to implement this +functionality might include using templates, Java ResourceBundles, +content-item containers in the Content Repository, or some +convention assigning a common prefix to key strings in the message +catalog. + +Message Catalog for String Translation + +40.0 +A message catalog facility will provide a database of +translations for constant strings for multilingual applications. It +must support the following: + +
+40.10 Each message will referenced via +unique a key. + +40.20 The key for a message will have +some hierarchical structure to it, so that sets of messages can be +grouped with respect to a module name or package path. + +40.30 The API for lookup of a message +will take a locale and message key as arguments, and return the +appropriate translation of that message for the specifed +locale. + +40.40 The API for lookup of a message +will accept an optional default string which can be used if the +message key is not found in the catalog. This lets the developer +get code working and tested in a single language before having to +initialize or update a message catalog. + +40.50 For use within templates, custom +tags which invoke the message lookup API will be provided. + +40.60 Provide a method for importing and +exporting a flat file of translation strings, in order to make it +as easy as possible to create and modify message translations in +bulk without having to use a web interface. + +40.70 Since translations may be in +different character sets, there must be provision for writing and +reading catalog files in different character sets. A mechanism must +exist for identifying the character set of a catalog file before +reading it. + +40.80 There should be a mechanism for +tracking dependencies in the message catalog, so that if a string +is modified, the other translations of that string can be flagged +as needing update. + +40.90 The message lookup must be as +efficient as possible so as not to slow down the delivery of +pages. + +Design question: Is there any reason to implement +the message catalog on top of the content repository as the +underlying storage and retrieval service, with a layer of caching +for performance? Would we get a nice user interface and version +control almost for free? +
+ +
Character Set Encoding + +Character Sets +50.0 A locale will have a primary +associated character set which is used to encode text in the +language. When given a locale, we can query the system for the +associated character set to use. + +The assumption is that we are going to use Unicode in our +database to hold all text data. Our current programming +environments (Tcl/Oracle or Java/Oracle) operate on Unicode data +internally. However, since Unicode is not yet commonly used in +browsers and authoring tools, the system must be able to read and +write other character sets. In particular, conversions to and from +Unicode will need to be explicitly performed at the following +times: + + + +Loading source files (.tcl or .adp) or content files from the +filesystem + + + +Accepting form input data from users + + + +Delivering text output to a browser + + + +Composing an email message + + + +Writing data to the filesystem + + + +Design question: Do we want to mandate that all +template files be stored in UTF8? I don't think so, because +most people don't have Unicode editors, or don't want to be +bothered with an extra step to convert files to UTF8 and back when +editing them in their favorite editor. + +Same question for script and template files, how do +we know what language and character set they are authored in? +Should we overload the filename suffix (e.g., +'.shiftjis.adp', +'.ja_JP.euc.adp')? + +The simplest design is probably just to assign a +default mapping from each locale to character a set: e.g. ja_JP +-> ShiftJIS, fr_FR -> ISO-8859-1. +++ (see new ACS/Java +notes) +++ + + + + Tcl Source File Character Set +
+ + There are two classes of Tcl files loaded by the system; + library files loaded at server startup, and page script files, + which are run on each page request. + + Should we require all Tcl files be stored as UTF8? + That seems too much of a burden on developers. + + 50.10 Tcl library files can be authored + in any character set. The system must have a way to determine the + character set before loading the files, probably from the + filename. + + 50.20 Tcl page script files can be + authored in any character set. The system must have a way to + determine the character set before loading the files, probably from + the filename. +
+
+ + + Submitted Form Data Character Set + + 50.30 Data which is submitted with a + HTTP request using a GET or POST method may be in any character + set. The system must be able to determine the encoding of the form + data and convert it to Unicode on demand. + + 50.35 The developer must be able to + override the default system choice of character set when parsing + and validating user form data. + + 50.30.10 Extra hair: In Japan and some + other Asian languages where there are multiple character set + encodings in common use, the server may need to attempt to do an + auto-detection of the character set, because buggy browsers may + submit form data in an unexpected alternate encoding. + + + + Output Character Set + +
+ 50.40 The output character set for a + page request will be determined by default by the locale associated + with the request (see requirement 20.0). + + 50.50 It must be possible for a + developer to manually override the output character set encoding + for a request using an API function. + + +
+ +
+
+ +ACS Kernel Issues + +
+60.10 All ACS error messages must use +the message catalog and the request locale to generate error +message for the appropriate locale. + +60.20 Web server error messages such as +404, 500, etc must also be delivered in the appropriate +locale. + +60.30 Where files are written or read +from disk, their filenames must use a character set and character +values which are safe for the underlying operating system. +
+ +
Templates + +
+70.0 For a given abstract URL, the +designer may create multiple locale-specific template files may be +created (one per locale or language) + +70.10 For a given page request, the +system must be able to select an approprate locale-specific +template file to use. The request locale is computed as per (see +requirement 20.0). + +Design note: this would probably be implemented by +suffixing the locale or a locale abbreviation to the template +filename, such as foo.ja.adp or foo.en_GB.adp. + +70.20A template file may be created for +a partial locale (language only, without a territory), and the +request processor should be able to find the closest match for the +current request locale. + +70.30 A template file may be created in +any character set. The system must have a way to know which +character set a template file contains, so it can properly process +it. +
+Formatting +Datasource Output in Templates + +70.50 The properties of a datasource +column may include a datatype so that the templating system can +format the output for the current locale. The datatype is defined +by a standard ACS datatype plus a format token or format string, +for example: a date column might be specified as +'current_date:date LONG,' or 'current_date:date +"YYYY-Mon-DD"' + +Forms + +
+70.60 The forms API must support +construction of locale-specific HTML form widgets, such as date +entry widgets, and form validation of user input data for +locale-specific data, such as dates or numbers. + +70.70 For forms which allow users to +upload files, a standard method for a user to indicate the charset +of a text file being uploaded must be provided. + +Design note: this presumably applies to uploading +data to the content repository as well +
+ +
Sorting and Searching + +
+80.10 Support API for correct collation +(sorting order) on lists of strings in locale-dependent way. + +80.20 For the Tcl API, we will say that +locale-dependent sorting will use Oracle SQL operations (i.e., we +won't provide a Tcl API for this). We require a Tcl API +function to return the correct incantation of NLS_SORT to use for a +given locale with ORDER BY clauses in +queries. + +80.40 The system must handle full-text +search in any supported language. +
+ +
Time Zones + +
+90.10 Provide API support for specifying +a time zone + +90.20 Provide an API for computing time +and date operations which are aware of timezones. So for example a +calendar module can properly synchronize items inserted into a +calendar from users in different time zones using their own local +times. + +90.30 Store all dates and times in +universal time zone, UTC. + +90.40 For a registered users, a time +zone preference should be stored. + +90.50 For a non-registered user a time +zone preference should be attached via a session or else UTC should +be used to display every date and time. + +90.60 The default if we can't +determine a time zone is to display all dates and times in some +universal time zone such as GMT. +
+ +
Database + +
+100.10 Since UTF8 strings can use up to +three (UCS2) or six (UCS4) bytes per character, make sure that +column size declarations in the schema are large enough to +accomodate required data (such as email addresses in +Japanese). +
+ +
Email and +Messaging + +When sending an email message, just as when delivering the +content in web page over an HTTP connection, it is necessary to be +able to specify what character set encoding to use. + +
+110.10 The email message sending API +will allow for a character set encoding to be specified. + +110.20 The email accepting API will +allow for character set to be parsed correctly (hopefully a well +formatted message will have a MIME character set content type header) +
+ +
+ + Implementation Notes + + Because globalization touches many different parts of the system, + we want to reduce the implementation risk by breaking the + implementation into phases. + + + + + Revision History + + + + + + Document Revision # + Action Taken, Notes + When? + By Whom? + + + + 0.4 + converting from HTML to DocBook and importing the document to the OpenACS + kernel documents. This was done as a part of the internationalization of + OpenACS and .LRN for the Heidelberg University in Germany + 12 September 2002 + Peter Marklund + + + + 0.3 + comments from Christian + 1/14/2000 + Henry Minsky + + + + 0.2 + Minor typos fixed, clarifications to wording + 11/14/2000 + Henry Minsky + + + + 0.1 + Creation + 11/08/2000 + Henry Minsky + + + + + + + + +