Changes between Initial Version and Version 1 of Message_Extraction


Ignore:
Timestamp:
Apr 19, 2014, 7:20:00 PM (10 years ago)
Author:
Adrián Chaves
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Message_Extraction

    v1 v1  
     1[[PageOutline(1-100, Table of Contents)]]
     2
     3'''Message extraction''' is the process of parsing the source files searching for strings that need to be translated, and generating a translation template file (POT) that contains those strings. Translators can use the generated POT file to create a translation file (PO) for a specific locale.
     4
     5= Configuring the Generation of Translation Templates =
     6
     7To configure the generation of one or more translation template files (POT) from a set of source files, you must:
     8* Locate or create a folder named `l10n` that will contain the generated translation template file, as well as any translation file created from that template.
     9* Edit or create a `messages.json` file within your `l10n` folder that defines how to generate your translation template file or files.
     10
     11== The `l10n` Folder ==
     12
     13Our convention is to save translation template files (POT) and translation files (PO) into folders named `l10n`. Specifically:
     14
     15* `binaries/data/l10n` contains translation template files and translation files generated from the sources of the game engine. Currently this folder contains only one translation template file, `engine.pot`, but in the future it might contain other translation template files generated from files in the `sources/` folder, such as `editor.pot` (for Atlas).
     16
     17* `binaries/data/mods/<mod name>/l10n` contains translation template files and translation files generated from the sources of the <mod name> mod. For example, `binaries/data/mods/public/l10n` contains the translation template files and translation files of the main mod.
     18
     19Determine which of these folders should contain the translation template file that you want to generate, and create the target `l10n` folder if it does not exist yet.
     20
     21== The `messages.json` File ==
     22
     23Each [#Thel10nFolder `l10n` folder] must contain a JSON file named `messages.json`. This file defines the translation template files that must be generated in the `l10n` folder.
     24
     25The `messages.json` file must contain a JSON array of objects:
     26
     27{{{
     28[
     29    {},
     30    {},
     31    …
     32]
     33}}}
     34
     35Each object represents a translation template file, and must provide the following properties:
     36
     37{{{
     38{
     39    "output": "",
     40    "inputRoot": "",
     41    "project": "",
     42    "copyrightHolder": "",
     43    "rules": []
     44}
     45}}}
     46
     47See below for detailed information on how to fill each property.
     48
     49    '''Note:''' Currently all `messages.json` files define a single translation template file, but we support generating multiple files.
     50
     51=== `output` ===
     52
     53The `output` property must contain the file name of the translation template file. For example:
     54
     55{{{
     56"output": "public.pot",
     57}}}
     58
     59This would save the generated translation template file to `l10n/public.pot`.
     60
     61When you run the game, the game mounts the content of all `l10n` folders into the same virtual folder. Unlike the rest of the content mounted in the game virtual filesystem, translation files are not meant to be overwritten by mods; to avoid this, we use the following naming convention in the `l10n` folder of mods:
     62* Translation template file: `<mod name>.pot`. For example: `public.po`.
     63* Translation file: `<locale code>.<mod name>.po`. For example: `de.public.po`.
     64
     65    '''Note:''' If a mod has more than one translation template file, you can append something else to the mod name. For example: `public.civilizations.pot`, `public.gui.pot`, `public.units.pot`, and so on.
     66
     67=== `inputRoot` ===
     68
     69The `inputRoot` property must contain the path to a folder that contains all the files that would be parsed to generate the translation template file, relative to the `l10n` folder. For mods, that should always be:
     70
     71{{{
     72"inputRoot": "..",
     73}}}
     74
     75When you later need to specify file masks to find the files to parse, you can specify those file masks relative to the path that you specified here.
     76
     77=== `project` and `copyrightHolder` ===
     78
     79The `project` and `copyrightHolder` properties are required as well. They impact the header of the resulting translation template file. For example:
     80
     81{{{
     82"project": "0 A.D. — Empires Ascendant",
     83"copyrightHolder": "Wildfire Games",
     84}}}
     85
     86Would results in a translation template file with the following header:
     87
     88{{{
     89# Translation template for 0 A.D. — Empires Ascendant.
     90# Copyright © 2014 Wildfire Games
     91# This file is distributed under the same license as the 0 A.D. — Empires
     92# Ascendant project.
     93#
     94msgid ""
     95msgstr ""
     96"Project-Id-Version: 0 A.D. — Empires Ascendant\n"
     97"Report-Msgid-Bugs-To: EMAIL@ADDRESS\n"
     98"POT-Creation-Date: 2014-04-19 15:52+0200\n"
     99"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
     100"MIME-Version: 1.0\n"
     101"Content-Type: text/plain; charset=utf-8\n"
     102"Content-Transfer-Encoding: 8bit\n"
     103"Generated-By: Potter 1.0\n"
     104}}}
     105
     106=== `rules` ===
     107
     108The `rules` property contains an array of objects:
     109
     110{{{
     111"rules": [
     112    {},
     113    {},
     114    …
     115]
     116}}}
     117
     118Each of those objects defines a set of properties that specify how to extract messages from a set of files:
     119
     120{{{
     121{
     122    "extractor": "",
     123    "filemasks": [],
     124    "options": {}
     125}
     126}}}
     127
     128See below for detailed information on how to fill each property.
     129
     130==== `extractor` ====
     131
     132The `extractor` property is the name of a extractor class, as defined in the `source/tools/i18n/potter/extractors.py` Python script.
     133
     134The currently supported extractors are:
     135* '''cpp''', to extract messages from C++ source files.
     136* '''javascript''', to extract messages from JavaScript source files.
     137* '''json''', to extract messages from JSON data files.
     138* '''txt''', to extract messages from plain text data files.
     139* '''xml''', to extract messages from XML data files.
     140
     141They should be enough, but if they ever aren’t, it should be fairly easy for a Python programmer to create a new extractor.
     142
     143==== `filemasks` ====
     144
     145The `filemasks` property must contain an array of strings. Each string should be a match to find the files that the extractor should parse, relative to the specified `inputRoot`. For example:
     146
     147{{{
     148"filemasks": [
     149    "maps/scenarios/**.xml",
     150    "maps/skirmishes/**.xml"
     151],
     152}}}
     153
     154If you also need to specify exclusion file masks, you can alternatively use an object instead of an array as in the following example:
     155
     156{{{
     157"filemasks": {
     158    "includeMasks": ["**.cpp"],
     159    "excludeMasks": ["third_party/**", "tools/**"]
     160},
     161}}}
     162
     163==== `options` ====
     164
     165The `options` object contains properties that specify parameters to pass to the extractor. Some of those options are specific to one or another extractor, while other options apply to all extractors.
     166
     167You can use any of the following options:
     168
     169===== `keywords` =====
     170
     171This is the most important option. Every extractor but '''txt''' supports this option, but the option is defined differently for each extractor, and it works differently as well.
     172
     173When you are defining the settings of a '''javascript''' or '''cpp''' extractor, the `keywords` property is an object where each property name is the name of a function, and its value is an array that defines the parameters of the function, where the context parameter (if any) is an array itself to tell it apart from the other parameters. For example:
     174
     175{{{
     176"keywords": {
     177    "Translate": [1],
     178    "TranslatePlural": [1, 2],
     179    "TranslateWithContext": [[1], 2],
     180    "TranslatePluralWithContext": [[1], 2, 3]
     181},
     182}}}
     183
     184When you are defining the settings of a '''json''' extractor, the `keywords` property is an array with the names of the JSON properties to extract from the JSON file. For example:
     185
     186{{{
     187"keywords": [
     188    "Name",
     189    "Description"
     190]
     191}}}
     192
     193When the extractor finds a property in a JSON file with a name from the list above, it checks the type of the property value and acts accordingly:
     194* If the value is a string, the extractor extracts that string. For example, from `"Name": "Jane"`, it extracts `Jane`.
     195* If the value is an array, the extractor extracts every item from the list that is a string. For example, from `"Name": ["Jane", "John"]`, it extracts both `Jane` and `John`.
     196* If the value is an object, the extractor extracts the value of every property of the object that is a string. For example, from `"Name": { "female": "Jane", "male": "John" }`, it extracts both `Jane` and `John`.
     197
     198When you are defining the settings of an '''xml''' extractor, the `keywords` property is an object where each property name is the name of an XML element. The extractor extracts the text within the start and end tags of XML elements with the specified name. The value of the property is an object of optional settings (you can provide an empty object). For example:
     199
     200{{{
     201"keywords": {
     202    "Name": {},
     203    "Description": {}
     204}
     205}}}
     206
     207From `<Name>Jane</Name>`, the settings above would make the extractor extract `Jane`.
     208
     209The settings object may contain any of the following settings:
     210
     211* '''extractJson'''. This property allows you to extract messages from a JSON string defined within an XML element. The value of this property should be an object of options for the '''json''' extractor. For example, from `<Name>{ "female": "Jane", "male": "John" }</Name>`, `"keywords": { "Name": { "extractJson": { "keywords": ["female", "male"] } } }` would make the extractor extract both `Jane` and `John`.
     212
     213* '''locationAttributes'''. This property allows to specify a list of XML attributes that may be helpful to identify the source location of the extracted message. These XML attributes, if found, are appended to the source path of the extracted message in the translation template file. If nothing else, the value of some attributes may help translators understand the context of the message. For example, for some languages it is important to know whether the message comes from a tooltip or a caption; if you use `"locationAttributes": ["id"]` when parsing GUI XML files, translators will see the message source as `path/to/gui/file.xml:123 (caption)` and not just `path/to/gui/file.xml:123`.
     214
     215===== `commentTags` =====
     216
     217This property only works with the '''javascript''' and '''cpp''' extractors.
     218
     219It lets you define a list of strings, “tags”. When the extractors finds a source comment below a translatable string, and that source comment uses one of the specified tags as a prefix, the extractor adds that source comment to the translation template file. For example:
     220
     221{{{
     222"commentTags": [
     223    "Translation:"
     224]
     225}}}
     226
     227Makes the extractor include the following source comment (`This string is used in X context.`) along with the extracted message (`Jane`):
     228
     229{{{
     230// Translation: This string is used in X context.
     231name = translate("Jane");
     232}}}
     233
     234===== `format` =====
     235
     236The `format` property lets you override the string format flag (such as `c-format` or `python-format`) that the extractor may determine that the extracted messages are using.
     237
     238If you do not trust the extractor ''guessing'' the format, or you just know for sure that all the extracted strings have a particular format, you can define the format using the property. See [https://www.gnu.org/software/gettext/manual/html_node/PO-Files.html The Format of PO Files] for a list of format identifiers.
     239
     240If you want to force the extracted messages ''not'' to have a format identifier, specify the special format identifier `none` as the value of the `format` property.
     241
     242= Generating Translation Templates =
     243
     244Once you have the `messages.json` files in their `l10n` folders, you can run the `source/tools/i18n/updateTemplates.py` script to generate the defined translation template files. Each file is generated on the `l10n` folder where it is defined, with the name specified in the `messages.json` file.