Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of Message_Extraction

Timestamp:: Apr 19, 2014, 7:20:00 PM (10 years ago)
Author:: Adrián Chaves
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

Message_Extraction

               v1
+[[PageOutline(1-100, Table of Contents)]]
+'''Message extraction''' is the process of parsing the source files searching for strings that need to be translated, and generating a translation template file (POT) that contains those strings. Translators can use the generated POT file to create a translation file (PO) for a specific locale.
+= Configuring the Generation of Translation Templates =
+To configure the generation of one or more translation template files (POT) from a set of source files, you must:
+* Locate or create a folder named `l10n` that will contain the generated translation template file, as well as any translation file created from that template.
+* Edit or create a `messages.json` file within your `l10n` folder that defines how to generate your translation template file or files.
+== The `l10n` Folder ==
+Our convention is to save translation template files (POT) and translation files (PO) into folders named `l10n`. Specifically:
+* `binaries/data/l10n` contains translation template files and translation files generated from the sources of the game engine. Currently this folder contains only one translation template file, `engine.pot`, but in the future it might contain other translation template files generated from files in the `sources/` folder, such as `editor.pot` (for Atlas).
+* `binaries/data/mods/<mod name>/l10n` contains translation template files and translation files generated from the sources of the <mod name> mod. For example, `binaries/data/mods/public/l10n` contains the translation template files and translation files of the main mod.
+Determine which of these folders should contain the translation template file that you want to generate, and create the target `l10n` folder if it does not exist yet.
+== The `messages.json` File ==
+Each [#Thel10nFolder `l10n` folder] must contain a JSON file named `messages.json`. This file defines the translation template files that must be generated in the `l10n` folder.
+The `messages.json` file must contain a JSON array of objects:
+{{{
+[
+    {},
+    {},
+    …
+]
+}}}
+Each object represents a translation template file, and must provide the following properties:
+{{{
+{
+    "output": "",
+    "inputRoot": "",
+    "project": "",
+    "copyrightHolder": "",
+    "rules": []
+}
+}}}
+See below for detailed information on how to fill each property.
+    '''Note:''' Currently all `messages.json` files define a single translation template file, but we support generating multiple files.
+=== `output` ===
+The `output` property must contain the file name of the translation template file. For example:
+{{{
+"output": "public.pot",
+}}}
+This would save the generated translation template file to `l10n/public.pot`.
+When you run the game, the game mounts the content of all `l10n` folders into the same virtual folder. Unlike the rest of the content mounted in the game virtual filesystem, translation files are not meant to be overwritten by mods; to avoid this, we use the following naming convention in the `l10n` folder of mods:
+* Translation template file: `<mod name>.pot`. For example: `public.po`.
+* Translation file: `<locale code>.<mod name>.po`. For example: `de.public.po`.
+    '''Note:''' If a mod has more than one translation template file, you can append something else to the mod name. For example: `public.civilizations.pot`, `public.gui.pot`, `public.units.pot`, and so on.
+=== `inputRoot` ===
+The `inputRoot` property must contain the path to a folder that contains all the files that would be parsed to generate the translation template file, relative to the `l10n` folder. For mods, that should always be:
+{{{
+"inputRoot": "..",
+}}}
+When you later need to specify file masks to find the files to parse, you can specify those file masks relative to the path that you specified here.
+=== `project` and `copyrightHolder` ===
+The `project` and `copyrightHolder` properties are required as well. They impact the header of the resulting translation template file. For example:
+{{{
+"project": "0 A.D. — Empires Ascendant",
+"copyrightHolder": "Wildfire Games",
+}}}
+Would results in a translation template file with the following header:
+{{{
+# Translation template for 0 A.D. — Empires Ascendant.
+# Copyright © 2014 Wildfire Games
+# This file is distributed under the same license as the 0 A.D. — Empires
+# Ascendant project.
+#
+msgid ""
+msgstr ""
+"Project-Id-Version: 0 A.D. — Empires Ascendant\n"
+"Report-Msgid-Bugs-To: EMAIL@ADDRESS\n"
+"POT-Creation-Date: 2014-04-19 15:52+0200\n"
+"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
+"MIME-Version: 1.0\n"
+"Content-Type: text/plain; charset=utf-8\n"
+"Content-Transfer-Encoding: 8bit\n"
+"Generated-By: Potter 1.0\n"
+}}}
+=== `rules` ===
+The `rules` property contains an array of objects:
+{{{
+"rules": [
+    {},
+    {},
+    …
+]
+}}}
+Each of those objects defines a set of properties that specify how to extract messages from a set of files:
+{{{
+{
+    "extractor": "",
+    "filemasks": [],
+    "options": {}
+}
+}}}
+See below for detailed information on how to fill each property.
+==== `extractor` ====
+The `extractor` property is the name of a extractor class, as defined in the `source/tools/i18n/potter/extractors.py` Python script.
+The currently supported extractors are:
+* '''cpp''', to extract messages from C++ source files.
+* '''javascript''', to extract messages from JavaScript source files.
+* '''json''', to extract messages from JSON data files.
+* '''txt''', to extract messages from plain text data files.
+* '''xml''', to extract messages from XML data files.
+They should be enough, but if they ever aren’t, it should be fairly easy for a Python programmer to create a new extractor.
+==== `filemasks` ====
+The `filemasks` property must contain an array of strings. Each string should be a match to find the files that the extractor should parse, relative to the specified `inputRoot`. For example:
+{{{
+"filemasks": [
+    "maps/scenarios/**.xml",
+    "maps/skirmishes/**.xml"
+],
+}}}
+If you also need to specify exclusion file masks, you can alternatively use an object instead of an array as in the following example:
+{{{
+"filemasks": {
+    "includeMasks": ["**.cpp"],
+    "excludeMasks": ["third_party/**", "tools/**"]
+},
+}}}
+==== `options` ====
+The `options` object contains properties that specify parameters to pass to the extractor. Some of those options are specific to one or another extractor, while other options apply to all extractors.
+You can use any of the following options:
+===== `keywords` =====
+This is the most important option. Every extractor but '''txt''' supports this option, but the option is defined differently for each extractor, and it works differently as well.
+When you are defining the settings of a '''javascript''' or '''cpp''' extractor, the `keywords` property is an object where each property name is the name of a function, and its value is an array that defines the parameters of the function, where the context parameter (if any) is an array itself to tell it apart from the other parameters. For example:
+{{{
+"keywords": {
+    "Translate": [1],
+    "TranslatePlural": [1, 2],
+    "TranslateWithContext": [[1], 2],
+    "TranslatePluralWithContext": [[1], 2, 3]
+},
+}}}
+When you are defining the settings of a '''json''' extractor, the `keywords` property is an array with the names of the JSON properties to extract from the JSON file. For example:
+{{{
+"keywords": [
+    "Name",
+    "Description"
+]
+}}}
+When the extractor finds a property in a JSON file with a name from the list above, it checks the type of the property value and acts accordingly:
+* If the value is a string, the extractor extracts that string. For example, from `"Name": "Jane"`, it extracts `Jane`.
+* If the value is an array, the extractor extracts every item from the list that is a string. For example, from `"Name": ["Jane", "John"]`, it extracts both `Jane` and `John`.
+* If the value is an object, the extractor extracts the value of every property of the object that is a string. For example, from `"Name": { "female": "Jane", "male": "John" }`, it extracts both `Jane` and `John`.
+When you are defining the settings of an '''xml''' extractor, the `keywords` property is an object where each property name is the name of an XML element. The extractor extracts the text within the start and end tags of XML elements with the specified name. The value of the property is an object of optional settings (you can provide an empty object). For example:
+{{{
+"keywords": {
+    "Name": {},
+    "Description": {}
+}
+}}}
+From `<Name>Jane</Name>`, the settings above would make the extractor extract `Jane`.
+The settings object may contain any of the following settings:
+* '''extractJson'''. This property allows you to extract messages from a JSON string defined within an XML element. The value of this property should be an object of options for the '''json''' extractor. For example, from `<Name>{ "female": "Jane", "male": "John" }</Name>`, `"keywords": { "Name": { "extractJson": { "keywords": ["female", "male"] } } }` would make the extractor extract both `Jane` and `John`.
+* '''locationAttributes'''. This property allows to specify a list of XML attributes that may be helpful to identify the source location of the extracted message. These XML attributes, if found, are appended to the source path of the extracted message in the translation template file. If nothing else, the value of some attributes may help translators understand the context of the message. For example, for some languages it is important to know whether the message comes from a tooltip or a caption; if you use `"locationAttributes": ["id"]` when parsing GUI XML files, translators will see the message source as `path/to/gui/file.xml:123 (caption)` and not just `path/to/gui/file.xml:123`.
+===== `commentTags` =====
+This property only works with the '''javascript''' and '''cpp''' extractors.
+It lets you define a list of strings, “tags”. When the extractors finds a source comment below a translatable string, and that source comment uses one of the specified tags as a prefix, the extractor adds that source comment to the translation template file. For example:
+{{{
+"commentTags": [
+    "Translation:"
+]
+}}}
+Makes the extractor include the following source comment (`This string is used in X context.`) along with the extracted message (`Jane`):
+{{{
+// Translation: This string is used in X context.
+name = translate("Jane");
+}}}
+===== `format` =====
+The `format` property lets you override the string format flag (such as `c-format` or `python-format`) that the extractor may determine that the extracted messages are using.
+If you do not trust the extractor ''guessing'' the format, or you just know for sure that all the extracted strings have a particular format, you can define the format using the property. See [https://www.gnu.org/software/gettext/manual/html_node/PO-Files.html The Format of PO Files] for a list of format identifiers.
+If you want to force the extracted messages ''not'' to have a format identifier, specify the special format identifier `none` as the value of the `format` property.
+= Generating Translation Templates =
+Once you have the `messages.json` files in their `l10n` folders, you can run the `source/tools/i18n/updateTemplates.py` script to generate the defined translation template files. Each file is generated on the `l10n` folder where it is defined, with the name specified in the `messages.json` file.