Message extraction is the process of parsing the source files searching for strings that need to be translated, and generating a translation template file (POT) that contains those strings. Translators can use the generated POT file to create a translation file (PO) for a specific locale.

Generating Translation Templates

When you only want to update the translations (i.e. you have good messages.json files described below), you can run the source/tools/i18n/updateTemplates.py script to generate the defined translation template files (POT). Each file is generated on the l10n folder where it is defined, with the name specified in the messages.json file.

When you updated the POT files locally, be sure to pull the existing PO files from Transifex as backup before you upload the POT (using the pullTranslations.py tool). As the PO and POT files are under version control, you can see the difference in the SVN diff output. If you have assured yourself that the changes to the POT files are sensible, you can just commit all POT and PO files. Transifex is configured to pull the changes to the POT files from our SVN once a day. So the translators will receive your updates with a small delay.

Configuring the Generation of Translation Templates

To configure the generation of one or more translation template files (POT) from a set of source files, you must:

  • Locate or create a folder named l10n that will contain the generated translation template file, as well as any translation file created from that template.
  • Edit or create a messages.json file within your l10n folder that defines how to generate your translation template file or files.

The l10n Folder

Our convention is to save translation template files (POT) and translation files (PO) into folders named l10n. Specifically:

  • binaries/data/l10n contains translation template files and translation files generated from the sources of the game engine. Currently this folder contains only one translation template file, engine.pot, but in the future it might contain other translation template files generated from files in the sources/ folder, such as editor.pot (for Atlas).
  • binaries/data/mods/<mod name>/l10n contains translation template files and translation files generated from the sources of the <mod name> mod. For example, binaries/data/mods/public/l10n contains the translation template files and translation files of the main mod.

Determine which of these folders should contain the translation template file that you want to generate, and create the target l10n folder if it does not exist yet.

The messages.json File

Each `l10n` folder must contain a JSON file named messages.json. This file defines the translation template files that must be generated in the l10n folder.

The messages.json file must contain a JSON array of objects:

[
    {},
    {},
    …
]

Each object represents a translation template file, and must provide the following properties:

{
    "output": "",
    "inputRoot": "",
    "project": "",
    "copyrightHolder": "",
    "rules": []
}

See below for detailed information on how to fill each property.

Note: Currently all messages.json files define a single translation template file, but we support generating multiple files.

output

The output property must contain the file name of the translation template file. For example:

"output": "public.pot",

This would save the generated translation template file to l10n/public.pot.

When you run the game, the game mounts the content of all l10n folders into the same virtual folder. Unlike the rest of the content mounted in the game virtual filesystem, translation files are not meant to be overwritten by mods; to avoid this, we use the following naming convention in the l10n folder of mods:

  • Translation template file: <mod name>.pot. For example: public.po.
  • Translation file: <locale code>.<mod name>.po. For example: de.public.po.

Note: If a mod has more than one translation template file, you can append something else to the mod name. For example: public.civilizations.pot, public.gui.pot, public.units.pot, and so on.

inputRoot

The inputRoot property must contain the path to a folder that contains all the files that would be parsed to generate the translation template file, relative to the l10n folder. For mods, that should always be:

"inputRoot": "..",

When you later need to specify file masks to find the files to parse, you can specify those file masks relative to the path that you specified here.

project and copyrightHolder

The project and copyrightHolder properties are required as well. They impact the header of the resulting translation template file. For example:

"project": "0 A.D. — Empires Ascendant",
"copyrightHolder": "Wildfire Games",

Would results in a translation template file with the following header:

# Translation template for 0 A.D. — Empires Ascendant.
# Copyright © 2014 Wildfire Games
# This file is distributed under the same license as the 0 A.D. — Empires
# Ascendant project.
#
msgid ""
msgstr ""
"Project-Id-Version: 0 A.D. — Empires Ascendant\n"
"Report-Msgid-Bugs-To: EMAIL@ADDRESS\n"
"POT-Creation-Date: 2014-04-19 15:52+0200\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Potter 1.0\n"

rules

The rules property contains an array of objects:

"rules": [
    {},
    {},
    …
]

Each of those objects defines a set of properties that specify how to extract messages from a set of files:

{
    "extractor": "",
    "filemasks": [],
    "options": {}
}

See below for detailed information on how to fill each property.

extractor

The extractor property is the name of a extractor class, as defined in the source/tools/i18n/potter/extractors.py Python script.

The currently supported extractors are:

  • cpp, to extract messages from C++ source files.
  • ini, to extract messages from INI source files.
  • javascript, to extract messages from JavaScript source files.
  • json, to extract messages from JSON data files.
  • txt, to extract messages from plain text data files.
  • xml, to extract messages from XML data files.

They should be enough, but if they ever aren’t, it should be fairly easy for a Python programmer to create a new extractor.

filemasks

The filemasks property must contain an array of strings. Each string should be a match to find the files that the extractor should parse, relative to the specified inputRoot. For example:

"filemasks": [
    "maps/scenarios/**.xml",
    "maps/skirmishes/**.xml"
],

If you also need to specify exclusion file masks, you can alternatively use an object instead of an array as in the following example:

"filemasks": {
    "includeMasks": ["**.cpp"],
    "excludeMasks": ["third_party/**", "tools/**"]
},

options

The options object contains properties that specify parameters to pass to the extractor. Some of those options are specific to one or another extractor, while other options apply to all extractors.

You can use any of the following options:

keywords

This is the most important option. Every extractor but txt supports this option, but the option is defined differently for each extractor, and it works differently as well.

If you are defining the settings of a ini extractor, the keywords property is an array where each item is the name of an key in the target INI files. The extractor extracts the values of those keys. For example:

"keywords": [
    "name",
    "version.name",
],

When you are defining the settings of a javascript or cpp extractor, the keywords property is an object where each property name is the name of a function, and its value is an array that defines the parameters of the function, where the context parameter (if any) is an array itself to tell it apart from the other parameters. For example:

"keywords": {
    "Translate": [1],
    "TranslatePlural": [1, 2],
    "TranslateWithContext": [[1], 2],
    "TranslatePluralWithContext": [[1], 2, 3]
},

When you are defining the settings of an xml extractor, the keywords property is an object where each property name is the name of an XML element. The extractor extracts the text within the start and end tags of XML elements with the specified name. The value of the property is an object of optional settings (you can provide an empty object). For example:

"keywords": {
    "Name": {},
    "Description": {}
}

From <Name>Jane</Name>, the settings above would make the extractor extract Jane.

The settings object may contain any of the following settings:

  • extractJson. This property allows you to extract messages from a JSON string defined within an XML element. The value of this property should be an object of options for the json extractor. For example, from <Name>{ "female": "Jane", "male": "John" }</Name>, "keywords": { "Name": { "extractJson": { "keywords": ["female", "male"] } } } would make the extractor extract both Jane and John.
  • customContext. Add a custom context to all extracted strings with this keyword.
  • locationAttributes. This property allows to specify a list of XML attributes that may be helpful to identify the source location of the extracted message. These XML attributes, if found, are appended to the source path of the extracted message in the translation template file. If nothing else, the value of some attributes may help translators understand the context of the message. For example, for some languages it is important to know whether the message comes from a tooltip or a caption; if you use "locationAttributes": ["id"] when parsing GUI XML files, translators will see the message source as path/to/gui/file.xml:123 (caption) and not just path/to/gui/file.xml:123.

When you are defining the settings of a json extractor, the keywords property is an array with the names of the JSON properties to extract from the JSON file. For example:

"keywords": {
    "Name": {},
    "Description": {}
}

When the extractor finds a property in a JSON file with a name from the list above, it checks the type of the property value and acts accordingly:

  • If the value is a string, the extractor extracts that string. For example, from "Name": "Jane", it extracts Jane.
  • If the value is an array, the extractor extracts every item from the list that is a string. For example, from "Name": ["Jane", "John"], it extracts both Jane and John.
  • If the value is an object, the extractor extracts the string with keyword "_string". Furthermore one can set the options tagAsContext and customContext similar to the xml.
  • If the setting extractFromInnerKeys is set to true, then the value of every property of the object will be extracted, using the previous rules. For example, from "Name": { "female": "Jane", "male": "John" }, it extracts both Jane and John.
commentTags

This property only works with the javascript and cpp extractors.

It lets you define a list of strings, “tags”. When the extractors finds a source comment below a translatable string, and that source comment uses one of the specified tags as a prefix, the extractor adds that source comment to the translation template file. For example:

"commentTags": [
    "Translation:"
]

Makes the extractor include the following source comment (This string is used in X context.) along with the extracted message (Jane):

// Translation: This string is used in X context.
name = translate("Jane");
format

The format property lets you override the string format flag (such as c-format or python-format) that the extractor may determine that the extracted messages are using.

If you do not trust the extractor guessing the format, or you just know for sure that all the extracted strings have a particular format, you can define the format using the property. See The Format of PO Files for a list of format identifiers.

If you want to force the extracted messages not to have a format identifier, specify the special format identifier none as the value of the format property.

To extract the actual messages with the given rules, please see the top of the page.

Last modified 2 years ago Last modified on Feb 19, 2022, 9:25:44 PM
Note: See TracWiki for help on using the wiki.