Gettext

优质

小牛编辑

190浏览

2023-12-01

安装

首先，你可能需要通过包管理器（例如 apt-get 或 yum）来安装 Gettext 以及相关的 PHP 库。安装之后，在 php.ini 文件中添加 extension=gettext.so(Linux/Unix) 或者 extension=php_gettext.dll (Windows) 以启用 Gettext。

我们还需要利用 Poedit 来创建翻译文件。一般通过系统自带的包管理器即可安装；针对 Unix、Mac 和 Windows 系统都有对应的版本，也可以在其官方网站下载。

Structure

Types of files

There are three files you usually deal with while working with gettext. The main ones are PO (Portable Object) and MO (Machine Object) files, the first being a list of readable “translated objects” and the second, the corresponding binary to be interpreted by gettext when doing localization. There’s also a POT (Template) file, that simply contains all existing keys from your source files, and can be used as a guide to generate and update all PO files. Those template files are not mandatory: depending on the tool you’re using to do l10n, you can go just fine with only PO/MO files. You’ll always have one pair of PO/MO files per language and region, but only one POT per domain.

Domains

There are some cases, in big projects, where you might need to separate translations when the same words convey different meaning given a context. In those cases, you split them into different domains. They’re basically named groups of POT/PO/MO files, where the filename is the said translation domain. Small and medium-sized projects usually, for simplicity, use only one domain; its name is arbitrary, but we will be using “main” for our code samples.
In Symfony projects, for example, domains are used to separate the translation for validation messages.

Locale code

A locale is simply a code that identifies one version of a language. It’s defined following the ISO 639-1 and ISO 3166-1 alpha-2 specs: two lower-case letters for the language, optionally followed by an underline and two upper-case letters identifying the country or regional code. For rare languages, three letters are used.

For some speakers, the country part may seem redundant. In fact, some languages have dialects in different countries, such as Austrian German (de_AT) or Brazilian Portuguese (pt_BR). The second part is used to distinguish between those dialects - when it’s not present, it’s taken as a “generic” or “hybrid” version of the language.

目录结构

为了使用 Gettext，我们需要创建一个特定的目录结构。首先，你需要在源码仓库中选择任意一个目录作为 l10n 文件的根目录。在这个目录中为每一种本地化语言单独创建一个目录，另外，还要创建一个 LC_MESSAGES 目录存放 PO/MO 文件。例如：

├─ src/

├─ templates/

└─ locales/

├─ forum.pot

├─ site.pot

├─ de/

│ └─ LC_MESSAGES/

│ ├─ forum.mo

│ ├─ forum.po

│ ├─ site.mo

│ └─ site.po

├─ es_ES/

│ └─ LC_MESSAGES/

│ └─ ...

├─ fr/

│ └─ ...

├─ pt_BR/

│ └─ ...

└─ pt_PT/

└─ ...

Plural forms

As we said in the introduction, different languages might sport different plural rules. However, gettext saves us from this trouble once again. When creating a new .po file, you’ll have to declare the plural rules for that language, and translated pieces that are plural-sensitive will have a different form for each of those rules. When calling Gettext in code, you’ll have to specify the number related to the sentence, and it will work out the correct form to use - even using string substitution if needed.

Plural rules include the number of plurals available and a boolean test with n that would define in which rule the given number falls (starting the count with 0). For example:

Japanese: nplurals=1; plural=0 - only one rule

English: nplurals=2; plural=(n != 1); - two rules, first if N is one, second rule otherwise

Brazilian Portuguese: nplurals=2; plural=(n > 1); - two rules, second if N is bigger than one, first otherwise

Now that you understood the basis of how plural rules works - and if you didn’t, please look at a deeper explanation on the LingoHub tutorial -, you might want to copy the ones you need from a list instead of writing them by hand.

When calling out Gettext to do localization on sentences with counters, you’ll have to give him the related number as well. Gettext will work out what rule should be in effect and use the correct localized version. You will need to include in the .po file a different sentence for each plural rule defined.

Sample implementation

After all that theory, let’s get a little practical. Here’s an excerpt of a .po file - don’t mind with its format, but instead the overall content, you’ll learn how to edit it easily later:

msgid "" msgstr "" "Language: pt_BR\n" "Content-Type: text/plain; charset=UTF-8\n" "Plural-Forms: nplurals=2; plural=(n > 1);\n" msgid "We're now translating some strings" msgstr "Nós estamos traduzindo algumas strings agora" msgid "Hello %1$s! Your last visit was on %2$s" msgstr "Olá %1$s! Sua última visita foi em %2$s" msgid "Only one unread message" msgid_plural "%d unread messages" msgstr[0] "Só uma mensagem não lida" msgstr[1] "%d mensagens não lidas"

The first section works like a header, having the msgid and msgstrespecially empty. It describes the file encoding, plural forms and other things that are less relevant. The second section translates a simple string from English to Brazilian Portuguese, and the third does the same, but leveraging string replacement from sprintf so the translation may contain the user name and visit date.
The last section is a sample of pluralization forms, displaying the singular and plural version as msgid in English and their corresponding translations as msgstr 0 and 1 (following the number given by the plural rule). There, string replacement is used as well so the number can be seen directly in the sentence, by using %d. The plural forms always have two msgid (singular and plural), so it’s advised to not use a complex language as the source of translation.

Discussion on l10n keys

As you might have noticed, we’re using as source ID the actual sentence in English. That msgid is the same used throughout all your .pofiles, meaning other languages will have the same format and the same msgid fields but translated msgstr lines.

Talking about translation keys, there are two main “schools” here:

msgid as a real sentence.
The main advantages are:

if there are pieces of the software untranslated in any given language, the key displayed will still maintain some meaning. Example: if you happen to translate by heart from English to Spanish but need help to translate to French, you might publish the new page with missing French sentences, and parts of the website would be displayed in English instead;

it’s much easier for the translator to understand what’s going on and make a proper translation based on the msgid;

it gives you “free” l10n for one language - the source one;

The only disadvantage: if you need to change the actual text, you would need to replace the same msgid across several language files.

msgid as a unique, structured key.
It would describe the sentence role in the application in a structured way, including the template or part where the string is located instead of its content.

it’s a great way to have the code organized, separating the text content from the template logic.

however, that could bring problems to the translator that would miss the context. A source language file would be needed as a basis for other translations. Example: the developer would ideally have an en.po file, that translators would read to understand what to write in fr.po for instance.

missing translations would display meaningless keys on screen (top_menu.welcome instead of Hello there, User! on the said untranslated French page). That’s good it as would force translation to be complete before publishing - but bad as translation issues would be really awful in the interface. Some libraries, though, include an option to specify a given language as “fallback”, having a similar behavior as the other approach.

The Gettext manual favors the first approach as, in general, it’s easier for translators and users in case of trouble. That’s how we will be working here as well. However, the Symfony documentation favors keyword-based translation, to allow for independent changes of all translations without affecting templates as well.

Everyday usage

In a common application, you would use some Gettext functions while writing static text in your pages. Those sentences would then appear in .po files, get translated, compiled into .mo files and then, used by Gettext when rendering the actual interface. Given that, let’s tie together what we have discussed so far in a step-by-step example:

1. A sample template file, including some different gettext calls

<?php 
include 'i18n_setup.php' 
?> 
<div id="header">    
 <h1>
 <?=sprintf(gettext('Welcome, %s!'), $name)?>
 </h1>     
 <!-- code indented this way only for legibility -->     
 <?php if ($unread): ?>         
 <h2>
 <?=sprintf(ngettext('Only one unread message','%d unread messages',$unread),$unread)?>
 </h2>     
 <?php endif ?> 
 </div> 
 <h1><?=gettext('Introduction')?></h1> 
 <p><?=gettext('We\'re now translating some strings')?></p>

gettext() simply translates a msgid into its corresponding msgstr for a given language. There’s also the shorthand function _() that works the same way;

ngettext() does the same but with plural rules;

there’s also dgettext() and dngettext(), that allows you to override the domain for a single call. More on domain configuration in the next example.

2. A sample setup file (i18n_setup.php as used above), selecting the correct locale and configuring Gettext

<?php /** * Verifies if the given $locale is supported in the project * @param string $locale * @return bool */ function valid($locale) { return in_array($locale, ['en_US', 'en', 'pt_BR', 'pt', 'es_ES', 'es']); } //setting the source/default locale, for informational purposes $lang = 'en_US'; if (isset($_GET['lang']) && valid($_GET['lang'])) { // the locale can be changed through the query-string $lang = $_GET['lang']; //you should sanitize this! setcookie('lang', $lang); //it's stored in a cookie so it can be reused } elseif (isset($_COOKIE['lang']) && valid($_COOKIE['lang'])) { // if the cookie is present instead, let's just keep it $lang = $_COOKIE['lang']; //you should sanitize this! } elseif (isset($_SERVER['HTTP_ACCEPT_LANGUAGE'])) { // default: look for the languages the browser says the user accepts $langs = explode(',', $_SERVER['HTTP_ACCEPT_LANGUAGE']); array_walk($langs, function (&$lang) { $lang = strtr(strtok($lang, ';'), ['-' => '_']); }); foreach ($langs as $browser_lang) { if (valid($browser_lang)) { $lang = $browser_lang; break; } } } // here we define the global system locale given the found language putenv("LANG=$lang"); // this might be useful for date functions (LC_TIME) or money formatting (LC_MONETARY), for instance setlocale(LC_ALL, $lang); // this will make Gettext look for ../locales/<lang>/LC_MESSAGES/main.mo bindtextdomain('main', '../locales'); // indicates in what encoding the file should be read bind_textdomain_codeset('main', 'UTF-8'); // if your application has additional domains, as cited before, you should bind them here as well bindtextdomain('forum', '../locales'); bind_textdomain_codeset('forum', 'UTF-8'); // here we indicate the default domain the gettext() calls will respond to textdomain('main'); // this would look for the string in forum.mo instead of main.mo // echo dgettext('forum', 'Welcome back!'); ?>

3. Preparing translation for the first run

To make matters easier - and one of the powerful advantages Gettext has over custom framework i18n packages - is its custom file type. “Oh man, that’s quite hard to understand and edit by hand, a simple array would be easier!” Make no mistake, applications like Poedit are here to help - a lot. You can get the program from their website, it’s free and available for all platforms. It’s a pretty easy tool to get used to, and a very powerful one at the same time - using all powerful features Gettext has available.

In the first run, you should select “File > New Catalog” from the menu. There you’ll have a small screen where we will set the terrain so everything else runs smoothly. You’ll be able to find those settings later through “Catalog > Properties”:

Project name and version, Translation Team and email address: useful information that goes in the .po file header;

Language: here you should use that format we mentioned before, such as en_US or pt_BR;

Charsets: UTF-8, preferably;

Source charset: set here the charset used by your PHP files - probably UTF-8 as well, right?

plural forms: here go those rules we mentioned before - there’s a link in there with samples as well;

Source paths: here you must include all folders from the project where gettext() (and siblings) will happen - this is usually your templates folder(s)

Source keywords: this last part is filled by default, but you might need to alter it later - and is one of the powerful points of Gettext. The underlying software knows how the gettext() calls look like in several programming languages, but you might as well create your own translation forms. This will be discussed later in the “Tips” section.

After setting those points you’ll be prompted to save the file - using that directory structure we mentioned as well, and then it will run a scan through your source files to find the localization calls. They’ll be fed empty into the translation table, and you’ll start typing in the localized versions of those strings. Save it and a .mo file will be (re)compiled into the same folder and ta-dah: your project is internationalized.

4. Translating strings

As you may have noticed before, there are two main types of localized strings: simple ones and the ones with plural forms. The first ones have simply two boxes: source and localized string. The source string can’t be modified as Gettext/Poedit do not include the powers to alter your source files - you should change the source itself and rescan the files. Tip: you may right-click a translation line and it will hint you with the source files and lines where that string is being used.
On the other hand, plural form strings include two boxes to show the two source strings, and tabs so you can configure the different final forms.

Whenever you change your sources and need to update the translations, just hit Refresh and Poedit will rescan the code, removing non-existent entries, merging the ones that changed and adding new ones. It may also try to guess some translations, based on other ones you did. Those guesses and the changed entries will receive a “Fuzzy” marker, indicating it needs review, being highlighted in the list. It’s also useful if you have a translation team and someone tries to write something they’re not sure about: just mark Fuzzy and someone else will review later.

Finally, it’s advised to leave “View > Untranslated entries first” marked, as it will help you a lot to not forget any entry. From that menu, you can also open parts of the UI that allow you to leave contextual information for translators if needed.

Tips & Tricks

Possible caching issues

If you’re running PHP as a module on Apache (mod_php), you might face issues with the .mo file being cached. It happens the first time it’s read, and then, to update it, you might need to restart the server. On Nginx and PHP5 it usually takes only a couple of page refreshes to refresh the translation cache, and on PHP7 it is rarely needed.

Additional helper functions

As preferred by many people, it’s easier to use _() instead of gettext(). Many custom i18n libraries from frameworks use something similar to t() as well, to make translated code shorter. However, that’s the only function that sports a shortcut. You might want to add in your project some others, such as __() or _n() for ngettext(), or maybe a fancy _r()that would join gettext() and sprintf() calls. Other libraries, such asoscarotero’s Gettext also provide helper functions like these.

In those cases, you’ll need to instruct the Gettext utility on how to extract the strings from those new functions. Don’t be afraid, it’s very easy. It’s just a field in the .po file, or a Settings screen on Poedit. In the editor, that option is inside “Catalog > Properties > Source keywords”. You need to include there the specifications of those new functions, following a specific format:

if you create something like t() that simply returns the translation for a string, you can specify it as t. Gettext will know the only function argument is the string to be translated;

if the function has more than one argument, you can specify in which one the first string is - and if needed, the plural form as well. For instance, if we call our function like this: __('one user', '%d users', $number), the specification would be __:1,2, meaning the first form is the first argument, and the second form is the second argument. If your number comes as the first argument instead, the spec would be __:2,3, indicating the first form is the second argument, and so on.

After including those new rules in the .po file, a new scan will bring in your new strings just as easy as before.