407 lines
14 KiB
Markdown
407 lines
14 KiB
Markdown
[[!template id=plugin name=po core=0 author="[[intrigeri]]"]]
|
|
[[!tag type/format]]
|
|
|
|
This plugin adds support for multi-lingual wikis, translated with
|
|
gettext, using [po4a](http://po4a.alioth.debian.org/).
|
|
|
|
It depends on the Perl `Locale::Po4a::Po` library (`apt-get install po4a`).
|
|
|
|
[[!toc levels=2]]
|
|
|
|
Introduction
|
|
============
|
|
|
|
A language is chosen as the "master" one, and any other supported
|
|
language is a "slave" one.
|
|
|
|
A page written in the "master" language is a "master" page. It can be
|
|
of any page type supported by ikiwiki, except `po`. It does not have to be
|
|
named a special way: migration to this plugin does not imply any page
|
|
renaming work.
|
|
|
|
Example: `bla/page.mdwn` is a "master" Markdown page written in
|
|
English; if `usedirs` is enabled, it is rendered as
|
|
`bla/page/index.en.html`, else as `bla/page.en.html`.
|
|
|
|
Any translation of a "master" page into a "slave" language is called
|
|
a "slave" page; it is written in the gettext PO format. `po` is now
|
|
a page type supported by ikiwiki.
|
|
|
|
Example: `bla/page.fr.po` is the PO "message catalog" used to
|
|
translate `bla/page.mdwn` into French; if `usedirs` is enabled, it is
|
|
rendered as `bla/page/index.fr.html`, else as `bla/page.fr.html`
|
|
|
|
|
|
Configuration
|
|
=============
|
|
|
|
Supported languages
|
|
-------------------
|
|
|
|
`po_master_language` is used to set the "master" language in
|
|
`ikiwiki.setup`, such as:
|
|
|
|
po_master_language => { 'code' => 'en', 'name' => 'English' }
|
|
|
|
`po_slave_languages` is used to set the list of supported "slave"
|
|
languages, such as:
|
|
|
|
po_slave_languages => { 'fr' => 'Français',
|
|
'es' => 'Castellano',
|
|
'de' => 'Deutsch',
|
|
}
|
|
|
|
Decide which pages are translatable
|
|
-----------------------------------
|
|
|
|
The `po_translatable_pages` setting configures what pages are
|
|
translatable. It is a [[ikiwiki/PageSpec]], so you have lots of
|
|
control over what kind of pages are translatable.
|
|
|
|
The `.po` files are not considered as being translatable, so you don't need to
|
|
worry about excluding them explicitly from this [[ikiwiki/PageSpec]].
|
|
|
|
Internal links
|
|
--------------
|
|
|
|
The `po_link_to` option in `ikiwiki.setup` is used to decide how
|
|
internal links should be generated, depending on web server features
|
|
and site-specific preferences.
|
|
|
|
### Default linking behavior
|
|
|
|
If `po_link_to` is unset, or set to `default`, ikiwiki's default
|
|
linking behavior is preserved: `\[[destpage]]` links to the master
|
|
language's page.
|
|
|
|
### Link to current language
|
|
|
|
If `po_link_to` is set to `current`, `\[[destpage]]` links to the
|
|
`destpage`'s version written in the current page's language, if
|
|
available, *i.e.*:
|
|
|
|
- `foo/destpage/index.LL.html` if `usedirs` is enabled
|
|
- `foo/destpage.LL.html` if `usedirs` is disabled
|
|
|
|
### Link to negotiated language
|
|
|
|
If `po_link_to` is set to `negotiated`, `\[[page]]` links to the
|
|
negotiated preferred language, *i.e.* `foo/page/`.
|
|
|
|
(In)compatibility notes:
|
|
|
|
- if `usedirs` is disabled, it does not make sense to set `po_link_to`
|
|
to `negotiated`; this option combination is neither implemented
|
|
nor allowed.
|
|
- if the web server does not support Content Negotiation, setting
|
|
`po_link_to` to `negotiated` will produce a unusable website.
|
|
|
|
|
|
Server support
|
|
==============
|
|
|
|
Apache
|
|
------
|
|
|
|
Using Apache `mod_negotiation` makes it really easy to have Apache
|
|
serve any page in the client's preferred language, if available.
|
|
This is the default Debian Apache configuration.
|
|
|
|
When `usedirs` is enabled, one has to set `DirectoryIndex index` for
|
|
the wiki context.
|
|
|
|
Setting `DefaultLanguage LL` (replace `LL` with your default MIME
|
|
language code) for the wiki context can help to ensure
|
|
`bla/page/index.en.html` is served as `Content-Language: LL`.
|
|
|
|
lighttpd
|
|
--------
|
|
|
|
lighttpd unfortunately does not support content negotiation.
|
|
|
|
**FIXME**: does `mod_magnet` provide the functionality needed to
|
|
emulate this?
|
|
|
|
|
|
Usage
|
|
=====
|
|
|
|
Templates
|
|
---------
|
|
|
|
The `ISTRANSLATION` and `ISTRANSLATABLE` variables can be used to
|
|
display things only on translatable or translation pages.
|
|
|
|
### Display page's versions in other languages
|
|
|
|
The `OTHERLANGUAGES` loop provides ways to display other languages'
|
|
versions of the same page, and the translations' status.
|
|
|
|
One typically adds the following code to `templates/page.tmpl`:
|
|
|
|
<TMPL_IF NAME="OTHERLANGUAGES">
|
|
<div id="otherlanguages">
|
|
<ul>
|
|
<TMPL_LOOP NAME="OTHERLANGUAGES">
|
|
<li>
|
|
<a href="<TMPL_VAR NAME="URL">"><TMPL_VAR NAME="LANGUAGE"></a>
|
|
<TMPL_UNLESS NAME="MASTER">
|
|
(<TMPL_VAR NAME="PERCENT"> %)
|
|
</TMPL_UNLESS>
|
|
</li>
|
|
</TMPL_LOOP>
|
|
</ul>
|
|
</div>
|
|
</TMPL_IF>
|
|
|
|
The following variables are available inside the loop (for every page in):
|
|
|
|
- `URL` - url to the page
|
|
- `CODE` - two-letters language code
|
|
- `LANGUAGE` - language name (as defined in `po_slave_languages`)
|
|
- `MASTER` - is true (1) if, and only if the page is a "master" page
|
|
- `PERCENT` - for "slave" pages, is set to the translation completeness, in percents
|
|
|
|
### Display the current translation status
|
|
|
|
The `PERCENTTRANSLATED` variable is set to the translation
|
|
completeness, expressed in percent, on "slave" pages.
|
|
|
|
One can use it this way:
|
|
|
|
<TMPL_IF NAME="ISTRANSLATION">
|
|
<div id="percenttranslated">
|
|
<TMPL_VAR NAME="PERCENTTRANSLATED">
|
|
</div>
|
|
</TMPL_IF>
|
|
|
|
Additional PageSpec tests
|
|
-------------------------
|
|
|
|
This plugin enhances the regular [[ikiwiki/PageSpec]] syntax with some
|
|
additional tests that are documented [[here|ikiwiki/pagespec/po]].
|
|
|
|
Automatic PO file update
|
|
------------------------
|
|
|
|
Committing changes to a "master" page:
|
|
|
|
1. updates the POT file and the PO files for the "slave" languages;
|
|
the updated PO files are then put under version control;
|
|
2. triggers a refresh of the corresponding HTML slave pages.
|
|
|
|
Also, when the plugin has just been enabled, or when a page has just
|
|
been declared as being translatable, the needed POT and PO files are
|
|
created, and the PO files are checked into version control.
|
|
|
|
Discussion pages
|
|
----------------
|
|
|
|
Discussion should happen in the language in which the pages are
|
|
written for real, *i.e.* the "master" one. If discussion pages are
|
|
enabled, "slave" pages therefore link to the "master" page's
|
|
discussion page.
|
|
|
|
Translating
|
|
-----------
|
|
|
|
One can edit the PO files using ikiwiki's CGI (a message-by-message
|
|
interface could also be implemented at some point).
|
|
|
|
If [[tips/untrusted_git_push]] is setup, one can edit the PO files in one's
|
|
preferred `$EDITOR`, without needing to be online.
|
|
|
|
TODO
|
|
====
|
|
|
|
Security checks
|
|
---------------
|
|
|
|
### Security history
|
|
|
|
The only past security issues I could find in GNU gettext and po4a
|
|
are:
|
|
|
|
- [CVE-2004-0966](http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2004-0966),
|
|
*i.e.* [Debian bug #278283](http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=278283):
|
|
the autopoint and gettextize scripts in the GNU gettext package
|
|
1.14 and later versions, as used in Trustix Secure Linux 1.5
|
|
through 2.1 and other operating systems, allows local users to
|
|
overwrite files via a symlink attack on temporary files.
|
|
- [CVE-2007-4462](http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2007-4462):
|
|
`lib/Locale/Po4a/Po.pm` in po4a before 0.32 allows local users to
|
|
overwrite arbitrary files via a symlink attack on the
|
|
gettextization.failed.po temporary file.
|
|
|
|
**FIXME**: check whether this plugin would have been a possible attack
|
|
vector to exploit these vulnerabilities.
|
|
|
|
Depending on my mood, the lack of found security issues can either
|
|
indicate that there are none, or reveal that no-one ever bothered to
|
|
find (and publish) them.
|
|
|
|
### PO file features
|
|
|
|
Can any sort of directives be put in po files that will cause mischief
|
|
(ie, include other files, run commands, crash gettext, whatever)?
|
|
|
|
> No [documented](http://www.gnu.org/software/gettext/manual/gettext.html#PO-Files)
|
|
> directive is supposed to do so. [[--intrigeri]]
|
|
|
|
### Running po4a on untrusted content
|
|
|
|
Are there any security issues on running po4a on untrusted content?
|
|
|
|
To say the least, this issue is not well covered, at least publicly:
|
|
|
|
- the documentation does not talk about it;
|
|
- grep'ing the source code for `security` or `trust` gives no answer.
|
|
|
|
On the other hand, a po4a developer answered my questions in
|
|
a convincing manner, stating that processing untrusted content was not
|
|
an initial goal, and analysing in detail the possible issues.
|
|
|
|
#### Already checked
|
|
|
|
- the core (`Po.pm`, `Transtractor.pm`) should be safe
|
|
- po4a source code was fully checked for other potential symlink
|
|
attacks, after discovery of one such issue
|
|
- the only external program run by the core is `diff`, in `Po.pm` (in
|
|
parts of its code we don't use)
|
|
- `Locale::gettext`: only used to display translated error messages
|
|
- Nicolas François "hopes" `DynaLoader` is safe, and has "no reason to
|
|
think that `Encode` is not safe"
|
|
- Nicolas François has "no reason to think that `Encode::Guess` is not
|
|
safe". The po plugin nevertheless avoids using it by defining the
|
|
input charset (`file_in_charset`) before asking `Transtractor` to
|
|
read any file. NB: this hack depends on po4a internals to stay
|
|
the same.
|
|
|
|
#### To be checked
|
|
|
|
##### Locale::Po4a modules
|
|
|
|
The modules we want to use have to be checked, as not all are safe
|
|
(e.g. the LaTeX module's behaviour is changed by commands included in
|
|
the content); they may use regexps generated from the content.
|
|
|
|
`Chooser.pm` only loads the plugin we tell it too: currently, this
|
|
means the `Text` module only.
|
|
|
|
`Text` module (I checked the CVS version):
|
|
|
|
- it does not run any external program
|
|
- only `do_paragraph()` builds regexp's that expand untrusted
|
|
variables; they seem safe to me, but someone more expert than me
|
|
will need to check. Joey?
|
|
|
|
##### Text::WrapI18N
|
|
|
|
`Text::WrapI18N` can cause DoS (see the
|
|
[Debian bug #470250](http://bugs.debian.org/470250)), but it is
|
|
optional and we do not need the features it provides.
|
|
|
|
It is loaded if available by `Locale::Po4a::Common`; looking at the
|
|
code, I'm not sure we can prevent this at all, but maybe some symbol
|
|
table manipulation tricks could work; overriding
|
|
`Locale::Po4a::Common::wrapi18n` may be easier. I'm no expert at all
|
|
in this field. Joey? [[--intrigeri]]
|
|
|
|
> Update: Nicolas François suggests we add an option to po4a to
|
|
> disable it. It would do the trick, but only for people running
|
|
> a brand new po4a (probably too late for Lenny). Anyway, this option
|
|
> would have to take effect in a `BEGIN` / `eval` that I'm not
|
|
> familiar with. I can learn and do it, in case no Perl wizard
|
|
> volunteers to provide the po4a patch. [[--intrigeri]]
|
|
|
|
##### Term::ReadKey
|
|
|
|
`Term::ReadKey` is not a hard dependency in our case, *i.e.* po4a
|
|
works nicely without it. But the po4a Debian package recommends
|
|
`libterm-readkey-perl`, so it will probably be installed on most
|
|
systems using the po plugin.
|
|
|
|
If `$ENV{COLUMNS}` is not set, `Locale::Po4a::Common` uses
|
|
`Term::ReadKey::GetTerminalSize()` to get the terminal size. How safe
|
|
is this?
|
|
|
|
Part of `Term::ReadKey` is written in C. Depending on the runtime
|
|
platform, this function use ioctl, environment, or C library function
|
|
calls, and may end up running the `resize` command (without
|
|
arguments).
|
|
|
|
IMHO, using Term::ReadKey has too far reaching implications for us to
|
|
be able to guarantee anything wrt. security. Since it is anyway of no
|
|
use in our case, I suggest we define `ENV{COLUMNS}` before loading
|
|
`Locale::Po4a::Common`, just to be on the safe side. Joey?
|
|
[[--intrigeri]]
|
|
|
|
> Update: adding an option to disable `Text::WrapI18N`, as Nicolas
|
|
> François suggested, would as a bonus disable `Term::ReadKey`
|
|
> as well. [[--intrigeri]]
|
|
|
|
### msgmerge
|
|
|
|
`refreshpofiles()` runs this external program. A po4a developer
|
|
answered he does "not expect any security issues from it".
|
|
|
|
### Fuzzing input
|
|
|
|
I was not able to find any public information about gettext or po4a
|
|
having been tested with a fuzzing program, such as `zzuf` or `fusil`.
|
|
Moreover, some gettext parsers seem to be quite
|
|
[easy to crash](http://fusil.hachoir.org/trac/browser/trunk/fuzzers/fusil-gettext),
|
|
so it might be useful to bang msgmerge/po4a's heads against such
|
|
a program in order to easily detect some of the most obvious DoS.
|
|
[[--intrigeri]]
|
|
|
|
> po4a was not fuzzy-tested, but according to one of its developers,
|
|
> "it would be really appreciated". [[--intrigeri]]
|
|
|
|
gettext/po4a rough corners
|
|
--------------------------
|
|
|
|
- fix infinite loop when synchronizing two ikiwiki (when checkouts
|
|
live in different directories): say bla.fr.po has been updated in
|
|
repo2; pulling repo2 from repo1 seems to trigger a PO update, that
|
|
changes bla.fr.po in repo1; then pushing repo1 to repo2 triggers
|
|
a PO update, that changes bla.fr.po in repo2; etc.; quickly fixed in
|
|
`629968fc89bced6727981c0a1138072631751fee`, by disabling references
|
|
in Pot files. Using `Locale::Po4a::write_if_needed` might be
|
|
a cleaner solution. (warning: this function runs the external
|
|
`diff` program, have to check security)
|
|
- new translations created in the web interface must get proper
|
|
charset/encoding gettext metadata, else the next automatic PO update
|
|
removes any non-ascii chars; possible solution: put such metadata
|
|
into the Pot file, and let it propagate; should be fixed in
|
|
`773de05a7a1ee68d2bed173367cf5e716884945a`, time will tell.
|
|
|
|
Misc. improvements
|
|
------------------
|
|
|
|
### page titles
|
|
|
|
Use nice page titles from meta plugin in links, as inline already
|
|
does. This is actually a duplicate for
|
|
[[bugs/pagetitle_function_does_not_respect_meta_titles]], which might
|
|
be fixed by something like [[todo/using_meta_titles_for_parentlinks]].
|
|
|
|
### source files format
|
|
|
|
Markdown is supported, great, but what about others? The set of file
|
|
formats supported both in ikiwiki and po4a probably is greater than
|
|
`{markdown}`. Warning: the po4a modules are the place where one can
|
|
expect security issues.
|
|
|
|
Translation quality assurance
|
|
-----------------------------
|
|
|
|
Modifying a PO file via the CGI must be forbidden if the new version
|
|
is not a valid PO file. As a bonus, check that it provides a more
|
|
complete translation than the existing one.
|
|
|
|
A new `cansave` type of hook would be needed to implement this.
|
|
|
|
Note: committing to the underlying repository is a way to bypass
|
|
this check.
|