566 lines
19 KiB
Markdown
566 lines
19 KiB
Markdown
[[!template id=plugin name=po core=0 author="[[intrigeri]]"]]
|
|
[[!tag type/format]]
|
|
|
|
This plugin adds support for multi-lingual wikis, translated with
|
|
gettext, using [po4a](http://po4a.alioth.debian.org/).
|
|
|
|
It depends on the Perl `Locale::Po4a::Po` library (`apt-get install po4a`).
|
|
|
|
[[!toc levels=2]]
|
|
|
|
Introduction
|
|
============
|
|
|
|
A language is chosen as the "master" one, and any other supported
|
|
language is a "slave" one.
|
|
|
|
A page written in the "master" language is a "master" page. It can be
|
|
of any page type supported by ikiwiki, except `po`. It does not have to be
|
|
named a special way: migration to this plugin does not imply any page
|
|
renaming work.
|
|
|
|
Example: `bla/page.mdwn` is a "master" Markdown page written in
|
|
English; if `usedirs` is enabled, it is rendered as
|
|
`bla/page/index.en.html`, else as `bla/page.en.html`.
|
|
|
|
Any translation of a "master" page into a "slave" language is called
|
|
a "slave" page; it is written in the gettext PO format. `po` is now
|
|
a page type supported by ikiwiki.
|
|
|
|
Example: `bla/page.fr.po` is the PO "message catalog" used to
|
|
translate `bla/page.mdwn` into French; if `usedirs` is enabled, it is
|
|
rendered as `bla/page/index.fr.html`, else as `bla/page.fr.html`
|
|
|
|
|
|
Configuration
|
|
=============
|
|
|
|
Supported languages
|
|
-------------------
|
|
|
|
`po_master_language` is used to set the "master" language in
|
|
`ikiwiki.setup`, such as:
|
|
|
|
po_master_language => { 'code' => 'en', 'name' => 'English' }
|
|
|
|
`po_slave_languages` is used to set the list of supported "slave"
|
|
languages, such as:
|
|
|
|
po_slave_languages => { 'fr' => 'Français',
|
|
'es' => 'Castellano',
|
|
'de' => 'Deutsch',
|
|
}
|
|
|
|
Decide which pages are translatable
|
|
-----------------------------------
|
|
|
|
The `po_translatable_pages` setting configures what pages are
|
|
translatable. It is a [[ikiwiki/PageSpec]], so you have lots of
|
|
control over what kind of pages are translatable.
|
|
|
|
The `.po` files are not considered as being translatable, so you don't need to
|
|
worry about excluding them explicitly from this [[ikiwiki/PageSpec]].
|
|
|
|
Internal links
|
|
--------------
|
|
|
|
The `po_link_to` option in `ikiwiki.setup` is used to decide how
|
|
internal links should be generated, depending on web server features
|
|
and site-specific preferences.
|
|
|
|
### Default linking behavior
|
|
|
|
If `po_link_to` is unset, or set to `default`, ikiwiki's default
|
|
linking behavior is preserved: `\[[destpage]]` links to the master
|
|
language's page.
|
|
|
|
### Link to current language
|
|
|
|
If `po_link_to` is set to `current`, `\[[destpage]]` links to the
|
|
`destpage`'s version written in the current page's language, if
|
|
available, *i.e.*:
|
|
|
|
- `foo/destpage/index.LL.html` if `usedirs` is enabled
|
|
- `foo/destpage.LL.html` if `usedirs` is disabled
|
|
|
|
### Link to negotiated language
|
|
|
|
If `po_link_to` is set to `negotiated`, `\[[page]]` links to the
|
|
negotiated preferred language, *i.e.* `foo/page/`.
|
|
|
|
(In)compatibility notes:
|
|
|
|
- if `usedirs` is disabled, it does not make sense to set `po_link_to`
|
|
to `negotiated`; this option combination is neither implemented
|
|
nor allowed.
|
|
- if the web server does not support Content Negotiation, setting
|
|
`po_link_to` to `negotiated` will produce a unusable website.
|
|
|
|
|
|
Server support
|
|
==============
|
|
|
|
Apache
|
|
------
|
|
|
|
Using Apache `mod_negotiation` makes it really easy to have Apache
|
|
serve any page in the client's preferred language, if available.
|
|
This is the default Debian Apache configuration.
|
|
|
|
When `usedirs` is enabled, one has to set `DirectoryIndex index` for
|
|
the wiki context.
|
|
|
|
Setting `DefaultLanguage LL` (replace `LL` with your default MIME
|
|
language code) for the wiki context can help to ensure
|
|
`bla/page/index.en.html` is served as `Content-Language: LL`.
|
|
|
|
lighttpd
|
|
--------
|
|
|
|
lighttpd unfortunately does not support content negotiation.
|
|
|
|
**FIXME**: does `mod_magnet` provide the functionality needed to
|
|
emulate this?
|
|
|
|
|
|
Usage
|
|
=====
|
|
|
|
Templates
|
|
---------
|
|
|
|
When `po_link_to` is not set to `negotiated`, one should replace some
|
|
occurrences of `BASEURL` with `HOMEPAGEURL` to get correct links to
|
|
the wiki homepage.
|
|
|
|
The `ISTRANSLATION` and `ISTRANSLATABLE` variables can be used to
|
|
display things only on translatable or translation pages.
|
|
|
|
### Display page's versions in other languages
|
|
|
|
The `OTHERLANGUAGES` loop provides ways to display other languages'
|
|
versions of the same page, and the translations' status.
|
|
|
|
One typically adds the following code to `templates/page.tmpl`:
|
|
|
|
<TMPL_IF NAME="OTHERLANGUAGES">
|
|
<div id="otherlanguages">
|
|
<ul>
|
|
<TMPL_LOOP NAME="OTHERLANGUAGES">
|
|
<li>
|
|
<a href="<TMPL_VAR NAME="URL">"><TMPL_VAR NAME="LANGUAGE"></a>
|
|
<TMPL_UNLESS NAME="MASTER">
|
|
(<TMPL_VAR NAME="PERCENT"> %)
|
|
</TMPL_UNLESS>
|
|
</li>
|
|
</TMPL_LOOP>
|
|
</ul>
|
|
</div>
|
|
</TMPL_IF>
|
|
|
|
The following variables are available inside the loop (for every page in):
|
|
|
|
- `URL` - url to the page
|
|
- `CODE` - two-letters language code
|
|
- `LANGUAGE` - language name (as defined in `po_slave_languages`)
|
|
- `MASTER` - is true (1) if, and only if the page is a "master" page
|
|
- `PERCENT` - for "slave" pages, is set to the translation completeness, in percents
|
|
|
|
### Display the current translation status
|
|
|
|
The `PERCENTTRANSLATED` variable is set to the translation
|
|
completeness, expressed in percent, on "slave" pages.
|
|
|
|
One can use it this way:
|
|
|
|
<TMPL_IF NAME="ISTRANSLATION">
|
|
<div id="percenttranslated">
|
|
<TMPL_VAR NAME="PERCENTTRANSLATED">
|
|
</div>
|
|
</TMPL_IF>
|
|
|
|
Additional PageSpec tests
|
|
-------------------------
|
|
|
|
This plugin enhances the regular [[ikiwiki/PageSpec]] syntax with some
|
|
additional tests that are documented [[here|ikiwiki/pagespec/po]].
|
|
|
|
Automatic PO file update
|
|
------------------------
|
|
|
|
Committing changes to a "master" page:
|
|
|
|
1. updates the POT file and the PO files for the "slave" languages;
|
|
the updated PO files are then put under version control;
|
|
2. triggers a refresh of the corresponding HTML slave pages.
|
|
|
|
Also, when the plugin has just been enabled, or when a page has just
|
|
been declared as being translatable, the needed POT and PO files are
|
|
created, and the PO files are checked into version control.
|
|
|
|
Discussion pages
|
|
----------------
|
|
|
|
Discussion should happen in the language in which the pages are
|
|
written for real, *i.e.* the "master" one. If discussion pages are
|
|
enabled, "slave" pages therefore link to the "master" page's
|
|
discussion page.
|
|
|
|
Translating
|
|
-----------
|
|
|
|
One can edit the PO files using ikiwiki's CGI (a message-by-message
|
|
interface could also be implemented at some point).
|
|
|
|
If [[tips/untrusted_git_push]] is setup, one can edit the PO files in one's
|
|
preferred `$EDITOR`, without needing to be online.
|
|
|
|
TODO
|
|
====
|
|
|
|
Security checks
|
|
---------------
|
|
|
|
### Security history
|
|
|
|
The only past security issues I could find in GNU gettext and po4a
|
|
are:
|
|
|
|
- [CVE-2004-0966](http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2004-0966),
|
|
*i.e.* [Debian bug #278283](http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=278283):
|
|
the autopoint and gettextize scripts in the GNU gettext package
|
|
1.14 and later versions, as used in Trustix Secure Linux 1.5
|
|
through 2.1 and other operating systems, allows local users to
|
|
overwrite files via a symlink attack on temporary files.
|
|
- [CVE-2007-4462](http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2007-4462):
|
|
`lib/Locale/Po4a/Po.pm` in po4a before 0.32 allows local users to
|
|
overwrite arbitrary files via a symlink attack on the
|
|
gettextization.failed.po temporary file.
|
|
|
|
**FIXME**: check whether this plugin would have been a possible attack
|
|
vector to exploit these vulnerabilities.
|
|
|
|
Depending on my mood, the lack of found security issues can either
|
|
indicate that there are none, or reveal that no-one ever bothered to
|
|
find (and publish) them.
|
|
|
|
### PO file features
|
|
|
|
Can any sort of directives be put in po files that will cause mischief
|
|
(ie, include other files, run commands, crash gettext, whatever)?
|
|
|
|
> No [documented](http://www.gnu.org/software/gettext/manual/gettext.html#PO-Files)
|
|
> directive is supposed to do so. [[--intrigeri]]
|
|
|
|
### Running po4a on untrusted content
|
|
|
|
Are there any security issues on running po4a on untrusted content?
|
|
|
|
To say the least, this issue is not well covered, at least publicly:
|
|
|
|
- the documentation does not talk about it;
|
|
- grep'ing the source code for `security` or `trust` gives no answer.
|
|
|
|
On the other hand, a po4a developer answered my questions in
|
|
a convincing manner, stating that processing untrusted content was not
|
|
an initial goal, and analysing in detail the possible issues.
|
|
|
|
#### Already checked
|
|
|
|
- the core (`Po.pm`, `Transtractor.pm`) should be safe
|
|
- po4a source code was fully checked for other potential symlink
|
|
attacks, after discovery of one such issue
|
|
- the only external program run by the core is `diff`, in `Po.pm` (in
|
|
parts of its code we don't use)
|
|
- `Locale::gettext`: only used to display translated error messages
|
|
- Nicolas François "hopes" `DynaLoader` is safe, and has "no reason to
|
|
think that `Encode` is not safe"
|
|
- Nicolas François has "no reason to think that `Encode::Guess` is not
|
|
safe". The po plugin nevertheless avoids using it by defining the
|
|
input charset (`file_in_charset`) before asking `Transtractor` to
|
|
read any file. NB: this hack depends on po4a internals to stay
|
|
the same.
|
|
|
|
#### To be checked
|
|
|
|
##### Locale::Po4a modules
|
|
|
|
The modules we want to use have to be checked, as not all are safe
|
|
(e.g. the LaTeX module's behaviour is changed by commands included in
|
|
the content); they may use regexps generated from the content.
|
|
|
|
`Chooser.pm` only loads the plugin we tell it too: currently, this
|
|
means the `Text` module only.
|
|
|
|
`Text` module (I checked the CVS version):
|
|
|
|
- it does not run any external program
|
|
- only `do_paragraph()` builds regexp's that expand untrusted
|
|
variables; they seem safe to me, but someone more expert than me
|
|
will need to check. Joey?
|
|
|
|
> Freaky code, but seems ok due to use of `quotementa`.
|
|
|
|
##### Text::WrapI18N
|
|
|
|
`Text::WrapI18N` can cause DoS (see the
|
|
[Debian bug #470250](http://bugs.debian.org/470250)), but it is
|
|
optional and we do not need the features it provides.
|
|
|
|
It is loaded if available by `Locale::Po4a::Common`; looking at the
|
|
code, I'm not sure we can prevent this at all, but maybe some symbol
|
|
table manipulation tricks could work; overriding
|
|
`Locale::Po4a::Common::wrapi18n` may be easier. I'm no expert at all
|
|
in this field. Joey? [[--intrigeri]]
|
|
|
|
> Update: Nicolas François suggests we add an option to po4a to
|
|
> disable it. It would do the trick, but only for people running
|
|
> a brand new po4a (probably too late for Lenny). Anyway, this option
|
|
> would have to take effect in a `BEGIN` / `eval` that I'm not
|
|
> familiar with. I can learn and do it, in case no Perl wizard
|
|
> volunteers to provide the po4a patch. [[--intrigeri]]
|
|
|
|
>> That doesn't really need to be in a BEGIN. This patch moves it to
|
|
>> `import`, and makes this disable wrap18n:
|
|
>> `use Locale::Po4a::Common q{nowrapi18n}` --[[Joey]]
|
|
|
|
<pre>
|
|
--- /usr/share/perl5/Locale/Po4a/Common.pm 2008-07-21 14:54:52.000000000 -0400
|
|
+++ Common.pm 2008-11-11 18:27:34.000000000 -0500
|
|
@@ -30,8 +30,16 @@
|
|
use strict;
|
|
use warnings;
|
|
|
|
-BEGIN {
|
|
- if (eval { require Text::WrapI18N }) {
|
|
+sub import {
|
|
+ my $class=shift;
|
|
+ my $wrapi18n=1;
|
|
+ if ($_[0] eq 'nowrapi18n') {
|
|
+ shift;
|
|
+ $wrapi18n=0;
|
|
+ }
|
|
+ $class->export_to_level(1, $class, @_);
|
|
+
|
|
+ if ($wrapi18n && eval { require Text::WrapI18N }) {
|
|
|
|
# Don't bother determining the wrap column if we cannot wrap.
|
|
my $col=$ENV{COLUMNS};
|
|
</pre>
|
|
|
|
##### Term::ReadKey
|
|
|
|
`Term::ReadKey` is not a hard dependency in our case, *i.e.* po4a
|
|
works nicely without it. But the po4a Debian package recommends
|
|
`libterm-readkey-perl`, so it will probably be installed on most
|
|
systems using the po plugin.
|
|
|
|
If `$ENV{COLUMNS}` is not set, `Locale::Po4a::Common` uses
|
|
`Term::ReadKey::GetTerminalSize()` to get the terminal size. How safe
|
|
is this?
|
|
|
|
Part of `Term::ReadKey` is written in C. Depending on the runtime
|
|
platform, this function use ioctl, environment, or C library function
|
|
calls, and may end up running the `resize` command (without
|
|
arguments).
|
|
|
|
IMHO, using Term::ReadKey has too far reaching implications for us to
|
|
be able to guarantee anything wrt. security. Since it is anyway of no
|
|
use in our case, I suggest we define `ENV{COLUMNS}` before loading
|
|
`Locale::Po4a::Common`, just to be on the safe side. Joey?
|
|
[[--intrigeri]]
|
|
|
|
> Update: adding an option to disable `Text::WrapI18N`, as Nicolas
|
|
> François suggested, would as a bonus disable `Term::ReadKey`
|
|
> as well. [[--intrigeri]]
|
|
|
|
### msgmerge
|
|
|
|
`refreshpofiles()` runs this external program. A po4a developer
|
|
answered he does "not expect any security issues from it".
|
|
|
|
### Fuzzing input
|
|
|
|
I was not able to find any public information about gettext or po4a
|
|
having been tested with a fuzzing program, such as `zzuf` or `fusil`.
|
|
Moreover, some gettext parsers seem to be quite
|
|
[easy to crash](http://fusil.hachoir.org/trac/browser/trunk/fuzzers/fusil-gettext),
|
|
so it might be useful to bang msgmerge/po4a's heads against such
|
|
a program in order to easily detect some of the most obvious DoS.
|
|
[[--intrigeri]]
|
|
|
|
> po4a was not fuzzy-tested, but according to one of its developers,
|
|
> "it would be really appreciated". [[--intrigeri]]
|
|
|
|
Test conditions:
|
|
|
|
- a 21M file containing 100 concatenated copies of all the files in my
|
|
`/usr/share/common-licenses/`; I had no existing PO file or
|
|
translated versions at hand, which renders these tests
|
|
quite incomplete.
|
|
- po4a was the Debian 0.34-2 package; the same tests were also run
|
|
after replacing the `Text` module with the CVS one (the core was not
|
|
changed in CVS since 0.34-2 was released), without any significant
|
|
difference in the results.
|
|
- Perl 5.10.0-16
|
|
|
|
#### po4a-gettextize
|
|
|
|
`po4a-gettextize` uses more or less the same po4a features as our
|
|
`refreshpot` function.
|
|
|
|
Without specifying an input charset, zzuf'ed `po4a-gettextize` quickly
|
|
errors out, complaining it was not able to detect the input charset;
|
|
it leaves no incomplete file on disk.
|
|
|
|
So I had to pretend the input was in UTF-8, as does the po plugin.
|
|
|
|
Two ways of crashing were revealed by this command-line:
|
|
|
|
zzuf -vc -s 0:100 -r 0.1:0.5 \
|
|
po4a-gettextize -f text -o markdown -M utf-8 -L utf-8 \
|
|
-m LICENSES >/dev/null
|
|
|
|
They are:
|
|
|
|
Malformed UTF-8 character (UTF-16 surrogate 0xdcc9) in substitution iterator at /usr/share/perl5/Locale/Po4a/Po.pm line 1443.
|
|
Malformed UTF-8 character (fatal) at /usr/share/perl5/Locale/Po4a/Po.pm line 1443.
|
|
|
|
and
|
|
|
|
Malformed UTF-8 character (UTF-16 surrogate 0xdcec) in substitution (s///) at /usr/share/perl5/Locale/Po4a/Po.pm line 1443.
|
|
Malformed UTF-8 character (fatal) at /usr/share/perl5/Locale/Po4a/Po.pm line 1443.
|
|
|
|
Perl seems to exit cleanly, and an incomplete PO file is written on
|
|
disk. I not sure whether if this is a bug in Perl or in `Po.pm`.
|
|
|
|
> It's fairly standard perl behavior when fed malformed utf-8. As long as it doesn't
|
|
> crash ikiwiki, it's probably acceptable. Ikiwiki can do some similar things itself when fed malformed utf-8 (doesn't crash tho) --[[Joey]]
|
|
|
|
#### po4a-translate
|
|
|
|
`po4a-translate` uses more or less the same po4a features as our
|
|
`filter` function.
|
|
|
|
Without specifying an input charset, same behaviour as
|
|
`po4a-gettextize`, so let's specify UTF-8 as input charset as of now.
|
|
|
|
zzuf -cv \
|
|
po4a-translate -d -f text -o markdown -M utf-8 -L utf-8 \
|
|
-k 0 -m LICENSES -p LICENSES.fr.po -l test.fr
|
|
|
|
... prints tons of occurences of the following error, but a complete
|
|
translated document is written (obviously with some weird chars
|
|
inside):
|
|
|
|
Use of uninitialized value in string ne at /usr/share/perl5/Locale/Po4a/TransTractor.pm line 854.
|
|
Use of uninitialized value in string ne at /usr/share/perl5/Locale/Po4a/TransTractor.pm line 840.
|
|
Use of uninitialized value in pattern match (m//) at /usr/share/perl5/Locale/Po4a/Po.pm line 1002.
|
|
|
|
While:
|
|
|
|
zzuf -cv -s 0:10 -r 0.001:0.3 \
|
|
po4a-translate -d -f text -o markdown -M utf-8 -L utf-8 \
|
|
-k 0 -m LICENSES -p LICENSES.fr.po -l test.fr
|
|
|
|
... seems to lose the fight, at the `readpo(LICENSES.fr.po)` step,
|
|
against some kind of infinite loop, deadlock, or any similar beast.
|
|
It does not seem to eat memory, though.
|
|
|
|
Whatever format module is used does not change anything. This is thus
|
|
probably a bug in po4a's core or in a lib it depends on.
|
|
|
|
The sub `read`, in `TransTractor.pm`, seems to be a good debugging
|
|
starting point.
|
|
|
|
#### msgmerge
|
|
|
|
`msgmerge` is run in our `refreshpofiles` function. I did not manage
|
|
to crash it with `zzuf`.
|
|
|
|
gettext/po4a rough corners
|
|
--------------------------
|
|
|
|
- fix infinite loop when synchronizing two ikiwiki (when checkouts
|
|
live in different directories): say bla.fr.po has been updated in
|
|
repo2; pulling repo2 from repo1 seems to trigger a PO update, that
|
|
changes bla.fr.po in repo1; then pushing repo1 to repo2 triggers
|
|
a PO update, that changes bla.fr.po in repo2; etc.; quickly fixed in
|
|
`629968fc89bced6727981c0a1138072631751fee`, by disabling references
|
|
in Pot files. Using `Locale::Po4a::write_if_needed` might be
|
|
a cleaner solution. (warning: this function runs the external
|
|
`diff` program, have to check security)
|
|
- new translations created in the web interface must get proper
|
|
charset/encoding gettext metadata, else the next automatic PO update
|
|
removes any non-ascii chars; possible solution: put such metadata
|
|
into the Pot file, and let it propagate; should be fixed in
|
|
`773de05a7a1ee68d2bed173367cf5e716884945a`, time will tell.
|
|
|
|
Better links
|
|
------------
|
|
|
|
### Subpages
|
|
|
|
On a translation page, links to subpages should actually be links to
|
|
the master page's subpages. They currently appear as broken links.
|
|
|
|
### Page title in links
|
|
|
|
To use the page titles set with the [[meta|plugins/meta]] plugin when
|
|
rendering links would be very much nicer, than the current
|
|
"filename.LL" format. This is actually a duplicate for
|
|
[[bugs/pagetitle_function_does_not_respect_meta_titles]].
|
|
|
|
Going to work on this in my `meta` branch.
|
|
|
|
### Translation status in links
|
|
|
|
See [[contrib/po]].
|
|
|
|
### Backlinks
|
|
|
|
They are not updated when the source page changes (e.g. meta title).
|
|
|
|
Page formats
|
|
------------
|
|
|
|
Markdown is well supported, great, but what about others?
|
|
|
|
The [[po|plugins/po]] uses `Locale::Po4a::Text` for every page format;
|
|
this can be expected to work out of the box with most other wiki-like
|
|
formats supported by ikiwiki. Some of their ad-hoc syntax might be
|
|
parsed in a strange way, but the worst problems I can imagine would be
|
|
wrapping issues; e.g. there is code in po4a dedicated to prevent
|
|
re-wrapping the underlined Markdown headers.
|
|
|
|
While it would be easy to better support formats such as [[html]] or
|
|
LaTeX, by using for each one the dedicated po4a module, this can be
|
|
problematic from a security point of view.
|
|
|
|
**TODO**: test the more popular formats and write proper documentation
|
|
about it.
|
|
|
|
Translation quality assurance
|
|
-----------------------------
|
|
|
|
Modifying a PO file via the CGI must be forbidden if the new version
|
|
is not a valid PO file. As a bonus, check that it provides a more
|
|
complete translation than the existing one.
|
|
|
|
A new `cansave` type of hook would be needed to implement this.
|
|
|
|
Note: committing to the underlying repository is a way to bypass
|
|
this check.
|
|
|
|
Creating new pages on the web
|
|
-----------------------------
|
|
|
|
See [[contrib/po]].
|
|
|
|
Documentation
|
|
-------------
|
|
|
|
Maybe write separate documentation depending on the people it targets:
|
|
translators, wiki administrators, hackers. This plugin may be complex
|
|
enough to deserve this.
|