2006-05-26 10:24:36 +02:00
|
|
|
ikiwiki should support utf-8 pages, both input and output. To test, here's a
|
|
|
|
utf-8 smiley:
|
|
|
|
|
|
|
|
# ☺
|
2006-04-04 21:34:50 +02:00
|
|
|
|
|
|
|
Currently ikiwiki is belived to be utf-8 clean itself; it tells perl to use
|
|
|
|
binmode when reading possibly binary files (such as images) and it uses
|
|
|
|
utf-8 compatable regexps etc.
|
|
|
|
|
|
|
|
utf-8 IO is not enabled by default though. While you can probably embed
|
|
|
|
utf-8 in pages anyway, ikiwiki will not treat it right in the cases where
|
|
|
|
it deals with things on a per-character basis (mostly when escaping and
|
|
|
|
de-escaping special characters in filenames).
|
|
|
|
|
|
|
|
To enable utf-8, edit ikiwiki and add -CSD to the perl hashbang line.
|
|
|
|
(This should probably be configurable via a --utf8 or better --encoding=
|
|
|
|
switch.)
|
|
|
|
|
|
|
|
The following problems have been observed when running ikiwiki this way:
|
|
|
|
|
|
|
|
* If invalid utf-8 creeps into a file, ikiwiki will crash rendering it as
|
|
|
|
follows:
|
|
|
|
|
|
|
|
Malformed UTF-8 character (unexpected continuation byte 0x97, with no preceding start byte) in substitution iterator at /usr/bin/markdown line 1317.
|
|
|
|
Malformed UTF-8 character (fatal) at /usr/bin/markdown line 1317.
|
|
|
|
|
|
|
|
In this example, a literal 0x97 character had gotten into a markdown
|
2006-05-26 17:33:14 +02:00
|
|
|
file.
|
|
|
|
|
|
|
|
Running this before markdown can avoid it:
|
|
|
|
|
|
|
|
$content = Encode::encode_utf8($content);
|
|
|
|
|
|
|
|
I'm not sure how, or what should be done after markdown to get the string
|
|
|
|
back into a form that perl can treat as utf-8.
|
|
|
|
|
|
|
|
* Apache "AddDefaultCharset on" settings will not play well with utf-8
|
|
|
|
pages.
|
|
|
|
|
|
|
|
* CGI::FormBuilder needs to be told to set `charset => "utf-8"` so that
|
|
|
|
utf-8 is used in the edit form. (done)
|