ikiwiki/doc/todo/utf8.mdwn

ikiwiki should support utf-8 pages, both input and output. To test, here's a
utf-8 smiley:

# ☺

Currently ikiwiki is belived to be utf-8 clean itself; it tells perl to use
binmode when reading possibly binary files (such as images) and it uses
utf-8 compatable regexps etc.

utf-8 IO is not enabled by default though. While you can probably embed
utf-8 in pages anyway, ikiwiki will not treat it right in the cases where
it deals with things on a per-character basis (mostly when escaping and
de-escaping special characters in filenames).

To enable utf-8, edit ikiwiki and add -CSD to the perl hashbang line.
(This should probably be configurable via a --utf8 or better --encoding=
switch.)

The following problems have been observed when running ikiwiki this way:

* If invalid utf-8 creeps into a file, ikiwiki will crash rendering it as
  follows:

	Malformed UTF-8 character (unexpected continuation byte 0x97, with no preceding start byte) in substitution iterator at /usr/bin/markdown line 1317.
	Malformed UTF-8 character (fatal) at /usr/bin/markdown line 1317.

  In this example, a literal 0x97 character had gotten into a markdown
  file.

  Running this before markdown can avoid it:

  $content = Encode::encode_utf8($content);

  I'm not sure how, or what should be done after markdown to get the string
  back into a form that perl can treat as utf-8.

* Apache "AddDefaultCharset on" settings will not play well with utf-8
  pages.

* CGI::FormBuilder needs to be told to set `charset => "utf-8"` so that
  utf-8 is used in the edit form. (done)
* Rebuilding on upgrade to this version is recommended. * Add a html validity check to the test suite, using the wdg-html-validator, if available. * Make the html valid when there is nothing in the actions list by adding an empty <li> to the end of it. * Reordered some function call parameters for consistency. 2006-05-26 10:24:36 +02:00			`ikiwiki should support utf-8 pages, both input and output. To test, here's a`
			`utf-8 smiley:`

			`# ☺`
proper binmode settings so that with -CSD, ikiwiki will support unicode however, due to robustness, that's not enabled by default yet 2006-04-04 21:34:50 +02:00
			`Currently ikiwiki is belived to be utf-8 clean itself; it tells perl to use`
			`binmode when reading possibly binary files (such as images) and it uses`
			`utf-8 compatable regexps etc.`

			`utf-8 IO is not enabled by default though. While you can probably embed`
			`utf-8 in pages anyway, ikiwiki will not treat it right in the cases where`
			`it deals with things on a per-character basis (mostly when escaping and`
			`de-escaping special characters in filenames).`

			`To enable utf-8, edit ikiwiki and add -CSD to the perl hashbang line.`
			`(This should probably be configurable via a --utf8 or better --encoding=`
			`switch.)`

			`The following problems have been observed when running ikiwiki this way:`

			`* If invalid utf-8 creeps into a file, ikiwiki will crash rendering it as`
			`follows:`

			`Malformed UTF-8 character (unexpected continuation byte 0x97, with no preceding start byte) in substitution iterator at /usr/bin/markdown line 1317.`
			`Malformed UTF-8 character (fatal) at /usr/bin/markdown line 1317.`

			`In this example, a literal 0x97 character had gotten into a markdown`
utf-8 support seems to be working now 2006-05-26 17:33:14 +02:00			`file.`

			`Running this before markdown can avoid it:`

			`$content = Encode::encode_utf8($content);`

			`I'm not sure how, or what should be done after markdown to get the string`
			`back into a form that perl can treat as utf-8.`

			`* Apache "AddDefaultCharset on" settings will not play well with utf-8`
			`pages.`

			* CGI::FormBuilder needs to be told to set `charset => "utf-8"` so that
			`utf-8 is used in the edit form. (done)`