holger reported that decode_utf8 was crashing with perl 5.8.8. Earlier, I
thought that passing 0 to the function avoided this with old perls, but
that was apparently not enough, it still crashes. So, put it inside the
eval, so we can at least recover from it crashing.
The old code actually did the same thing, just obfuscated -- since the eval
use wasn't quoted, it used the modules on load. Thus, the error (not to
mentioned the return) was bypassed, and it just failed on load.
But that seems like the right thing to do, really, so just made it clearer
that's what happens.
Since ikiwiki uses open :utf8, perl assumes that files contain valid utf-8.
If it turns out to be malformed it may later crash while processing strings
read from them, with 'Malformed UTF-8 character (fatal)'.
As at least a quick fix, use utf8::valid as soon as data is read, and if
it's not valid, call encode_utf8 on the string, thus clearing the utf-8
flag. This may cause follow-on encoding problems, but will avoid this
crash, and the input file was broken anyway, so GIGO is a reasonable
response. (I looked at calling decode_utf8 after, but it seemed to cause
more trouble than it was worth. BTW, use open ':encoding(utf8)' avaoids
this problem, but the corrupted data later causes Storable to crash when
writing the index.)
This is a quick fix, clearly imperfect:
- It might be better to explicitly call decode_utf8 when reading files,
rather than using the IO layer.
- Data read other than by readfile() can still sneak in bad utf-8. While
ikiwiki does very little file input not using it, stdin for the CGI
would be one way.