Since ikiwiki uses open :utf8, perl assumes that files contain valid utf-8.
If it turns out to be malformed it may later crash while processing strings
read from them, with 'Malformed UTF-8 character (fatal)'.
As at least a quick fix, use utf8::valid as soon as data is read, and if
it's not valid, call encode_utf8 on the string, thus clearing the utf-8
flag. This may cause follow-on encoding problems, but will avoid this
crash, and the input file was broken anyway, so GIGO is a reasonable
response. (I looked at calling decode_utf8 after, but it seemed to cause
more trouble than it was worth. BTW, use open ':encoding(utf8)' avaoids
this problem, but the corrupted data later causes Storable to crash when
writing the index.)
This is a quick fix, clearly imperfect:
- It might be better to explicitly call decode_utf8 when reading files,
rather than using the IO layer.
- Data read other than by readfile() can still sneak in bad utf-8. While
ikiwiki does very little file input not using it, stdin for the CGI
would be one way.
This is necessary so that things that fork to the background,
like pinger, and inline ping, don't block other cgis from running.
Note that websetup also calls unlockwiki, before refreshing / rebuilding
the wiki. It makes perfect sense for that not to block other cgis.
* Stop busy-waiting in lockwiki, as this could delay ikiwiki from waking up
for up to one second. The bailout code is no longer needed.
* Remove support for unused optional wait parameter from lockwiki.
This fixes a problem exposed by the recent change to tags
(a2839de936). That recorded tag links as
absolute by including a leading slash in the link. The same could also be
done with an absolute wikilink.
In either case, link() would not match such links, unless the leading slash
was included in the link to match. But that's not right, because pagespecs
match absolute by default. So strip the leading slash.
Note that to keep any existing `link(/foo)` pagespecs working after this
change, the leading slash is removed from there, too.
The syslog value from the setup file is purposfully ignored when doing
ikiwiki -setup, so that it will output to stdout (while generating wrappers
that do use the syslog). But that caused -dumpsetup to not preserve
the syslog value from the setup file.
This speeds up web commits by 1/4th of a second or so, since perl does
not have to start up for the post commit hook.
perl's locking is completly FuBar, since it's impossible to tell what perl
flock() really does, and thus difficult to write code in other languages
that interoperates with perl's locking. (Let alone interoperating with
existing fcntl locking from perl...)
In this particular case, I think I was able to find a way to avoid the
insanity, mostly. The C code does a true flock(2), and if perl is using an
incompatable lock method that does not use the same locking primative at
the kernel level, then the C code's test will fail, and it will go ahead
and run the perl code. Then the perl code's test will test the right thing.
On Debian, at least lately, perl's flock() does a true flock(2), so the
optimisation does work.
Add an inject function, that can be used by plugins that want to replace
one of ikiwiki's functions with their own version. (This is a scary thing
that grubs through the symbol table, and replaces all exported occurances
of a function with the injected version.)
external: RPC functions can be injected to replace exported functions.
Removed the stupid displaytime hook, and use injection instead.
The html links already went there, but internally the links were not
recorded as absolute, which could cause confusing backlinks etc.
For example, with tagbase=tags, if blog/tags/bar existed and blog/foo was
tagged bar, it would link to /tags/bar. But, the link would be recorded
simply as a link to tags/bar, and so later blog/tags/bar would appear to
have the backlink.
* Add an underlay for javascript, and add ikiwiki.js containing some utility
code.
* toggle: Stop embedding the full toggle code on each page using it, and
move it to toggle.js in the javascript underlay.
This is the easy part of supporting foo/index.mdwn sources for page foo.
Note that if foo.mdwn exists too, there will be a warning about multiple
sources for the same page, and which is used is indeterminate.
indexpages should also cause web based editing to create index source pages
by default; this and other fallout of the option not yet implemented.
Upgrades to the new index format should be transparent.
The version field is 3, because 1 was the old textual index, 2 was the
pre-versioned format.
This also includes some efficiency improvements to index loading, by
not copying a hash and using a reference.
* htmltidy: Avoid returning undef if tidy fails. Also avoid returning the
untidied content if tidy crashes. In either case, it seems best to tidy
the content to nothing.
* htmltidy: Avoid spewing tidy errors to stderr.
Seems that the problem is that once the \nnn coming from git is converted
to a single character, decode_utf8 decides that this is a standalone
character, and not part of a multibyte utf-8 sequence, and so does nothing.
I tried playing with the utf-8 flag, but that didn't work. Instead, use
decode("utf8"), which doesn't have the same qualms, and successfully
decodes the octets into a utf-8 character.
Rant:
Think for a minute about fact that any and every program that parses git-log,
or git-show, etc output to figure out what files were in a commit needs to
contain this snippet of code, to convert from git-log's wacky output to a
regular character set:
if ($file =~ m/^"(.*)"$/) {
($file=$1) =~ s/\\([0-7]{1,3})/chr(oct($1))/eg;
}
(And it's only that "simple" if you don't care about filenames with
embedded \n or \t or other control characters.)
Does that strike anyone else as putting the parsing and conversion in the
wrong place (ie, in gitweb, ikiwiki, etc, etc)? Doesn't anyone who actually
uses git with utf-8 filenames get a bit pissed off at seeing \xxx\xxx
instead of the utf-8 in git-commit and other output?
I saw this in the wild, apparently a page was not present on disk, but was
in the aggregate db, and not marked as expired either. Not sure how that
happened, but such pages should get marked as expired since they have an
effectively zero ctime.