Since ikiwiki uses open :utf8, perl assumes that files contain valid utf-8.
If it turns out to be malformed it may later crash while processing strings
read from them, with 'Malformed UTF-8 character (fatal)'.
As at least a quick fix, use utf8::valid as soon as data is read, and if
it's not valid, call encode_utf8 on the string, thus clearing the utf-8
flag. This may cause follow-on encoding problems, but will avoid this
crash, and the input file was broken anyway, so GIGO is a reasonable
response. (I looked at calling decode_utf8 after, but it seemed to cause
more trouble than it was worth. BTW, use open ':encoding(utf8)' avaoids
this problem, but the corrupted data later causes Storable to crash when
writing the index.)
This is a quick fix, clearly imperfect:
- It might be better to explicitly call decode_utf8 when reading files,
rather than using the IO layer.
- Data read other than by readfile() can still sneak in bad utf-8. While
ikiwiki does very little file input not using it, stdin for the CGI
would be one way.
This is necessary so that things that fork to the background,
like pinger, and inline ping, don't block other cgis from running.
Note that websetup also calls unlockwiki, before refreshing / rebuilding
the wiki. It makes perfect sense for that not to block other cgis.
* Stop busy-waiting in lockwiki, as this could delay ikiwiki from waking up
for up to one second. The bailout code is no longer needed.
* Remove support for unused optional wait parameter from lockwiki.
This fixes a problem exposed by the recent change to tags
(a2839de936). That recorded tag links as
absolute by including a leading slash in the link. The same could also be
done with an absolute wikilink.
In either case, link() would not match such links, unless the leading slash
was included in the link to match. But that's not right, because pagespecs
match absolute by default. So strip the leading slash.
Note that to keep any existing `link(/foo)` pagespecs working after this
change, the leading slash is removed from there, too.
The syslog value from the setup file is purposfully ignored when doing
ikiwiki -setup, so that it will output to stdout (while generating wrappers
that do use the syslog). But that caused -dumpsetup to not preserve
the syslog value from the setup file.
This speeds up web commits by 1/4th of a second or so, since perl does
not have to start up for the post commit hook.
perl's locking is completly FuBar, since it's impossible to tell what perl
flock() really does, and thus difficult to write code in other languages
that interoperates with perl's locking. (Let alone interoperating with
existing fcntl locking from perl...)
In this particular case, I think I was able to find a way to avoid the
insanity, mostly. The C code does a true flock(2), and if perl is using an
incompatable lock method that does not use the same locking primative at
the kernel level, then the C code's test will fail, and it will go ahead
and run the perl code. Then the perl code's test will test the right thing.
On Debian, at least lately, perl's flock() does a true flock(2), so the
optimisation does work.
Add an inject function, that can be used by plugins that want to replace
one of ikiwiki's functions with their own version. (This is a scary thing
that grubs through the symbol table, and replaces all exported occurances
of a function with the injected version.)
external: RPC functions can be injected to replace exported functions.
Removed the stupid displaytime hook, and use injection instead.
The html links already went there, but internally the links were not
recorded as absolute, which could cause confusing backlinks etc.
For example, with tagbase=tags, if blog/tags/bar existed and blog/foo was
tagged bar, it would link to /tags/bar. But, the link would be recorded
simply as a link to tags/bar, and so later blog/tags/bar would appear to
have the backlink.
* Add an underlay for javascript, and add ikiwiki.js containing some utility
code.
* toggle: Stop embedding the full toggle code on each page using it, and
move it to toggle.js in the javascript underlay.
This is the easy part of supporting foo/index.mdwn sources for page foo.
Note that if foo.mdwn exists too, there will be a warning about multiple
sources for the same page, and which is used is indeterminate.
indexpages should also cause web based editing to create index source pages
by default; this and other fallout of the option not yet implemented.
Upgrades to the new index format should be transparent.
The version field is 3, because 1 was the old textual index, 2 was the
pre-versioned format.
This also includes some efficiency improvements to index loading, by
not copying a hash and using a reference.
* htmltidy: Avoid returning undef if tidy fails. Also avoid returning the
untidied content if tidy crashes. In either case, it seems best to tidy
the content to nothing.
* htmltidy: Avoid spewing tidy errors to stderr.
Seems that the problem is that once the \nnn coming from git is converted
to a single character, decode_utf8 decides that this is a standalone
character, and not part of a multibyte utf-8 sequence, and so does nothing.
I tried playing with the utf-8 flag, but that didn't work. Instead, use
decode("utf8"), which doesn't have the same qualms, and successfully
decodes the octets into a utf-8 character.
Rant:
Think for a minute about fact that any and every program that parses git-log,
or git-show, etc output to figure out what files were in a commit needs to
contain this snippet of code, to convert from git-log's wacky output to a
regular character set:
if ($file =~ m/^"(.*)"$/) {
($file=$1) =~ s/\\([0-7]{1,3})/chr(oct($1))/eg;
}
(And it's only that "simple" if you don't care about filenames with
embedded \n or \t or other control characters.)
Does that strike anyone else as putting the parsing and conversion in the
wrong place (ie, in gitweb, ikiwiki, etc, etc)? Doesn't anyone who actually
uses git with utf-8 filenames get a bit pissed off at seeing \xxx\xxx
instead of the utf-8 in git-commit and other output?
I saw this in the wild, apparently a page was not present on disk, but was
in the aggregate db, and not marked as expired either. Not sure how that
happened, but such pages should get marked as expired since they have an
effectively zero ctime.
* edittemplate: Default new page file type to the same type as the template.
(willu)
* edittemplate: Add "silent" parameter. (Willu)
* edittemplate: Link to template, to allow creating it. (Willu)
Setup output is once again output to stdout in this case.
Implemented by stashing the verbose/syslog values set in the setup file,
and using those values in the generated wrappers, but not allowing them to take
effect during the setup operation itself, so that command-line options,
appearing before or after -setup, are honored.
Also, some cleanups to how %config is generated for wrappers, removing some
fields that do not need to be recorded inside the wrapper.
* goodstuff: Remove otl plugin from the bundle since it needs a significant
external dependency and is not commonly used. If you use otl, make sure
you explicitly enable it now.
* goodstuff: Add more, progress, and table plugins to the bundle.
I don't want to be stuck renameing it later if preprocessor directives are
turned into postprocessor directives. Also, "directives" is shorter and
clearer than "preprocessors".
* teximg: The prefix is configurable, and has changed to not include the
nonstandard mhchem by default. (willu)
* teximg: dvipng is used if available to render images. Its output is
antialiased and better than dvips. If not available, the old dvips+convert
chain will be used. (willu)
* Drop suggests on texlive-science, add suggests on dvipng.
The use of $dummy was not sufficient, because it only stuck around for the
first element after a dummy parent, and was then lost. Instead, use a
$addparent that contains the actual dummy parent, so it can be compared
with the new item to see if we're still under that parent or have moved to
another one.
The locked pages configuration is moving to a locked_pages option in the
setup file, and the allowed attachments configuration to
allowed_attachments. The admin prefs page can still be used for these, but
that's depreacted and will only be shown if there's currently a value.
Implemented for git and svn so far.
Note that rcs_commit_staged does assume that the rcs has the ability to
"stage" multiple changes for a later commit. Support for this varies, but
all we really care about is staging removals and renames, which, AFAIK, all
modern rcs's support.
can be used to avoid a security check that is a good safe default, but
problimatic overkill in some situations.
I decided to underdocument this, because the option looks ugly, and I don't
want people randomly turning it on because it looks like a good idea. So if
you need it, you'll get an error message mentioning how to fix it.
The committer's email address is not used (because leaking email addresses
is not liked by many users). Closes: #451023
A "Web-commit" trailer is added, to allow telling the difference between
web commits and direct commits.
Smileys need to be double-escaped to work, since the smiley plugin runs as
a sanitize hook, and markdown helpfully removes one level of escapes first.
There were some bugs in the smiley handling code that made escaped smileys
still be expanded. After unescaping a smiley, it needed to move pos forward
past it or the next pass would expand it.
Also, once the m//g got to the end, it seemed to loop back through and make
one more pass (a difference in perl 5.10's regexp exngine? I observed that
pos was undefined when this happened, so added a `last unless defined pos`.
This is truely horribly disgusting. CGI::tmpFileName, in current perls, is
an undocumented function (which should be a clue..) that takes the original
filename of an uploaded attachment, and returns the name of the tempfile
that CGI has stored it in.
In old perls, though, CGI::tmpFileName does not take a filename. It takes
a key from the object's {'.tmpfiles'} hash. This key is something
crazy like '*Fh::fh00001group' -- apparently the stringification of a
filehandle object.
Just to add to the fun, tmpFileName doesn't take the key, it expects a
refernce to the key. Argh?!
But the fun doesn't stop there, because in perl 5.8, CGI.pm is also broken
in two other ways. The upload() method is supposed to return a filehandle
to the temp file. It doesn't. The param() method is supposed to return
a filehandle to the temp file, that stringifies to the original filename.
It returns just the original filename, no filehandle.
Combine all these bugs, and you end up with this disgusting commit. Since
I have no way to get the filehandle, I *need* to get the tempfile name.
If I had the filehandle, I could probably pass it into tmpFileName, and
it might strigify to the right key name. But I don't, so the only way to
determine the key is to grub through the .tmpfiles hash ourselves.
And finally, one the temp file name is discovered, a filehandle can finally
be obtained by (re)opening it.
I recommend that this commit be reverted when perl 5.8 is a mercifully
faded memory.
I'm really, really, really glad I'm actually being paid for working on
this right now!
So the problem is that ikiwiki would generate a relative link like
href="colon:problem", which web browsers treat as being in the "colon:"
uri scheme.
The best fix seems to be to make url beautification fix this, by slapping
a "./" in front.
* The editpage form now uses the raw page name, not the page title, in its
'page' cgi parameter. Using the title was ambiguous and made it
impossible to tell between some pages, like "foo/bar" and "foo__47__bar",
sometimes causing the wrong page to be edited.
* This change means that some edit links need to be updated.
Force a rebuild on upgrade to this version.
* Above change also allowed really fixing escaped slashes from the blogpost
form.