The simple implementation of this, which I'd prefer to use, would be:
if we can import LWPx::ParanoidAgent, use it; otherwise, use
LWP::UserAgent.
However, aggregate has historically worked with proxies, and
LWPx::ParanoidAgent quite reasonably refuses to work with proxies
(because it can't know whether those proxies are going to do the same
filtering that LWPx::ParanoidAgent would).
Signed-off-by: Simon McVittie <smcv@debian.org>
When an aggregated post lacked a title, the code first prepended the
$feed->{dir} to it, and only then checked if it had zero length. So,
that check could never succeed and it was possible to end up with
$page="dir/", and writing to that would of course fail.
(Same problem could also occur when the whole title got sanitized away by the
wiki_file_regexp.)
Fixed by simply checking earlier if $page is empty.
Based on a patch by Alexandre Oliva which got lost in a maze of email
folders all alike for over two years despite him mentioning it to me at
least once in person.
As noted in the Try::Tiny man page, eval/$@ can be quite awkward in
corner cases, because $@ has the same properties and problems as C's
errno. While writing a regression test for definetemplate
in which it couldn't find an appropriate template, I received
<span class="error">Error: failed to process template
<span class="createlink">deftmpl</span> </span>
instead of the intended
<span class="error">Error: failed to process template
<span class="createlink">deftmpl</span> template deftmpl not
found</span>
which turned out to be because the "catch"-analogous block called
gettext before it used $@, and gettext can call define_gettext,
which uses eval.
This commit alters all current "catch"-like blocks that use $@, except
those that just do trivial things with $@ (string interpolation, string
concatenation) and call a function (die, error, print, etc.)
While here, mollify http://validator.w3.org/feed/ and
s/dcterms:creator/dc:creator/g, which happens to make rss2email see
and do nice things with authors.
Previously, prune("wiki/srcdir/sandbox/test.mdwn") could delete srcdir
or even wiki, if they happened to be empty. This is rarely what you
want: there's usually some base directory (destdir, srcdir, transientdir
or another subdirectory of wikistatedir) beyond which you do not want to
delete.
Two problems fixed:
1. Files are written with a .ikiwiki-new suffix, which has to be taken into
account.
2. Need to count length of bytes, not of unicode characters.
Since the plugin abuses the checkconfig hook to launch aggregation when in
--aggregate mode, it should give other plugins that have checkconfig hooks
a chance to run before they are possibly used in rendering the aggregated
content.
Besides being wrong to do, this could lead to the wrong item
being expired, as follows: If B is added and at the same time
A is changed, then A's ctime may be set to the current time,
while B's is set to its creation time. Thus the new item, A,
is incorrectly removed as older.
(This interacted especially badly with the bug fixed by
90b4d079605b72bb50d1da41402d994960e10937.)
The aggregate state merge code neglected to merge changes to the md5
field of an item. Therefore, if an item's md5 changed after initial
aggregation, it would be updated, and rewritten, each time thereafter.
This was wasteful and indirectly led to some expire problems.
See [[bugs/Aggregated_Atom_feeds_are_double-encoded]]. By default,
XML::Atom outputs strings of UTF-8 bytes with the Perl UTF8 flag stripped
off, which IkiWiki assumes to be Latin-1 and re-encodes as UTF-8 on
output. XML::Feed does not currently (0.41-1) set the magic variable to
change this behaviour (I've filed a bug on CPAN), but IkiWiki can
usefully set the same variable as a workaround.
holger reported that decode_utf8 was crashing with perl 5.8.8. Earlier, I
thought that passing 0 to the function avoided this with old perls, but
that was apparently not enough, it still crashes. So, put it inside the
eval, so we can at least recover from it crashing.
newpagefile.
Note that newpagefile is not used here (or in recentchanges) because
the internal use pages they generate are transient and unlikely to
benefit from being put each in their own subdir.
I saw this in the wild, apparently a page was not present on disk, but was
in the aggregate db, and not marked as expired either. Not sure how that
happened, but such pages should get marked as expired since they have an
effectively zero ctime.