Benchmarking refresh of a a wiki with 25 thousand pages showed
file_pruned() using most of the time. But, when refreshing, ikiwiki already
knows about nearly all the files. So we can skip calling file_pruned() for
those it knows about. While tricky to do, this sped up a refresh (that
otherwise does no work) by 10-50%.
This makes it more efficient.
It also fixes the same bug that I fixed in orphans recently,
that only changes to the set of displayed pages were considered (or amoung),
which missed changes to links on other pages to those.
Probably this bug was never noticed because pagestats is most often put
on a blog type page, which gets updated anyway when posts change,
and thus the tag cloud was updated.
This makes it more efficient.
It also fixes a longstanding bug, where if only a small set of pages were
considered by orphans, changes to links on other pages failed to cause an
update.
Here I was bitten by perl's aliasing of foreach variables
to the loop array contents, and match_link accidentially changed
the contents of %links.
In Jon's testcase, a tag added an absolute link, which was
made relative by the above bug, and then the link was added
again in preprocess, and turned into a duplicate.
I had assumed that an image shown full size did not need add_depends, since
a change would not need a change to the displaying page.
But this is not true if the image is modified and its size changes. Then
the page needs to update its img tag to reflect the current size.
If an image was resized smaller, with width and height specified to values
that did not fit its aspect ratio, the image tag with/height were not
adjusted to the actual size imagemagick chooses.
This was broken by 03449610d6.
To fix it right, it unfortunatly needs to always read the src image now,
in order to determine if the image is being displayed larger, or resized
smaller. When resized smaller, it then always uses the size of the
thumbnail, while for larger it calculates the size.
(Only way to get rid of this sometimes extra image read would be to change
it to not allow displaying images larger.)
I weakended the regexp, so this matches ipv6 addresses too. It does not
ensure that the address is valid, but that should not matter here.
Note that addresses ending in "::" are not matched, so eg, the unspecified
address will not match -- but should never appear here anyway.