git log --follow seems to sometimes show merges from before the file was
ever created. So, skip them, a file shouldn't be first created during a
merge anyway.
This will be a bit more expensive, but --getctime does not need to be fast.
And getting the real creation time a very useful when untangling blog
histories that involve renames.
I made match_* functions whose influences can vary depending on the page
matched set a special "" influence to indicate this.
Then add_depends can try just one page, and if static influences are found,
stop there.
This was tricky because of the caching, and because use_pagespec always
adds a dependency. That would have made year calendars depend on the whole
pagespec, which is overly broad. So I removed the caching, format_month,
and in format_year just look at %pagesources to see if month pages are
available.
In format_month, I make it always call use_pagespec, so each month calendar
gets the right dependency and any influcences added. This means a bit more
work, but the added work is fairly minimal, and presence dependencies
remove a *lot* of work it used to do.
(100% untested!)
This dependency was missing before switching to use_pagespec.
It is correct to add it, but it needs to be combined with the regular
"pages" dependency to ensure that it does not match extra pages.
(Also fixed its dependency type.)
Benchmarking refresh of a a wiki with 25 thousand pages showed
file_pruned() using most of the time. But, when refreshing, ikiwiki already
knows about nearly all the files. So we can skip calling file_pruned() for
those it knows about. While tricky to do, this sped up a refresh (that
otherwise does no work) by 10-50%.
If a pagespec fails to match, I had been throwing the influences away, but
that is not right. Consider `backlink(foo)`, where foo does not exist.
It still needs to be added as an influence, because if it is created, it
will influence the pagespec to match.
But with that fix, `link(bar)` had as influences all pages, whether they
link to bar or not. Which is not necessary, because modifiying a page to
add a link to bar will directly cause the pagespec to match.
So, in match_link (and all the match_* functions for page metadata),
only return an influence if the match succeeds.
match_backlink had been implemented as the inverse of match_link, but that
is no longer completly true. While match_link does not return an influence
on failure, match_backlink does.
match_created_before/after also return the influence on failure, this way
if created_after(foo) currently fails because foo does not exist, it will
still update the page with the pagespec if foo is created.
This is very common, and the code has to test each type differently, since
the list of candidates to test, as well as the test, will vary per type.
Much happier with this code now.
This new method for determining when links on pages
have changed should be more efficient, since it avoids
double calculation of the bestlinks.
It also allows collecting data about the old links, before
the scan pass, so the data is accurate when pages move around
and bestlinks change.
Also got some code back to a saner indent level.