I made match_* functions whose influences can vary depending on the page
matched set a special "" influence to indicate this.
Then add_depends can try just one page, and if static influences are found,
stop there.
Thought of a cleaner way to accumulate all influences in
pagespec_match_list, using the pagespec_match result object as an
accumulator.
(This also accumulates all influences from failed matches, rather than just
one failed match. I'm not sure if the old method was correct.)
Benchmarking refresh of a a wiki with 25 thousand pages showed
file_pruned() using most of the time. But, when refreshing, ikiwiki already
knows about nearly all the files. So we can skip calling file_pruned() for
those it knows about. While tricky to do, this sped up a refresh (that
otherwise does no work) by 10-50%.
If a pagespec fails to match, I had been throwing the influences away, but
that is not right. Consider `backlink(foo)`, where foo does not exist.
It still needs to be added as an influence, because if it is created, it
will influence the pagespec to match.
But with that fix, `link(bar)` had as influences all pages, whether they
link to bar or not. Which is not necessary, because modifiying a page to
add a link to bar will directly cause the pagespec to match.
So, in match_link (and all the match_* functions for page metadata),
only return an influence if the match succeeds.
match_backlink had been implemented as the inverse of match_link, but that
is no longer completly true. While match_link does not return an influence
on failure, match_backlink does.
match_created_before/after also return the influence on failure, this way
if created_after(foo) currently fails because foo does not exist, it will
still update the page with the pagespec if foo is created.
The hash will be used used to record a set of pages that influenced the
result of a pagespec match.
The influences are merged together when boolean and/or are encountered
in a pagespec. That means using a non-short-circuiting OR operator. And
so I use & and | when translating pagespecs, since those bitwise operators
can be overloaded. ("and" and "or" cannot, apparently).
Involved some code refactoring so that same code that detects
link changes for backlinks updating can be used for link dependency
checking. The nice thing is that link dep checking is thus
comopletly free!
When adding a contentless dependency, the pagespec also needs to be one
that does not look at any page content information.
As a first approximation of that, only allow glob-based pagespecs in
contentless dependencies. While there are probably a few other types of
pagespecs that can match contentless, this will work for most of them.
Dependency types are represented by bits in the values of the %depends
and %depends_simple hashes.
Change the dependslist array saved to the index to a depends hash.
depends_simple is also converted from an array to a hash.
Note that the depends field used to be a string, and we still
have compat code to handle upgrades from that, as well as from the arrays.
I didn't use ikiwiki-transition because I don't want ikiwiki to break if
users forget to run it; also we're going to recommend a full rebuild on
upgrade to this version to get the improved dependency handling. So
this compat code can be removed or moved to ikiwiki-transition later.
Here I was bitten by perl's aliasing of foreach variables
to the loop array contents, and match_link accidentially changed
the contents of %links.
In Jon's testcase, a tag added an absolute link, which was
made relative by the above bug, and then the link was added
again in preprocess, and turned into a duplicate.
I weakended the regexp, so this matches ipv6 addresses too. It does not
ensure that the address is valid, but that should not matter here.
Note that addresses ending in "::" are not matched, so eg, the unspecified
address will not match -- but should never appear here anyway.
It's not "exact" since case munging has to be done, and I think
"simple" captures the optimisation better.</pedant>
With apologies to smcv, who probably has to rebuild his wiki now.
Let E be the number of dependencies per page of the form "A depends on B and
nothing else", let D be the number of other dependencies per page,
let P be the total number of pages, and let C be the number of changed
pages in a refresh.
This patch should speed up a refresh from O(E*C*P + D*C*P) to
O(C + E*P + D*C*P), assuming that hash lookups are O(1).
In practice, plugins like inline and map produce a lot of these very simple
dependencies, and my album plugin's combination of inline with a large
number of pages causes it to suffer particularly badly.
In testing on a wiki with about 7000 objects (3500 full pages, 3500
images), a full rebuild continued to take about 5:30, and a refresh
after touching about 350 pages and 350 images reduced from 5:30 to 1:30.
As with my previous optimizations, this change will result in downgrades not
working correctly until the wiki is rebuilt.
Now that dependencies are a list of pagespecs with an implicit "or"
operation, there's no need to try to merge pagespecs under normal use.
ikiwiki-transition contains the only use of the function, so move
it there rather than deleting it entirely (it's used to concatenate all
admins' lists of locked pages).
On a large wiki you can spend a lot of time reading through large lists
of dependencies to see whether files need to be rebuilt (album, with its
one-page-per-photo arrangement, suffers particularly badly from this).
The dependency list is currently a single pagespec, but it's not used like
a normal pagespec - in practice, it's a list of pagespecs joined with the
"or" operator.
Accordingly, change it to be stored as a list of pagespecs. On a wiki
with many tagged photo albums, this reduces the time to refresh after
`touch tags/*.mdwn` from about 31 to 25 seconds.
Getting the benefit of this change on an existing wiki requires a rebuild.