This new method for determining when links on pages
have changed should be more efficient, since it avoids
double calculation of the bestlinks.
It also allows collecting data about the old links, before
the scan pass, so the data is accurate when pages move around
and bestlinks change.
Also got some code back to a saner indent level.
Involved some code refactoring so that same code that detects
link changes for backlinks updating can be used for link dependency
checking. The nice thing is that link dep checking is thus
comopletly free!
Preliminary support, anyway.
If a dependency only includes DEPEND_EXISTS, then only changes that
involved adding or deleting a page can trigger it.
This is complicated by internal pages, since the code did not previously
differentiate between add, delete, and change of internal pages.
Now it tracks change separately from add+delete, so DEPEND_EXISTS pagespecs
that actually match internal pages (which will probably be quite rare in
practice) should work.
As soon as a change happens, we know we will need to rescan all
dependencies from the start, so bail out of the current scan partway to
avoid doing redundant work.
Only problem with this is that ikiwiki sometimes ends up printing out
dependencies that, while correct, are not obvious. Before:
building B, which depends on A
building C, which depends on A
building D, which depends on A
After:
building B, which depends on A
building C, which depends on B
building D, which depends on C
It's not "exact" since case munging has to be done, and I think
"simple" captures the optimisation better.</pedant>
With apologies to smcv, who probably has to rebuild his wiki now.
Let E be the number of dependencies per page of the form "A depends on B and
nothing else", let D be the number of other dependencies per page,
let P be the total number of pages, and let C be the number of changed
pages in a refresh.
This patch should speed up a refresh from O(E*C*P + D*C*P) to
O(C + E*P + D*C*P), assuming that hash lookups are O(1).
In practice, plugins like inline and map produce a lot of these very simple
dependencies, and my album plugin's combination of inline with a large
number of pages causes it to suffer particularly badly.
In testing on a wiki with about 7000 objects (3500 full pages, 3500
images), a full rebuild continued to take about 5:30, and a refresh
after touching about 350 pages and 350 images reduced from 5:30 to 1:30.
As with my previous optimizations, this change will result in downgrades not
working correctly until the wiki is rebuilt.
This should be more efficient than pagespec_match_list since it short-circuits
after the first match is found.
The other problem with using pagespec_match_list here is it may throw an
error if a bad or failing pagespec somehow got into the dependencies.
On a large wiki you can spend a lot of time reading through large lists
of dependencies to see whether files need to be rebuilt (album, with its
one-page-per-photo arrangement, suffers particularly badly from this).
The dependency list is currently a single pagespec, but it's not used like
a normal pagespec - in practice, it's a list of pagespecs joined with the
"or" operator.
Accordingly, change it to be stored as a list of pagespecs. On a wiki
with many tagged photo albums, this reduces the time to refresh after
`touch tags/*.mdwn` from about 31 to 25 seconds.
Getting the benefit of this change on an existing wiki requires a rebuild.
During backlink calulation, all links are examined and broken links can
be detected for free, so store a list of broken links and have brokenlinks
use it.
Exposing the %brokenlinks structure is a bit ugly, but the speedup seems
worth it: Around 1 second for wikis the size of the doc wiki that use
brokenlinks.
By adding this setting, we get both more configurability, and a minor
optimisation too, since gettext does not need to be called continually
to get the Discussion value.
Previously, if a page changed its type but not its mtime
(e.g. mv page.txt page.mdwn), then it would not be rebuilt.
Now, check if the source of a page has changed,
in which case force a rebuild of that page.
(cherry picked from commit b6a3b8a683fed7a7f6d77a5b3f2dfbd14c849843)
can be used to avoid a security check that is a good safe default, but
problimatic overkill in some situations.
I decided to underdocument this, because the option looks ugly, and I don't
want people randomly turning it on because it looks like a good idea. So if
you need it, you'll get an error message mentioning how to fix it.
* Renamed to parentlinks every single variable or function called
pedigree
* Removed the parentlinks function from Render.pm
* Enabled the new parentlinks plugin by default
* Adapted testsuite and documentation to reflate the above facts
Signed-off-by: intrigeri <intrigeri@boum.org>