Benchmarking refresh of a a wiki with 25 thousand pages showed
file_pruned() using most of the time. But, when refreshing, ikiwiki already
knows about nearly all the files. So we can skip calling file_pruned() for
those it knows about. While tricky to do, this sped up a refresh (that
otherwise does no work) by 10-50%.
This is very common, and the code has to test each type differently, since
the list of candidates to test, as well as the test, will vary per type.
Much happier with this code now.
This new method for determining when links on pages
have changed should be more efficient, since it avoids
double calculation of the bestlinks.
It also allows collecting data about the old links, before
the scan pass, so the data is accurate when pages move around
and bestlinks change.
Also got some code back to a saner indent level.
Involved some code refactoring so that same code that detects
link changes for backlinks updating can be used for link dependency
checking. The nice thing is that link dep checking is thus
comopletly free!
Preliminary support, anyway.
If a dependency only includes DEPEND_EXISTS, then only changes that
involved adding or deleting a page can trigger it.
This is complicated by internal pages, since the code did not previously
differentiate between add, delete, and change of internal pages.
Now it tracks change separately from add+delete, so DEPEND_EXISTS pagespecs
that actually match internal pages (which will probably be quite rare in
practice) should work.
As soon as a change happens, we know we will need to rescan all
dependencies from the start, so bail out of the current scan partway to
avoid doing redundant work.
Only problem with this is that ikiwiki sometimes ends up printing out
dependencies that, while correct, are not obvious. Before:
building B, which depends on A
building C, which depends on A
building D, which depends on A
After:
building B, which depends on A
building C, which depends on B
building D, which depends on C
It's not "exact" since case munging has to be done, and I think
"simple" captures the optimisation better.</pedant>
With apologies to smcv, who probably has to rebuild his wiki now.
Let E be the number of dependencies per page of the form "A depends on B and
nothing else", let D be the number of other dependencies per page,
let P be the total number of pages, and let C be the number of changed
pages in a refresh.
This patch should speed up a refresh from O(E*C*P + D*C*P) to
O(C + E*P + D*C*P), assuming that hash lookups are O(1).
In practice, plugins like inline and map produce a lot of these very simple
dependencies, and my album plugin's combination of inline with a large
number of pages causes it to suffer particularly badly.
In testing on a wiki with about 7000 objects (3500 full pages, 3500
images), a full rebuild continued to take about 5:30, and a refresh
after touching about 350 pages and 350 images reduced from 5:30 to 1:30.
As with my previous optimizations, this change will result in downgrades not
working correctly until the wiki is rebuilt.
This should be more efficient than pagespec_match_list since it short-circuits
after the first match is found.
The other problem with using pagespec_match_list here is it may throw an
error if a bad or failing pagespec somehow got into the dependencies.
On a large wiki you can spend a lot of time reading through large lists
of dependencies to see whether files need to be rebuilt (album, with its
one-page-per-photo arrangement, suffers particularly badly from this).
The dependency list is currently a single pagespec, but it's not used like
a normal pagespec - in practice, it's a list of pagespecs joined with the
"or" operator.
Accordingly, change it to be stored as a list of pagespecs. On a wiki
with many tagged photo albums, this reduces the time to refresh after
`touch tags/*.mdwn` from about 31 to 25 seconds.
Getting the benefit of this change on an existing wiki requires a rebuild.
During backlink calulation, all links are examined and broken links can
be detected for free, so store a list of broken links and have brokenlinks
use it.
Exposing the %brokenlinks structure is a bit ugly, but the speedup seems
worth it: Around 1 second for wikis the size of the doc wiki that use
brokenlinks.
By adding this setting, we get both more configurability, and a minor
optimisation too, since gettext does not need to be called continually
to get the Discussion value.
Previously, if a page changed its type but not its mtime
(e.g. mv page.txt page.mdwn), then it would not be rebuilt.
Now, check if the source of a page has changed,
in which case force a rebuild of that page.
(cherry picked from commit b6a3b8a683fed7a7f6d77a5b3f2dfbd14c849843)
can be used to avoid a security check that is a good safe default, but
problimatic overkill in some situations.
I decided to underdocument this, because the option looks ugly, and I don't
want people randomly turning it on because it looks like a good idea. So if
you need it, you'll get an error message mentioning how to fix it.
* Renamed to parentlinks every single variable or function called
pedigree
* Removed the parentlinks function from Render.pm
* Enabled the new parentlinks plugin by default
* Adapted testsuite and documentation to reflate the above facts
Signed-off-by: intrigeri <intrigeri@boum.org>
If hardlinks are enabled, it would hardlink files from the underlay. That
was sorta annoying if you tried to edit by hand for some reason, so let's
not. Files that are hardlinked should be rare enough that a few extra stats
won't hurt.
* The editpage form now uses the raw page name, not the page title, in its
'page' cgi parameter. Using the title was ambiguous and made it
impossible to tell between some pages, like "foo/bar" and "foo__47__bar",
sometimes causing the wrong page to be edited.
* This change means that some edit links need to be updated.
Force a rebuild on upgrade to this version.
* Above change also allowed really fixing escaped slashes from the blogpost
form.
Because the search plugin needed it, also because it's one of the few
plugins that didn't already have it.
I also considered adding it to htmlize, but I really cannot imagine caring
what the destpage is when htmlizing. (I'll probably be poven wrong later.)
since this leads to too many problems with web caching, especially with
inlined pages. Properly solving this would involve tracking every page
that contributes to a page's content and using the youngest of them all,
as well as special cases for things like the version plugin, and it's just
too complex to do.
license, and copyright. This can be used to create custom RecentChanges.
* meta: To support the pagespec functions, metadata about pages has to be
retained as pagestate.
* Fix encoding bug when pagestate values contained spaces.
I kept it to a simple global configuration, rather than using the
preprocessor directive for recentchanges, because that had chicken and egg
problems and seemed overcomplicated. This should work reasonably well,
though it would be good to add some more metadata so that more customised
recentchanges pages can be made.
This makes it a lot quicker to deal with lots of recentchanges pages
appearing and disappearing. It avoids needing to clutter up pagespecs with
exclusions for those pages, by making normal pagespecs not match them.
This is important to do because until will_render is called, ikiwiki doesn't
know that the page exists. This avoids recentchanges re-writing every change
page every run.
* Plugins can add new directories to the search path with the add_underlay
function.
* Split out smiley underlay files into a separate underlay, so if the plugin
isn't used, the wiki isn't bloated with all those files.
for extended pagespecs. The old calling convention will still work for
back-compat for now.
* The calling convention for functions in the IkiWiki::PageSpec namespace
has changed so they are passed named parameters.
* Plugin interface version increased to 2.00 since I don't anticipate any
more interface changes before 2.0.
on and supported creating it (especially Tumov). This adds a "usedirs"
option that makes ikiwiki use foo/index.html instead of foo.html as
output page names. It is not yet enabled by default.
* Renamed %oldpagemtime to a more accurately named %pagemtime and fix it to
actually store pages' mtimes.
* Add "mtime" sort parameter to inline plugin.
were titlepage escaped in the urls, and then doubly escaped by the CGI
when editing. To fix this, I removed the titlepage escaping in the edit
urls.
* That means that *every edit link* on the wiki is potentially changed.
Rebuilding wikis on upgrade to this version therefore necessary; enabled
that in postinst.