Speedup of about 25% for small inlines; could be much larger for inlines of
many, or complex pages.
Not bloating memory with excessive memoization data was the key to this.
The method chosen does not squeeze out every erg of speed possible when
inlines are nested, but that's rare. It uses less memory than other
optimisation hacks (I'm looking at you,
f937c1fb80 !) already used in inline.pm.
Unlike generic meta foo tags, meta description is known to be safe, so can
be special cased to be allowed despite the html scrubber. This makes meta
description much more useful, since it is otherwise limited to being used
by other plugins like map.
My experience is that when inlines are nested, the old behavior of
generating feeds for the nested inlines was never really desired. Since the
feeds were numbered sequentially, the numbers could easily change, and it did
not make sense to subscribe to or use those feeds. And generating those nested
feeds often meant a lot of unnecessary calculation, and data being written.
So, I dropped them.
Looking back, nested feeds originally were a free side effect of properly
handing multiple feeds on one page. Of course, that is still supported.
I chose not to have it override style.css, because style.css is not really
intended to be edited; the one from the underlay is intended to be used as
a base that local.css overrides.
I chose to use a plugin rather than changing the default behavior, both
because I didn't want to have to worry about possibly breaking backwards
compatability (though this seems unlikely), and because it seemed cleaner
to not include style template parameters in the main page template code.
I suppose someone might want a way to not override the toplevel
local.css, but instead include it as well as foo/local.css. Probably the
best way to do that would be to have foo/local.css @import ../local.css
(modulo browser compatability issues). Alternatively, edit page.tmpl
to always include the toplevel local.css, or swap out this plugin for
another one.
When redirecting to a page, ie, after editing, ensure that the url is
uri-encoded. Most browsers other than MSIE don't care, but it's the right
thing to do.
The known failure case involved editing a page that had utf-8 in the name
using MSIE.
Before, the htmllink would display the link to the template as if it were a
wikilink, but what was stored was not, which could lead to confusing
situations.
git log --follow seems to sometimes show merges from before the file was
ever created. So, skip them, a file shouldn't be first created during a
merge anyway.
This will be a bit more expensive, but --getctime does not need to be fast.
And getting the real creation time a very useful when untangling blog
histories that involve renames.
I made match_* functions whose influences can vary depending on the page
matched set a special "" influence to indicate this.
Then add_depends can try just one page, and if static influences are found,
stop there.
This was tricky because of the caching, and because use_pagespec always
adds a dependency. That would have made year calendars depend on the whole
pagespec, which is overly broad. So I removed the caching, format_month,
and in format_year just look at %pagesources to see if month pages are
available.
In format_month, I make it always call use_pagespec, so each month calendar
gets the right dependency and any influcences added. This means a bit more
work, but the added work is fairly minimal, and presence dependencies
remove a *lot* of work it used to do.
(100% untested!)
This dependency was missing before switching to use_pagespec.
It is correct to add it, but it needs to be combined with the regular
"pages" dependency to ensure that it does not match extra pages.
(Also fixed its dependency type.)
Benchmarking refresh of a a wiki with 25 thousand pages showed
file_pruned() using most of the time. But, when refreshing, ikiwiki already
knows about nearly all the files. So we can skip calling file_pruned() for
those it knows about. While tricky to do, this sped up a refresh (that
otherwise does no work) by 10-50%.
If a pagespec fails to match, I had been throwing the influences away, but
that is not right. Consider `backlink(foo)`, where foo does not exist.
It still needs to be added as an influence, because if it is created, it
will influence the pagespec to match.
But with that fix, `link(bar)` had as influences all pages, whether they
link to bar or not. Which is not necessary, because modifiying a page to
add a link to bar will directly cause the pagespec to match.
So, in match_link (and all the match_* functions for page metadata),
only return an influence if the match succeeds.
match_backlink had been implemented as the inverse of match_link, but that
is no longer completly true. While match_link does not return an influence
on failure, match_backlink does.
match_created_before/after also return the influence on failure, this way
if created_after(foo) currently fails because foo does not exist, it will
still update the page with the pagespec if foo is created.
This is very common, and the code has to test each type differently, since
the list of candidates to test, as well as the test, will vary per type.
Much happier with this code now.
This new method for determining when links on pages
have changed should be more efficient, since it avoids
double calculation of the bestlinks.
It also allows collecting data about the old links, before
the scan pass, so the data is accurate when pages move around
and bestlinks change.
Also got some code back to a saner indent level.
This makes it more efficient.
It also fixes the same bug that I fixed in orphans recently,
that only changes to the set of displayed pages were considered (or amoung),
which missed changes to links on other pages to those.
Probably this bug was never noticed because pagestats is most often put
on a blog type page, which gets updated anyway when posts change,
and thus the tag cloud was updated.
This makes it more efficient.
It also fixes a longstanding bug, where if only a small set of pages were
considered by orphans, changes to links on other pages failed to cause an
update.
Involved some code refactoring so that same code that detects
link changes for backlinks updating can be used for link dependency
checking. The nice thing is that link dep checking is thus
comopletly free!
Preliminary support, anyway.
If a dependency only includes DEPEND_EXISTS, then only changes that
involved adding or deleting a page can trigger it.
This is complicated by internal pages, since the code did not previously
differentiate between add, delete, and change of internal pages.
Now it tracks change separately from add+delete, so DEPEND_EXISTS pagespecs
that actually match internal pages (which will probably be quite rare in
practice) should work.
As soon as a change happens, we know we will need to rescan all
dependencies from the start, so bail out of the current scan partway to
avoid doing redundant work.
Only problem with this is that ikiwiki sometimes ends up printing out
dependencies that, while correct, are not obvious. Before:
building B, which depends on A
building C, which depends on A
building D, which depends on A
After:
building B, which depends on A
building C, which depends on B
building D, which depends on C
I had assumed that an image shown full size did not need add_depends, since
a change would not need a change to the displaying page.
But this is not true if the image is modified and its size changes. Then
the page needs to update its img tag to reflect the current size.
If an image was resized smaller, with width and height specified to values
that did not fit its aspect ratio, the image tag with/height were not
adjusted to the actual size imagemagick chooses.
This was broken by 03449610d6.
To fix it right, it unfortunatly needs to always read the src image now,
in order to determine if the image is being displayed larger, or resized
smaller. When resized smaller, it then always uses the size of the
thumbnail, while for larger it calculates the size.
(Only way to get rid of this sometimes extra image read would be to change
it to not allow displaying images larger.)
Through a complex chain of circumstances, that filtering was causing
dumpsetup to trigger undefined warning messages from the po plugin. But
anyway, munging the otl in htmlize is less error-prone and less expensive,
a win all around.
Loading and use of IkiWiki::Receive can all be pushed into the git plugin,
rather than scattered around.
I had at first wanted to make a receive plugin and move it there,
but a plugin was not a good fit; you don't want users to have to manually
load it, and making the git plugin load the receive plugin at the right
times would need more, and ugly code.
calls are warranted. They shouldn't modify the caller's working directory,
though. Use File::chdir to keep the scope of the changes subroutine-local.
The tests now pass without resetting the working directory.
* In Wrapper.pm, add a new hook "wrapperargcheck" to examine argc/argv
and return success or failure. In the failure case, the wrapper
terminates.
* In cvs.pm, implement the new hook to return failure if a directory is
being cvs added.
having to quote, and the possible use of the shell) sucks. Stop
passing args to cvs_runcvs() as an arrayref, since that also sucks
(and was a sop to IPC::Cmd). Instead, use Joey's construction for
temporarily redirecting stderr to /dev/null. Much much simpler and
better. Works on my laptop with bozohttpd, now to test on the NetBSD
wiki.
TeX has configuration options that prevent unsafe things like shell
escapes and insecure file reads/writes. Turn all of them on.
teximg's regex-based blacklist does not suffice. For instance:
[[!teximg code="""
\catcode`\%=0
%input{/etc/passwd}
"""]]
Remove the blacklist, since the TeX configuration options seal off the
underlying mechanisms more safely, and the blacklist blocks other TeX
commands that can prove useful.
Although imagemagick handles even really large sizes sanely, using a page
file, doing so would just waste time and disk space, since the browser
can be told to resize it larger.
checkconfig can run more than once in a single ikiwiki run if setup is
building wrappers. That clobbered the origsub value for bestlink, leading
to infinite recursion
It's not "exact" since case munging has to be done, and I think
"simple" captures the optimisation better.</pedant>
With apologies to smcv, who probably has to rebuild his wiki now.
Let E be the number of dependencies per page of the form "A depends on B and
nothing else", let D be the number of other dependencies per page,
let P be the total number of pages, and let C be the number of changed
pages in a refresh.
This patch should speed up a refresh from O(E*C*P + D*C*P) to
O(C + E*P + D*C*P), assuming that hash lookups are O(1).
In practice, plugins like inline and map produce a lot of these very simple
dependencies, and my album plugin's combination of inline with a large
number of pages causes it to suffer particularly badly.
In testing on a wiki with about 7000 objects (3500 full pages, 3500
images), a full rebuild continued to take about 5:30, and a refresh
after touching about 350 pages and 350 images reduced from 5:30 to 1:30.
As with my previous optimizations, this change will result in downgrades not
working correctly until the wiki is rebuilt.
This is unnecessary and just slows us down (by a factor of 2, in the
pessimal case where every page has an inline with pagenames); it's also
not possible to optimize it into add_depends_exact calls.
Set rootpage to the non-l10n'd rootpage parameter if it is set,
else to the masterpage of the linking page.
Signed-off-by: intrigeri <intrigeri@boum.org>
The po plugin's injected bestlink must do something special when called by this
exact part of inline's code.
Signed-off-by: intrigeri <intrigeri@boum.org>
... else, the recentchanges page shows a link such as "sandbox.es". But,
clicking on it goes to the English (or negotiated language) version of the page.
It is better in this one case if the link goes direct to the translated version
of the page.
(cherry picked from commit 496e8523c6)
... else, the recentchanges page shows a link such as "sandbox.es". But,
clicking on it goes to the English (or negotiated language) version of the page.
It is better in this one case if the link goes direct to the translated version
of the page.
This should be more efficient than pagespec_match_list since it short-circuits
after the first match is found.
The other problem with using pagespec_match_list here is it may throw an
error if a bad or failing pagespec somehow got into the dependencies.
The new dependency handling works better (eliminates more duplicates) if
dependencies are split up. On the same wiki mentioned in the previous
commit, this saves about a second (i.e. 4%) on the same test.
On a large wiki you can spend a lot of time reading through large lists
of dependencies to see whether files need to be rebuilt (album, with its
one-page-per-photo arrangement, suffers particularly badly from this).
The dependency list is currently a single pagespec, but it's not used like
a normal pagespec - in practice, it's a list of pagespecs joined with the
"or" operator.
Accordingly, change it to be stored as a list of pagespecs. On a wiki
with many tagged photo albums, this reduces the time to refresh after
`touch tags/*.mdwn` from about 31 to 25 seconds.
Getting the benefit of this change on an existing wiki requires a rebuild.
can evaluate them, check them in the wrapper right off the bat.
This doesn't prevent the deadlock in web commits that need to cvs
add directories, but I'm committing so Joey can take a look if he
wants.