Avoid adding the page matched against as an influence for
currently failing pagespec matches, while still adding
any other influences.
This avoids bloating depends_simple with lots of bogus influences when
matching eg, "!link(done)". It's only necessary for the page being tested
to be an influence of that if the page matches.
With this change, the <span> with class createlink is always created
around the link text, even when no CGI URL is defined. This allows
styling of these 'links' in this case too. The same class is used as when
CGI URL is defined so that e.g. clones of the same ikiwiki, one with CGI
and one without, display in the same way (modulo the missing question mark
link).
(cherry picked from commit 290d1b498f00f63e6d41218ddb76d87e68ed5081)
Many calls to file_prune were incorrectly calling it with 2 parameters.
In cases where the filename being checked is relative to the srcdir,
that is not needed.
Made absolute filenames be pruned. (This won't work for the 2 parameter call
style.)
Made add_autofile take a generator function, and just register the
autofile, for later possible creation. The testing is moved into Render,
which allows cleaning up some stuff.
* Automatically run --gettime the first time ikiwiki is run on
a given srcdir.
* Optimise --gettime for git, so it's appropriatly screamingly
fast. (This could be done for other backends too.)
* However, --gettime for git no longer follows renames.
* Use above to fix up timestamps on docwiki, as well as ensure that
timestamps on basewiki files shipped in the deb are sane.
* Rename --getctime to --gettime. (The old name still works for
backwards compatability.)
* --gettime now also looks up last modification time.
* Add rcs_getmtime to plugin API; currently only implemented
for git.
This can be a lot faster, since huge numbers of pages are not sorted
only to mostly be thrown away. It sped up a build of my blog by at least
5 minutes.
Both markdown and tidy add paragraph tags around text, that needs to be
stripped when the text is a short, one line fragment that is being inserted
into a larger page. tidy also adds several newlines to the end, and this
broke removal of the paragraph tags.
The reason to do this is basically a user interaction design decision.
It is achieved by adding an entry, associated to the creating plugin, to
%pagestate. To find out if files were deleted a new global hash %del_hash is
%introduced.
add_autofile has to have checks, whether to create the file, anyway, so this
will make things more consistent.
Correcter check for the result of verify_src_file().
Cosmetic rename of a variable $addfile to $autofile.
pagespec_translate may set $@ if it fails to parse a pagespec, but
due to memoization, this is not reliable. If a memoized call is repeated,
and $@ is already set for some other reason previously, it will remain
set through the call to pagespec_translate.
Instead, just check if pagespec_translate returns undef.
Finally removed the last hardcoding of IkiWiki::Setup::Standard.
Take the first "IkiWiki::Setup::*" in the setup file to define the
setuptype, and remember that type to use in dumping later. (But it can be
overridden using --set, etc.)
Also, support setup file types that are not evaled.
As I was adding ngettext support, I realized I could optimize the gettext
functions by memoizing the creation of the gettext object. Note that
the object creation is still deferred until a gettext function is called,
to avoid unnecessary startup penalties on code paths that do not need
gettext.
A side benefit is that separate stub functions are no longer needed to
handle the C language case.
The objective is to provide a sensible way to let plugins add files during the
"scan stage" of the build.
Currently does a little verification and adds the file to the global array
@add_autofiles.
bestlink was looking at whether %links existed for a page in order to tell
if the page exists, but just-deleted pages still have entries in there (for
reasons it may be best not to explore). So bestlink would return
just-deleted pages. Instead, make bestlink use %pagesources.
Also, when finding a deleted page, %pagecase was not cleared of that page.
This, again, made bestlink return just-deleted pages. Now that is cleared.
Fixing bestlink exposed another issue though. The backlink calculation code
uses bestlink. So when a page was deleted, no backlinks to it are found,
and pages that really did backlink to it were not updated, and had broken
links.
To fix that, the code that actually removes deleted pages had to be split
out from find_del_files, so it can run a bit later. It is run just after
backlinks are calculated. This way, backlink calculation still sees the
deleted pages, but everything afterwards does not.
However, it does not address the original bug report that started this
whole thing, [[bugs/bestlink_returns_deleted_pages]]. Because there
bestlink is run in the needsbuild hook. And that happens before backlink
calculation, and so bestlink still returns deleted pages then. Also in the
scan hook.
If bestlink needs to work consistently during those hooks, a more involved
fix will be needed.
This avoids unnecessary influences being recorded from pagespecs
such as "link(done) and bugs/*", when a page cannot ever possibly
match.
A pagespec term that returns a value without influence is an influence
blocker. If such a blocker has a false value (possibly due to being
negated) and is ANDed with another term, it blocks that term's influence
from propigating out.
If the term is ORed, or has a true value, it does not block influence.
(Consider "link(done) or bugs/*" and "link(done) and !nosuchpage")
In the implementation in merge_influence, I had to be careful to never
negate $this or $other when testing if they are an influence blocker,
since negation mutates the object. Thus the slightly weird if statement.
I made match_* functions whose influences can vary depending on the page
matched set a special "" influence to indicate this.
Then add_depends can try just one page, and if static influences are found,
stop there.
Thought of a cleaner way to accumulate all influences in
pagespec_match_list, using the pagespec_match result object as an
accumulator.
(This also accumulates all influences from failed matches, rather than just
one failed match. I'm not sure if the old method was correct.)
Benchmarking refresh of a a wiki with 25 thousand pages showed
file_pruned() using most of the time. But, when refreshing, ikiwiki already
knows about nearly all the files. So we can skip calling file_pruned() for
those it knows about. While tricky to do, this sped up a refresh (that
otherwise does no work) by 10-50%.
If a pagespec fails to match, I had been throwing the influences away, but
that is not right. Consider `backlink(foo)`, where foo does not exist.
It still needs to be added as an influence, because if it is created, it
will influence the pagespec to match.
But with that fix, `link(bar)` had as influences all pages, whether they
link to bar or not. Which is not necessary, because modifiying a page to
add a link to bar will directly cause the pagespec to match.
So, in match_link (and all the match_* functions for page metadata),
only return an influence if the match succeeds.
match_backlink had been implemented as the inverse of match_link, but that
is no longer completly true. While match_link does not return an influence
on failure, match_backlink does.
match_created_before/after also return the influence on failure, this way
if created_after(foo) currently fails because foo does not exist, it will
still update the page with the pagespec if foo is created.
The hash will be used used to record a set of pages that influenced the
result of a pagespec match.
The influences are merged together when boolean and/or are encountered
in a pagespec. That means using a non-short-circuiting OR operator. And
so I use & and | when translating pagespecs, since those bitwise operators
can be overloaded. ("and" and "or" cannot, apparently).
Involved some code refactoring so that same code that detects
link changes for backlinks updating can be used for link dependency
checking. The nice thing is that link dep checking is thus
comopletly free!
When adding a contentless dependency, the pagespec also needs to be one
that does not look at any page content information.
As a first approximation of that, only allow glob-based pagespecs in
contentless dependencies. While there are probably a few other types of
pagespecs that can match contentless, this will work for most of them.
Dependency types are represented by bits in the values of the %depends
and %depends_simple hashes.
Change the dependslist array saved to the index to a depends hash.
depends_simple is also converted from an array to a hash.
Note that the depends field used to be a string, and we still
have compat code to handle upgrades from that, as well as from the arrays.
I didn't use ikiwiki-transition because I don't want ikiwiki to break if
users forget to run it; also we're going to recommend a full rebuild on
upgrade to this version to get the improved dependency handling. So
this compat code can be removed or moved to ikiwiki-transition later.
Here I was bitten by perl's aliasing of foreach variables
to the loop array contents, and match_link accidentially changed
the contents of %links.
In Jon's testcase, a tag added an absolute link, which was
made relative by the above bug, and then the link was added
again in preprocess, and turned into a duplicate.
I weakended the regexp, so this matches ipv6 addresses too. It does not
ensure that the address is valid, but that should not matter here.
Note that addresses ending in "::" are not matched, so eg, the unspecified
address will not match -- but should never appear here anyway.
It's not "exact" since case munging has to be done, and I think
"simple" captures the optimisation better.</pedant>
With apologies to smcv, who probably has to rebuild his wiki now.
Let E be the number of dependencies per page of the form "A depends on B and
nothing else", let D be the number of other dependencies per page,
let P be the total number of pages, and let C be the number of changed
pages in a refresh.
This patch should speed up a refresh from O(E*C*P + D*C*P) to
O(C + E*P + D*C*P), assuming that hash lookups are O(1).
In practice, plugins like inline and map produce a lot of these very simple
dependencies, and my album plugin's combination of inline with a large
number of pages causes it to suffer particularly badly.
In testing on a wiki with about 7000 objects (3500 full pages, 3500
images), a full rebuild continued to take about 5:30, and a refresh
after touching about 350 pages and 350 images reduced from 5:30 to 1:30.
As with my previous optimizations, this change will result in downgrades not
working correctly until the wiki is rebuilt.
Now that dependencies are a list of pagespecs with an implicit "or"
operation, there's no need to try to merge pagespecs under normal use.
ikiwiki-transition contains the only use of the function, so move
it there rather than deleting it entirely (it's used to concatenate all
admins' lists of locked pages).
On a large wiki you can spend a lot of time reading through large lists
of dependencies to see whether files need to be rebuilt (album, with its
one-page-per-photo arrangement, suffers particularly badly from this).
The dependency list is currently a single pagespec, but it's not used like
a normal pagespec - in practice, it's a list of pagespecs joined with the
"or" operator.
Accordingly, change it to be stored as a list of pagespecs. On a wiki
with many tagged photo albums, this reduces the time to refresh after
`touch tags/*.mdwn` from about 31 to 25 seconds.
Getting the benefit of this change on an existing wiki requires a rebuild.
By adding this setting, we get both more configurability, and a minor
optimisation too, since gettext does not need to be called continually
to get the Discussion value.
On various sites I have two IkiWiki instances running from the same
repository: one accessible via http and only accepting openid logins,
and one accessible via authenticated https and only accepting httpauth.
The https version should still pretty-print OpenIDs seen in git history,
even though it does not itself accept OpenID logins.
The test suite was emitting a lot of ugly gettext warnings;
setting LC_ALL didn't solve the problem for all locale setups
(since ikiwiki remaps it to LANG, and ikiwiki didn't know about
the C locale).
People also seem generally annoyed by the messages when
Locale::Gettext is not installed, and I suspect will be
generally happier if it just silently doesn't localize.
The optimisation came about when I noticed that the gettext
sub was doing rather a lot of work each call just to see
if localisation is needed. We can avoid that work by caching,
and the best thing to cache is a version of the gettext sub
that does exactly the right thing.
This was slightly complicated by the locale setting,
which might need to override the original locale (or lack
thereof) after gettext has been called. So it needs to invalidate
the cache in that case. It used to do it via a global variable,
which I am happy to have also gotten rid of.
Do not allow an unterminated """ string to be treated as a series of bare
words. Fixes runaway regexp recursion/backtracking in strange situations.
(See 1d57a21c98 for test case.)
And avoid a whole class of potential security problems (though
none that I know of actually existing..), by avoiding
performing any string interpolation on user-supplied data when translating
pagespecs.
This is sorta an optimisation, and sorta a bug fix. In one
test case I have available, it can speed a page build up from 3
minutes to 3 seconds.
The root of the problem is that $links{$page} contains arrays of
links, rather than hashes of links. And when a link is found,
it is just pushed onto the array, without checking for dups.
Now, the array is emptied before scanning a page, so there
should not be a lot of opportunity for lots of duplicate links
to pile up in it. But, in some cases, they can, and if there
are hundreds of duplicate links in the array, then scanning it
for matching links, as match_link and some other code does,
becomes much more expensive than it needs to be.
Perhaps the real right fix would be to change the data structure
to a hash. But, the list of links is never accessed like that,
you always want to iterate through it.
I also looked at deduping the list in saveindex, but that does
a lot of unnecessary work, and doesn't completly solve the problem.
So, finally, I decided to add an add_link function that handles deduping,
and make ikiwiki-transition remove the old dup links.
This reverts commit 2f96c49bd1.
I forgot about internal pages. We don't want * matching them!
I left the optimisation in pagecount, where it used to live.
Internal pages probably don't matter when they're just being
counted.
* pagespec_match_list: New API function, matches pages in a list
and throws an error if the pagespec is bad.
* inline, brokenlinks, calendar, linkmap, map, orphans, pagecount,
pagestate, postsparkline: Display a handy error message if the pagespec
is erronious.
* Add IkiWiki::ErrorReason objects, and modify pagespecs to return
them in cases where they fail to match due to a configuration or syntax
error.
* inline: Display a handy error message if the inline cannot display any
pages due to such an error.
This is perhaps somewhat incomplete, as other users of pagespecs do not
display the error, and will eventually need similar modifications to inline.
I should probably factor out a pagespec_match_all function and make it throw
ErrorReasons.
The problem was introduced by the recent noextension patches.
Object autovivification caused junk to get into %htmlize,
and all keys of that showed up as page types.
Because getopt::long is used in passthrough mode, if a known
option like --wikiname that needs a parameter is specified w/o
the parameter, it will not be processed, and passed on through.
So in this case the "unknown option" message is innaccurate.
Make it slightly better by noting that the problem can be a missing
parameter.
This modification was initially done in editpage, in commit
a3726968bc, but was then lost while merging
upstream/master branch.
Signed-off-by: intrigeri <intrigeri@boum.org>
It no longer makes sense to keep these functions in editpage, because
serveral plugins now exist that use them, and users may want to disable
editpage, while leaving those plugins enabled.
Most notably, comments uses both functions, and it's entirely appropriate
to disable editpage but still want to have comments enabled.
Less likely, attachments, rename, and remove all use check_canedit -- but
it would be unusual indeed to want to use these w/o editpage.
It used to replace unknown functions with "0" when translating a pagespec.
Instead, replace it with a FailReason object. This way, the pagespec will
still evaluate as before (possibly successfully if other terminals exist),
but a human-readable error will be shown if the result is displayed.
Also, an empty pagespec used to be replaced with "0", to avoid a eval
error. Also use a FailReason here.
It seems to be a failing of i18n in unix that the translation stops at the
commands and the parameters to them, and ikiwiki is no exception with its
currently untranslated directives. So the little bit that's translated sticks
out like a sore thumb. It also breaks building of wikis if a different locale
happens to be set.
I suppose the best thing to do is either give up on the localisation of this
part completly, or make it recognise English in addition to the locale. I've
tenatively chosen the latter.
(Also accept 1 and 0 as input.)
inline has a format hook that is an optimisation hack. Until this hook
runs, the inlined content is not present on the page. This can prevent
other format hooks, that process that content, from acting on inlined
content. In bug ##509710, we discovered this happened commonly for the
embed plugin, but it could in theory happen for many other plugins (color,
cutpaste, etc) that use format to fill in special html after sanitization.
The ordering was essentially random (hash key order). That's kinda a good
thing, because hooks should be independent of other hooks and able to run
in any order. But for things like inline, that just doesn't work.
To fix the immediate problem, let's make hooks able to be registered as
running "first". There was already the ability to make them run "last".
Now, this simple first/middle/last ordering is obviously not going to work
if a lot of things need to run first, or last, since then we'll be back to
being unable to specify ordering inside those sets. But before worrying about
that too much, and considering dependency ordering, etc, observe how few
plugins use last ordering: Exactly one needs it. And, so far, exactly one
needs first ordering. So for now, KISS.
Another implementation note: I could have sorted the plugins with
first/last/middle as the primary key, and plugin name secondary, to get a
guaranteed stable order. Instead, I chose to preserve hash order. Two
opposing things pulled me toward that decision:
1. Since has order is randomish, it will ensure that no accidental
ordering assumptions are made.
2. Assume for a minute that ordering matters a lot more than expected.
Drastically changing the order a particular configuration uses could
result in a lot of subtle bugs cropping up. (I hope this assumption is
false, partly due to #1, but can't rule it out.)