149 lines
6.8 KiB
Markdown
149 lines
6.8 KiB
Markdown
I think there is a problem in my "dependency graph". As an example,
|
||
[here](http://poivron.org/~nil/misc/ikiwiki_buggy_index) is the index
|
||
ikiwiki generated for [my site](http://poivron.org/~nil/misc/ikiwiki_buggy_index)
|
||
(note that the site changed since this index was generated).
|
||
|
||
Some **HUGE** dependencies appear, clearly non optimal, like
|
||
|
||
depends = A| B | A | C | A | D | A | E | A | F | A | G | ....
|
||
|
||
or
|
||
|
||
depends= A | B | C | D | A | B | C | D | A | B | C | D | ....
|
||
|
||
Couldn't isolate the cause, but some sources for this problem may be:
|
||
|
||
* related to the img module
|
||
* easily observable in my sire because one of my pages includes 80 resized images
|
||
|
||
Other special things in my templates and site:
|
||
|
||
* a sidebar with \[[!include pages="notes/\*" template=foo]] while notes.mdwn has
|
||
a \[[!include pages="notes/*"]] and uses the sidebar; removed it, doesn't change
|
||
* a template (biblio.tmpl) calling the "img" plugin with a template parameter as the
|
||
image filename; removed it, doesn't change
|
||
* some strange games with tags whose page calls a "map" directive to show other tags
|
||
shile tags are also used in tagclouds (in the sidebar and in the main pages)
|
||
* ...
|
||
|
||
I observed these problems (same *kind*, I didn't check in details) on
|
||
|
||
* ikiwiki 2.00gpa1 + v5.8.4 + Debian 3.1
|
||
* ikiwiki 2.3 + v5.8.8 + Ubuntu 7.04
|
||
|
||
I can think about reducung the size of my wiki source and making it available online for analysis.
|
||
|
||
-- NicolasLimare
|
||
|
||
> As long as these dependencies don't grow over time (ie, when a page is
|
||
> edited and nothing changed that should add a dependency), I wouldn't
|
||
> worry about them. There are many things that can cause non-optimal
|
||
> dependencies to be recorded. For one thing, if you inline something, ikiwiki
|
||
> creates a dependency like:
|
||
>
|
||
> (PageSpec) or (file1 or file2 or file3 ...)
|
||
>
|
||
> Where fileN are all the files that the PageSpec currently matches. (This
|
||
> is ncessary to detect when a currently inlined file is deleted, and know
|
||
> the inlining page needs an update.) Now consider what it does if you have
|
||
> a single page with two inline statements, that inline the same set of
|
||
> stuff twice:
|
||
>
|
||
> ((PageSpec) or (file1 or file2 or file3 ...) or (PageSpec) or (file1 or file2 or file3 ...)
|
||
>
|
||
> Clearly non-optimal, indeed.
|
||
>
|
||
> Ikiwiki doesn't bother to simplify complex PageSpecs
|
||
> because it's difficult to do, and because all they use is some disk
|
||
> space. Consider what ikiwiki uses these dependencies for.
|
||
> All it wants to know is: does the PageSpec for this page it's considering
|
||
> rebuilding match any of the pages that have changed? Determining this is
|
||
> a simple operation -- the PageSpec is converted to perl code. The perl
|
||
> code is run.
|
||
>
|
||
> So the total impact of an ugly dependency like this is:
|
||
>
|
||
> 1. Some extra data read/written to disk.
|
||
> 2. Some extra space in memory.
|
||
> 3. A bit more data for the PageSpec translation code to handle. But that
|
||
> code is quite fast.
|
||
> 4. Typically one extra function call when the generated perl code is run.
|
||
> Ie, when the expression on the left-hand side fails, which typically
|
||
> happens after one (inexpensive) function call, it has to check
|
||
> the identical expression on the right hand side.
|
||
>
|
||
> So this is at best a wishlist todo item, not a bug. A PageSpec simplifier
|
||
> (or improved `pagespec_merge()` function) could be written and improve
|
||
> ikiwiki's memory and disk usage, but would it actually speed it up any?
|
||
> We'd have to see the code to the simplifier to know.
|
||
>
|
||
> --[[Joey]]
|
||
|
||
[[!template id=gitbranch branch=smcv/ready/optimize-depends author="[[smcv]]"]]
|
||
|
||
>> I've been looking at optimizing ikiwiki for a site using
|
||
>> [[plugins/contrib/album]] (which produces a lot of pages) and it seems
|
||
>> that checking which pages depend on which pages does take a significant
|
||
>> amount of time. The optimize-depends branch in my git repository
|
||
>> avoids using `pagespec_merge()` for this (indeed it's no longer used
|
||
>> at all), and instead represents dependencies as a list of pagespecs
|
||
>> rather than a single pagespec. This does turn out to be faster, although
|
||
>> not as much as I'd like. --[[smcv]]
|
||
|
||
>>> I just wanted to note that there is a whole long discussion of dependencies and pagespecs on the [[todo/tracking_bugs_with_dependencies]] page. -- [[Will]]
|
||
|
||
>>>> Yeah, I had a look at that (as the only other mention of `pagespec_merge`).
|
||
>>>> I think I might have solved some of the problems mentioned there,
|
||
>>>> actually - `pagespec_merge` no longer needs to exist in my branch (although
|
||
>>>> I haven't actually deleted it), because the "or" operation is now done in
|
||
>>>> the Perl code, rather than by merging pagespecs and translating. --[[smcv]]
|
||
|
||
[[!template id=gitbranch branch=smcv/ready/remove-pagespec-merge author="[[smcv]]"]]
|
||
|
||
>>>>> I've now added a patch to the end of that branch that deletes
|
||
>>>>> `pagespec_merge` almost entirely (we do need to keep a copy around, in
|
||
>>>>> ikiwiki-transition, but that copy doesn't have to be optimal or support
|
||
>>>>> future features like [[tracking_bugs_with_dependencies]]). --[[smcv]]
|
||
|
||
---
|
||
|
||
Some questions on your optimize-depends branch. --[[Joey]]
|
||
|
||
In saveindex it still or'd together the depends list, but the `{depends}`
|
||
field seems only useful for backwards compatability (ie, ikiwiki-transition
|
||
uses it still), and otherwise just bloats the index.
|
||
|
||
Is an array the right data structure? `add_depends` has to loop through the
|
||
array to avoid dups, it would be better if a hash were used there. Since
|
||
inline (and other plugins) explicitly add all linked pages, each as a
|
||
separate item, the list can get rather long, and that single add_depends
|
||
loop has suddenly become O(N^2) to the number of pages, which is something
|
||
to avoid..
|
||
|
||
> I was also thinking about this (I've been playing with some stuff based on the
|
||
> `remove-pagespec-merge` branch). A hash, by itself, is not optimal because
|
||
> the dependency list holds two things: page names and page specs. The hash would
|
||
> work well for the page names, but you'll still need to iterate through the page specs.
|
||
> I was thinking of keeping a list and a hash. You use the list for pagespecs
|
||
> and the hash for individual page names. To make this work you need to adjust the
|
||
> API so it knows which you're adding. -- [[Will]]
|
||
|
||
> I wasn't thinking about a lookup hash, just a dedup hash, FWIW.
|
||
> --[[Joey]]
|
||
|
||
Also, since a lot of places are calling add_depends in a loop, it probably
|
||
makes sense to just make it accept a list of dependencies to add. It'll be
|
||
marginally faster, probably, and should allow for better optimisation
|
||
when adding a lot of depends at once.
|
||
|
||
In Render.pm, we now have a triply nested loop, which is a bit
|
||
scary for efficiency. It seems there should be a way to
|
||
rework this code so it can use the optimised `pagespec_match_list`,
|
||
and/or hoist some of the inner loop calculations (like the `pagename`)
|
||
out.
|
||
|
||
Very good catch on img/meta using the wrong dependency; verified in the wild!
|
||
(I've cherry-picked those bug fixes.)
|
||
|
||
[[!tag wishlist patch patch/core]]
|