Avoid %links accumulating duplicates. (For TOVA)

This is sorta an optimisation, and sorta a bug fix. In one
test case I have available, it can speed a page build up from 3
minutes to 3 seconds.

The root of the problem is that $links{$page} contains arrays of
links, rather than hashes of links. And when a link is found,
it is just pushed onto the array, without checking for dups.

Now, the array is emptied before scanning a page, so there
should not be a lot of opportunity for lots of duplicate links
to pile up in it. But, in some cases, they can, and if there
are hundreds of duplicate links in the array, then scanning it
for matching links, as match_link and some other code does,
becomes much more expensive than it needs to be.

Perhaps the real right fix would be to change the data structure
to a hash. But, the list of links is never accessed like that,
you always want to iterate through it.

I also looked at deduping the list in saveindex, but that does
a lot of unnecessary work, and doesn't completly solve the problem.

So, finally, I decided to add an add_link function that handles deduping,
and make ikiwiki-transition remove the old dup links.
master
Joey Hess 2009-05-05 23:40:09 -04:00
parent 1c7c9e95f2
commit 2a7721febd
11 changed files with 70 additions and 17 deletions

View File

@ -21,12 +21,12 @@ our @EXPORT = qw(hook debug error template htmlpage add_depends pagespec_match
pagespec_match_list bestlink htmllink readfile writefile
pagetype srcfile pagename displaytime will_render gettext urlto
targetpage add_underlay pagetitle titlepage linkpage
newpagefile inject
newpagefile inject add_link
%config %links %pagestate %wikistate %renderedfiles
%pagesources %destsources);
our $VERSION = 3.00; # plugin interface version, next is ikiwiki version
our $version='unknown'; # VERSION_AUTOREPLACE done by Makefile, DNE
our $installdir=''; # INSTALLDIR_AUTOREPLACE done by Makefile, DNE
our $installdir='/usr'; # INSTALLDIR_AUTOREPLACE done by Makefile, DNE
# Optimisation.
use Memoize;
@ -1757,6 +1757,14 @@ sub inject {
use warnings;
}
sub add_link ($$) {
my $page=shift;
my $link=shift;
push @{$links{$page}}, $link
unless grep { $_ eq $link } @{$links{$page}};
}
sub pagespec_merge ($$) {
my $a=shift;
my $b=shift;

View File

@ -61,7 +61,7 @@ sub scan (@) {
my $content=$params{content};
while ($content =~ /$link_regexp/g) {
push @{$links{$page}}, linkpage($1) unless ignored($1)
add_link($page, linkpage($1)) unless ignored($1)
}
}

View File

@ -43,7 +43,7 @@ sub preprocess (@) {
return '';
}
push @{$links{$params{page}}}, $image;
add_link($params{page}, $image);
# optimisation: detect scan mode, and avoid generating the image
if (! defined wantarray) {
return;

View File

@ -86,7 +86,7 @@ sub scan (@) {
my $content=$params{content};
while ($content =~ /(?<!\\)$link_regexp/g) {
push @{$links{$page}}, linkpage($2);
add_link($page, linkpage($2));
}
}

View File

@ -110,7 +110,7 @@ sub preprocess (@) {
}
elsif ($key eq 'link' && ! %params) {
# hidden WikiLink
push @{$links{$page}}, $value;
add_link($page, $value);
return "";
}
elsif ($key eq 'author') {

View File

@ -73,7 +73,7 @@ sub preprocess_tag (@) {
$tag=linkpage($tag);
$tags{$page}{$tag}=1;
# hidden WikiLink
push @{$links{$page}}, tagpage($tag);
add_link($page, tagpage($tag));
}
return "";
@ -88,14 +88,14 @@ sub preprocess_taglink (@) {
if (/(.*)\|(.*)/) {
my $tag=linkpage($2);
$tags{$params{page}}{$tag}=1;
push @{$links{$params{page}}}, tagpage($tag);
add_link($params{page}, tagpage($tag));
return taglink($params{page}, $params{destpage}, $tag,
linktext => pagetitle($1));
}
else {
my $tag=linkpage($_);
$tags{$params{page}}{$tag}=1;
push @{$links{$params{page}}}, tagpage($tag);
add_link($params{page}, tagpage($tag));
return taglink($params{page}, $params{destpage}, $tag);
}
}

9
debian/NEWS vendored
View File

@ -1,3 +1,12 @@
ikiwiki (3.12) UNRELEASED; urgency=low
You may want to run `ikiwiki-transition deduplinks /path/to/srcdir`
after upgrading to this version of ikiwiki. This command will
optimise your wiki's saved state, removing duplicate information
that can slow ikiwiki down.
-- Joey Hess <joeyh@debian.org> Wed, 06 May 2009 00:25:06 -0400
ikiwiki (3.01) unstable; urgency=low
If your wiki uses git, and you have a `diffurl` configured in

4
debian/changelog vendored
View File

@ -5,6 +5,10 @@ ikiwiki (3.12) UNRELEASED; urgency=low
fails on nonexistant directories with some broken perl
versions.
* inline: Minor optimisation.
* add_link: New function, which plugins should use rather than
modifying %links directly, to avoid it accumulating duplicates.
* ikiwiki-transition: Add a deduplinks action, that can be used
to remove duplicate links and optimise a wiki w/o rebuilding it.
-- Joey Hess <joeyh@debian.org> Mon, 04 May 2009 19:17:39 -0400

View File

@ -61,6 +61,13 @@ If this is not done explicitly, a user's plaintext password will be
automatically converted to a hash when a user logs in for the first time
after upgrade to ikiwiki 2.48.
# deduplinks srcdir
In the past, bugs in ikiwiki have allowed duplicate link information
to be stored in its indexdb. This mode removes such duplicate information,
which may speed up wikis afflicted by it. Note that rebuilding the wiki
will have the same effect.
# AUTHOR
Josh Triplett <josh@freedesktop.org>, Joey Hess <joey@ikiwiki.info>

View File

@ -107,8 +107,8 @@ adding or removing files from it.
This hook is called early in the process of building the wiki, and is used
as a first pass scan of the page, to collect metadata about the page. It's
mostly used to scan the page for [[WikiLinks|ikiwiki/WikiLink]], and add them to `%links`.
Present in IkiWiki 2.40 and later.
mostly used to scan the page for [[WikiLinks|ikiwiki/WikiLink]], and add
them to `%links`. Present in IkiWiki 2.40 and later.
The function is passed named parameters "page" and "content". Its return
value is ignored.
@ -151,11 +151,11 @@ parameter is set to a true value if the page is being previewed.
If `hook` is passed an optional "scan" parameter, set to a true value, this
makes the hook be called during the preliminary scan that ikiwiki makes of
updated pages, before begining to render pages. This should be done if the
hook modifies data in `%links`. Note that doing so will make the hook be
run twice per page build, so avoid doing it for expensive hooks. (As an
optimisation, if your preprocessor hook is called in a void context, you
can assume it's being run in scan mode, and avoid doing expensive things at
that point.)
hook modifies data in `%links` (typically by calling `add_link`). Note that
doing so will make the hook be run twice per page build, so avoid doing it
for expensive hooks. (As an optimisation, if your preprocessor hook is
called in a void context, you can assume it's being run in scan mode, and
avoid doing expensive things at that point.)
Note that if the [[htmlscrubber]] is enabled, html in
preprocessor [[ikiwiki/directive]] output is sanitised, which may limit what
@ -174,7 +174,8 @@ links. The function is passed named parameters "page", "destpage", and
and later.
Plugins that implement linkify must also implement a scan hook, that scans
for the links on the page and adds them to `%links`.
for the links on the page and adds them to `%links` (typically by calling
`add_link`).
### htmlize
@ -753,6 +754,11 @@ Optionally, a third parameter can be passed, to specify the preferred
filename of the page. For example, `targetpage("foo", "rss", "feed")`
will yield something like `foo/feed.rss`.
#### `add_link($$)`
This adds a link to `%links`, ensuring that duplicate links are not
added. Pass it the page that contains the link, and the link text.
## Miscellaneous
### Internal use pages

View File

@ -220,6 +220,21 @@ sub moveprefs {
IkiWiki::Setup::dump($setup);
}
sub deduplinks {
my $dir=shift;
if (! defined $dir) {
usage();
}
$config{wikistatedir}=$dir."/.ikiwiki";
IkiWiki::loadindex();
foreach my $page (keys %links) {
my %l;
$l{$_}=1 foreach @{$links{$page}};
$links{$page}=[keys %l]
}
IkiWiki::saveindex();
}
sub usage {
print STDERR "Usage: ikiwiki-transition type ...\n";
print STDERR "Currently supported transition subcommands:\n";
@ -229,6 +244,7 @@ sub usage {
print STDERR "\tmoveprefs setupfile\n";
print STDERR "\thashpassword srcdir\n";
print STDERR "\tindexdb srcdir\n";
print STDERR "\tdeduplinks srcdir\n";
exit 1;
}
@ -253,6 +269,9 @@ elsif ($mode eq 'setupformat') {
elsif ($mode eq 'moveprefs') {
moveprefs(@ARGV);
}
elsif ($mode eq 'deduplinks') {
deduplinks(@ARGV);
}
else {
usage();
}