During backlink calulation, all links are examined and broken links can
be detected for free, so store a list of broken links and have brokenlinks
use it.
Exposing the %brokenlinks structure is a bit ugly, but the speedup seems
worth it: Around 1 second for wikis the size of the doc wiki that use
brokenlinks.
By adding this setting, we get both more configurability, and a minor
optimisation too, since gettext does not need to be called continually
to get the Discussion value.
This was impressively broken. add_depends was being called with params
backwards, and on parameter was set to the name of the generated
file, which isn't in the source.
Now updates to images will update the page that contains them, thus
updating them. This is unncessary for fullsize images, so skipped.
openiduser previously used a constructor that no longer works in 2.x.
However, all we actually want is the (undocumented) DisplayOfURL function
that is invoked by the display method, so try to use that.
(cherry picked from commit c3dd0ff5c7c10743107f203a5b456fdcd1b171df)
Besides being wrong to do, this could lead to the wrong item
being expired, as follows: If B is added and at the same time
A is changed, then A's ctime may be set to the current time,
while B's is set to its creation time. Thus the new item, A,
is incorrectly removed as older.
(This interacted especially badly with the bug fixed by
90b4d079605b72bb50d1da41402d994960e10937.)
The aggregate state merge code neglected to merge changes to the md5
field of an item. Therefore, if an item's md5 changed after initial
aggregation, it would be updated, and rewritten, each time thereafter.
This was wasteful and indirectly led to some expire problems.
The test suite was emitting a lot of ugly gettext warnings;
setting LC_ALL didn't solve the problem for all locale setups
(since ikiwiki remaps it to LANG, and ikiwiki didn't know about
the C locale).
People also seem generally annoyed by the messages when
Locale::Gettext is not installed, and I suspect will be
generally happier if it just silently doesn't localize.
The optimisation came about when I noticed that the gettext
sub was doing rather a lot of work each call just to see
if localisation is needed. We can avoid that work by caching,
and the best thing to cache is a version of the gettext sub
that does exactly the right thing.
This was slightly complicated by the locale setting,
which might need to override the original locale (or lack
thereof) after gettext has been called. So it needs to invalidate
the cache in that case. It used to do it via a global variable,
which I am happy to have also gotten rid of.
Do not allow an unterminated """ string to be treated as a series of bare
words. Fixes runaway regexp recursion/backtracking in strange situations.
(See 1d57a21c98 for test case.)
Rationalle: Comments need to be user-editable so that they can be posted
via git commit etc.
The _comment directive is still supported, for back-compat.
format: Provide a htmlizefallback hook that other plugins can use to
handle formats that are not suitable for general-purpose htmlize hooks.
highlight: Use the hook to allow formatting of any language/extension,
without it needing to be enabled for standalone source files.
highlight: If the highlight perl binding is not available, fallback
safely to a passthrough mode.
And avoid a whole class of potential security problems (though
none that I know of actually existing..), by avoiding
performing any string interpolation on user-supplied data when translating
pagespecs.
This is sorta an optimisation, and sorta a bug fix. In one
test case I have available, it can speed a page build up from 3
minutes to 3 seconds.
The root of the problem is that $links{$page} contains arrays of
links, rather than hashes of links. And when a link is found,
it is just pushed onto the array, without checking for dups.
Now, the array is emptied before scanning a page, so there
should not be a lot of opportunity for lots of duplicate links
to pile up in it. But, in some cases, they can, and if there
are hundreds of duplicate links in the array, then scanning it
for matching links, as match_link and some other code does,
becomes much more expensive than it needs to be.
Perhaps the real right fix would be to change the data structure
to a hash. But, the list of links is never accessed like that,
you always want to iterate through it.
I also looked at deduping the list in saveindex, but that does
a lot of unnecessary work, and doesn't completly solve the problem.
So, finally, I decided to add an add_link function that handles deduping,
and make ikiwiki-transition remove the old dup links.
When finding the pageurl, it was calling bestlink unnecessarily.
Since at this point $page contains the full name of the page that
is being inlined, there is no need to do bestlink's scan
for it.
This is only a minor optimisation, since bestlink is only called
once per displayed, inlined page.
* pagespec_match_list: New API function, matches pages in a list
and throws an error if the pagespec is bad.
* inline, brokenlinks, calendar, linkmap, map, orphans, pagecount,
pagestate, postsparkline: Display a handy error message if the pagespec
is erronious.
* Add IkiWiki::ErrorReason objects, and modify pagespecs to return
them in cases where they fail to match due to a configuration or syntax
error.
* inline: Display a handy error message if the inline cannot display any
pages due to such an error.
This is perhaps somewhat incomplete, as other users of pagespecs do not
display the error, and will eventually need similar modifications to inline.
I should probably factor out a pagespec_match_all function and make it throw
ErrorReasons.
If the server has a clock running a bit ahead of the web browsing client,
relativedate could cause somewhat confusing displays like "3 seconds from now"
for just posted things.
As a hack, avoid displaying times in the future if they're less than a
small slip forward. I chose 30 minutes because both client and server could
be wrong in different directions, while it's still close enough that "just
now" is not horribly wrong.
This works around an enormous (and, in this context, enormously confusing)
message that git has begun to print when one attempts to push changes into
a non-bare repo.
As a bonus, it now tests whether ikiwiki-makerepo works.
Well, that was a PITA.
Luckily, this doesn't break guids to comments in rss feeds,
though it does change the links.
I haven't put in a warning about needing to rebuild to get
this fix. It's probably good enough for new comments to get the
fix, without a lot of mass rebuilding.
If an inlined page contains a floating element, this ensures
that the footer appears beneath it, and prevents the floating element from
possibly leaking down to the next inlined page.
It would be better to use urlto() here, but will_render
has not yet been called on the feed files at this point, so
it won't work. (And reorganizing so it can be is tricky.)
The problem was introduced by the recent noextension patches.
Object autovivification caused junk to get into %htmlize,
and all keys of that showed up as page types.
I guess what's happening here is that since the name
is passed to git via an environment variable, perl's normal
utf-8 IO layer stuff doesn't work. So we have to explicitly
decode the string from perl's internal representation into
utf-8.
This makes wikis such as zack's much faster in the scan pass.
In that pass, when a template contains an inline, there is no reason to
process the entire inline and all its pages. I'd forgotten to pass
along the flag to let preprocess() know it was in scan mode, leading to
much unncessary churning.
- In 3.05, ikiwiki began expanding templates in scan mode,
for annoying, expensive, but ultimatly necessary reasons
of correctness.
- Smiley processing has a bug: It inserts a span for the smiley,
and then continues searching forward in the content for more,
starting at $end_of_smiley+1. Which means it searches for smilies
in the span too! And if it somehow finds one, we get an infinite loop
here.
- This bug can, probably, only be tickled if a htmllink to
show the smiley fails, because the smiley file doesn't exist,
or because ikiwiki doesn't know about it. In that case,
a link will be inserted to _create_ the missing page,
and that link will include the smiley inside the <a></a>.
- When a template is expanded in scan mode, and it contains
an inline, the sanitize hook is run during scan mode,
which never happened before. That causes the smiley processor
to run, before ikiwiki is, necessarily, aware that all
the smiley files exist (depending on scan order). So
it inserts creation links for them, and triggers the bug.
I've put in the simple fix of jumping forward past the inserted
span, and it does fix the problem. I will need to look in a bit
more detail into why an inline nested inside a template is
fully expanded during the scan pass -- that really shouldn't
be necessary, and it makes things much slower than they need
to be.
This is potentially expensive, but is necessary so that meta and tag
directives, and other links on templates affect the page using the template
reliably.
It no longer makes sense to keep these functions in editpage, because
serveral plugins now exist that use them, and users may want to disable
editpage, while leaving those plugins enabled.
Most notably, comments uses both functions, and it's entirely appropriate
to disable editpage but still want to have comments enabled.
Less likely, attachments, rename, and remove all use check_canedit -- but
it would be unusual indeed to want to use these w/o editpage.
Falls back to looking for shortcuts.mdwn for backwards compatabiity; there
probably exist wikis that have changed the pageext but still use
shortcuts.mdwn.
This may already work with other web servers that have copied apache's
interface, and it should be easy to add support to it for web servers that
use some other interface. So, make the name more general.
That resulted in double encoded display when using perl's stub
readline module. Apparently that module unconditionally upgrades
text to utf8, in a quite braindead way.
(Term::ReadLine::Gnu::Perl worked ok.)
It will set up an ikiwiki instance tuned for use in blogging.
As part of this change, move the example sites into /usr/share/ikiwiki so
they are available even if docs are not installed.
Asking for only the head worked in my tests, but I've found a site where it
didn't -- apparently ikiwiki didn't get a chance to do or finish the
refresh when HEADed. Getting the whole url, waiting for ikiwiki to finish,
avoided the update problem.
* repolist: New plugin to support the rel=vcs-* microformat.
* goodstuff: Include repolist by default. (But it does nothing until
configured with the repository locations.)
It seems to be a failing of i18n in unix that the translation stops at the
commands and the parameters to them, and ikiwiki is no exception with its
currently untranslated directives. So the little bit that's translated sticks
out like a sore thumb. It also breaks building of wikis if a different locale
happens to be set.
I suppose the best thing to do is either give up on the localisation of this
part completly, or make it recognise English in addition to the locale. I've
tenatively chosen the latter.
(Also accept 1 and 0 as input.)
inline has a format hook that is an optimisation hack. Until this hook
runs, the inlined content is not present on the page. This can prevent
other format hooks, that process that content, from acting on inlined
content. In bug ##509710, we discovered this happened commonly for the
embed plugin, but it could in theory happen for many other plugins (color,
cutpaste, etc) that use format to fill in special html after sanitization.
The ordering was essentially random (hash key order). That's kinda a good
thing, because hooks should be independent of other hooks and able to run
in any order. But for things like inline, that just doesn't work.
To fix the immediate problem, let's make hooks able to be registered as
running "first". There was already the ability to make them run "last".
Now, this simple first/middle/last ordering is obviously not going to work
if a lot of things need to run first, or last, since then we'll be back to
being unable to specify ordering inside those sets. But before worrying about
that too much, and considering dependency ordering, etc, observe how few
plugins use last ordering: Exactly one needs it. And, so far, exactly one
needs first ordering. So for now, KISS.
Another implementation note: I could have sorted the plugins with
first/last/middle as the primary key, and plugin name secondary, to get a
guaranteed stable order. Instead, I chose to preserve hash order. Two
opposing things pulled me toward that decision:
1. Since has order is randomish, it will ensure that no accidental
ordering assumptions are made.
2. Assume for a minute that ordering matters a lot more than expected.
Drastically changing the order a particular configuration uses could
result in a lot of subtle bugs cropping up. (I hope this assumption is
false, partly due to #1, but can't rule it out.)
I see that this plugin's lists of safe content are already well out of
date, and htmlscrubber_skip offers a non whitelist based approach, so let's
deprecate this plugin for 3.0.
People seem to be able to expect to enter www.foo.com and get away with it.
The resulting my.wiki/www.foo.com link was not ideal.
To fix it, use URI::Heuristic to expand such things into a real url. It
even looks up hostnames in the DNS if necessary.
Since ikiwiki uses open :utf8, perl assumes that files contain valid utf-8.
If it turns out to be malformed it may later crash while processing strings
read from them, with 'Malformed UTF-8 character (fatal)'.
As at least a quick fix, use utf8::valid as soon as data is read, and if
it's not valid, call encode_utf8 on the string, thus clearing the utf-8
flag. This may cause follow-on encoding problems, but will avoid this
crash, and the input file was broken anyway, so GIGO is a reasonable
response. (I looked at calling decode_utf8 after, but it seemed to cause
more trouble than it was worth. BTW, use open ':encoding(utf8)' avaoids
this problem, but the corrupted data later causes Storable to crash when
writing the index.)
This is a quick fix, clearly imperfect:
- It might be better to explicitly call decode_utf8 when reading files,
rather than using the IO layer.
- Data read other than by readfile() can still sneak in bad utf-8. While
ikiwiki does very little file input not using it, stdin for the CGI
would be one way.
This is necessary so that things that fork to the background,
like pinger, and inline ping, don't block other cgis from running.
Note that websetup also calls unlockwiki, before refreshing / rebuilding
the wiki. It makes perfect sense for that not to block other cgis.
* Stop busy-waiting in lockwiki, as this could delay ikiwiki from waking up
for up to one second. The bailout code is no longer needed.
* Remove support for unused optional wait parameter from lockwiki.
This fixes a problem exposed by the recent change to tags
(a2839de936). That recorded tag links as
absolute by including a leading slash in the link. The same could also be
done with an absolute wikilink.
In either case, link() would not match such links, unless the leading slash
was included in the link to match. But that's not right, because pagespecs
match absolute by default. So strip the leading slash.
Note that to keep any existing `link(/foo)` pagespecs working after this
change, the leading slash is removed from there, too.
The syslog value from the setup file is purposfully ignored when doing
ikiwiki -setup, so that it will output to stdout (while generating wrappers
that do use the syslog). But that caused -dumpsetup to not preserve
the syslog value from the setup file.
This speeds up web commits by 1/4th of a second or so, since perl does
not have to start up for the post commit hook.
perl's locking is completly FuBar, since it's impossible to tell what perl
flock() really does, and thus difficult to write code in other languages
that interoperates with perl's locking. (Let alone interoperating with
existing fcntl locking from perl...)
In this particular case, I think I was able to find a way to avoid the
insanity, mostly. The C code does a true flock(2), and if perl is using an
incompatable lock method that does not use the same locking primative at
the kernel level, then the C code's test will fail, and it will go ahead
and run the perl code. Then the perl code's test will test the right thing.
On Debian, at least lately, perl's flock() does a true flock(2), so the
optimisation does work.
Add an inject function, that can be used by plugins that want to replace
one of ikiwiki's functions with their own version. (This is a scary thing
that grubs through the symbol table, and replaces all exported occurances
of a function with the injected version.)
external: RPC functions can be injected to replace exported functions.
Removed the stupid displaytime hook, and use injection instead.
The html links already went there, but internally the links were not
recorded as absolute, which could cause confusing backlinks etc.
For example, with tagbase=tags, if blog/tags/bar existed and blog/foo was
tagged bar, it would link to /tags/bar. But, the link would be recorded
simply as a link to tags/bar, and so later blog/tags/bar would appear to
have the backlink.
* Add an underlay for javascript, and add ikiwiki.js containing some utility
code.
* toggle: Stop embedding the full toggle code on each page using it, and
move it to toggle.js in the javascript underlay.
This is the easy part of supporting foo/index.mdwn sources for page foo.
Note that if foo.mdwn exists too, there will be a warning about multiple
sources for the same page, and which is used is indeterminate.
indexpages should also cause web based editing to create index source pages
by default; this and other fallout of the option not yet implemented.
Upgrades to the new index format should be transparent.
The version field is 3, because 1 was the old textual index, 2 was the
pre-versioned format.
This also includes some efficiency improvements to index loading, by
not copying a hash and using a reference.
* htmltidy: Avoid returning undef if tidy fails. Also avoid returning the
untidied content if tidy crashes. In either case, it seems best to tidy
the content to nothing.
* htmltidy: Avoid spewing tidy errors to stderr.
Seems that the problem is that once the \nnn coming from git is converted
to a single character, decode_utf8 decides that this is a standalone
character, and not part of a multibyte utf-8 sequence, and so does nothing.
I tried playing with the utf-8 flag, but that didn't work. Instead, use
decode("utf8"), which doesn't have the same qualms, and successfully
decodes the octets into a utf-8 character.
Rant:
Think for a minute about fact that any and every program that parses git-log,
or git-show, etc output to figure out what files were in a commit needs to
contain this snippet of code, to convert from git-log's wacky output to a
regular character set:
if ($file =~ m/^"(.*)"$/) {
($file=$1) =~ s/\\([0-7]{1,3})/chr(oct($1))/eg;
}
(And it's only that "simple" if you don't care about filenames with
embedded \n or \t or other control characters.)
Does that strike anyone else as putting the parsing and conversion in the
wrong place (ie, in gitweb, ikiwiki, etc, etc)? Doesn't anyone who actually
uses git with utf-8 filenames get a bit pissed off at seeing \xxx\xxx
instead of the utf-8 in git-commit and other output?
I saw this in the wild, apparently a page was not present on disk, but was
in the aggregate db, and not marked as expired either. Not sure how that
happened, but such pages should get marked as expired since they have an
effectively zero ctime.
* edittemplate: Default new page file type to the same type as the template.
(willu)
* edittemplate: Add "silent" parameter. (Willu)
* edittemplate: Link to template, to allow creating it. (Willu)
Setup output is once again output to stdout in this case.
Implemented by stashing the verbose/syslog values set in the setup file,
and using those values in the generated wrappers, but not allowing them to take
effect during the setup operation itself, so that command-line options,
appearing before or after -setup, are honored.
Also, some cleanups to how %config is generated for wrappers, removing some
fields that do not need to be recorded inside the wrapper.
* goodstuff: Remove otl plugin from the bundle since it needs a significant
external dependency and is not commonly used. If you use otl, make sure
you explicitly enable it now.
* goodstuff: Add more, progress, and table plugins to the bundle.
I don't want to be stuck renameing it later if preprocessor directives are
turned into postprocessor directives. Also, "directives" is shorter and
clearer than "preprocessors".
* teximg: The prefix is configurable, and has changed to not include the
nonstandard mhchem by default. (willu)
* teximg: dvipng is used if available to render images. Its output is
antialiased and better than dvips. If not available, the old dvips+convert
chain will be used. (willu)
* Drop suggests on texlive-science, add suggests on dvipng.
The use of $dummy was not sufficient, because it only stuck around for the
first element after a dummy parent, and was then lost. Instead, use a
$addparent that contains the actual dummy parent, so it can be compared
with the new item to see if we're still under that parent or have moved to
another one.
The locked pages configuration is moving to a locked_pages option in the
setup file, and the allowed attachments configuration to
allowed_attachments. The admin prefs page can still be used for these, but
that's depreacted and will only be shown if there's currently a value.
Implemented for git and svn so far.
Note that rcs_commit_staged does assume that the rcs has the ability to
"stage" multiple changes for a later commit. Support for this varies, but
all we really care about is staging removals and renames, which, AFAIK, all
modern rcs's support.
can be used to avoid a security check that is a good safe default, but
problimatic overkill in some situations.
I decided to underdocument this, because the option looks ugly, and I don't
want people randomly turning it on because it looks like a good idea. So if
you need it, you'll get an error message mentioning how to fix it.
The committer's email address is not used (because leaking email addresses
is not liked by many users). Closes: #451023
A "Web-commit" trailer is added, to allow telling the difference between
web commits and direct commits.
Smileys need to be double-escaped to work, since the smiley plugin runs as
a sanitize hook, and markdown helpfully removes one level of escapes first.
There were some bugs in the smiley handling code that made escaped smileys
still be expanded. After unescaping a smiley, it needed to move pos forward
past it or the next pass would expand it.
Also, once the m//g got to the end, it seemed to loop back through and make
one more pass (a difference in perl 5.10's regexp exngine? I observed that
pos was undefined when this happened, so added a `last unless defined pos`.
This is truely horribly disgusting. CGI::tmpFileName, in current perls, is
an undocumented function (which should be a clue..) that takes the original
filename of an uploaded attachment, and returns the name of the tempfile
that CGI has stored it in.
In old perls, though, CGI::tmpFileName does not take a filename. It takes
a key from the object's {'.tmpfiles'} hash. This key is something
crazy like '*Fh::fh00001group' -- apparently the stringification of a
filehandle object.
Just to add to the fun, tmpFileName doesn't take the key, it expects a
refernce to the key. Argh?!
But the fun doesn't stop there, because in perl 5.8, CGI.pm is also broken
in two other ways. The upload() method is supposed to return a filehandle
to the temp file. It doesn't. The param() method is supposed to return
a filehandle to the temp file, that stringifies to the original filename.
It returns just the original filename, no filehandle.
Combine all these bugs, and you end up with this disgusting commit. Since
I have no way to get the filehandle, I *need* to get the tempfile name.
If I had the filehandle, I could probably pass it into tmpFileName, and
it might strigify to the right key name. But I don't, so the only way to
determine the key is to grub through the .tmpfiles hash ourselves.
And finally, one the temp file name is discovered, a filehandle can finally
be obtained by (re)opening it.
I recommend that this commit be reverted when perl 5.8 is a mercifully
faded memory.
I'm really, really, really glad I'm actually being paid for working on
this right now!
So the problem is that ikiwiki would generate a relative link like
href="colon:problem", which web browsers treat as being in the "colon:"
uri scheme.
The best fix seems to be to make url beautification fix this, by slapping
a "./" in front.
* The editpage form now uses the raw page name, not the page title, in its
'page' cgi parameter. Using the title was ambiguous and made it
impossible to tell between some pages, like "foo/bar" and "foo__47__bar",
sometimes causing the wrong page to be edited.
* This change means that some edit links need to be updated.
Force a rebuild on upgrade to this version.
* Above change also allowed really fixing escaped slashes from the blogpost
form.
* toc: Revert change in 2.45 that made it run at sanitize time. This breaks
use of toc in a sidebar.
* Call format hooks when generating page previews, thus fixing toc display
there, as well as fixing inlins to again display in page previews, since
it's started using format hooks. This also allows several other things,
like embed, that use format hooks, to work during page preview time.
* Format hooks should not rely on getting an entire html document, as they
will only get the body during page preview.
* toggle: Deal with preview mode when adding javascript.
This special case crops up when generating the parentlink to the toplevel
index page. urlto("") had been generating a link to "./" (or "../" etc)
for that, which is fine, if the web server redirects that to the toplevel
index.html. It's less fine if there is no web server.
I actually ran into the problem first when using gopher. (Yes, yes, don't
laugh.. see upcoming tip.) But it also crops up when browsing local wiki
files.
Of course, the index.html is stripped back off if usedirs is enabled.