This reverts commit 4cf185e781.
That commit broke t/po.t (probably the test case only is testing too
close the the old implementation and needs correcting).
Also, we have not decided how to want to represent it yet, so I'm not
ready for this change.
Conflicts:
IkiWiki/Plugin/po.pm
doc/plugins/po.mdwn
Probably best to store it unsanitized and sanitize as needed on use.
And it already was for comments, leaving only the need to sanitize the
nickname when git committing, to ensure the email address is legal.
... after having audited the po4a Xml and Xhtml modules for security issues.
Signed-off-by: intrigeri <intrigeri@boum.org>
(cherry picked from commit a128c256a5)
This better defines what the filter hook is passed, to only be the raw,
complete text of a page. Not some snippet, or data read in from an
unrelated template.
Several plugins that filtered text that originates from an (already
filtered) page were modified not to do that. Note that this was not
done very consistently before; other plugins that receive text from a
page called preprocess on it w/o first calling filter.
The template plugin gets text from elsewhere, and was also changed not to
filter it. That leads to one known regression -- the embed plugin cannot
be used to embed stuff in templates now. But that plugin is deprecated
anyway.
Later we may want to increase the coverage of what is filtered. Perhaps
a good goal would be to allow writing a filter plugin that filters
out unwanted words, from any input. We're not there yet; not only
does the template plugin load unfiltered text from its templates now,
but so can the table plugin, and other plugins that use templates (like
inline!). I think we can cross that bridge when we come to it. If I wanted
such a censoring plugin, I'd probably make it use a sanitize hook instead,
for the better coverage.
For now I am concentrating on the needs of the two non-deprecated users
of filter. This should fix bugs/po_vs_templates, and it probably fixes
an obscure bug around txt's use of filter for robots.txt.
Set it to true every time IkiWiki::filter is called on a full page's content.
This is a much nicer solution, for the po plugin, than previous whitelisting
using caller().
The protection against processing loops (i.e. the alreadyfiltered stuff) was
playing against us: the template plugin triggered a filter hooks run with the
very same ($page, $destpage) arguments pair that we use to identify a already
filtered page. Processing an included template could then mark the whole
translation page as already filtered, which prevented po_to_markup to be called
on the PO content.
This commit only runs the whole PO filter logic when our filter hook is run by
IkiWiki::render, which only happens when the full page needs to be filtered.
Renamed usershort => nickname.
Note that this means existing user login sessions will not have the nickname
recorded, and so it won't be used for those.
There was some confusion about whether the filename was
relative to srcdir or not. Some test cases, and the bzr
plugin assumed it was relative to the srcdir. Most everything else
assumed it was absolute.
Changed it to relative, for consistency with the rest
of the rcs_ functions.
Using named parameters for these is overdue. Passing the session in a
parameter instead of passing username and IP separately will later allow
storing other session info, like username or part of the email.
Note that these functions are not part of the exported API,
and the prototype change will catch (most) skew, so I am not changing
API versions. Any third-party plugins that call them will need updated
though.
In the process, lost the commits from special usernames
when committing changed po files. Instead of trying to dummy up a session
object for the special username, I just don't pass one, and the commit will
appear to be from whatever user ikiwiki runs as.
Everywhere that REMOTE_ADDR was used, a session object is available, so
instead use its remote_addr method.
In IkiWiki::Receive, stop setting a dummy REMOTE_ADDR.
Note that it's possible for a session cookie to be obtained using one IP
address, and then used from another IP. In this case, the first IP will now
be used. I think that should be ok.
Now the git plugin supports commits with author fields that look like:
Author: http://my.openid/ <me@web>
Then in recentchanges, the short username will be displayed, linking
to the openid.
Particularly useful for the horrible google openids, of course.
This way, an email-like link will be a mailto until a matching page
is created, then it will link to the page. And removing the page will
convert it back to a mailto.
At least two bugfixes in here. First, an old bug;
\[[foo#0]] was displayed as [[foo]], losing the anchor
as the anchor text was false. Secondly, a new bug;
an email like foo#bar@baz should not check bestlink("foo@baz").
The following ways to create a link are supported now:
[[url]]
[[text|url]]
url can be one of the following:
- an internal wikilink: will be handled as before
- any other kind of URL, including mailto: proper links will be created:
<a href="url">url</a>
<a href="url">text</a>
- an email address:
<a href="mailto:url">url</a>
<a href="mailto:url">text</a>
For now, a rebuild is the only way to ensure the changed theme is used.
Ikiwiki normally will not realize style.css has changed, since themes
tend to have the same timestamp for the file.
A short story:
Once there was a unicode string, let's call him Srcdir.
Along came a crufy old File::Find, who went through a tree and pasted each
of the leaves in turn onto Srcdir. But this 90's relic didn't decode the
leaves -- despite some of them using unicode! Poor Srcdir, with these
leaves stuck on him, tainted them with his nice unicode-ness. They didn't
look like leaves at all, but instead garbage.
(In other words, perl's unicode support sucks mightily, and drives
us all to drink and bad storytelling. But we knew that..)
So, srcdir is not normally flagged as unicode, because typically it's pure
ascii. And in that case, things work ok; File::Find finds filenames, which
are not yet decoded to unicode, and appends them to the srcdir, and then
decode_utf8 happily converts the whole thing.
But, if the srcdir does contain utf8 characters, that breaks. Or, if a Yaml
setup file is used, Yaml::Syck's implicitunicode sets the unicode flag of
*all* strings, even those containing only ascii. In either case, srcdir
has the unicode flag set; a non-decoded filename is appended, and the flag
remains set; and decode_utf8 sees the flag and does *nothing*. The result
is that the filename is not decoded, so looks valid and gets skipped.
File::Find only sticks the directory and filenames together in no_chdir
mode .. but we need that mode for security. In order to retain the
security, and avoid the problem, I made it not pass srcdir to File::Find.
Instead, chdir to the srcdir, and pass ".". Since "." is ascii, the problem
is avoided.
Note that chdir srcdir is safe because we check for symlinks in the srcdir
path.
Note that it takes care to chdir back to the starting location. Because
the user may have specified relative paths and so staying in the srcdir
might break. A relative path could even be specifed for an underlay dir, so
it chdirs back after each.
Removing a plugin from add_plugins is not always enough to disable it.
It may have been redundantly added there and also pulled in via goodstuff.
Always add didabled plugins to disable_plugins.
* calendar: Shorten day names, and improve styling of month calendar.
* style.css: Reduced sidebar width back to 20ex from 30; the month calendar
will now fit in the smaller width, and 30 was feeling too large.
I've seen user(http://*) confuse someone who didn't know pagespecs to think
that just http://* would moderate all comments to every page, or something
like that.
Problem is that by the time rendering calls render_dependent, %pagesources
has had deleted files removed from it. So match_comment's lookup of
files in there to see if they had the _comment extension failed.
I had to introduce a hash that temporarily holds filenames of deleted pages
to fix this.
Note that unlike comment(), internal() had avoided this pitfall by being
defined to match both internal and non-internal pages.
If the site is configured to allow comments on *, then the comment post
interface was being added to cgi pages like signin and prefs. This fixes it
w/o requiring more page.tmpl changes. The pagetemplate hook is called by
misctemplate with an empty page name for dynamic pages.
On second thought, misctemplate can use pagetemplate hooks to provide
it, so it's better to keep back-compat, and allow full customisation
of how it's displayed via the template.
So RecentChanges shows on the action bar there,
convert recentchanges to use new pageactions hook,
with compatability code to avoid breaking old templates.
If po is imported twice, bad things happen. Guard against that.
I'm not sure what causes the double import; I saw it when websetup did a
wiki rebuild. Carp failed to show a backtrace for the second call to
import.
* openid: Incorporated a fancy openid-selector signin form.
(http://code.google.com/p/openid-selector/)
* openid: Use "openid_identifier" as the form field, as required
by OpenID Authentication v2.0 spec.
Instead, add a custom do=commentsignin, that calls cgi_signin.
This allows a plugin to inject a custom cgi_signin, that uses a different
do= parameter, and have it be used consitently. (This was the only
place to hardcode a link to do=signin.)
test isinternal first, because match_glob with internal => 1 also returns
non-internal pages that match. This order should also be faster.
Remove test to see if pagesources is set. isinternal will not succeed if it
is not.
Since all forms are wrapped in a template that defines the actual
stylesheets, formbuilder just has to be told to turn on stylesheet mode,
not what file is the style sheet.
* comments: Comments pending moderation are now stored in the srcdir
alongside accepted comments, but with a `._comment_pending` extension.
* This allows easier byhand moderation, as the "_pending" need
only be stripped off and the comment be committed to version control.
* The `comment_pending()` pagespec can be used to match such unmoderated
comments, which makes it easy to add a feed of them, or a counter
indicating how many there are.
* Belatedly added a `comment()` pagespec.
Note that I put comment-header in a <header> despite it being
below the comment. Using a <footer> would be confusing given
the class name. Also, the content is semantically closer to
a header than a footer.
This entailed changing template_params; it no longer takes the template
filename as its first parameter.
Add template_depends to api and replace calls to template() with
template_depends() in appropriate places, where a dependency should be
added on the template.
Other plugins don't use template(), so will need further work.
Also, includes are disabled for security. Enabling includes only when using
templates from the templatedir would be nice, but would add a lot of
complexity to the implementation.
This is needed so that when a negated pagespec like "!author(foo)"
stops matching, due to the page being changed, ikiwiki knows that
the match was influenced by the page content.
The commit that added the (working) support for using /tag to override
tagbase also tried to make ./tag work. Problem is, tags are links,
and ./foo is not a valid link (though I think there's a wishlist about it).
So, using ./tag really resulted in tag creation links that led to a
"bad page name" error. And even if the tag were created in the right place,
the link didn't go to it.
By a stroke of luck, after a long & full day, I happened to
remember that in the morning, I had seen someone on irc mention
that darcs query manifest doesn't like it if its full output
is not consumed.
So contrary to the usual case where bug reports sent via irc are like
messages written in sand before the new tide, this one was seen and
fixed.
(But use http://ikiwiki.info/bugs/ next time!)
$cgi->params('do') may not be defined. The CSRF code may delete all
cgi params. This uninitalized value was introduced when do=register
support was added recently.
Many calls to file_prune were incorrectly calling it with 2 parameters.
In cases where the filename being checked is relative to the srcdir,
that is not needed.
Made absolute filenames be pruned. (This won't work for the 2 parameter call
style.)
A tag like ./foo is searched for relative to the tagging page.
However, if multiple pages use such a tag, the only one sure
to be in common is in the root, so autocreate it there to
avoid scattering redunadant autocreated tags around the tree.
(This is probably not ideal.)
Also renamed the tagpage and taglink functions for clarity.
Made add_autofile take a generator function, and just register the
autofile, for later possible creation. The testing is moved into Render,
which allows cleaning up some stuff.
This is a slow implementation; it runs svn log once per file
still, rather than running svn log once on the whole srcdir.
I did it this way because in my experience, svn log, run on a directory,
does not always list every change to files inside that directory.
I don't know why, and I use svn as little as possible these days.
* Automatically run --gettime the first time ikiwiki is run on
a given srcdir.
* Optimise --gettime for git, so it's appropriatly screamingly
fast. (This could be done for other backends too.)
* However, --gettime for git no longer follows renames.
* Use above to fix up timestamps on docwiki, as well as ensure that
timestamps on basewiki files shipped in the deb are sane.
* Rename --getctime to --gettime. (The old name still works for
backwards compatability.)
* --gettime now also looks up last modification time.
* Add rcs_getmtime to plugin API; currently only implemented
for git.
The pagetemplate hook may be called multiple times, for example when pages
are inlined into a page. Sidebars were being calculated each time that
happened, only to be thrown away when the final pagetemplate hook was
called. Avoid this unnecessary work.
Remove stored sidebar content on use to save some memory.
This way, the example blog always has a sidebar on the index page,
but not the overhead of sidebars on all the other pages. And if a
user wants to, they can enable global_sidebars to switch to sidebars on
every page.
* pagestats: Class parameter can be used to override default class for
custom styling.
* pagestats: Use style=list to get a list of tags, scaled by use like
in a tag cloud. This is useful to put in a sidebar.
* Rework example blog front page.
This makes them consistent with the rest of the meta keys. A wiki rebuild
will be needed on upgrade to this version; until the wiki is rebuilt,
double-escaping will occur in the titles of pages that have not changed.
The meta title data set by comments needs to be encoded the same way that
meta encodes it. (NB The security implications of the missing encoding
are small.)
Note that meta's encoding of title, description, and guid data, and not
other data, is probably a special case that should be removed. Instead,
these values should be encoded when used. I have avoided doing so here
because that would mean forcing a wiki rebuild on upgrade to have the data
consitently encoded.
The output of "bzr log" seems to have changed a bit, so we change the
parsing accordingly. This has not been tested with earlier versions of
bzr.
Several problems seemed to occur, all in the bzr_log subroutine:
1. The @infos list would contain an empty hash, which would confuse the
rest of the program.
2. This was because bzr_log would push an empty anonymous hash to the
list whenever it thought a new record would start.
3. However, a new record marker (now?) also happens at th end of bzr log
output.
4. Now we collect the record to a hash that gets pushed to the list only
if it is not empty.
5. Also, sometimes bzr log outputs "revno: 1234 [merge]", so we catch only
the revision number.
6. Finally, there may be non-headers at the of the output, so we ignore
those.
The reason to do this is basically a user interaction design decision.
It is achieved by adding an entry, associated to the creating plugin, to
%pagestate. To find out if files were deleted a new global hash %del_hash is
%introduced.
add_autofile has to have checks, whether to create the file, anyway, so this
will make things more consistent.
Correcter check for the result of verify_src_file().
Cosmetic rename of a variable $addfile to $autofile.
Colons are not allowed at the start of urls, because it can be interpreted
as a protocol, and allowing arbitrary protocols can be unsafe
(CVE-2008-0809). However, this check was too restrictive, not allowing
use of eg, "video.ogv?t=0:03:00/0:04:00" to seek to a given place in a
video, or "somecgi?foo=bar:baz" to pass parameters with colons.
It's still not allowed to have a filename with a colon in it (ie
"foo:bar.png") -- to link to such a file, a fully qualified url must be
used.
The info is stored in the session database, not the user database.
There should be no reason to need it when a user is not logged in.
Also, hide the email field in the preferences page for openid users.
Note that the email and username are not yet actually used for anything.
The email will be useful for gravatar, while the username might be used
for a more pretty display of the openid.
* moderatedcomments: Added moderate_pagespec that can be used
to control which users or comment locations are moderated.
This can be used, just for example, to moderate http://myopenid.com/*
if you're getting a lot of spammers from one particular openid
provider (who should perhaps answer your emails about them),
while not moderating other users.
* moderatedcomments: The moderate_users setting is deprecated. Instead,
set moderate_pagespec to "!admin()" or "user(*)" instead.
This prevented comments containing some utf-8, including euro sign, from
being submitted. Since md5_hex is a C implementation, the string has to be
converted from perl's internal encoding to utf-8 when it is called. Some
utf-8 happened to work before, apparently by accident.
Note that this will change the checksums returned.
unique_comment_location is only used when posting comments, so the checksum
does not need to be stable there.
I only changed page_to_id for completeness; it is passed a comment page
name, and they can currently never contain utf-8.
In teximg, the bug could perhaps be triggered if the tex source contained
utf-8. If that happens, the checksum will change, and some extra work might
be performed on upgrade to rebuild the image.
This was not doable before, but when I added transitive dependency handling
in the big dependency rewrite, it became possible to include a comment
count when inlining.
This also improves the action link when a page has no comments. It will
link direct to the cgi to allow posting the first comment. And if the page
is locked to prevent posting new comments, the link is no longer shown.
When creating a page, multiple locations are tested to see if they can be
edited. If all fail, one of the failure subs is called, to log the user in
to allow them to proceed with the edit. So far so good.
But, what if some pages fail for one reason, and some for another? This
occurs when httpauth_pagespec is used in conjunction with signinedit (and
openid or something). When the user is not signed in at all
The former will fail to edit a page because the user was not httpauthed.
The latter will fail to edit a different page, because the user was not
signed in. One of their failure methods gets to run first.
The page creation code always ran the failure method corresponding to the
topmost page location. So, when editing a foo/Discussion page, and with
httpauth_pagespec => "*!/Discussion", it ran the httpauth failure method,
which was exactly the wrong thing to do.
I fixed this by making it instead run the failure method for the *best*
page location. In the above example, that's foo/Discussion, so signinedit
runs, as desired, and we get the signin page.
This seems like it will be the right choice, or at least an acceptable
choice. If a user wants to use httpauth they can always choose it on the
signin page.
My logic was right before. Cleaned up some code.
(Page creation is still a problem.)
Also, I removed the Edit url munging, because that is not
necessary with the canedit hook, since canedit will handle
redirection through cgiauthurl if necessary.
Now that openiduser is in IkiWiki core, it's ok to have passwordauth check
for it, and avoid displaying useless password fields when showing
preferences for an openid.
Also improved the styling of the display of the openid in the preferneces
page.
if "tag_autocreate=1" is set in the configuration. The pages will be created in
tagbase, if and only if they do not exist in the srcdir yet. Tag pages will be create from
"autotag.tmpl".
At this stage a second refresh is needed for the tag pages to be rendered.
Add autotag.tmpl template.
Consider a template like:
[[!template type=note text="""
[[!inline pages="*foo*"]]
"""]]
The text parameter is htmlized before being passed into the template (in
case the template wraps it in a <span> that prevents markdown from
htmlizing it later).
But, when markdown sees "*foo*", it turns that into <em>foo</em>.
Later, when preprocessing the inline directive, that leads to suprising
results.
To fix this, I made template parameters be preprocessed (and filtered)
before being htmlized.
Note that I left in the preprocessing (and filtering) of the template
output at the end. That's still relevant when the template itself contains
preprocessor directives.
Note that there is an associated po4a warning when a page is empty:
Use of uninitialized value $file in substitution (s///) at /usr/share/perl5/Locale/Po4a/Text.pm line 205.
I've filed a bug with po4a about that, but the important thing is fixing
the crash here.
The new git-notes feature in git 1.6.6 changes git log output in a way that
broke ikiwiki's parser if notes are added to commits.
I decided to deal with this by disabling notes when ikiwiki uses git,
by setting GIT_NOTES_REF="". AFAICS, looking up notes when dumping logs
will only waste time, since it does not currently seem to make sense for
ikiwiki to do anything with the notes.
I noticed that chromium was not hyperlinking the areas in the object-based
linkmap, while img works ok. Dunno why, but img based is nicer anyway since
it is allowed right through the htmlscrubber with no workarounds.
This way users can use all the other alignment values when not including a
caption. Also, it will work without the standard style, and I don't have to
worry about regressions this way.
This is achieved by preparing CSS definitions that emulates the behavior
of the align attribute, and passing it to the outermost IMG wrapper
(A or TABLE) instead of passing the align value to IMG directly.
On second though, you might want a wide-open wiki with some locked
pages that cannot be edited online.
So, the right thing for lockedit to do when there are no auth plugins is
to just say the page cannot be edited.
Problem here was that no charset http header was being sent.
I fixed this globally by making cgi_custom_failure send the header.
Required changing its parameters.
The crux of the problem is that the cgi object has raw values not converted
to utf-8, and rename was using its fields. Also fixed a missed place where
the form object did not get its fields utf-8 encoded.
Speedup of about 25% for small inlines; could be much larger for inlines of
many, or complex pages.
Not bloating memory with excessive memoization data was the key to this.
The method chosen does not squeeze out every erg of speed possible when
inlines are nested, but that's rare. It uses less memory than other
optimisation hacks (I'm looking at you,
f937c1fb80 !) already used in inline.pm.
Unlike generic meta foo tags, meta description is known to be safe, so can
be special cased to be allowed despite the html scrubber. This makes meta
description much more useful, since it is otherwise limited to being used
by other plugins like map.
My experience is that when inlines are nested, the old behavior of
generating feeds for the nested inlines was never really desired. Since the
feeds were numbered sequentially, the numbers could easily change, and it did
not make sense to subscribe to or use those feeds. And generating those nested
feeds often meant a lot of unnecessary calculation, and data being written.
So, I dropped them.
Looking back, nested feeds originally were a free side effect of properly
handing multiple feeds on one page. Of course, that is still supported.