ikiwiki/doc/plugins/comments/discussion.mdwn

## Why internal pages? (unresolved)

Comments are saved as internal pages, so they can never be edited through the CGI,
only by direct committers.

> So, why do it this way, instead of using regular wiki pages in a
> namespace, such as `$page/comments/*`? Then you could use [[plugins/lockedit]] to
> limit editing of comments in more powerful ways. --[[Joey]]

>> Er... I suppose so. I'd assumed that these pages ought to only exist as inlines
>> rather than as individual pages (same reasoning as aggregated posts), though.
>>
>> lockedit is actually somewhat insufficient, since `check_canedit()`
>> doesn't distinguish between creation and editing; I'd have to continue to use
>> some sort of odd hack to allow creation but not editing.
>>
>> I also can't think of any circumstance where you'd want a user other than
>> admins (~= git committers) and possibly the commenter (who we can't check for
>> at the moment anyway, I don't think?) to be able to edit comments - I think
>> user expectations for something that looks like ordinary blog comments are
>> likely to include "others can't put words into my mouth".
>>
>> My other objection to using a namespace is that I'm not particularly happy about
>> plugins consuming arbitrary pieces of the wiki namespace - /discussion is bad
>> enough already. Indeed, this very page would accidentally get matched by rules
>> aiming to control comment-posting... :-) --[[smcv]]

>>> Thinking about it, perhaps one way to address this would be to have the suffix
>>> (e.g. whether commenting on Sandbox creates sandbox/comment1 or sandbox/c1 or
>>> what) be configurable by the wiki admin, in the same way that recentchanges has
>>> recentchangespage => 'recentchanges'? I'd like to see fewer hard-coded page
>>> names in general, really - it seems odd to me that shortcuts and smileys
>>> hard-code the name of the page to look at. Perhaps I could add
>>> discussionpage => 'discussion' too? --[[smcv]]

>>> (I've now implemented this in my branch. --[[smcv]])

>> The best reason to keep the pages internal seems to me to be that you
>> don't want the overhead of every comment spawning its own wiki page. --[[Joey]]

## Formats (resolved)

The plugin now allows multiple comment formats while still using internal
pages; each comment is saved as a page containing one `\[[!comment]]` directive,
which has a superset of the functionality of [[ikiwiki/directives/format]].

## Access control (unresolved?)

By the way, I think that who can post comments should be controllable by
the existing plugins opendiscussion, anonok, signinedit, and lockedit. Allowing
posting comments w/o any login, while a nice capability, can lead to
spam problems. So, use `check_canedit` as at least a first-level check?
--[[Joey]]

> This plugin already uses `check_canedit`, but that function doesn't have a concept
> of different actions. The hack I use is that when a user comments on, say, sandbox,
> I call `check_canedit` for the pseudo-page "sandbox[postcomment]". The
> special `postcomment(glob)` [[ikiwiki/pagespec]] returns true if the page ends with
> "[postcomment]" and the part before (e.g. sandbox) matches the glob. So, you can
> have postcomment(blog/*) or something. (Perhaps instead of taking a glob, postcomment
> should take a pagespec, so you can have postcomment(link(tags/commentable))?)
>
> This is why `anonok_pagespec => 'postcomment(*)'` and `locked_pages => '!postcomment(*)'`
> are necessary to allow anonymous and logged-in editing (respectively).
>
>> I changed that to move the flag out of the page name, and into a variable that the `match_postcomment`
>> function checks for. Other ugliness still applies. :-) --[[Joey]] 
>
> This is ugly - one alternative would be to add `check_permission()` that takes a
> page and a verb (create, edit, rename, remove and maybe comment are the ones I
> can think of so far), use that, and port the plugins you mentioned to use that
> API too. This plugin could either call `check_can("$page/comment1", 'create')` or
> call `check_can($page, 'comment')`.
> 
> One odd effect of the code structure I've used is that we check for the ability to
> create the page before we actually know what page name we're going to use - when
> posting the comment I just increment a number until I reach an unused one - so
> either the code needs restructuring, or the permission check for 'create' would
> always be for 'comment1' and never 'comment123'. --[[smcv]]

>> Now resolved, in fact --[[smcv]]

> Another possibility is to just check for permission to edit (e.g.) `sandbox/comment1`.
> However, this makes the "comments can only be created, not edited" feature completely
> reliant on the fact that internal pages can't be edited. Perhaps there should be a
> `editable_pages` pagespec, defaulting to `'*'`? --[[smcv]]

## comments directive vs global setting (resolved?)

When comments have been enabled generally, you still need to mark which pages
can have comments, by including the `\[[!comments]]` directive in them. By default,
this directive expands to a "post a comment" link plus an `\[[!inline]]` with
the comments. [This requirement has now been removed --[[smcv]]]

> I don't like this, because it's hard to explain to someone why they have
> to insert this into every post to their blog. Seems that the model used
> for discussion pages could work -- if comments are enabled, automatically
> add the comment posting form and comments to the end of each page.
> --[[Joey]]

>> I don't think I'd want comments on *every* page (particularly, not the
>> front page). Perhaps a pagespec in the setup file, where the default is "*"?
>> Then control freaks like me could use "link(tags/comments)" and tag pages
>> as allowing comments.
>>
>>> Yes, I think a pagespec is the way to go. --[[Joey]]

>>>> Implemented --[[smcv]]

>> 
>> The model used for discussion pages does require patching the existing
>> page template, which I was trying to avoid - I'm not convinced that having
>> every possible feature hard-coded there really scales (and obviously it's
>> rather annoying while this plugin is on a branch). --[[smcv]]

>>> Using the template would allow customising the html around the comments
>>> which seems like a good thing? --[[Joey]]

>>>> The \[[!comments]] directive is already template-friendly - it expands to
>>>> the contents of the template `comments_embed.tmpl`, possibly with the
>>>> result of an \[[!inline]] appended. I should change `comments_embed.tmpl`
>>>> so it uses a template variable `INLINE` for the inline result rather than
>>>> having the perl code concatenate it, which would allow a bit more
>>>> customization (whether the "post" link was before or after the inline).
>>>> Even if you want comments in page.tmpl, keeping the separate comments_embed.tmpl
>>>> and having a `COMMENTS` variable in page.tmpl might be the way forward,
>>>> since the smaller each templates is, the easier it will be for users
>>>> to maintain a patched set of templates. (I think so, anyway, based on what happens
>>>> with dpkg prompts in Debian packages with monolithic vs split
>>>> conffiles.) --[[smcv]]

>>>>> I've switched my branch to use page.tmpl instead; see what you think? --[[smcv]]

## Raw HTML (resolved?)

Raw HTML was not initially allowed by default (this was configurable).

> I'm not sure that raw html should be a problem, as long as the
> htmlsanitizer and htmlbalanced plugins are enabled. I can see filtering
> out directives, as a special case. --[[Joey]]

>> Right, if I sanitize each post individually, with htmlscrubber and either htmltidy
>> or htmlbalance turned on, then there should be no way the user can forge a comment;
>> I was initially wary of allowing meta directives, but I think those are OK, as long
>> as the comment template puts the \[[!meta author]] at the *end*. Disallowing
>> directives is more a way to avoid commenters causing expensive processing than
>> anything else, at this point.
>>
>> I've rebased the plugin on master, made it sanitize individual posts' content
>> and removed the option to disallow raw HTML. Sanitizing individual posts before
>> they've been htmlized required me to preserve whitespace in the htmlbalance
>> plugin, so I did that. Alternatively, we could htmlize immediately and always
>> save out raw HTML? --[[smcv]]

>>> There might be some use cases for other directives, such as img, in
>>> comments.
>>> 
>>> I don't know if meta is "safe" (ie, guaranteed to be inexpensive and not
>>> allow users to do annoying things) or if it will continue to be in the
>>> future. Hard to predict really, all that can be said with certainty is
>>> all directives will contine to be inexpensive and safe enough that it's
>>> sensible to allow users to (ab)use them on open wikis.
>>> --[[Joey]]

----

I have a test ikiwiki setup somewhere to investigate adopting the comments
plugin. It is setup with no auth enabled and I got hammered with a spam attack
over the last weekend (predictably).  What surprised me was the scale of the
attack: ikiwiki eventually triggered OOM and brought the box down. When I got
it back up, I checked out a copy of the underlying git repository, and it
measured 280M in size after being packed. Of that, about 300K was data prior
to the spam attack, so the rest was entirely spam text, compressed via git's
efficient delta compression.

I had two thoughts about possible improvements to the comments plugin in the
wake of this:

 * comment pagination - there is a hard-to-define upper limit on the number
   of comments that can be appended to a wiki page whilst the page remains
   legible.  It would be useful if comments could be paginated into sub-pages.

 * crude flood control - asides from spam attacks (and I am aware of
   [[plugins/blogspam]]), people can crap flood or just aggressively flame
   repeatedly. An interesting prevention measure might be to not let an IP
   post more than 3 sequential comments to a page, or to the site, without
   at least one other comment being interleaved. I say 3 rather than 2 since
   correction follow-ups are common.

-- [[Jon]]
Move some more discussion here 2008-11-27 11:42:07 +01:00			`## Why internal pages? (unresolved)`
Move some discussion from comments page to here 2008-11-27 11:38:37 +01:00
			`Comments are saved as internal pages, so they can never be edited through the CGI,`
multiple formats now supported 2008-12-11 03:50:15 +01:00			`only by direct committers.`
Move some discussion from comments page to here 2008-11-27 11:38:37 +01:00
			`> So, why do it this way, instead of using regular wiki pages in a`
			> namespace, such as `$page/comments/*`? Then you could use [[plugins/lockedit]] to
			`> limit editing of comments in more powerful ways. --[[Joey]]`

			`>> Er... I suppose so. I'd assumed that these pages ought to only exist as inlines`
			`>> rather than as individual pages (same reasoning as aggregated posts), though.`
			`>>`
			>> lockedit is actually somewhat insufficient, since `check_canedit()`
			`>> doesn't distinguish between creation and editing; I'd have to continue to use`
			`>> some sort of odd hack to allow creation but not editing.`
			`>>`
			`>> I also can't think of any circumstance where you'd want a user other than`
			`>> admins (~= git committers) and possibly the commenter (who we can't check for`
			`>> at the moment anyway, I don't think?) to be able to edit comments - I think`
			`>> user expectations for something that looks like ordinary blog comments are`
			`>> likely to include "others can't put words into my mouth".`
			`>>`
			`>> My other objection to using a namespace is that I'm not particularly happy about`
			`>> plugins consuming arbitrary pieces of the wiki namespace - /discussion is bad`
			`>> enough already. Indeed, this very page would accidentally get matched by rules`
			`>> aiming to control comment-posting... :-) --[[smcv]]`

			`>>> Thinking about it, perhaps one way to address this would be to have the suffix`
			`>>> (e.g. whether commenting on Sandbox creates sandbox/comment1 or sandbox/c1 or`
			`>>> what) be configurable by the wiki admin, in the same way that recentchanges has`
			`>>> recentchangespage => 'recentchanges'? I'd like to see fewer hard-coded page`
			`>>> names in general, really - it seems odd to me that shortcuts and smileys`
			`>>> hard-code the name of the page to look at. Perhaps I could add`
			`>>> discussionpage => 'discussion' too? --[[smcv]]`

			`>>> (I've now implemented this in my branch. --[[smcv]])`

			`>> The best reason to keep the pages internal seems to me to be that you`
multiple formats now supported 2008-12-11 03:50:15 +01:00			`>> don't want the overhead of every comment spawning its own wiki page. --[[Joey]]`
Move some discussion from comments page to here 2008-11-27 11:38:37 +01:00
multiple formats now supported 2008-12-11 03:50:15 +01:00			`## Formats (resolved)`

			`The plugin now allows multiple comment formats while still using internal`
			pages; each comment is saved as a page containing one `\[[!comment]]` directive,
			`which has a superset of the functionality of [[ikiwiki/directives/format]].`
Move some discussion from comments page to here 2008-11-27 11:38:37 +01:00
Move some more discussion here 2008-11-27 11:42:07 +01:00			`## Access control (unresolved?)`
Move some discussion from comments page to here 2008-11-27 11:38:37 +01:00
			`By the way, I think that who can post comments should be controllable by`
			`the existing plugins opendiscussion, anonok, signinedit, and lockedit. Allowing`
			`posting comments w/o any login, while a nice capability, can lead to`
			spam problems. So, use `check_canedit` as at least a first-level check?
			`--[[Joey]]`

			> This plugin already uses `check_canedit`, but that function doesn't have a concept
			`> of different actions. The hack I use is that when a user comments on, say, sandbox,`
			> I call `check_canedit` for the pseudo-page "sandbox[postcomment]". The
			> special `postcomment(glob)` [[ikiwiki/pagespec]] returns true if the page ends with
			`> "[postcomment]" and the part before (e.g. sandbox) matches the glob. So, you can`
			`> have postcomment(blog/*) or something. (Perhaps instead of taking a glob, postcomment`
			`> should take a pagespec, so you can have postcomment(link(tags/commentable))?)`
			`>`
Fix documentation of anonok_pagespec. Closes: #521793 2009-03-30 19:07:50 +02:00			> This is why `anonok_pagespec => 'postcomment()'` and `locked_pages => '!postcomment()'`
Move some discussion from comments page to here 2008-11-27 11:38:37 +01:00			`> are necessary to allow anonymous and logged-in editing (respectively).`
			`>`
more comments doc updates Moved todo items to a todo page, mark the old todo item about comments as done, etc. 2008-12-17 20:15:52 +01:00			>> I changed that to move the flag out of the page name, and into a variable that the `match_postcomment`
			`>> function checks for. Other ugliness still applies. :-) --[[Joey]]`
			`>`
Move some discussion from comments page to here 2008-11-27 11:38:37 +01:00			> This is ugly - one alternative would be to add `check_permission()` that takes a
			`> page and a verb (create, edit, rename, remove and maybe comment are the ones I`
			`> can think of so far), use that, and port the plugins you mentioned to use that`
			> API too. This plugin could either call `check_can("$page/comment1", 'create')` or
			> call `check_can($page, 'comment')`.
			`>`
			`> One odd effect of the code structure I've used is that we check for the ability to`
			`> create the page before we actually know what page name we're going to use - when`
			`> posting the comment I just increment a number until I reach an unused one - so`
			`> either the code needs restructuring, or the permission check for 'create' would`
multiple formats now supported 2008-12-11 03:50:15 +01:00			`> always be for 'comment1' and never 'comment123'. --[[smcv]]`

			`>> Now resolved, in fact --[[smcv]]`
Move some discussion from comments page to here 2008-11-27 11:38:37 +01:00
			> Another possibility is to just check for permission to edit (e.g.) `sandbox/comment1`.
			`> However, this makes the "comments can only be created, not edited" feature completely`
			`> reliant on the fact that internal pages can't be edited. Perhaps there should be a`
			> `editable_pages` pagespec, defaulting to `'*'`? --[[smcv]]

Move some more discussion here 2008-11-27 11:42:07 +01:00			`## comments directive vs global setting (resolved?)`
Move some discussion from comments page to here 2008-11-27 11:38:37 +01:00
			`When comments have been enabled generally, you still need to mark which pages`
			can have comments, by including the `\[[!comments]]` directive in them. By default,
			this directive expands to a "post a comment" link plus an `\[[!inline]]` with
			`the comments. [This requirement has now been removed --[[smcv]]]`

			`> I don't like this, because it's hard to explain to someone why they have`
			`> to insert this into every post to their blog. Seems that the model used`
			`> for discussion pages could work -- if comments are enabled, automatically`
			`> add the comment posting form and comments to the end of each page.`
			`> --[[Joey]]`

			`>> I don't think I'd want comments on every page (particularly, not the`
			`>> front page). Perhaps a pagespec in the setup file, where the default is "*"?`
			`>> Then control freaks like me could use "link(tags/comments)" and tag pages`
			`>> as allowing comments.`
			`>>`
			`>>> Yes, I think a pagespec is the way to go. --[[Joey]]`

			`>>>> Implemented --[[smcv]]`

			`>>`
			`>> The model used for discussion pages does require patching the existing`
			`>> page template, which I was trying to avoid - I'm not convinced that having`
			`>> every possible feature hard-coded there really scales (and obviously it's`
			`>> rather annoying while this plugin is on a branch). --[[smcv]]`

			`>>> Using the template would allow customising the html around the comments`
			`>>> which seems like a good thing? --[[Joey]]`

			`>>>> The \[[!comments]] directive is already template-friendly - it expands to`
			>>>> the contents of the template `comments_embed.tmpl`, possibly with the
			>>>> result of an \[[!inline]] appended. I should change `comments_embed.tmpl`
			>>>> so it uses a template variable `INLINE` for the inline result rather than
			`>>>> having the perl code concatenate it, which would allow a bit more`
			`>>>> customization (whether the "post" link was before or after the inline).`
			`>>>> Even if you want comments in page.tmpl, keeping the separate comments_embed.tmpl`
			>>>> and having a `COMMENTS` variable in page.tmpl might be the way forward,
			`>>>> since the smaller each templates is, the easier it will be for users`
			`>>>> to maintain a patched set of templates. (I think so, anyway, based on what happens`
			`>>>> with dpkg prompts in Debian packages with monolithic vs split`
			`>>>> conffiles.) --[[smcv]]`

			`>>>>> I've switched my branch to use page.tmpl instead; see what you think? --[[smcv]]`
Move some more discussion here 2008-11-27 11:42:07 +01:00
			`## Raw HTML (resolved?)`

			`Raw HTML was not initially allowed by default (this was configurable).`

			`> I'm not sure that raw html should be a problem, as long as the`
			`> htmlsanitizer and htmlbalanced plugins are enabled. I can see filtering`
			`> out directives, as a special case. --[[Joey]]`

			`>> Right, if I sanitize each post individually, with htmlscrubber and either htmltidy`
			`>> or htmlbalance turned on, then there should be no way the user can forge a comment;`
			`>> I was initially wary of allowing meta directives, but I think those are OK, as long`
			`>> as the comment template puts the \[[!meta author]] at the end. Disallowing`
			`>> directives is more a way to avoid commenters causing expensive processing than`
			`>> anything else, at this point.`
			`>>`
			`>> I've rebased the plugin on master, made it sanitize individual posts' content`
			`>> and removed the option to disallow raw HTML. Sanitizing individual posts before`
			`>> they've been htmlized required me to preserve whitespace in the htmlbalance`
			`>> plugin, so I did that. Alternatively, we could htmlize immediately and always`
			`>> save out raw HTML? --[[smcv]]`

			`>>> There might be some use cases for other directives, such as img, in`
			`>>> comments.`
			`>>>`
			`>>> I don't know if meta is "safe" (ie, guaranteed to be inexpensive and not`
			`>>> allow users to do annoying things) or if it will continue to be in the`
			`>>> future. Hard to predict really, all that can be said with certainty is`
			`>>> all directives will contine to be inexpensive and safe enough that it's`
			`>>> sensible to allow users to (ab)use them on open wikis.`
			`>>> --[[Joey]]`
some thoughts in the wake of a spam attack 2009-07-21 12:44:52 +02:00
			`----`

			`I have a test ikiwiki setup somewhere to investigate adopting the comments`
			`plugin. It is setup with no auth enabled and I got hammered with a spam attack`
			`over the last weekend (predictably). What surprised me was the scale of the`
			`attack: ikiwiki eventually triggered OOM and brought the box down. When I got`
			`it back up, I checked out a copy of the underlying git repository, and it`
			`measured 280M in size after being packed. Of that, about 300K was data prior`
			`to the spam attack, so the rest was entirely spam text, compressed via git's`
			`efficient delta compression.`

			`I had two thoughts about possible improvements to the comments plugin in the`
			`wake of this:`

			`* comment pagination - there is a hard-to-define upper limit on the number`
			`of comments that can be appended to a wiki page whilst the page remains`
			`legible. It would be useful if comments could be paginated into sub-pages.`

			`* crude flood control - asides from spam attacks (and I am aware of`
			`[[plugins/blogspam]]), people can crap flood or just aggressively flame`
			`repeatedly. An interesting prevention measure might be to not let an IP`
			`post more than 3 sequential comments to a page, or to the site, without`
			`at least one other comment being interleaved. I say 3 rather than 2 since`
			`correction follow-ups are common.`

			`-- [[Jon]]`