A short story:
Once there was a unicode string, let's call him Srcdir.
Along came a crufy old File::Find, who went through a tree and pasted each
of the leaves in turn onto Srcdir. But this 90's relic didn't decode the
leaves -- despite some of them using unicode! Poor Srcdir, with these
leaves stuck on him, tainted them with his nice unicode-ness. They didn't
look like leaves at all, but instead garbage.
(In other words, perl's unicode support sucks mightily, and drives
us all to drink and bad storytelling. But we knew that..)
So, srcdir is not normally flagged as unicode, because typically it's pure
ascii. And in that case, things work ok; File::Find finds filenames, which
are not yet decoded to unicode, and appends them to the srcdir, and then
decode_utf8 happily converts the whole thing.
But, if the srcdir does contain utf8 characters, that breaks. Or, if a Yaml
setup file is used, Yaml::Syck's implicitunicode sets the unicode flag of
*all* strings, even those containing only ascii. In either case, srcdir
has the unicode flag set; a non-decoded filename is appended, and the flag
remains set; and decode_utf8 sees the flag and does *nothing*. The result
is that the filename is not decoded, so looks valid and gets skipped.
File::Find only sticks the directory and filenames together in no_chdir
mode .. but we need that mode for security. In order to retain the
security, and avoid the problem, I made it not pass srcdir to File::Find.
Instead, chdir to the srcdir, and pass ".". Since "." is ascii, the problem
is avoided.
Note that chdir srcdir is safe because we check for symlinks in the srcdir
path.
Note that it takes care to chdir back to the starting location. Because
the user may have specified relative paths and so staying in the srcdir
might break. A relative path could even be specifed for an underlay dir, so
it chdirs back after each.
A short story:
Once there was a unicode string, let's call him Srcdir.
Along came a crufy old File::Find, who went through a tree and pasted each
of the leaves in turn onto Srcdir. But this 90's relic didn't decode the
leaves -- despite some of them using unicode! Poor Srcdir, with these
leaves stuck on him, tainted them with his nice unicode-ness. They didn't
look like leaves at all, but instead garbage.
In other words, perl's unicode support sucks mightily, and drives
us all to drink and bad storytelling. But we knew that..
So, srcdir is not normally flagged as unicode, because typically it's pure
ascii. And in that case, things work ok; File::Find finds filenames, which
are not yet decoded to unicode, and appends them to the srcdir, and then
decode_utf8 happily converts the whole thing.
But, if the srcdir does contain utf8 characters, that breaks. Or, if a Yaml
setup file is used, Yaml::Syck's implicitunicode sets the unicode flag of
*all* strings, even those containing only ascii. In either case, srcdir
has the unicode flag set; a non-decoded filename is appended, and
decode_utf8 sees the flag and does *nothing*. The result is that the
filename is not decoded, so looks valid and gets skipped.
File::Find only sticks the directory and filenames together in no_chdir
mode .. but we need that mode for security. In order to retain the
security, and avoid the problem, I made it not pass srcdir to File::Find.
Instead, chdir to the srcdir, and pass ".". Since "." is ascii, the problem
is avoided.
Note that it takes care to chdir back to the starting location. Because
the user may have specified relative paths and so staying in the srcdir
might break. A relative path could even be specifed for an underlay dir, so
it chdirs back after each.
Problem is that by the time rendering calls render_dependent, %pagesources
has had deleted files removed from it. So match_comment's lookup of
files in there to see if they had the _comment extension failed.
I had to introduce a hash that temporarily holds filenames of deleted pages
to fix this.
Note that unlike comment(), internal() had avoided this pitfall by being
defined to match both internal and non-internal pages.
* Ikiwiki can be configured to generate html5 instead of the default xhtml
1.0. The html5 output mode is experimental, not yet fully standards
compliant, and will be subject to rapid change.
Needed to handle the move of the .js files into ikiwiki/, but also this is
a longstanding bug.
Old pagemtime is not remembered in rebuild mode, and changing that would
need a lot of changes. So instead, loop on pagectime, which is remembered.
Change to remembering old pagesources info in rebuild mode. This seems safe
enough.
Rather than wasting resources recording that every page depends on
page.tmpl, add a special case. The special case curretly rebuilds non-page
files too when page.tmpl changes, but that's minor.
This entailed changing template_params; it no longer takes the template
filename as its first parameter.
Add template_depends to api and replace calls to template() with
template_depends() in appropriate places, where a dependency should be
added on the template.
Other plugins don't use template(), so will need further work.
Also, includes are disabled for security. Enabling includes only when using
templates from the templatedir would be nice, but would add a lot of
complexity to the implementation.
loadindex does not bother populating oldtypedlinks if there is no link
type. However, the code in link_types_changed assumed that if oldtypedlinks
is not defined, and typedlinks is, they must differ.
This fixes the problem that it did not remember if an autofile is deleted,
unless a plugin happened to register the autofile at the same time.
With the new code, we just never recreate an autofile more than once.
Only downside is that the list of autofiles is never pruned either.
And I don't really see a way to prune it.
Splitting out this function bothered me. It is conceptially similar to
file_pruned, and yet also very specific to exactly the security needs of
find_src_files.
I liked that it got rid of duplicate code in the latter function. So
instead, put a helper sub in that, which I think allows refactoring
things more cleanly, and with less boilerplate.
As to the needs of gen_autofile, I'm not convinced this needs to handle
the same set of problems that verify_src_file did. So I sat down and
wrote a custom validator for autofiles, which turned out to seem to just
need three things: Make sure the candidate filename is not something
that would be pruned; untaint the candidate filename; and make sure that
srcdir doesn't already have something with its name. (Plus, of course,
all the other checks that were already in gen_autofile.)
(In passing, also fixed a bunch of bugs I had introduced in this branch.)
Made add_autofile take a generator function, and just register the
autofile, for later possible creation. The testing is moved into Render,
which allows cleaning up some stuff.
* Automatically run --gettime the first time ikiwiki is run on
a given srcdir.
* Optimise --gettime for git, so it's appropriatly screamingly
fast. (This could be done for other backends too.)
* However, --gettime for git no longer follows renames.
* Use above to fix up timestamps on docwiki, as well as ensure that
timestamps on basewiki files shipped in the deb are sane.
* Rename --getctime to --gettime. (The old name still works for
backwards compatability.)
* --gettime now also looks up last modification time.
* Add rcs_getmtime to plugin API; currently only implemented
for git.
The reason to do this is basically a user interaction design decision.
It is achieved by adding an entry, associated to the creating plugin, to
%pagestate. To find out if files were deleted a new global hash %del_hash is
%introduced.
pagespec_translate may set $@ if it fails to parse a pagespec, but
due to memoization, this is not reliable. If a memoized call is repeated,
and $@ is already set for some other reason previously, it will remain
set through the call to pagespec_translate.
Instead, just check if pagespec_translate returns undef.
Use `_` to avoid superfluous stat.
Check for `defined $file`, instead of just `$file`.
Add spaces after commas.
Change return values of `verify_src_file()` to not return the tainted filename.
Rename `$f` to `$file_untainted in `verify_src_file()`.
$f changes to `$file` in `find_src_files()`.
This attempts to fix commit f3abeac919.
For discussion see
<http://ikiwiki.info/todo/auto-create_tag_pages_according_to_a_template/>