Tuesday, December 27, 2011


I haven't blogged recently (or really ever), so I thought I'd make a rambley post about an extension I'm currently writing.

Recently, at Amgine's suggestion, I have been working on a new MediaWiki extension called PageInCat. What it does is add a new parser function {{#incat:Some category|text if current page is in "Some categoy"| text if it is not}}. At first glance I thought it'd be fairly straightforward, but it turned out to be tricky to get right.

It's fairly easy to determine if a page is in a specific category. Just query the database. The problem is that when we're reading the {{#incat, it is before the page is saved, so the db would have the categories for the previous version of the page, not the version we are in the process of saving. Thus it would work fine if no categories were added/removed in this revision, however if categories did change, the result wouldn't reflect the new changes.

The solution I used, was to mark the page as vary-revision. This is a signal to MediaWiki that the page varies with how its saved to the database. The original purpose of vary-revision was the {{REVISIONID}} magic word, which inserts what revision number the page is, which can only be determined once the page is saved to the db. With the page marked as such, MediaWiki will only serve users versions of the page rendered after it is saved in the DB.

vary-revision fixes the problem for saving the page, however, previews still don't work because previews never get saved, so are never inserted into the db. So what the extension does is hook into EditPageGetPreviewText which is run right before the preview is generated. It takes the edit box text, parses it, and stores the resulting categories. Next once mediawiki does the actual preview, it hooks into ParserBeforeInternalParse, which is a hook run very early in the parse process. At this point it checks if we already have the categories for this text stored, and if so uses those for calculating the #incat's for the preview.

This makes the preview give the correct result, albeit at the price of parsing the preview text twice, slowing down the preview process.

However, there's one more situation where the extension could give wrong results during preview (or saving for that matter). What if someone does something like {{#incat:Foo||[[category:Foo]]}} (read: The page is in category foo only if it is not in category foo). There's really no correct answer for if the page is in category foo or not (as it is self-contradictory), so #incat can't chose the right result. A less pathological case would be #incat's that depend on each other - if page in foo add cat bar, if page in bar add cat baz, if page in baz add cat fred, and so on. The category memberships can't be determined in this case by the two stage, figure out which categories the article is in, and then base the #incat's on that, as each category would only be determined to be included once it was determined the previous category in the chain was included.

Really there's not much we can do in these cases. Thus instead of trying to prevent it, the extension tries to warn the user. What it does is keeps track of what response #incat gave, and then at the end of the parse (during ParserAfterTidy) it checks if the #incat responses match the actual categories of the page. If they don't match, it presents a warning at the top of the page during preview, via $parser->getOutput()->addWarning(), which is similar to what happens if someone exceeds the expensive parser function limit (It doesn't add a tracking category though like expensive parser func exceeded does, but it certainly could if it'd be useful)

Anyways, hopefully the extension is useful to someone :)

No comments:

Post a Comment