tag:blogger.com,1999:blog-73382467859461210072024-03-11T23:58:20.774-07:00Bawolff's rantsA collection of my opinions/rants/whatever else.Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.comBlogger24125tag:blogger.com,1999:blog-7338246785946121007.post-60090210862152684942024-01-10T03:29:00.000-08:002024-01-10T03:29:15.822-08:00Imagining Future MediaWiki<p> As we roll into 2024, I thought I'd do something a little different on this blog.</p><p>A common product vision exercise is to ask someone, imagine it is 20 years from now, what would the product look like? What missing features would it have? What small (or large) annoyances would it no longer have?</p><p>I wanted to do that exercise with MediaWiki. Sometimes it feels like MediaWiki is a little static. Most of the core ideas were implemented a long time ago. Sure there is a constant stream of improvements, some quite important, but the core product has been fixed for quite some time now. People largely interact with MediaWiki the same way they always have. When I think of new fundamental features to MediaWiki, I think of things like Echo, Lua and VisualEditor, which can hardly be considered new at this point (In fairness, maybe DiscussionTools should count as a new fundamental feature, which is quite recent). Alternatively, I might think of things that are on the edges. Wikidata is a pretty big shift, but its a separate thing from the main experience and also over a decade old at this point.</p><p>I thought it would be fun to brainstorm some crazy ideas for new features of MediaWiki, primarily in the context of large sites like Wikipedia. I'd love to hear feedback on if these ideas are just so crazy they might work, or just crazy. Hopefully it inspires others to come up with their own crazy ideas.</p><h2 style="text-align: left;">What is MediaWiki to me?<br /></h2><p>Before I start, I suppose I should talk about what I think the goals of the MediaWiki platform is. What is the value that should be provided by MediaWiki as a product, particularly in the context of Wikimedia-type projects?</p><p>Often I hear Wikipedia described as a top 10 document hosting website combined with a medium scale social network. While I think there is some truth to that, I would divide it differently.<br /></p><p>I see MediaWiki as aiming to serve 4 separate goals:</p><ul style="text-align: left;"><li>A document authoring platform</li><li>A document viewing platform (i.e. Some people just want to read the articles). <br /></li><li>A community management tool</li><li>A tool to collect and disseminate knowledge <br /></li></ul><p>The first two are pretty obvious. MediaWiki has to support writing Wikipedia articles. MediaWiki has to support people reading Wikipedia articles. While I often think the difference between readers and editors is overstated (or perhaps counter-productive as hiding editing features from readers reduces our recruitment pool), it is true they are different audiences with different needs.</p><p>What I think is a bit under-appreciated sometimes but just as important, is that MediaWiki is not just about creating individual articles, it is about creating a place where a community of people dedicated to writing articles can thrive. This doesn't just happen at the scale of tens of thousands of people, all sorts of processes and bureaucracy is needed for such a large group to effectively work together. While not all of that is in MediaWiki, the bulk of it is.</p><p>One of my favourite things about the wiki-world, is it is a socio-technical system. The software does not prescribe specific ways of working, but gives users the tools to create community processes themselves. I think this is one of our biggest strengths, which we must not lose sight of. However we also shouldn't totally ignore this sector and assume the community is fine on its own - we should still be on the look out for better tools to allow the community to make better processes.</p><p>Last of all, MediaWiki aims to be a tool to aid in the collection and dissemination of knowledge¹. Wikimedia's mission statement is: "Imagine a world in which every single human being can freely share in the sum of all knowledge." No one site can do that alone, not even Wikipedia. We should aim to make it easy to transfer content between sites. If a 10 billion page treatise on Pokemon is inappropriate for Wikipedia, it should be easy for an interested party to set up there own site that can house knowledge that does not fit in existing sites. We should aim to empower people to do their own thing if Wikimedia is not the right venue. We do not have a monopoly on knowledge nor should we.</p><p>As anyone who has ever tried to copy a template from Wikipedia can tell you, making forks or splits from Wikipedia is easy in theory but hard in practice. In many ways I feel this is the area where we have most failed to meet the potential of MediaWiki.<br /></p><p>With that in mind, here are my ideas for new fundamental features in MediaWiki:</p><h2 style="text-align: left;">As a document authoring/viewing platform</h2><h3 style="text-align: left;">Interactivity</h3><p style="text-align: left;">Detractors of Wikipedia have often criticized how text based it is. While there are certainly plenty of pictures to illustrate, Wikipedia has typically been pretty limited when it comes to more complex multimedia. This is especially true of interactive multimedia. While I don't have first hand experience, in the early days it was often negatively compared to Microsoft Encarta on that front.</p><p style="text-align: left;">We do have certain types of interactive content, such as videos, slippy maps and 3D models, but we don't really have any options for truly interactive content. For example, physics concepts might be better illustrated with "interactive" experiments, e.g. where you can push a pendulum with a mouse and watch what happens.</p><p style="text-align: left;">One of my favourite illustrations on the web is <a href="https://observablehq.com/@tmcw/enigma-machine">this one</a> of an Enigma machine. The Enigma machine for those not familiar was a mechanical device used in world war 2 to encrypt secret messages. The interactive illustration shows how an inputted message goes through various wires and rotates various disks to give the scrambled output. I think this illustrates what an Enigma machine fundamentally is better than any static picture or even video would ever be able to.</p><p style="text-align: left;">Right now there are no satisfactory solutions on Wikipedia to make this kind of content. There was a previous effort to do something in the vein of interactive content in the graph extension, which allowed using the Vega domain specific language to make interactive graphs. I've <a href="https://www.mediawiki.org/wiki/User:Bawolff/Reflections_on_graphs">previously wrote</a> on how I think that was a good effort but ultimately missed the mark. In short, I believe it was too high level which caused it to lack the flexibility neccessarily to meet the needs of users, while also being difficult to build simplifying abstractions overtop.</p><p style="text-align: left;">I am a big believer that instead of making complicated projects that prescribe certain ways of doing things, it is better to make simpler, lower level tools that can be combined together in complex ways, as well as abstracted over so that users can make simple interfaces (Essentially the unix philosophy). On Wiki, I think this has been borne out by the success of using Lua scripting in templates. Lua is low level (relative to other wiki interfaces), but the users were able to use that to accomplish their goals without MediaWiki developers having to think about every possible thing they might want to do. Users were than able to make abstractions that hid the low level details in every day use.<br /></p><p style="text-align: left;">To that end, what I'd like to see, is to extend Lua to the client side. Allow special lua interfaces that allow calling other lua functions on the client side (run by JS), in order to make parts of the wiki page scriptable while being viewed instead of just while being generated.</p><p style="text-align: left;">I did make some early proof-of-concepts in this direction, see <a href="https://bawolff.net/monstranto/index.php/Main_Page">https://bawolff.net/monstranto/index.php/Main_Page</a> for a Demo of <a href="https://www.mediawiki.org/wiki/Extension:Monstranto">Extension:Monstranto</a>. See also a longer piece I <a href="https://www.mediawiki.org/wiki/User:Bawolff/Interactive_rich_media">wrote</a>, as well as <a href="https://meta.wikimedia.org/wiki/User:Yurik/I_Dream_of_Content">an essay</a> by Yurik on the subject I found inspiring.<br /></p><h3 style="text-align: left;">Mobile editing<br /></h3><p style="text-align: left;">This is one where I don't really know what the answer is, but if I imagine MW in 20 years, I certainly hope this is better.</p><p style="text-align: left;">Its not just MediaWiki, I don't think any website really has authoring long text documents on mobile figured out.</p><p style="text-align: left;">That said, I have seen some interesting ideas around, that I think are worth exploring (None of these are my own ideas)</p><h4 style="text-align: left;">Paragraph or sentence level editing</h4><p style="text-align: left;"><a href="https://www.mediawiki.org/w/index.php?oldid=673034">This idea was originally proposed</a> about 13 years ago by Jan Paul Posma. In fact, he write a whole <a href="https://upload.wikimedia.org/wikipedia/commons/7/78/In-line_Editing_thesis.pdf">bachelor's thesis</a> on it.</p><p style="text-align: left;">In essence, Mobile gets more frustrating the longer the text you are editing is. MediaWiki often works on editing at the granularity of a section, but what about editing at the granularity of a paragraph or a sentence instead? Especially if you just want to fix a typo on mobile, I feel it would be much easier if you could just hit the edit button on a sentence instead of the entire section.</p><p style="text-align: left;">Even better, I suspect that parsoid makes this a lot easier to implement now than it would have been back in the day.</p><h4 style="text-align: left;">Better text editing UI (e.g. Eloquent)<br /></h4><p style="text-align: left;">A while ago I was linked to <a href="https://jenson.org/text/">a very interesting article</a> by Scott Jenson about the problems with text editing on mobile. I think he articulated the reasons it is frustrating very well, and also proposed a better UI which he called Eloquent. I highly recommend reading the article and seeing if it makes sense to you.</p><p style="text-align: left;">In many ways, we can't really do this, as this is an android level UI not something we control in the web app. Even if we did manage to make it in a web app somehow, it would probably be a hard sell to ordinary users not used to the new UI. Nonetheless, I think it would be incredibly beneficial to experiment with alternate UIs like these, and see how far we can get. The world is increasingly going mobile, and Wikipedia is increasingly getting left behind.</p><h4 style="text-align: left;">Alternative editing interfaces (e.g. voice)<br /></h4><p>Maybe traditional text editing is not the way of the future. Can we do something with voice control?</p><p>It seems like voice controlled IDEs are increasingly becoming a thing. For example, <a href="https://www.joshwcomeau.com/blog/hands-free-coding/">here</a> is a blog post about someone who programs with a voice programming software called Talon. It seems like there are a couple other options out there. I see <span class="commtext c00"> Serenade mentioned quite a bit.</span></p><p><span class="commtext c00">A project in this space that looks especially interesting is <a href="https://www.cursorless.org/">cursorless</a>. The demo looked really cool, and i could imagine that a power user would find it easier to use a system like this to edit large blobs of WikiText than the normal text editing interface on mobile. Anyways, i reccomend watching the demo video to see what you think.<br /></span></p><p><span class="commtext c00">All this is to say, I think we should look really hard at the possibilities in this space for editing MediaWiki from a phone. On screen keyboards are always going to suck, might as well look to other options.</span></p><h2 style="text-align: left;"><span class="commtext c00">As a community building platform</span></h2><h3 style="text-align: left;"><span class="commtext c00">Extensibility</span></h3><p style="text-align: left;"><span class="commtext c00">I think it would be really cool if we had "lua" extensions. Instead of normal php extensions, a user would be able to register/upload some lua code, that gets subscribed to hooks, and do stuff. In this vision, these extension types would not be able to do anything unsafe like raw html, but would be able to do all sorts of stuff that users normally use javascript for.</span></p><p style="text-align: left;"><span class="commtext c00">This could be per user or also global. Perhaps could be integrated with a permission system to control what they can and cannot do. <br /></span></p><p style="text-align: left;"><span class="commtext c00">I'd also like to see a super stable API abstraction layer for these (and normal extensions). Right now our extension API is fairly unstable. I would love to see a simple abstraction layer with hard stability guarantees. It wouldn't replace the normal API entirely, but would allow simpler extensions to be written in such a way that they retain stability in the long term.</span></p><h3 style="text-align: left;"><span class="commtext c00">Workflows <br /></span></h3><p style="text-align: left;"><span class="commtext c00">I think we could do more to support user-created workflows. The Wiki is full of user created workflows and processes. Some are quite complex others simple. For example nominating an article for deletion or !voting in an RFC.</span></p><p style="text-align: left;"><span class="commtext c00">Sometimes the more complicated ones get turned into javascript wizards, but i think that's the wrong approach. As I side earlier, I am a fan of simpler tools that can be used by ordinary users, not complex tools that do a specific task but can only be edited by developers and exist "outside" the wiki.</span></p><p style="text-align: left;"><span class="commtext c00">There's already an extension in this area (not used by Wikimedia) called <a href="https://www.mediawiki.org/wiki/Extension:Page_Forms">PageForms</a>. This is in the vein of what I am imagining, but I think still too heavy. Another option in this space is the <a href="https://www.mediawiki.org/wiki/Extension:PageProperties">PageProperties</a> extension which also doesn't really do what I am thinking of.<br /></span></p><p style="text-align: left;"><span class="commtext c00">What I would really want to see is an extension of the existing InputBox/preload feature.</span></p><p style="text-align: left;"><span class="commtext c00">As it stands right now, when starting a new page or section, <a href="https://www.mediawiki.org/wiki/Manual:Parameters_to_index.php#Options_affecting_the_edit_form">you can</a> give a url parameter to preload some text as well as parameters to that text to replace $1 markers.</span></p><p style="text-align: left;"><span class="commtext c00">We also have the <a href="https://www.mediawiki.org/wiki/Extension:InputBox">InputBox extension</a> to provide a text box where you can put in the name of an article to create with specific text pre-loaded.</span></p><p style="text-align: left;"><span class="commtext c00">I'd like to extend this idea, to allow users to add arbitrary widgets² (form elements) to a page, and bind those widgets to specific parameters to be preloaded.</span></p><p style="text-align: left;"><span class="commtext c00">If further processing or complex logic is needed, perhaps an option to allow the new preloaded text to be pre-processed by a lua module. This would allow complex logic in how the page is edited based on the user's inputs. If there is one theme in this blog post, it is I wish lua could be used for more things on wiki.</span></p><p style="text-align: left;"><span class="commtext c00">I still imagine the user would be presented with a diff view and have to press save, in order to prevent shenanigans where users are tricked into doing something they don't intend to.<br /></span></p><p style="text-align: left;"><span class="commtext c00">I believe this is a very light-weight solution that also gives the community a lot of flexibility to create custom workflows in the wiki that are simple for editors to participate in.<br /></span></p><h3 style="text-align: left;">Querying, reporting and custom metadata</h3><p style="text-align: left;">This is the big controversial one.</p><p style="text-align: left;">I believe that there should be a way for users to attach custom metadata to pages and do complex queries over that metadata (including aggregation). This is important both for organizing articles as well as organizing behind the scenes workflows.<br /></p><p style="text-align: left;">In the broader MediaWiki ecosystem, this is usually provided by either the <a href="https://www.semantic-mediawiki.org/wiki/Semantic_MediaWiki">SemanticMediaWiki</a> or <a href="https://www.mediawiki.org/wiki/Extension:Cargo">Cargo</a> extensions. Often in third party wikis this is considered MediaWiki's killer feature. People use them to create complex workflows including things like task trackers. In essence it turns MediaWiki into a no-code/low-code user programmable workflow designer.<br /></p><p style="text-align: left;">Unfortunately, these extensions all scale poorly, preventing their use on Wikimedia. Essentially I dream of seeing the features provided by these extensions on Wikipedia.</p><p style="text-align: left;">The existing approaches are as follows:</p><ul style="text-align: left;"><li style="text-align: left;">Vanilla MediaWiki: Category pages, and some query pages.</li><ul><li style="text-align: left;">This is extremely limited. Category pages allow an alphabetical list. Query pages allow some limited pre-defined maintenance lists like list of double redirects or longest articles. Despite these limitations, Wikipedia makes great use out of categories.<br /></li></ul><li style="text-align: left;">Vanilla mediawiki + bots:</li><ul><li style="text-align: left;">This is essentially Wikipedia's approach to solving this problems. Have programs do queries offsite and put the results on a page. I find this to be a really unsatisfying solution. A Wikipedian once told me that every bot is just a hacky workaround to MediaWiki failing to meet its users' needs, and I tend to agree. Less ideologically, the main issue here is its very brittle - when bots break often nobody knows who has access to the code or how it can be fixed. Additionally, they often have significant latency for updates (If they run once a week, then the latency is 7 days) and ordinary users are not really empowered to create their own queries.<br /></li></ul><li style="text-align: left;"><a href="https://www.wikidata.org/wiki/Wikidata:Main_Page">Wikidata</a> (including the <a href="https://query.wikidata.org/">WDQS</a> SPARQL endpoint)<br /></li><ul><li style="text-align: left;">Wikidata is adjacent to this problem, but not quite trying to solve it. It is more meant as a central clearinghouse for facts, not a way to do querying inside Wikipedia. That said Wikidata does have very powerful query features in the form of SPARQL. Sometimes these are copied into Wikipedia via bots. SPARQL of course has difficult to quantify performance characteristics that make it unsuitable for direct embedding into Wikipedia articles in the MediaWiki architecture. Perhaps it could be iframed, but that is far from being a full solution.</li></ul><li style="text-align: left;"><a href="https://www.semantic-mediawiki.org/wiki/Semantic_MediaWiki">SemanticMediaWiki</a></li><ul><li style="text-align: left;"> This allows adding Semantic annotations to articles (i.e. Subject-verb-object type relations). It then allows querying using a custom semantic query language. The complexity of the query language make performance hard to reason about and it often scales poorly.</li></ul><li style="text-align: left;"><a href="https://www.mediawiki.org/wiki/Extension:Cargo">Cargo</a></li><ul><li style="text-align: left;">This is very similar to SemanticMediaWiki, except it uses a relational paradigm instead of a semantic paradigm. Essentially users can define DB tables. Typically the workflow is template based, where a template is attached to a table, and specific parameters to the template are populated into the database. Users can then use (Sanitized) SQL queries to query these tables. The system uses an indexing strategy of adding one index for every attribute in the relation.</li></ul><li style="text-align: left;"><a href="https://www.mediawiki.org/wiki/Extension:DynamicPageList">DPL</a></li><ul><li style="text-align: left;">DPL is an extension to do complex querying and display using MediaWiki's built in metadata like categories. There are many different versions of this extension, but all of them have potential queries that scale linearly with the number of pages in the database, and sometimes even worse.</li></ul></ul><p>I believe none of these approaches really work for Wikipedia. They either do not support complex queries or allow too complex queries with unpredictable performance. I think the requirements are as follows:</p><ul style="text-align: left;"><li>Good read scalability (By read, I mean scalability when generating pages (during "parse" in mediawiki speak). On Wikipedia, pages are read and regenerated a lot more often than they are edited.<br /></li><ul><li>We want any sort of queries to have very low read latency. Having long pauses waiting for I/O during page parsing is bad in the MediaWiki architecture</li><li>Queries should scale consistetly. They should at worse be roughly O(log n) in the number of pages on the wiki. If using a relational style database, we would want the number of rows the DBMS have to look at be no more than a fixed max number</li></ul><li>Eventual write consistency</li><ul><li>It is ok if it takes a few minutes for things using the custom metadata to update after it is written. Templates already have a delay for updating.<br /></li><li>That said, it should still be relatively quick. On the order of minutes ideally. If it takes a day or scales badly in terms of the size of the database, that would also be unacceptable.<br /></li><li>write performance does not have to scale quite as well as read performance, but should still scale reasonably well. <br /></li></ul><li>Predictable performance.</li><ul><li>Users should not be able to do anything that negatively impacts site performance</li><li>Users should not have to be an expert (or have any knowledge) in DB performance or SQL optimization. <br /></li><li>Limits should be predictable. Timeouts suck, they can vary depending on how much load the site is under and other factors. Queries should either work or not work. Their validity should not be run-time dependent. It should be obvious to the user if their query is an acceptable query before they try and run it. There should be clear rules about what the limits of the system are.</li></ul><li>Results should be usable for futher processing</li><ul><li>e.g. You should be able to use the result inside a lua module and format it in arbitrary ways</li></ul><li>[Ideally] Able to be isolated from the main database, shardable, etc.</li><li>Be able to query for a specific page, a range of pages, or aggregates of pages (e.g. Count how many pages are in a range, average of some property, etc)</li><ul><li>Essentially we want just enough complexity to do interesting user defined queries, but not enough that the user is able to take any action that affects performance.<br /></li><li>There are some other query types that are more obscure but maybe harder. For example geographic related queries. I don't think we need to support that.</li><li>Intersection queries are an interesting case, as they are often useful on wiki. Ideally we would support that too.<br /></li></ul></ul><p> </p><p>Given these constraints I think the CouchDB model might be the best match for on-wiki querying and reporting.</p><p>Much of the CouchDB marketing material is aimed around their local data eventual consistency replication story. Which is cool and all but not what I'm interested in here. A good starting point for how their data model works is their documentation on <a href="https://docs.couchdb.org/en/stable/ddocs/views/intro.html">views</a>. To be clear, I'm not neccesarily suggesting using CouchDB, just that its data model seems like a good match to the requirements.</p><p>CouchDB is essentially a document database based around the ideas of map-reduce. You can make views which are similar to an index on a virtual column in mysql. You can also make reduce functions which calculate some function over the view. The interesting part is that the reduce function is indexed in a tree fashion, so you can efficiently get the value of the function applied to any contiguous range of the rows in logrithmic time. This allows computing aggregations of the data very efficiently. Essentially all the read queries are very efficient. Potentially write queries can be less so but it is easy to build controls around that. Creating or editing reduce functions is expensive because it requires regenerating the index, but that is expected to be a rare operation and users can be informed that results may be unreliable until it completes.<br /></p><p>In short, the way the CouchDB data model works as applied to MediaWiki could be as follows:</p><ul style="text-align: left;"><li>There is an emit( relationName, key, data) function added to lua. In many ways this is very similar to adding a page to a category named relationName with a sortkey specificed by key. data is optional extra data associated with this item. For performance reason, there may be a (high) limit to the max number of emit() on a page to prevent DB size from exploding.<br /></li><li>Lua gets a function query( relationName, startKey, endKey ). This returns all pages between startKey and endKey and their associated data. If there are more than X (e.g. 200) number of pages, only return the first X. </li><li>Lua gets a queryReduced( relationName, reducerName, startKey, endKey ) which returns the reduction function over the specified range. (Main limitation here is the reduce function output must be small in size in order to make this efficient)<br /></li><li>A way is added to associate a lua module as a reduce function. Adding or modifying these functions is potentially an expensive operation. However it is probably acceptable to the user that this takes some time</li></ul><p>All the query types here are efficient. It is not as powerful as arbitrary SQL or semantic queries, but it is still quite powerful. It allows computing fairly arbitrary aggregation queries as well as returning results in a user-specified order. The main slow parts is when a reduction function is edited or added, which is similar to how a template used on very many pages can take a while to update. Emiting a new item may also be a little slower than reading since the reducers have to be updated up the tree (With possibly contention on the root node), however that is a much rarer operation, and users would likely see it as similar to current delays in updating templates.</p><p>I suspect such a system could also potentially support intersection queries with reasonable efficiency subject to a bunch of limitations.</p><p>All performance limitations are pretty easy for the user to understand. There is some max number of items that can be emit() from a page to prevent someone from emit()ing 1000 things per page. There is a max number of results that can be returned from a query to prevent querying the entire database, and a max number of queries allowed to be made from a page. The queries involve reading a limited number of rows, often sequential. The system could probably be sharded pretty easily if a lot of data ends up in the database.</p><p>I really do think this sort of query model provides the sweet spot of complex querying but predictable good performance and would be ideal for a MediaWiki site running at scale that wanted SMW style features.<br /></p><h2 style="text-align: left;">As a knowledge collection tool</h2><p style="text-align: left;">Wikipedia can't do everything. One thing I'd love to see is better integration between different MediaWiki servers to allow people to go to different places if their content doesn't fit in Wikipedia.<br /></p><h3 style="text-align: left;">Template Modularity/packaging</h3><p style="text-align: left;">Anyone who has ever tried to use Wikipedia templates on another wiki knows it is a painful process. Trying to find all the dependencies is a complex process, not to mention if it relies on WikiData or JsonConfig (Commons data: namespace)</p><p style="text-align: left;">The templates on a Wiki are not just user content, but complex technical systems. I wish we had a better systems for packaging and distributing them.</p><p style="text-align: left;">Even within the Wikimedia movement, there is often a call for global templates. A good idea certainly, but would be less critical if templates could be bundled up and shared. Even still, having distinct boundries around templates would probably make global templates easier than the current mess of dependencies.</p><p style="text-align: left;">I should note, that there are extensions already in this vein. For example Extension:Page_import and Extension:Data_transfer. All of them are nice and all, but I think it would maybe be cooler to have the concept of discrete template/module units on wiki, so that different components are organized together in a way that is easier to follow.<br /></p><h3 style="text-align: left;">Easy forking</h3><p style="text-align: left;">Freedom to fork is the freedom from which all others flow. In addition to providing an avenue for people who disagree with the status quo a way to do their own thing, easy forking/mirroring is critical when censorship is at play and people want to mirror Wikipedia somewhere we cannot normally reach. However running a wiki the size of english wikipedia is quite hard, even if you don't have any traffic. Simply importing an xml dump into a mysql DB can be a struggle at the sizes we are talking about.</p><p style="text-align: left;">I think it would be cool if we made ready to go sqlite db dumps. Perhaps possibly packaged as a phar archive with MediaWiki, so you could essentially just download a huge 100 GB file, plop it somewhere, and have a mirror/fork</p><p style="text-align: left;">Even better if it could integrate with EventStream to automatically keep things up to date.<br /></p><h2 style="text-align: left;">Conclusion</h2><p style="text-align: left;">So those are my crazy ideas for what I think is missing in MediaWiki (With an emphasis on the Wikipedia usecase and not the third party use-case). Agree? Disagree? Hate it? I'd love to know. Maybe you have your own crazy ideas. You should post them, after all, your crazy idea cannot become reality if you keep it to yourself!<br /></p><h3 style="text-align: left;">Notes:</h3><p>¹ I left out "Free", because as much as I believe in "Free Culture" I believe the free part is Wikimedia's mission but not MediaWiki's.</p><p>² To clarify, by widgets i mean buttons and text boxes. I do not mean widgets in the sense of the MediaWiki extension named "Widgets".<br /></p>Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com0tag:blogger.com,1999:blog-7338246785946121007.post-90300152277662117872023-11-14T00:25:00.000-08:002024-01-09T20:57:31.464-08:00WikiConference North America 2023 (part 1)<p> </p><div class="separator" style="clear: both; text-align: center;"><a href="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1d/WikiConference_North_America_2023_logo.png/305px-WikiConference_North_America_2023_logo.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="480" data-original-width="305" height="480" src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1d/WikiConference_North_America_2023_logo.png/305px-WikiConference_North_America_2023_logo.png" width="305" /></a></div><br /><p></p><p> This weekend I attended <a href="https://wikiconference.org/wiki/2023/Main_Page">WikiConference North America</a>. I decided to go somewhat at the last moment, but am really glad I did. This is the first non-technical Wikimedia community conference I have attended since COVID and it was great to hear what the Wikipedia community has been up to.<br /><br />I was on a bit of a budget, so i decided to get a cheaper hotel that was about an hour away by public transit from the venue. I don't think I'll do that again. Getting back and forth was really smooth - Toronto has great transit. However it meant an extra hour at the end of the day to get back, and waking up an hour earlier to get there on time, which really added up. By the end I was pretty tired and much rather would have had an extra 2 hours of sleep (or an extra 2 hours chatting with people).<br /><br />Compared to previous iterations of this conference, there was a much heavier focus on on-wiki governance, power users and "lower-case s" Wikipedia (not Wikimedia) strategy. I found this quite refreshing and interesting since I mostly do MediaWiki dev stuff and do not hear about the internal workings of Wikipedia as much. Previous versions of this conference focused too much (imho) on talks about outreach which while important were often a bit repetitive. The different focus was much more interesting to me.<br /></p><h2 style="text-align: left;">Key Take-aways</h2><p>My key take away from this conference was that there is a lot of nervousness about the future. Especially:</p><ul style="text-align: left;"><li>Wikipedia's power-user demographics curve is shifting in a concerning way. Particularly around admin promotion.</li><li>AI is changing the way we consume knowledge, potentially cutting Wikipedia out, and this is scary</li><li>A fear that the world is not as it once was and the conditions that created Wikipedia are no longer present. As they keynote speaker Selena Deckelmann phrased it, "Is Wikipedia a one-generation marvel?"</li></ul><p>However I don't want to overstate this. Its unclear to me how pervasive this view is. Lots of presenters presented views of that form, but does the average Wikipedian agree? If so, is it more an intellectual agreement, or are people actually nervous? I am unsure. My read on it is that people were vaguely nervous about these things, but by no means was anyone panicking about them. Honestly though, I don't really know. However, I think some of these concerns are undercut by there being a long history of people worried about similar things and yet Wikipedia has endured. Before admin demographics people were panicking about new user retention. Before AI changing the way we consume content, it was mobile (A threat which I think is actually a much bigger deal).</p><h4 style="text-align: left;">Admin demographics</h4><p>That said, I never quite realized the scale of admin demographic crisis. People always talk about there being less admin promotions now than in the past, but i did not realize until it was pointed out that it is not just a little bit less but allegedly 50 times less. There is no doubt that a good portion of the admin base are people who started a decade (or 2) ago, and new user admins are fewer and further between.<br /></p><p>A particular thing that struck me as related to this at the conference, is how the definition of "young" Wikipedian seems to be getting older. Occasionally I would hear people talk about someone who is in high school as being a young Wikipedian, with the implication that this is somewhat unusual. However when you talk to people who have been Wikipedians for a long time, often they say they were teenagers when they started. It seems like Wikipedians being teenagers was a really common thing early in the project, but is now becoming more rare.<br /><br />Ultimately though, I suspect the problem will solve itself with time. As more and more admins retire as time goes on, eventually work load on the remaining will increase until the mop will be handed out more readily out of necessity. I can't help but be reminded of all the panic over new user retention, until eventually people basically decided that it didn't really matter.</p><h4 style="text-align: left;">AI</h4><p>As far as AI goes, hating AI seems to be a little bit of a fad right now. I generally think it is overblown. In the Wikipedia context, this seems to come down to three things:</p><ul style="text-align: left;"><li>Deepfakes and other media manipulation to make it harder to have reliable sources (Mis/Dis-information)</li><li>Using AI to generate articles that get posted, but perhaps are not properly fact checked or otherwise poor quality in ways that aren't immediately obvious or in ways existing community practice is not as of yet well prepared to handle</li><li>Voice assistants (alexa), LLMs (ChatGPT) and other knowledge distributions methods that use Wikipedia data but cut Wikipedia out of the loop. (A continuation of the concern that started with google knowledge graph)</li></ul><p>I think by and large it is the third point that was the most concerning to people at the conference although all 3 were discussed at various points. The third point is also unique to Wikipedia.<br /><br />There seemed to be two causes of concern for the third point. First there was worry over lack of attribution and a feeling that large silicon valley companies are exploitatively profiting off the labor of Wikipedians. Second there is concern that by Wikipedia being cut out of the loop we lose the ability to recruit people when there is no edit button and maybe even lose brand awareness. While totally unstated, I imagine the inability to show fundraising banners to users consuming via such systems probably is on the mind of the fundraising department of WMF.<br /><br />My initial reaction to this is probably one of disagreement with the underlying moral basis. The goal was always to collect the world's knowledge for others to freely use. The free knowledge movement literally has free in the name. The knowledge has been collected and now other people are using it in interesting, useful and unexpected ways. Who are we to tell people what they can and cannot do with it?<br /><br />This is the sort of statement that is very ideologically based. People come to Wikimedia for a variety of reasons, we are not a monolith. I imagine that people probably either agree with this view or disagree with it, and no amount of argument is going to change anyone's mind about it. Of course a major sticking point here is arguably ChatGPT is not complying with our license and lack of attribution is a reasonable concern.<br /><br />The more pragmatic concerns are interesting though. The project needs new blood to continue over the long term, and if we are cut out of the distribution loop, how do we recruit. I honestly don't know, but I'd like to see actual data confirming the threat before I get too worried.<br /><br />The reason I say that, is that I don't think voice assistants and LLMs are going to replace Wikipedia. They may replace Wikipedia for certain use cases but not all use cases, and especially not the use case that our recruitment base is.<br /><br />Voice assistants generally are good for quick fact questions. "Who is the prime minister of Canada?" type questions. The type of stuff that has a one sentence answer and is probably stored on Wikidata. LLMs are somewhat longer form, but still best for information that can be summarized in a few paragraphs, maybe a page at most and has a relatively objective "right" answer (From what I hear. I haven't actually used ChatGPT). Complex nuanced topics are not well served by these systems. Want to know the historical context that lead to the current flare up in the middle east? I don't think LLMs will give you what you want.<br /><br />Now think about the average Wikipedia editor. Are they interested in one paragraphs answers? I don't know for sure, but I would posit that they tend to be more interested in the larger nuanced story. Yes other distribution models may threaten our ability to recruit from users using them, but I don't think that is the target audience we would want to focus recruitment on anyways. I suppose time will tell. AI might just be a fad in the end.<br /></p><h2 style="text-align: left;">Conclusion</h2><p>I had a great time. It was awesome to see old friends but also meet plenty of new people I did not know. I learned quite a bit, especially about Wikipedia governance. In many ways, it is one of the more surprising wiki conferences I've been too, as it contained quite a bit of content that was new to me. I plan to write a second blog post about my more raw unfiltered thoughts on specific presentations. (Edit: I never did make a second post, and i guess its kind of late enough at this point that i probably won't, so nevermind about that)<br /></p>Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com0tag:blogger.com,1999:blog-7338246785946121007.post-58741917526785038382023-01-20T08:13:00.005-08:002023-01-20T15:45:04.065-08:00The Vector-pocalypse is upon us!<p> </p><p>tl;dr: [[WP:IDONTLIKEIT]]</p><p>Yesterday, a new version of the Vector skin was made default on English Wikipedia.</p><p>As will shock absolutely no one who pays attention to Wikipedia politics, the new skin is controversial. Personally I'm a <a href="https://en.wikipedia.org/?useskin=timeless">Timeless</a> fan and generally have not liked what I have seen of new vector when it was in development. However, now that it is live I thought I'd give it another chance and share my thoughts on the new skin. For reference I am doing this on my desktop computer which has a large wide-screen monitor. It looks very different on a phone (I actually like it a lot better on the phone). It might even look different on different monitors with different gamuts.<br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhx5kcRenDxBU6J4j2T9jcna4V1oKD2Lai8PtvPxIBrFkroSoymdA1fIhY7Z3HVYKggBM4QrJnuhT0kJ47t28C_Z1aXGBLFQzCOcnoFIbvJEnv00tdxdR2UAe-VVeIY8WTSfIwSHkT72h_xZTGaU0nnWHj31shUSUlJafNX5_o3MOOBHPs2kb3J4QBO/s3436/newvector-default.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1401" data-original-width="3436" height="259" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhx5kcRenDxBU6J4j2T9jcna4V1oKD2Lai8PtvPxIBrFkroSoymdA1fIhY7Z3HVYKggBM4QrJnuhT0kJ47t28C_Z1aXGBLFQzCOcnoFIbvJEnv00tdxdR2UAe-VVeIY8WTSfIwSHkT72h_xZTGaU0nnWHj31shUSUlJafNX5_o3MOOBHPs2kb3J4QBO/w638-h259/newvector-default.png" width="638" /></a></div><br /><p>So the first thing that jumps out is there is excessive whitespace on either side of the page. There is also a lot more hidden by default, notably the "sidebar" which is a prominent feature on most skins. One minor thing that jumps out to me is that echo notifications look a little wonky when you have more than 100 of them.</p><p>On the positive though, the top bar does look very clean. The table of contents is on the left hand side and sticky (Somewhat similar to WikiWand), which I think is a nice change.<br /></p><p>When you scroll, you notice the top bar scrolls with it but changes:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUxeJZzeFy52-Pu64ctnIhLPk20IceZCNLqWgfLGdoyENu1aJCLp_X32jCsiHbmc7QXmOTav8vazGr-kn1s3cJv_OvazGA4YTwKNi7o_XBVh8cSKRqoXrVlMPlBapuOJIBs28yufHRuf0GWeiCvKgs6cuzPAM_0E130brMLrCNnj-6Sze4t8T_ypua/s1769/newvector-scroll-header_000.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="280" data-original-width="1769" height="102" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiUxeJZzeFy52-Pu64ctnIhLPk20IceZCNLqWgfLGdoyENu1aJCLp_X32jCsiHbmc7QXmOTav8vazGr-kn1s3cJv_OvazGA4YTwKNi7o_XBVh8cSKRqoXrVlMPlBapuOJIBs28yufHRuf0GWeiCvKgs6cuzPAM_0E130brMLrCNnj-6Sze4t8T_ypua/w640-h102/newvector-scroll-header_000.png" width="640" /></a></div><p>On one hand, this is quite cool. However on reflection I'm not sure if I feel this is quite worth it. It feels like this sticky header is 95% of the way to working but just not quite there. The alignment with the white padding on the right (I don't mean the off-white margin area but the area that comes before that) seems slightly not meeting somehow. Perhaps i am explaining it poorly, but it feels like there should be a division there since the article ends around the pencil icon. Additionally, the sudden change makes it feel like you are in a different context, but it is all the same tools with different icons. On the whole, I think there is a good idea here with the sticky header, but maybe could use a few more iterations.<br /></p><p>If you expand the Sidebar menu, the result feels very ugly and out of place to me:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgcnUQ7_1k-WlVjDVx8391Ynid1oU8sbgmEv7hS9CJCghPIWcYJTxiHcXtma22FoGmazS_dQDKAdFoGB9ca27x0MIx1tRPafSRXan1bSvmI-4NQJjZelMk5q5LksmMhOvXeQklhuE-SwOy8aEfzG3tgbCwh9rkMyp5zXd2esBa5vc6292gcUaTZbuMg/s1123/newvector-sidebar.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1123" data-original-width="829" height="640" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgcnUQ7_1k-WlVjDVx8391Ynid1oU8sbgmEv7hS9CJCghPIWcYJTxiHcXtma22FoGmazS_dQDKAdFoGB9ca27x0MIx1tRPafSRXan1bSvmI-4NQJjZelMk5q5LksmMhOvXeQklhuE-SwOy8aEfzG3tgbCwh9rkMyp5zXd2esBa5vc6292gcUaTZbuMg/w472-h640/newvector-sidebar.png" width="472" /></a></div><br /><p>idk, I really hate the look of it, and the four levels of different off-whites. More to the point, one of the key features of Wikipedia is it is edited by users. To get new users you have to hook people into editing. I worry hiding things like "learn to edit" will just make it so people never learn that they can edit. I understand there is a counter-point here, where overwhelming users with links makes users ignore all of them and prevents focus on the important things. I even agree somewhat that there are probably too many links in Monobook/traditional vector. However having all the links hidden doesn't seem right either.</p><p><br /></p><h3 style="text-align: left;">On the fixed width</h3><p style="text-align: left;">One of the common complaints is that the fixed width design wastes lots of screen real estate. The counter argument is studies suggest that shorter line lengths improve readability.</p><p style="text-align: left;">As a compromise there is a button in the bottom right corner to make it use the full screen. It is very tiny. I couldn't find it even knowing that it is supposed to be somewhere. Someone had to tell me that it is in the lower-right corner. So it definitely lacks discoverability.</p><p style="text-align: left;">Initially, I thought I hated the fixed-width design too. However after trying it out, I realized that it is not the fixed width that I hate. What I really hate is:</p><ul style="text-align: left;"><li>The use of an off-white background colour that is extremely close to the main background colour</li><li>Centering the design in the screen</li></ul><p> I really really don't like the colour scheme chosen. Having it be almost but not quite the same colour white really bothers my eyes.</p><p>I experimented with using a darker colour for more contrast and found that I like the skin much much better. Tastes vary of course, so perhaps it is just me. Picking a dark blue colour at random and moving the main content to the left looks something like:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTa7jY3Diw8Qq_MSN6V0sAyoi-ZAnctFtjSrh8sJMPyh9CCSVYG15L89kz3-4VnJAzjIxRc5wLu9Bg4I0kbiyL2JyjRyXFFm7G0ZjtBW6jPg9MZvyxFkvv8FoiAWqSz6nampFqN6dXSCuTgjdmbJRpx_jIBPCbWSATprten6mK1u8DPMMdTO_do7-o/s3436/newvector-dark.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1401" data-original-width="3436" height="260" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgTa7jY3Diw8Qq_MSN6V0sAyoi-ZAnctFtjSrh8sJMPyh9CCSVYG15L89kz3-4VnJAzjIxRc5wLu9Bg4I0kbiyL2JyjRyXFFm7G0ZjtBW6jPg9MZvyxFkvv8FoiAWqSz6nampFqN6dXSCuTgjdmbJRpx_jIBPCbWSATprten6mK1u8DPMMdTO_do7-o/w640-h260/newvector-dark.png" width="640" /></a></div><br /> <br /><p></p><p> Although I like the contrast of the dark background, my main issue is that in the original the colours are almost identical, so even just making it a slightly more off-white off-white would be fine. If you want to do a throwback to monobook, something like this looks fine to me as well:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8LMHngbEW2ZP2gyYHHgzywGYWnxMz1BbcL6a8sD7WuQ72whEmdpqXNrIF5ZBceDzuFihWBGsn7aEoGy_Ed7fD6dBYlzauToWV2gCX7ywfoCRZh7QQWlmHPQ8-42UWAQuH6wWeYrFyXbBdB6I4Of7ZOUE-Z1bpTQ8FGFgDD8nXe95U4RedSqtKQF8V/s3436/newvector-monobook.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1401" data-original-width="3436" height="260" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi8LMHngbEW2ZP2gyYHHgzywGYWnxMz1BbcL6a8sD7WuQ72whEmdpqXNrIF5ZBceDzuFihWBGsn7aEoGy_Ed7fD6dBYlzauToWV2gCX7ywfoCRZh7QQWlmHPQ8-42UWAQuH6wWeYrFyXbBdB6I4Of7ZOUE-Z1bpTQ8FGFgDD8nXe95U4RedSqtKQF8V/w640-h260/newvector-monobook.png" width="640" /></a></div><br /><p><br /></p><p>I don't really know if this is just my particular tastes or if other people agree with me. However, making it more left aligned and increasing the contrast to the background makes the skin go from something I can't stand to something I can see as usable.<br /></p><p><br /></p><p><br /></p>Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com0tag:blogger.com,1999:blog-7338246785946121007.post-27077428005622694722022-12-04T18:04:00.002-08:002022-12-04T18:04:22.958-08:00Hardening SQLite against injection in PHP<p><b>tl;dr:</b> What are our options in php to make SQLite not write files when given malicious SQL queries as a hardening measure against SQL injection?</p><p> </p><p>One of the most famous web application security vulnerabilities is the SQL injection.</p><p>This is where you have code like:</p><p><span style="font-family: courier;">doQuery( "SELECT foo1, foo2 from bar where baz = '" . $_GET['fred'] . "';" );</span></p><p>The attacker goes to a url like <span style="font-family: courier;">?fred='%20UNION%20ALL%20SELECT%20user%20'foo1',%20password%20'foo2'%20from%20users;--</span></p><p>The end result is: <span style="font-family: courier;">doQuery( "SELECT foo1, foo2 from bar where baz ='' UNION ALL SELECT user 'foo1', password 'foo2' from users ;-- ';" );</span></p><p>and the attacker has all your user's passwords. <a href="https://portswigger.net/web-security/sql-injection">Portswigger has a really good detailed explanation on how such attacks work.</a><br /></p><p>In addition to dumping all your private info, the usual next step is to try and get code execution. In a PHP environment, often this means getting your DB to write a a php file in the web directory.</p><p>In MariaDB/MySQL this looks like:</p><p><span style="font-family: courier;">SELECT '<?php system($_GET["c"]);?>' INTO OUTFILE "/var/www/html/w/foo.php";</span></p><p>Of course, in a properly setup system, permissions are such that mysqld/mariadbd does not have permission to write in the web directory and the DB user does not have <a href="https://mariadb.com/docs/ent/ref/mdb/privileges/FILE/"><span style="font-family: courier;">FILE</span></a> privileges, so cannot use<span style="font-family: courier;"> <a href="https://mariadb.com/kb/en/select-into-outfile/">INTO OUTFILE</a></span>.</p><p>In SQLite, the equivalent is to use the <span style="font-family: courier;"><a href="https://www.sqlite.org/lang_attach.html">ATTACH</a></span> command to create a new database (or<a href="https://www.sqlite.org/lang_vacuum.html"> <span style="font-family: courier;">VACUUM</span></a>). Thus the SQLite equivalent is:<br /></p><p><span style="font-family: courier;">ATTACH DATABASE '/var/www/html/w/foo.php' AS foo; CREATE TABLE foo.bar (stuff text); INSERT INTO foo.bar VALUES( '<?php system($_GET["c"]);?>' );</span></p><p>This is harder than the MySQL case, since it involves multiple commands and you can't just add it as a suffix but have to inject as a prefix. It is very rare you would get this much control in an SQL injection.</p><p>Nonetheless it seems like the sort of thing we would want to disable in a web application, as a hardening best practice. After all, dynamically attaching multiple databases is rarely needed in this type of application.</p><p>Luckily, SQLite implements a feature called run time limits. There are a number of limits you can set. SQLite docs contain a list of suggestions for paranoid people at <a href="https://www.sqlite.org/security.html">https://www.sqlite.org/security.html</a>. In particular, there is a <span style="font-family: courier;">LIMIT_ATTACH</span> which you can set to 0 to disable attaching databases. There is also a more fine grained <a href="https://www.sqlite.org/c3ref/set_authorizer.html">authorizer</a> API which allows setting a permission callback to check things on a per-statement level.</p><p>Unfortunately PHP PDO-SQLITE supports neither of these things. It does set an authorizer if you have <a href="https://www.php.net/manual/en/ini.core.php#ini.open-basedir">open_basedir</a> on to prevent reading/writing outside the basedir, but it exposes no way that I can see for you to set them yourself. This seems really unfortunate. Paranoid people would want to set runtime limits. People who have special use-cases may even want to raise them. I really wish PDO-SQLITE supported setting these, perhaps as a driver specific connection option in the constructor.</p><p>On the bright side, if instead of using the PDO-SQLITE php extension, you are using the alternative sqlite3 extension there is a solution. You still cannot set runtime limits but you can set a custom authorizer:</p><p><span style="font-family: courier;">$db = new SQLite3($dbFileName);<br />$db->setAuthorizer(function ( $action, $filename ) { <br /> return $action === SQLite3::ATTACH ? Sqlite3::DENY : Sqlite3::OK;<br />});</span></p><p>After this if you try and do an ATTACH you get:</p><p><span style="font-family: courier;">Warning: SQLite3::query(): Unable to prepare statement: 23, not authorized in /var/www/html/w/test.php on line 17</span></p><p><span style="font-family: georgia;">Thus success! No evil SQL can possibly write files.</span><br /></p><p><br /></p><p><br /></p>Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com0tag:blogger.com,1999:blog-7338246785946121007.post-48821864459883986862022-09-11T18:07:00.008-07:002022-09-11T18:07:45.884-07:00Why don't we ever talk about volunteer PMs in open source?<p> Recently, on Wikipedia, there was an <a href="https://en.wikipedia.org/wiki/Wikipedia:New_pages_patrol/Coordination/2022_WMF_letter">open letter</a> to the Wikimedia Foundation, asking them to improve the New Page Patrol feature.</p><p>This started the usual debate between, WMF should do something vs It is open source, {{<a href="https://en.wikipedia.org/wiki/Template:Sofixit">sofixit</a>}} (i.e. Send a patch). There's valid points on both sides of that debate, which I don't really want to get into.</p><p>However, it occurred to me - the people on the {{sofixit}} side always suggest that people should learn how to program (an unreasonable ask), figure out how to fix something, and do it themselves. On the other hand, in a corporate environment, stuff is never done solely by developers. You usually have either a product manager or a program manager organizing the work.</p><p>Instead of saying to users - learn PHP and submit a patch, why don't we say: Be the PM for the things you want done, so a programmer can easily just do them without getting bogged down with organizational questions?</p><p>At first glance this may sound crazy - after all, ordinary users have no authority. Being a PM is hard enough when people are paid to listen to you, how could it possibly work if nobody has to listen to you. And I agree - not everything a PM does is applicable here, but i think some things are.</p><p>Some things a volunteer person could potentially do:</p><ul style="text-align: left;"><li>Make sure that bugs are clearly described with requirements, so a developer could just do them instead of trying to figure out what the users need</li><li>Make sure tasks are broken down into appropriate sized tickets</li><li>Make a plan of what they wish would happen. A volunteer can't force people to follow their plan, but if you have a plan people may just follow it. Too often all that is present is a big list of bugs of varying priority which is hard for a developer to figure out what is important and what isn't</li><ul><li>For example, what i mean is breaking things into a few milestones, and having each milestone contain a small number (3-5) tickets around a similar theme. This could then be used in order to promote the project to volunteer developers, using language like "Help us achieve milestone 2" and track progress. Perhaps even gamifying things.<br /></li><li>No plan survives contact with the enemy of course, and the point isn't to stick to any plan religiously. The point is to have a short list of what the most pressing things to work on right now are. Half the battle is figuring out what to work on and what to work on first.</li></ul><li>Coordinate with other groups as needed. Sometimes work might depend on other work other people have planned to do. Or perhaps the current work is dependent on someone else's requirements (e.g. new extensions require security review). Potentially a volunteer PM could help coordinate this or help ensure that everyone is on the same page about expectations and requirements.</li><li>[not sure about this one] Help find constructive code reviewers. In MediaWiki development, code much be reviewed by another developer to be merged in. Finding knowledgeable people can often be difficult and a lot of effort. Sometimes this comes down to personal relationships and politely nagging people until someone bites. For many developers this is a frustrating part of the software development process. Its not clear how productive a non-developer would be here, as you may need to understand the code to know who to talk to. Nonetheless, potentially this is something a non-programmer volunteer can help with.</li></ul><p>To use the new page patrol feature as an example - Users have a list of <a href="https://en.wikipedia.org/wiki/Wikipedia:Page_Curation/Suggested_improvements">56 feature requests</a>. There's not really any indication of which ones are more important then others. A useful starting point would be to select the 3 most important. There are plenty of volunteer developers in the MediaWiki ecosystem that might work on them. The less time they have to spend figuring out what is wanted, the more likely they might fix one of the things. There are no guarantees of course, but it is a thing that someone who is not a programmer could do to move things forward.<br /></p><p> To be clear, being a good PM is a skill - all of this is hard and takes practice to be good at. People who have not done it before won't be good at it to begin with. But I think it is something we should talk about more, instead of the usual refrain of fix it yourself or be happy with what you got.</p><p><br /></p><p>p.s. None of this should be taken as saying that WMF shouldn't fix anything and it should only be up to the communities, simply that there are things non-programmers could do to {{sofixit}} if they were so inclined.<br /></p><p> <br /></p>Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com0tag:blogger.com,1999:blog-7338246785946121007.post-52348232608889471462022-07-20T08:07:00.001-07:002022-07-20T08:08:22.108-07:00Interviewed on Between the brackets<p> This week I was interviewed by Yaron Karon for the second time for his MediaWiki podcast <i><a href="https://betweenthebrackets.libsyn.com/">Between the Brackets</a>.</i></p><p>Yaron has been doing this podcast for several years now, and I love how he highlights the different voices of all the different groups that use, interact and develop MediaWiki. He's had some fascinating people on his podcast over the years, and I highly reccomend giving it a listen.</p><p>Anyhow, it's an honour to be on the program again for <a href="https://betweenthebrackets.libsyn.com/episode-117-brian-wolff">episode 117</a>. I was previously on the program 4 years ago for <a href="https://betweenthebrackets.libsyn.com/episode-5-brian-wolff">episode 5</a></p><p><br /></p>Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com0tag:blogger.com,1999:blog-7338246785946121007.post-55546410897158321272022-07-12T01:46:00.003-07:002022-07-12T01:46:57.432-07:00Making Instant Commons Quick<p> The Wikimedia family of websites includes one known as Wikimedia Commons. Its mission is to collect and organize freely licensed media so that other people can re-use them. More pragmatically, it collects all the files needed by different language Wikipedias (and other Wikimedia projects) into one place.</p><p> </p><p></p><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto;"><tbody><tr><td style="text-align: center;"><a href="https://commons.wikimedia.org/wiki/File:Alcedo_atthis_-_Riserve_naturali_e_aree_contigue_della_fascia_fluviale_del_Po.jpg" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="427" data-original-width="640" height="427" src="https://upload.wikimedia.org/wikipedia/commons/thumb/0/01/Alcedo_atthis_-_Riserve_naturali_e_aree_contigue_della_fascia_fluviale_del_Po.jpg/640px-Alcedo_atthis_-_Riserve_naturali_e_aree_contigue_della_fascia_fluviale_del_Po.jpg" width="640" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;"><a href="https://commons.wikimedia.org/wiki/File:Alcedo_atthis_-_Riserve_naturali_e_aree_contigue_della_fascia_fluviale_del_Po.jpg">The 2020 Wikimedia Commons Picture of the Year: Common Kingfisher</a><a href="https://commons.wikimedia.org/wiki/File:Alcedo_atthis_-_Riserve_naturali_e_aree_contigue_della_fascia_fluviale_del_Po.jpg"> by <i></i></a><i><a title="User:Luca Casale">Luca Casale</a> / <a class="extiw" title="creativecommons:by-sa/4.0">CC BY SA 4.0</a></i></td></tr></tbody></table><br /> As you can imagine, it's extremely useful to have a library of freely licensed photos that you can just use to illustrate your articles.<p>However, it is not just useful for people writing encyclopedias. It is also useful for any sort of project.<br /></p><p>To take advantage of this, MediaWiki, the software that powers Wikipedia and friends, comes with a feature to use this collection on your own Wiki. It's an option you can select when installing the software and is quite popular. Alternatively, it can be manually configured via <a href="https://www.mediawiki.org/wiki/Manual:$wgUseInstantCommons">$wgUseInstantCommons</a> or the more advanced <a href="https://www.mediawiki.org/wiki/Manual:$wgForeignFileRepos">$wgForeignFileRepos</a>.</p><h2 style="text-align: left;">The Issue</h2><p style="text-align: left;">Unfortunately, instant commons has a reputation for being rather slow. As a weekend project I thought I'd measure how slow, and see if I could make it faster.</p><h2 style="text-align: left;">How Slow?</h2><p style="text-align: left;">First things first, I'll need a test page. Preferably something with a large (but not extreme) number of images but not much else. A Wikipedia list article sounded ideal. I ended up using the English Wikipedia article: <a href="https://en.wikipedia.org/w/index.php?title=List_of_governors_general_of_Canada&oldid=1054426240">List of Governors General of Canada</a> (Long live the Queen!). This has 85 images and not much else, which seemed perfect for my purposes.<br /></p><p style="text-align: left;">I took the expanded Wikitext from <a href="https://en.wikipedia.org/w/index.php?title=List_of_governors_general_of_Canada&oldid=1054426240&action=raw&templates=expand">https://en.wikipedia.org/w/index.php?title=List_of_governors_general_of_Canada&oldid=1054426240&action=raw&templates=expand</a>, pasted it into my test wiki with instant commons turned on in the default config.</p><p style="text-align: left;">And then I waited...</p><p style="text-align: left;">Then I waited some more...</p><p style="text-align: left;">1038.18761 seconds later (17 minutes, 18 seconds) I was able to view a beautiful list of all my viceroys.</p><p style="text-align: left;">Clearly that's pretty bad. 85 images is not a small number, but it is definitely not a huge number either. Imagine how long [[<a href="https://en.wikipedia.org/wiki/Comparison_of_European_road_signs">Comparison_of_European_road_signs</a>]] would take with its 3643 images or [[<a href="https://en.wikipedia.org/wiki/List_of_paintings_by_Claude_Monet">List_of_paintings_by_Claude_Monet</a>]] with 1676.</p><h2 style="text-align: left;">Why Slow?</h2><p style="text-align: left;">This raises the obvious question of why is it so slow. What is it doing for all that time?</p><p style="text-align: left;">When MediaWiki turns wikitext into html, it reads through the text. When it hits an image, it stops reading through the wikitext and looks for that image. Potentially the image is cached, in which case it can go back to rendering the page right away. Otherwise, it has to actually find it. First it will check the local DB to see if the image is there. If not it will look at Foreign image repositories, such as Commons (if configured).</p><p style="text-align: left;">To see if commons has the file we need to start making some HTTPS requests¹:</p><ol style="text-align: left;"><li>We make a metadata request to see if the file is there and get some information about it: <a href="https://commons.wikimedia.org/w/api.php?titles=File%3AExample.png&iiprop=timestamp%7Cuser%7Ccomment%7Curl%7Csize%7Csha1%7Cmetadata%7Cmime%7Cmediatype%7Cextmetadata&prop=imageinfo&iimetadataversion=2&iiextmetadatamultilang=1&format=json&action=query&redirects=true&uselang=en">https://commons.wikimedia.org/w/api.php?titles=File%3AExample.png&iiprop=timestamp%7Cuser%7Ccomment%7Curl%7Csize%7Csha1%7Cmetadata%7Cmime%7Cmediatype%7Cextmetadata&prop=imageinfo&iimetadataversion=2&iiextmetadatamultilang=1&format=json&action=query&redirects=true&uselang=en</a></li><li> We make an API request to find the url for the thumbnail of the size we need for the article. For commons, this is just to find the url, but on wikis with <a href="https://www.mediawiki.org/wiki/Manual:Thumb_handler.php">404 thumbnail handling</a> disabled, this is also needed to tell the wiki to generate the file we will need: <a href="https://commons.wikimedia.org/w/api.php?titles=File%3AExample.png&iiprop=url%7Ctimestamp&iiurlwidth=300&iiurlheight=-1&iiurlparam=300px&prop=imageinfo&format=json&action=query&redirects=true&uselang=en">https://commons.wikimedia.org/w/api.php?titles=File%3AExample.png&iiprop=url%7Ctimestamp&iiurlwidth=300&iiurlheight=-1&iiurlparam=300px&prop=imageinfo&format=json&action=query&redirects=true&uselang=en</a></li><li> Some devices now have very high resolution screens. Screen displays are made up of dots. High resolution screens have more dots per inch, and thus can display more fine detailed. Traditionally 1 pixel equalled one dot on the screen. However if you keep that while increasing the dots-per-inch, suddenly everything on the screen that was measured in pixels is very small and hard to see. Thus these devices now sometimes have 1.5 dots per pixel, so they can display fine detail without shrinking everything. To take advantage of this, we use an image 1.5 times bigger than we normally would, so that when it is displayed in its normal size, we can take advantage of the extra dots and display a much more clear picture. Hence we need the same image but 1.5x bigger: <a href="https://commons.wikimedia.org/w/api.php?titles=File%3AExample.png&iiprop=url%7Ctimestamp&iiurlwidth=450&iiurlheight=-1&iiurlparam=450px&prop=imageinfo&format=json&action=query&redirects=true&uselang=en">https://commons.wikimedia.org/w/api.php?titles=File%3AExample.png&iiprop=url%7Ctimestamp&iiurlwidth=450&iiurlheight=-1&iiurlparam=450px&prop=imageinfo&format=json&action=query&redirects=true&uselang=en</a></li><li>Similarly, some devices are even higher resolution and use 2 dots per pixel, so we also fetch an image double the normal size: <a href="https://commons.wikimedia.org/w/api.php?titles=File%3AExample.png&iiprop=url%7Ctimestamp&iiurlwidth=600&iiurlheight=-1&iiurlparam=600px&prop=imageinfo&format=json&action=query&redirects=true&uselang=en">https://commons.wikimedia.org/w/api.php?titles=File%3AExample.png&iiprop=url%7Ctimestamp&iiurlwidth=600&iiurlheight=-1&iiurlparam=600px&prop=imageinfo&format=json&action=query&redirects=true&uselang=en</a></li></ol><p> </p><p>This is the <b>first problem</b> - for every image we include we have to make 4 api requests. If we have 85 images that's 340 requests.</p><h2 style="text-align: left;">Latency and RTT <br /></h2><p>It gets worse. All of these requests are done in serial. Before doing request 2, we wait until we have the answer to request 1. Before doing request 3 we wait until we get the answer to request 2, and so on.</p><p>Internet speed can be measured in two ways - latency and bandwidth. Bandwidth is the usual measurement we're familiar with: how much data can be transferred in bulk - e.g. 10 Mbps.<br /></p><p>Latency, ping time or round-trip-time (RTT) is another important measure - it's how long it takes your message to get somewhere and come back.</p><p>When we start to send many small messages in serial, latency starts to matter. How big your latency is depends on how close you are to the server you're talking to. For Wikimedia Commons, the data-centers (DCs) are located in San Francisco (<a href="https://wikitech.wikimedia.org/wiki/Ulsfo_cluster">ulsfo</a>), Virginia (<a href="https://wikitech.wikimedia.org/wiki/Eqiad_cluster">eqiad</a>), Texas (<a href="https://wikitech.wikimedia.org/wiki/Codfw_cluster">codfw</a>), Singapore (<a href="https://wikitech.wikimedia.org/wiki/Eqsin_cluster">eqsin</a>) and Amsterdam (<a href="https://wikitech.wikimedia.org/wiki/Esams_cluster">esams</a>). For example, I'm relatively close to SF, so my ping time to the SF servers is about 50ms. For someone with a 50ms ping time, all this back and forth will take at a minimum 17 seconds just from latency.</p><p>However, it gets worse; Your computer doesn't just ask for the page and get a response back, it has to setup the connection first (TCP & TLS handshake). This takes additional round-trips.</p><p>Additionally, not all data centers are equal. The Virginia data-center (eqiad)² is the main data center which can handle everything, the other DCs only have varnish servers and can only handle cached requests. This makes browsing Wikipedia when logged out very speedy, but the type of API requests we are making here cannot be handled by these caching DCs³. For requests they can't handle, they have to ask the main DC what the answer is, which adds further latency. When I tried to measure mine, i got 255ms, but I didn't measure very rigorously, so I'm not fully confident in that number. In our particular case, the TLS & TCP handshake are handled by the closer DC, but the actual api response has to be fetched all the way from the DC in Virginia.</p><p>But wait, you might say: Surely you only need to do the TLS & TCP setup once if communicating to the same host. And the answer would normally be yes, which brings us to <b>major problem #2:</b> Each connection is setup and tore down independently, requiring us to re-establish the TCP/TLS session each time. This adds 2 additional RTT. In our 85 image example, we're now up to 1020 round-trips. If you assume 50ms to caching DC and 255ms to Virginia (These numbers are probably quite idealized, there are probably other things I'm not counting), we're up to 2 minutes.<br /></p><p>To put it altogether, here is a diagram representing all the back and forth communication needed just to use a single image:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVq5038lwvYr9QVkoarOsTeJCBGZ8VEOoC0BSV739Wctye0Dv3eMKFK9mvYgF-NC_qBe7d-sjtjptnl5L13e5_bySfQk7pbyL4ZEDKIBXxSW1T2Kajo--DbaR_redOzuWV2szNsDSPFss7CMZ66OqDp3NF0Iy9x9XHFi_qRMNuQauSWd13M4t4Pfb3/s1169/Commons-fetch-diagram.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="1169" data-original-width="450" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiVq5038lwvYr9QVkoarOsTeJCBGZ8VEOoC0BSV739Wctye0Dv3eMKFK9mvYgF-NC_qBe7d-sjtjptnl5L13e5_bySfQk7pbyL4ZEDKIBXxSW1T2Kajo--DbaR_redOzuWV2szNsDSPFss7CMZ66OqDp3NF0Iy9x9XHFi_qRMNuQauSWd13M4t4Pfb3/s16000/Commons-fetch-diagram.png" /></a></div>12 RTT per image used! This is assuming TLS 1.3. Earlier versions of TLS would be even worse.<br /><h2 style="text-align: left;">Introducing HTTP/2</h2><p style="text-align: left;">In 2015, HTTP/2 came on the scene. This was the first major revision to the HTTP protocol in almost 20 years.</p><p style="text-align: left;">The primary purpose of this revision of HTTP, was to minimize the effect of latency when you are requesting many separate small resources around the same time. It works by allowing a single connection to be reused for many requests at the same time and allowing the responses to come in out of order or jumbled together. In HTTP/1.1 you can sometimes be stuck waiting for some request to finish before being allowed to start on the next one (Head of line blocking) resulting in inefficient use of network resources<br /></p><p style="text-align: left;">This is exactly the problem that instant commons was having.</p><p style="text-align: left;">Now I should be clear, instant commons wasn't using HTTP/1.1 in a very efficient way, and it would be possible to do much better even with HTTP/1.1. However, HTTP/2 will still be that much better than what an improved usage of HTTP/1.1 would be.</p><p style="text-align: left;">Changing instant commons to use HTTP/2 changed two things:</p><ol style="text-align: left;"><li>Instead of creating a new connection each time, with multiple round trips to set up TCP and TLS, we just use a single HTTP/2 connection that only has to do the setup once.</li><li>If we have multiple requests ready to go, send them all off at once instead of having to wait for each one to finish before sending the next one.</li></ol><p>We still can't do all requests at once, since the MediaWiki parser is serial, and it stops parsing once we hit an image, so we need to get information about the current image before we will know what the next one we need is. However, this still helps as for each image, we send 4 requests (metadata, thumbnail, 1.5dpp thumbnail and 2dpp thumbnail), which we can now send in parallel.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0bpoziioj1KJ-CLz41VClxhKSZEqi-73BBCeJKy9P6SHjVZL0yRJxVWQZPq2QhqYQjWUMIZ8QQOEdFGCXbhJtQwcTNp2c4GNg77r1_oWJyfY1LwRN2ATZJlL5hCcniXjQfID04k9KsX_KA9JxvoYeGlzy6d-ot6OfRphuy5pT0UWZPXGTzLDy6WIX/s832/Commons-fetch-diagram-http2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="832" data-original-width="450" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0bpoziioj1KJ-CLz41VClxhKSZEqi-73BBCeJKy9P6SHjVZL0yRJxVWQZPq2QhqYQjWUMIZ8QQOEdFGCXbhJtQwcTNp2c4GNg77r1_oWJyfY1LwRN2ATZJlL5hCcniXjQfID04k9KsX_KA9JxvoYeGlzy6d-ot6OfRphuy5pT0UWZPXGTzLDy6WIX/s16000/Commons-fetch-diagram-http2.png" /></a></div><br /><p>The results are impressive for such a simple change. Where previously my test page took 17 minutes, now it only takes 2 (139 seconds).<br /></p><p><br /></p><h2 style="text-align: left;">Transform via 404</h2><p style="text-align: left;">In vanilla MediaWiki, you have to request a specific thumbnail size before fetching it; otherwise it might not exist. This is not true on Wikimedia Commons. If you fetch a thumbnail that doesn't exist, Wikimedia Commons will automatically create it on the spot. MediaWiki calls this feature "TransformVia404".</p><p style="text-align: left;">In instant commons, we make requests to create thumbnails at the appropriate sizes. This is all pointless on a wiki where they will automatically be created on the first attempt to fetch them. We can just output <img> tags, and the first user to look at the page will trigger the thumbnail creation. Thus skipping 3 of the requests.</p><p style="text-align: left;">Adding this optimization took the time down from 139 seconds with just HTTP/2 to 18.5 seconds with both this and HTTP/2. This is 56 times faster than what we started with!</p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiw8XiEj0JYaYpJAGtvlxXmxLu5tJAtBpcNFNi2De-wq9dcRNLepPrGJF61G3_K5s4Q8vDraVa_qP4Av59aojQIYki8xldIeMin8pJXad5DPb6itWJ3I3nbigOoozjkjmrVlCrDNHhBcRtM6plgKap3_RepODOLdciz9xSVHCwrh6nQns1wkm4hIwW9/s495/Commons-fetch-diagram-404.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="495" data-original-width="450" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiw8XiEj0JYaYpJAGtvlxXmxLu5tJAtBpcNFNi2De-wq9dcRNLepPrGJF61G3_K5s4Q8vDraVa_qP4Av59aojQIYki8xldIeMin8pJXad5DPb6itWJ3I3nbigOoozjkjmrVlCrDNHhBcRtM6plgKap3_RepODOLdciz9xSVHCwrh6nQns1wkm4hIwW9/s16000/Commons-fetch-diagram-404.png" /></a></div><br /><p style="text-align: left;"><br /></p><h2 style="text-align: left;">Prefetching</h2><p style="text-align: left;">18.5 seconds is pretty good. But can we do better?</p><p style="text-align: left;">We might not be able to if we actually have to fetch all the images, but there is a pattern we can exploit.</p><p style="text-align: left;">Generally when people edit an article, they might change a sentence or two, but often don't alter the images. Other times, MediaWiki might re-parse a page, even if there are no changes to it (e.g. Due to a cache expiry). As a result, often the set of images we need is the same or close to the set that we needed for the previous version of the page. This set is already recorded in the database in order to display what pages use an image on the image description page<br /></p><p style="text-align: left;">We can use this. First we retrieve this list of images used on the (previous version) of the page. We can then fetch all of these at once, instead of having to wait for the parser to tell us one at a time which image we need.</p><p style="text-align: left;">It is possible of course, that this list could be totally wrong. Someone could have replaced all the images on the page. If it's right, we speed up by pre-fetching everything we need, all in parallel. If it's wrong, we fetched some things we didn't need, possibly making things slower than if we did nothing.</p><p style="text-align: left;">I believe in the average case, this will be a significant improvement. Even in the case that the list is wrong, we can send off the fetch in the background while MediaWiki does other page processing - the hope being, that MediaWiki does other stuff while this fetch is running, so if it is fetching the wrong things, time is not wasted.</p><p style="text-align: left;">On my test page, using this brings the time to render (Where the previous version had all the same images) down to 1.06 seconds. A <b>980 times</b> speed improvement! It should be noted, that this is time to render in total, not just time to fetch images, so most of that time is probably related to rendering other stuff and not instant commons.</p><h2 style="text-align: left;">Caching</h2><p style="text-align: left;">All the above is assuming a local cache miss. It is wasteful to request information remotely, if we just recently fetched it. It makes more sense to reuse information recently fetched.</p><p style="text-align: left;">In many cases, the parser cache, which in MediaWiki caches the entire rendered page, will mean that instant commons isn't called that often. However, some extensions that create dynamic content make the parser cache very short lived, which makes caching in instant commons more important. It is also common for people to use the same images on many pages (e.g. A warning icon in a template). In such a case, caching at the image fetching layer is very important.</p><p style="text-align: left;">There is a downside though, we have no way to tell if upstream has modified the image. This is not that big a deal for most things. Exif data being slightly out of date does not matter that much. However, if the aspect ratio of the image changes, then the image will appear squished until InstantCommons' cache is cleared.</p><p style="text-align: left;">To balance these competing concerns, Quick InstantCommons uses an adaptive cache. If the image has existed for a long time, we cache for a day (configurable). After all, if the image has been stable for years, it seems unlikely it is going to be edited in very soon. However, if the image has been edited recently, we use a dynamically determined shorter time to live. The idea being, if the image was edited 2 minutes ago, there is a much higher possibility that it might be edited a second time. Maybe the previous edit was vandalism, or maybe it just got improved further.</p><p style="text-align: left;">As the cache entry for an image begins to get close to expiring, we refetch it in the background. The hope is that we can use the soon to be expired version now, but as MediaWiki is processing other things, we refetch in background so that next time we have a new version, but at the same time we don't have to stall downloading it when MediaWiki is blocked on getting the image's information. That way things are kept fresh without a negative performance impact.</p><p style="text-align: left;">MediaWiki's built-in instant commons did support caching, however it wasn't configurable and the default time to live was very low. Additionally, the adaptive caching code had a bug in it that prevented it from working correctly. The end result was that often the cache could not be effectively used.</p><h2 style="text-align: left;">Missing MediaHandler Extensions</h2><div style="text-align: left;">In MediaWiki's built-in InstantCommons feature, you need to have the same set of media extensions installed to view all files. For example, PDFs won't render via instant commons without Extension:PDFHandler.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">This is really unnecessary where the file type just renders to a normal image. After all, the complicated bit is all on the other server. My extension fixes that, and does its best to show thumbnails for file types it doesn't understand. It can't support advanced features without the appropriate extension e.g. navigating in 3D models, but it will show a static thumbnail.<br /></div><h2 style="text-align: left;">Conclusion</h2><p style="text-align: left;">In the end, by making a few, relatively small changes, we were able to improve the performance of instant commons significantly. 980 times as fast!</p><p style="text-align: left;">Do you run a MediaWiki wiki? Try out the <a href="https://www.mediawiki.org/wiki/Extension:QuickInstantCommons">extension</a> and let me know what you think.<br /></p><h4 style="text-align: left;">Footnotes:</h4><p>¹ This is assuming default settings and an [object] cache miss. This may be different if $<span class="searchmatch">wg</span><span class="searchmatch">Responsive</span><span class="searchmatch">Images is false in which case high-DPI images won't be fetched, or if </span><span class="searchmatch">apiThumbCacheExpiry is set to non-zero in which case thumbnails will be downloaded locally to the wiki server during the page parse instead of being hotlinked.</span></p><p><span class="searchmatch"><br />² This role actually rotates between the Virginia & Texas data center. Additionally, the Texas DC (when not primary) does do some things that the caching DCs don't that isn't particularly relevant to this topic. There are eventual plans to have multiple active DCs which all would be able to respond to the type of API queries being made here, but they are not complete as of this writing - <a href="https://www.mediawiki.org/wiki/Wikimedia_Performance_Team/Active-active_MediaWiki">https://www.mediawiki.org/wiki/Wikimedia_Performance_Team/Active-active_MediaWiki</a></span></p><p><span class="searchmatch"><br />³ The MediaWiki API actually supports an smaxage=<number of seconds> (shared maximum age) url parameter. This tells the API server you don't care if your request is that many seconds out of date, and to serve it from varnish caches in the local caching data center if possible. Unlike with normal Wikipedia page views, there is no cache invalidation here, so it is rarely used and it is not used by instant commons.</span> </p><p><br /></p><br /><p><br /></p><p><br /></p><p></p>Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com0tag:blogger.com,1999:blog-7338246785946121007.post-27980527243763699182022-05-01T12:23:00.001-07:002022-05-01T12:23:49.835-07:00Finding if an item is a list of ranges stored in a DB<p> Recently, I've decided to experiment with making a reporting extension for MediaWiki (Working name "Sofa") as a side project. There are already several options already in this space, notably <a href="https://www.mediawiki.org/wiki/Extension:Cargo">Cargo</a>, <a href="https://www.semantic-mediawiki.org/wiki/Semantic_MediaWiki">Semantic MediaWiki</a> (SMW) and <a href="https://www.mediawiki.org/wiki/Extension:DynamicPageList">DynamicPageLst</a> (DPL). However, I think the design space is still a little under explored. My goal is to experiment with some rather different design choices, and see what I end up with. In particular - I want to make one that respects the core wiki philosophy of "quick" - e.g. changes are reflected immediately, and "<a href="http://meatballwiki.org/wiki/SoftSecurity">soft-security</a>", where nothing users can do, whether accidentally or maliciously, cause any (performance) harm but can simply be reverted. Current solutions often require manual cache purging to see changes reflected and can have unpredictable user-dependent performance characteristics where subtle user choices in extreme cases could even cause a DB overload.<br /></p><p>I don't want to talk too much about the new extension generally, as that's not what this blog post is about and I am still in the early stages. However one design requirement that left me in a bit of a conundrum is the automatic cache purging one. In the model for this extension, we have ordered lists of items and pages that display a range of items from the list. In order to support cache purging when someone adds a new entry that would appear in a used range, we need some way to store what ranges of items are used by what pages so that given a specific item, we can query which pages use a range containing that item. This turned out to be surprisingly difficult. I thought I'd write a post about the different methods I considered. </p><h2 style="text-align: left;">The naive approach</h2><p style="text-align: left;">For example we might have the following list of alphabetically ordered items</p>
<table border="1">
<thead><tr><th>id</th><th>item</th></tr></thead><tbody>
<tr><td>17</td><td>Alpha</td></tr>
<tr><td>12</td><td>Atom</td></tr>
<tr><td>21</td><td>Beta</td></tr>
<tr><td>34</td><td>Bobcat</td></tr>
<tr><td>8</td><td>Bonsai Tree</td></tr>
</tbody></table>
<p style="text-align: left;">And a page might want to include all items between Amoeba and Bobcat (Note that Amoeba is not actually in the list).</p><p style="text-align: left;">In this example, we need someway to record that the page is using items between Amoeba and Bobcat, so if someone inserts Badger into the list, the page using the list gets refreshed.</p><p style="text-align: left;">The natural way of doing this would be a MariaDB table like the following:</p>
<table border="1">
<caption>list_cache table</caption>
<thead><tr><th>id</th><th>page_id</th><th>list_start</th><th>list_end</th></tr></thead>
<tbody><tr><td>1</td><td>1234</td><td>Amoeba</td><td>Bobcat</td></tr></tbody>
</table>
<p style="text-align: left;">Along with 2 indicies: one on <i>list_start</i> and the other on <i>list_end</i>.<sup>[1]</sup><br /></p><p style="text-align: left;">In the event <i>Badger</i> gets added to the list, we would need to query what pages to refresh. In this structure we would use a query like: <span style="font-family: courier;">SELECT page_id FROM list_cache WHERE "Badger" BETWEEN list_start AND list_end;</span> </p><p style="text-align: left;">This all works. However there is one problem, to answer that query the DB has to look at a number of rows beside the rows we're interested in. For best performance, we ideally want to make the database look at as few rows as possible to answer our query. Ideally it would only look at rows that are relevant to the answer.</p><p style="text-align: left;">An index in the database is an ordered list of rows from a table - like a phonebook. In this case we have two - one for the start of the range and one for the end. In essence we want information from both, however that's not how it works.</p><p style="text-align: left;"><span style="font-family: courier;">"'badger' BETWEEN list_start AND list_end"<span style="font-family: georgia;"> is really just a fancy way of saying <span style="font-family: courier;">list_start < 'badger' AND list_end > 'badger'.<span style="font-family: georgia;"> The RDBMS then has a choice to make: It can use the list_start index, go to the spot in that index where badger would be, and then work its way down to the bottom of the list, checking each row individually for if the second condition</span></span> is true. Alternatively it can pull up the <i>list_end</i> index and do the same thing but reversed.</span></span></p><p style="text-align: left;"><span style="font-family: courier;"><span style="font-family: georgia;">However, once it picks an index and goes to it starting place, it has to look at every entry until it hits the beginning (Respectively end) of the index. If the starting place is in the middle of the index, it would have to look at roughly half the rows in order to answer the query. If this system was used at scale and the table had 2 million rows, the database might have to look at a million rows, just to find the 3 that are matching. This is definitely not ideal</span></span></p><p style="text-align: left;"><span style="font-family: courier;"><span style="font-family: georgia;"> </span></span></p><p style="text-align: left;"><span style="font-family: courier; font-size: x-small;"><span style="font-family: georgia;">[1] </span></span><span style="font-family: courier; font-size: x-small;"><span style="font-family: georgia;">Potentially you could do a compound index on (list_start,list_end) and
(list_end,list_start) instead which allows the DB to look directly at the index instead of constantly looking up each row in the underlying table. I'm unclear on if the benefit of index-condition-pushdown outweighs the
increased index size, I suspect it would mildly, but either way things are still rather inefficient. </span></span></p><h2 style="text-align: left;"><span style="font-family: courier;"><span style="font-family: georgia;">Interval trees</span></span></h2><p style="text-align: left;"><span style="font-family: courier;"><span style="font-family: georgia;">At this point, like any good programmer, I started googling and trawling stack overflow. Eventually I stumbled upon the article "<a href="https://blogs.solidq.com/en/sqlserver/static-relational-interval-tree/">A Static Relational Interval Tree</a>" by Laurent Martin on the SolidQ blog.</span><span style="font-family: georgia;"></span></span></p><p style="text-align: left;"><span style="font-family: courier;"><span style="font-family: georgia;">At first glance this seemed the exact solution I was looking for. On second glance - not so much. This is a really cool technique; unfortunately it scales with the number of bits needed to represent the data type you are using. If I was looking at ranges of integers, dates or IP addresses, this would be perfect. Unfortunately I am using strings which take a large number of bits. Nonetheless, I'm going to describe the technique, as I think its really cool, and should be more widely known.</span></span></p><p style="text-align: left;"><span style="font-family: courier;"><span style="font-family: georgia;">The core idea, is to separate the different ranges into a number of buckets. Each bucket can be queried separately but efficiently - Any row that the DB has to look at, is a row that would match the query. In order to find all the ranges that contain a specific point, you just need to query all the applicable buckets. The number of buckets that are applicable in the worst case is the same as the number of bits needed to represent the data type (hence the applicability to ranges of ints but not long strings).</span></span></p><p style="text-align: left;"><span style="font-family: courier;"><span style="font-family: georgia;">The end result is: if you are using ranges of 32-bit integer, you would have to do 32 really efficient separate queries (or 1 query after UNIONing them together). For a large DB, this is much better than 1 inefficient query that might read millions of rows or more.</span></span></p><p style="text-align: left;"><span style="font-family: courier;"><span style="font-family: georgia;">For simplicity, in the explanation I will use 4 bit integers (0-16).<br /></span></span></p><h3 style="text-align: left;"><span style="font-family: courier;"><span style="font-family: georgia;">Buckets</span></span></h3><p style="text-align: left;"><span style="font-family: courier;"><span style="font-family: georgia;">So, what are these buckets? They are simply the shared binary prefix of the two ends of the range with a 1 followed by 0's appended to fill out the rest of the binary number. Here are some examples:</span></span></p>
<table border="1">
<thead><tr><th>Range start</th><th>Range end</th><th>Shared prefix</th><th>Bucket</th></tr></thead>
<tbody>
<tr><td>10 (<b>1</b>010<sub>b</sub>)</td><td>13 (<b>1</b>101<sub>b</sub>)</td><td>1</td><td>12 (1100<sub>b</sub>)</td></tr>
<tr><td>9 (<b>10</b>01<sub>b</sub>)</td><td>10 (<b>10</b>10<sub>b</sub>)</td><td>10</td><td>10 (1010<sub>b</sub>)</td></tr>
<tr><td>9 (<b>1001</b><sub>b</sub>)</td><td>9 (<b>1001</b><sub>b</sub>)</td><td>1001</td><td>9 (1001<sub>b</sub>)</td></tr>
</tbody>
</table>
<p>In the database, we can use virtual columns to have mariaDB manage this for us, using the formula <span style="font-family: courier;">range_end - range_end % POWER(2, FLOOR(LOG( (range_start - 1) ^ range_end)/LOG(2)))</span></p>
<pre>CREATE TABLE ranges (
id int unsigned PRIMARY KEY AUTO_INCREMENT,
range_start int NOT NULL,
range_end int NOT NULL,
bucket int as (range_end - range_end % POWER(2, FLOOR(LOG( (range_start - 1) ^ range_end)/LOG(2))))
);
CREATE INDEX start on ranges (bucket, range_start);
CREATE INDEX end on ranges (bucket, range_end);
</pre>
<p>Now we can just insert things like normal, and MariaDB will take care of calculating the bucket number.</p><h3 style="text-align: left;">Querying<br /></h3><p>This has the following useful property: Any range assigned to bucket number <i>n</i> must contain the value <i>n</i>. For example, bucket 12 might have the range [10,13] in it, or [12,12], all of which contain the point 12. It would not be able to have the range [10,11] as that does not go through the point 12.<br /></p><p>This is useful, since if we have a specific number of interest, we can easily query the bucket to see if it is in there. For example, if we wanted to know all the ranges in bucket 12 containing 9, we can query <span style="font-family: courier;">WHERE bucket = 12 AND range_start <= 9;</span> since if a range is in bucket 12, it must contain the value 12, so any range that starts before or equal 9, must go through 9 in order to get to 12. If we have an index on (bucket, range_start), this is a really efficient query that will only look at the rows that are of interest. Similarly, if the number of interest is greater than the bucket number, we do the same thing, just with <span style="font-family: courier;">range_end >=</span> instead.</p><p>Now that we know how to get all the relevant ranges from one bucket efficiently, we just need to figure out what all the relevant buckets are, and we have a solution (remembering that at most log of the data type number of buckets need to be consulted).</p><h3 style="text-align: left;">Finding the buckets<br /></h3><p>To find the buckets, we simply take the first x bits of the binary representation and append 1 followed by 0s to fill out the number, for all x.</p><p>For example, if we wanted to find all the buckets that are relevant to ranges containing 9 (1001<sub>b</sub>):</p>
<table border="1">
<caption>buckets that can contain 9 (1001<sub>b</sub>)</caption>
<thead><tr><th>bits masked</th><th>prefix</th><th>bucket</th><th>top or bottom</th></tr></thead>
<tbody>
<tr><td>0</td><td><br /></td><td>8 (1000)</td><td>top</td></tr>
<tr><td>1</td><td>1<sub>b</sub></td><td>12 (<b>1</b>100)</td><td>bottom</td></tr>
<tr><td>2</td><td>10<sub>b</sub></td><td>10 (<b>10</b>10)</td><td>bottom</td></tr>
<tr><td>3</td><td>100<sub>b</sub></td><td>9 (<b>100</b>1)</td><td>top (either)</td></tr>
</tbody>
</table>
<p>We would have to look at buckets 8,12,10 and 9. If the bit immediately after the prefix is a 1 (or equivalently if our point >= bucket) we have to look at the top of the bucket, otherwise, we look at the bottom. By this, we mean whether the query is range_end <= point or range_start >= point respectively.</p><p>So with that in mind, in our example we would make a query like:</p>
<pre>SELECT id, range_start, range_end FROM ranges WHERE bucket = 8 and range_end <= 9
UNION ALL SELECT id, range_start, range_end FROM ranges WHERE bucket = 12 and range_start >= 9
UNION ALL SELECT id, range_start, range_end FROM ranges WHERE bucket = 10 and range_start >= 9
UNION ALL SELECT id, range_start, range_end FROM ranges WHERE bucket = 9 and range_end <= 9;
</pre>
<p>Which is a union of queries that each individually are very efficient.</p><p>To generate this, you might use code like the following (in php):</p>
<pre>define( "MAX_BITS", 4 );
function makeQuery ( $target ) {
$mask = 0;
$query = '';
for ( $i = MAX_BITS-1; $i >= 0; $i-- ) {
$mask = $mask | (1>>$i);
$tmpMask = $mask ^ (1>>$i);
$pad = 1 >> $i;
$bucket = ($target & $tmpMask) | $pad;
$query .= "SELECT id, range_start, range_end FROM ranges WHERE bucket = $bucket ";
if ( $target < $bucket ) {
$query .= "AND range_start >= $target ";
} else {
$query .= "AND range_end <= $target ";
}
if ( $i != 0 ) {
$query .= "\nUNION ALL ";
}
}
return $query . ";";
}
</pre>
<p> </p><p> <br /></p><span><a name='more'></a></span><p>Unfortunately this technique didn't work for me, as I am using ranges of strings. Nonetheless I was very happy to learn about it as it is quite a cool technique. It can also be easily extended to find intersecting ranges instead of just points in a range. For more information I encourage the interested reader to read the articles "<a href="https://blogs.solidq.com/en/sqlserver/static-relational-interval-tree/">A Static Relational Interval Tree</a>" by Laurent Martin, as well as "<a href="https://www.dbs.ifi.lmu.de/Publikationen/Papers/VLDB2000.pdf">Managing Intervals Efficiently in Object-Relational Databases</a>" by Kriegel et al.</p><p><br /></p><h2 style="text-align: left;">Geospatial (R-Tree) index</h2><p style="text-align: left;">Newer versions of MariaDB support spatial indexes. This is meant to support 2-D coordinates on a globe. If our ranges only involved integers, we would be able to (ab)use this for the 1-dimensional case. We simply use LineStrings that have only two points in them, and always have a y value of 0.<br /></p><p style="text-align: left;"> </p><p style="text-align: left;"><span style="font-family: courier;">CREATE TABLE geo_ranges (</span></p><p style="text-align: left;"><span style="font-family: courier;"> id int unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT,</span></p><p style="text-align: left;"><span style="font-family: courier;"> ranges LINESTRING NOT NULL, SPATIAL INDEX (ranges)</span></p><p style="text-align: left;"><span style="font-family: courier;">);</span></p><p style="text-align: left;"><span style="font-family: courier;"><span style="font-family: georgia;">Then we just insert some ranges:</span></span></p><p style="text-align: left;"><span style="font-family: courier;">INSERT INTO geo_ranges (ranges) VALUES</span></p><p style="text-align: left;"><span style="font-family: courier;"> (LineString(Point(17,0),Point(34,0))),</span></p><p style="text-align: left;"><span style="font-family: courier;"> (LineString(Point(28,0),Point(40,0))),</span></p><p style="text-align: left;"><span style="font-family: courier;"> (LineString(Point(50,0),Point(52,0))); </span><br /></p><p style="text-align: left;">Now we can query it using the ST_INTERSECTS function:</p><p style="text-align: left;"><span style="font-family: courier;">SELECT id, x(startpoint(ranges)) AS 'range_start', x(endpoint(ranges)) AS 'range_end'</span></p><p style="text-align: left;"><span style="font-family: courier;">FROM geo_ranges</span></p><p style="text-align: left;"><span style="font-family: courier;">WHERE st_intersects(point(30,0),ranges);</span></p><p style="text-align: left;"><span style="font-family: courier;">+----+-------------+-----------+<br />| id | range_start | range_end |<br />+----+-------------+-----------+<br />| 2 | 28 | 40 |<br />| 1 | 17 | 34 |<br />+----+-------------+-----------+<br />2 rows in set (0.001 sec)</span></p><p style="text-align: left;">I don't know enough about how MariaDB spatial indexes work to analyze the performance of this approach, but I would assume it is reasonably performant. Unfortunately it does not work for my usecase, as I want string ranges not floats or ints.</p><h2 style="text-align: left;">Recording every point<br /></h2><p style="text-align: left;">So the last method I'm aware of, is give up on trying to record ranges, and instead record every point within the range.</p><p style="text-align: left;">So if we have the following items:<br /></p><table border="1"><thead><tr><th>id</th><th>item</th></tr></thead><tbody><tr><td>17</td><td>Alpha</td></tr><tr><td>12</td><td>Atom</td></tr><tr><td>21</td><td>Beta</td></tr><tr><td>34</td><td>Bobcat</td></tr><tr><td>8</td><td>Bonsai Tree</td></tr></tbody></table><p style="text-align: left;">And if page number 100 uses the range Amulet to Bog, it would be recorded as follows:</p>
<table border="1">
<caption>list_used table</caption>
<thead><tr><th>page_id</th><th>item_id</th></tr></thead>
<tbody>
<tr><td>100</td><td>12</td></tr>
<tr><td>100</td><td>21</td></tr>
<tr><td>100</td><td>34</td></tr>
</tbody>
</table>
<p style="text-align: left;"><br /></p><p style="text-align: left;">If you insert a new item, like "Anthill", you would look for the items immediately before and after it (Alpha - 17 and Atom 12), and then purge all pages that use item 17 or 12.</p><p style="text-align: left;">The queries are efficient, at the cost of much data duplication. At first, I didn't like this idea: it had to store so much extra data. However after thinking about it, this is mitigated by a couple of factors: First of all, the usage table only has to store id numbers, not the original string, which reduces space quite a bit. Second, if we assume that each entry is due to the item actually being used on the page, that puts some limits on things, as we can generally assume that people do not want to make excessively long pages.</p><p style="text-align: left;">If we supported aggregations over large numbers of items, that could be problematic. However, i have another idea in mind for that case.</p><p style="text-align: left;">In the end, the mild data duplication seems well worth the cost, especially when you consider that saving (writing to the table) happens rarely, relative to how often you might have to check to see if a cache purge is needed. <br /></p><h2 style="text-align: left;">Conclusion</h2><p style="text-align: left;">Overall, I'm surprised I didn't find any option that I completely liked. I would have assumed this is a somewhat common need, so I would have expected more information about doing this type of thing. In the end, I think the recording every point meets my needs the best.</p><p style="text-align: left;">Is there a method I missed?<br /></p><p style="text-align: left;"><br /></p><p style="text-align: left;"><br /></p><p><br /></p>Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com0tag:blogger.com,1999:blog-7338246785946121007.post-39117106835045877092013-02-21T19:02:00.000-08:002013-02-22T08:26:09.850-08:00Tech related IEG proposalsThe deadline for <a href="https://meta.wikimedia.org/wiki/Grants:IEG#ieg-join">IEG</a> proposals has now passed. IEGs (Individual engagement grants) are a new pilot program by the Wikimedia foundation to give small amounts of money to people who promise to do cool things with it. Ok, the criteria are a bit more complicated than that, but that is the gist of it.
<p>
<br/>
I thought I'd take some time to look through the technical proposals. To be honest, I was hoping to see more programming proposals, something like google summer of code, but for already experienced devs. By and large that did not seem to happen. This may be due to some mixed messages on technical projects - contrast <a href="http://lists.wikimedia.org/pipermail/wikitech-l/2013-January/065950.html"> this mailing list post</a> and this <a href="https://meta.wikimedia.org/w/index.php?title=Template:IEG/Criteria&diff=next&oldid=5241437">late addition to the rules I just discovered today</a>. Also it may simply be because its a new program, and developers weren't its primary audience. Perhaps it has to do with the timing, which would interfere with students going to school (as opposed to google summer of code, which coincides with summer break), resulting in less student programmers participating. Who knows.
<p>
Additionally, of the technical proposals made, only one actually consulted with the developer community :( at large (by which I mean wikitech-l). I should note that some of the proposals I'm listing below as technical, are more of the form "develop a vision statement" which doesn't really require consulting with dev community. However, I still expected more people to be chatting up the devs in relation to IEG proposals.<p>
<br/>
<b>tl;dr</b>: My favourites are: <a href="https://meta.wikimedia.org/wiki/Grants:IEG/Elaborate_Wikisource_strategic_vision">Elaborate Wikisource strategic vision</a>, and <a href="https://meta.wikimedia.org/wiki/Grants:IEG/The_Wikipedia_Adventure">The Wikipedia Adventure</a>. My runner up favourite is <a href="https://meta.wikimedia.org/wiki/Grants:IEG/Backlog_pages_for_all_WikiProjects">backlog pages for WikiProjects</a> (That one is a runner up as its too vague on what actually will be accomplished).
<br/>
<p>
Anyhow, here's my take on the technical proposals that were submitted. Note, I have mostly just read through the proposals once, so if I misunderstand anything in any of the proposals, I apologize in advance.
<h2><a href="https://meta.wikimedia.org/wiki/Grants:IEG/Backlog_pages_for_all_WikiProjects">Backlog pages for WikiProjects</a></h2>
This is an interesting proposal. Basically the author notes that Wikipedia has categories and organizational pages for its various backlogs. However individual WikiProjects do not have such per-project backlog pages, or if they do, they're very limited.
<p>
The actual proposal for what to do is rather vague. It sounds slightly like figuring out what to do is part of the proposal. From what I've gathered the proposal breaks down into two related wants:
<ul>
<li>(Efficient) Category intersection - The ability to get all pages that are in the intersection of a set of categories. There are some tools that do this already - like <a href="http://mediawiki.org/wiki/Extension:DynamicPageList_(Wikimedia)">DPL</a> [Not enabled on Wikipedia, but is on other wikis like meta], and <a href="https://en.wikipedia.org/wiki/Wikipedia:CATSCAN">CATSCAN</a>. Neither scales well once things get big.</li>
<li>A snazzy interface for showing the backlog - The authors point to WikiHow's <a href="http://www.wikihow.com/Special:CommunityDashboard">CommunityDashboard</a> as an example to potentially emulate. This is the first time I've heard of WikiHow's tool, and while I only gave it a brief glance, it is very cool looking.</li>
</ul>
Category intersection is an interesting problem, one that has been wanted for quite a long time by many people. I'm currently the maintainer of the DynamicPageList extension (however I mostly ignore it, and simply fix the rare bug that pops up). DynamicPageList does category intersection in the naive way, which simply does not scale to Wikipedia-size wikis (or even wikis significantly smaller than enwikipedia). (By naive method, I mean doing a bunch of self-joins on the <a href="http://mediawiki.org/wiki/manual:Categorylinks_table">categorylinks table</a>) <a href="https://bugzilla.wikimedia.org/show_bug.cgi?id=5244#c43">Some people have suggested</a> that it may be possible to implement this efficiently using full-text indexes and a program like lucene. There's even a proof of concept extension written using this approach. Adapting DynamicPageList to use this type of method is certainly something I would personally like to investigate if I ever had a large swath of free time.
<p>
The authors of this proposal suggest $1000 to hire a developer to implement their feature requests. While its hard to be certain, as the actual project requirements of this proposal are basically not defined, that seems like way too low a number given the amount of work wanted for the project. Particularly if efficient category intersection is a requirement.
<h2><a href="https://meta.wikimedia.org/wiki/Grants:IEG/Elaborate_Wikisource_strategic_vision">Elaborate Wikisource strategic vision</a></h2>
I really like this proposal. Wikisource has always been a bit of a mystery to me. I know it has something to do with digitizing documents, and proofreading the resulting text, but I don't know much beyond that. In particular, I have almost no knowledge about how their main tool, <a href="https://www.mediawiki.org/wiki/Extension:ProofreadPage">ProofreadPage</a>, actually works.
<p>
Having a strategic vision for Wikisource would help more people understand, and thus appreciate the work of Wikisource. In turn, this may result in more people using Wikisource.
<p>
One of things I noticed about the proposal, is that they make it very clear they want to in the short term concentrate on things that do not need Wikimedia technical staff attention. This is probably a reaction to how Wikisource has been ignored by both the foundation and the larger developer community. The vast majority of work done on Wikisource related extensions has been done by volunteer developers who come from the Wikisource project. Personally I would caution this proposal from ignoring potential wmf tech resources too much. Well its important to consider what is do-able and what is not, it is also good to first decide what is wanted, and then figure out how to do it (Where there's a will there's a way).
Wikisource may even find that once there is a clear picture of what is needed, much more resources are available to them. WMF employees aren't the only developers, there are also (non-wikisource) volunteers. Who knows, perhaps these people would be willing to help if they knew what needed doing (More generally, if you want some new feature for your wiki, a good first step is always to produce a good design document of precisely what is wanted. Developers aren't mind readers, and would much rather code than try to figure out what the user wants. Having a clear statement about what you need may be half the battle to getting what you need). Also just because the WMF isn't willing to devote tech resources to Wikisource, doesn't mean that employees might not help. Employees do have 20% time, and occasionally even commit code unrelated to foundation goals in their free time.
<p>
I really wish this project luck, and should it be accepted, I look forward to reading the final report.
<br/>
<b>Edit:</b> I wrote this section before I saw the new part of the rules where nothing involving WMF-tech resources is allowed. With that in mind, the no-wmf-tech parts of this proposal make much more sense.
<h2><a href="https://meta.wikimedia.org/wiki/Grants:IEG/Mapping_History:_Revision_History_Visualizer_and_Improvement_Suggester_using_Geo-Spatial_Technologies">Mapping History: Revision History Visualizer and Improvement Suggester using Geo-Spatial Technologies</a></h2>
This one gets points for being the only tech proposal to actually talk to the developer community.
<p>
Basically what they want to do, is create a map from an articles edit history, to highlight which region is editing the article the most. Afterwards they want to do some fancy machine learning stuff to see if any automatic inferences can be made from this geo-spatial data (For example, if only one country edits an article, maybe its POV).
<p>
Unfortunately the proposal has several problems. First of all the privacy policy. The authors want to get the IP addresses of logged in users, in order to find out roughly where they live, so they can be plotted on a map. That's not going to happen for privacy reasons, end of story. Hence the visualizations will be a lot less complete (If they only use anon locations). The proposal could perhaps parse user pages for location based infoboxes, but not everyone specifies that sort of information
<p>
Beyond not sufficiently researching the privacy policy, the authors seem not to understand what sort of access different technical projects (extensions vs gadgets vs third party hosted thingies) have, along with what data the API provides. I would expect that someone making such a proposal would understand the limitations of the technology that they intend to use before making the proposal.
<p>
Last of all the $30000 budget request seems a little high relative to the amount of work (I believe would be required) and the impact the project would make.
<h2><a href="https://meta.wikimedia.org/wiki/Grants:IEG/MediaWiki_and_Javanese_script"> MediaWiki and Javanese script</a></h2>
This is an interesting one. It would be interesting to see what someone from wmf's i18n team thought of it.<p>
As far as I understand, the main points are:
<ul>
<li>There are no input methods generally available for the Javanese script except in MediaWiki (which sounds odd to me)</li>
<li>People should be able to type in their own script easily</li>
<li>Therefore we should distribute MediaWiki-on-a-stick (A Wiki on a usb stick, so you can take the wiki with you).</li>
</ul>
First of all, it would be kind of cool if Wiki on a stick was supported for MediaWiki. The author mentions XAMPP, but there might be simpler options (Using PHP's built in webserver, combined with sqlite). However the wiki on a stick part seems to be a means to an end, not the main goal of this project.
<p>
For the actual project, I'm not sure what the end goal is - Have Javanese speaking people start using MediaWiki as a personal word processor? It seems like making an input method for X11/where ever input methods go in general for various operating systems would be much more effective at accomplishing the authors goals. Its also unclear how this benefits Wikimedia, other than wiki on a stick support would benefit MediaWiki. Having more Javanese speakers familiar with MediaWiki might make them more likely to contribute to a Javanese project, but that seems like a rather indirect benefit.
<h2><a href="https://meta.wikimedia.org/wiki/Grants:IEG/Replay_Edits">Replay Edits</a></h2>
This is an interesting proposal.
<p>
From what I understand, what is being proposed is that you could replay the history of an article, having text being added and removed in front of your eyes. Somewhat similar to how edits happen in front of your eyes in etherpad (?) (but replaying the past, not real time editing). This would allow a cool visualization of how articles change with time.
<p>
I'm unclear on this proposal if its meant to operate on the wikitext or on the rendered page. I'm also unclear if as it goes forward in time, does it highlight the changes, or just show the new page. Some of the comments on the talk page, and <a href="https://docs.google.com/file/d/0B1hJO1N6piYFM2FxVlBtakNlRXM/edit?usp=sharing">this mockup</a> suggest it would work on rendered pages. Having diffs that highlight what changed, but on the rendered page instead of the wikitext source (so-called visual diff), is a feature that would be awesome in and of itself. (There was once upon a time <a href="https://www.mediawiki.org/wiki/Visual_Diff"> some experimental support</a> in MW for this, but it was removed due to being incomplete).
<p>
If it is indeed the authors intention to provide visual diffs, then this project becomes quite exciting. It also becomes quite a bit more difficult, and I would be hesitant supporting it, unless the author stated his implementation plans in much more detail, in order to verify he understands the issues involved. If this is more just a visualization of how articles change in time, it is a much lower impact project, but still an interesting one. I would support it, especially because the proposer is only asking for $200 to do this.
<h2><a href="https://meta.wikimedia.org/wiki/Grants:IEG/MediaWiki_data_browser">MediaWiki data browser</a></h2>
This proposal is from Yaron, who is (among other things) a very prominent developer of Semantic MediaWiki. This is by far the most ambitious technical project of any proposed and could potentially have a huge impact.
<p>
At the same time it is a little unclear what is actually being proposed. The author says a framework to create drill down interfaces. Perhaps my confusion stems from only having a vague idea of what a drill-down interface is. A picture of an example interface would really be worth a thousand words.
<p>
With that said, the idea seems to be creating an interface where the user could filter or select pages by some criteria based on information in an infobox. This all sounds really cool, but it also sounds very hand-waving, to really evaluate this proposal, I think I would need to better understand what is actually being proposed. A concrete example of what an app designed with the framework would look like, including what sort of scope in terms of data processing a potential app could have, would be helpful.
<p>
An interesting part of this proposal is that all the processing is done on the client side. The author mentions that (obviously) only a small portion of wikipedia's data would be downloaded. I would be interested to know more about how much data would be downloaded, what data would be downloaded (is it wikitext of relevant pages), how the framework would find the relevant information it needs to download (this is part of my confusion over what the relevant information the framework would be working on is).
<p>
Certainly an interesting proposal, and one with much potential.
<h2><a href="https://meta.wikimedia.org/wiki/Grants:IEG/TapAMap">TapAMap</a></h2>
Basically author has an apple iPhone app that gives you a map. You click somewhere on the map, and it takes you to the nearest article to where you clicked. This is the opposite of most geo-location efforts, although is somewhat similar to WikiMiniAtlas type things that provide wikilinks to various places at their location on the map. It appears this one tries to be different by not showing textual links on the map, but instead concentrating on the geographical location only.
<p>
The developer wants grant money to port his App to Android. Apparently the iPhone version is fairly popular.
<p>
My main concerns with this proposal is that while it is different from other article mapping things, its similar enough to make it relatively low impact. Additionally it seems the author is reluctant to open source his app, or possibly only willing to open source the Android port that would be funded by the grant. I feel this would be a show stopper. Anything funding using Wikimedia money should be Free Software, no ifs ands or buts. I would not support this proposal unless the entire thing (including the existing iPhone app, and the proposed android port) were GPL'd (or another OSI approved license).
<h2><a href="https://meta.wikimedia.org/wiki/Grants:IEG/Wiki_Makes_Video">Wiki Makes Video</a></h2>
I'm only going to briefly mention this, as its mostly non-technical, but does include implementing a video capture/upload? app for phones. Making videos easier is certainly a useful thing, and something we could do much better at. Would probably want to check that the Mobile and TMH tech teams aren't already doing anything in this direction (I don't think they are, but should check).
<h2><a href="https://meta.wikimedia.org/wiki/Grants:IEG/The_Wikipedia_Adventure">The Wikipedia Adventure</a></h2>
Last but certainly not least (Whew, there was actually a lot more of these than I originally thought) comes <i>The Wikipedia Adventure</i>. This is a proposal to continue some work originally started as a fellowship, to create an educational game to show people how to edit Wikipedia.
<p>
This is an interesting approach to help break down barriers. While I am personally a fan of manuals and what not, I understand that most people aren't, and this could serve as a very effective introduction to editing Wikipedia.
<p>
I (very) briefly tried <a href="http://en.wikipediaadventure.org/wiki/Main_Page">the prototype</a>, and I must say its pretty cool. I would be interested to see where people can go with this, if given the proper opportunities to pursue it. This is definitely a proposal I would support.
<div style="border-top:thin black solid"></div>
And that is the end. There were actually quite a few more tech proposals than I thought, and it took a lot longer to read through them then I thought it would. If you've stuck through reading this blog post for this long, thanks for reading :)Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com4tag:blogger.com,1999:blog-7338246785946121007.post-82772578751682376452012-08-21T13:21:00.000-07:002012-08-22T13:51:07.659-07:00On git and gerrit<p>
We've now been using git and gerrit for mediawiki for quite some time. I must say the software has grown on me quite a bit. When we first switched, I hated both of them. Now that I've had some time to adjust, I've discovered that I really like git, and I don't hate gerrit quite as much as I used to.
<p>
First of all git. Git is quite cool from a technology standpoint. In my usecase I like how it has the ability to work on multiple different features at once, since you can have multiple local branches. On SVN when I wanted to do that, I had to use workflows like <code>svn diff > somefile.patch; svn revert</code>. With git I can switch to different branches easily, and its all still contained in the version control system.
<p>
The ability with git to easily work offline is also very nice. I currently don't have an internet connection at home (and no neighbour's wifi to steal either, I don't know what this world is coming to ;). Git makes it much easier to develop features without having to go to somewhere like a library. [Note: The no internet thing is by choice, and is a way to force myself to waste less time on the internet. I can work on things, save (commit) it to a branch, work on more things, commit that, all well being in isolation from the rest of the inter-connected world.
<p>
The only thing I do slightly miss from svn, is incremental revision number. In SVN, each version of the code had an id number, and they went in order. With git it is a totally random sha1 hash (aka <tt>6176d71256aa94a25c471c8696f28820f0b4e8e7</tt>). This is less annoying than it might seem at first glance, however it means when I do <code>git blame</code> I get these sha1 things rather than monotonically increasing revision numbers. This makes it harder to tell what version something was introduced in because I can't just look up the revision in the [[<a href="http://mediawiki.org/wiki/Branch points">mw:Branch points</a>]] page. (To be fair, git also provides a date, which helps somewhat).
<p>
Now on to gerrit. Gerrit has certainly grown on me. I think this is a combination of getting used to it, and the new skin we started to use. However, I still think the interface is horrid, and I miss [[<a href="http://mediawiki.org/wiki/Special:Code">mw:Special:CodeReview</a>]].
<p>
The interface to gerrit is pretty confusing, especially at first. Almost everyone doesn't understand how to save an inline comment the first time around (Hint: you also have to save a non-inline comment for it to go through). I don't like the fact I can't use wildcards when searching for projects (aka cannot do <code>project:mediawiki/*</code> to get mediawiki/core as well as everything in mediawiki/extensions/... ), since I mostly don't care what happens in ops (Although I am very happy to see how ops is becoming more and more open - Good job ops folks!). You also cannot do a search for everything that matches some certain path (AFAIK, however you can set up email alerts based on paths), which was easy to do in Special:CodeReview. Free-form tagging of revisions would also be nice (Another feature missed from Special:CodeReview). Last of all, the gerrit user interface begins to get really clunky when a patchset has been amended multiple times (Also I wish <code>git review</code> should ask for a patch-set message if its not the first version, it is a real kludge to modify the commit message with <i>Patchset 6: rebase</i> type messages). With that said, I do understand from the <a href="https://www.mediawiki.org/wiki/Git/Gerrit_evaluation">gerrit alternatives discussion</a> that gerrit is the only system that really even remotely meets our needs, so I am by no means advocating switching.
<p>
I suppose one of things I most like less about gerrit (But understand the reasons for, and am not advocating changing), is the gated trunk model. For the non-technical audience (although to be honest, I'd be surprised if anyone is actually reading this, and if they are, and are non-technical, that they got this far), the gated trunk model is roughly a spiffier version of FlaggedRevisions/PendingChanges, but for computer programs instead of wikis. I've found it has some of the draw backs that FlaggedRevs detractors were all talking about — namely less instant gratification. In the SVN days, if you had commit access, you coded your feature or bug fix, hit commit, and that was the end of that. Sure someone would eventually come along and review it (In a similar way as how edit patrol works on wikis), and reviewers were not afraid to revert something if there was something wrong with it. However you still had to do something wrong in order for it to be reverted. With gerrit, it requires someone to approve your commit, as opposed to merely someone not finding an issue with it. Thus if nobody cares, your commit could sit in limbo for weeks or even months before anyone approves it.
<p>
So all in all our great glorious git future is growing on me more and more. There are still things I miss from the old system, but with time, perhaps that will no longer be the case.Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com4tag:blogger.com,1999:blog-7338246785946121007.post-22142192422896999592011-12-27T00:24:00.000-08:002011-12-27T01:35:35.989-08:00Extension:PageInCatI haven't blogged recently (or really ever), so I thought I'd make a rambley post about an extension I'm currently writing.<br /><p><br />Recently, at Amgine's suggestion, I have been working on a new MediaWiki extension called <a href="https://mediawiki.org/wiki/Extension:PageInCat">PageInCat</a>. What it does is add a new parser function <code>{{#incat:Some category|text if current page is in "Some categoy"| text if it is not}}</code>. At first glance I thought it'd be fairly straightforward, but it turned out to be tricky to get right.<br /><p><br />It's fairly easy to determine if a page is in a specific category. Just query the database. The problem is that when we're reading the <Code>{{#incat</code>, it is before the page is saved, so the db would have the categories for the previous version of the page, not the version we are in the process of saving. Thus it would work fine if no categories were added/removed in this revision, however if categories did change, the result wouldn't reflect the new changes.<br /><p><br />The solution I used, was to mark the page as <i>vary-revision</i>. This is a signal to MediaWiki that the page varies with how its saved to the database. The original purpose of vary-revision was the {{REVISIONID}} magic word, which inserts what revision number the page is, which can only be determined once the page is saved to the db. With the page marked as such, MediaWiki will only serve users versions of the page rendered after it is saved in the DB.<br /><p><br /><i>vary-revision</i> fixes the problem for saving the page, however, previews still don't work because previews never get saved, so are never inserted into the db. So what the extension does is hook into <tt>EditPageGetPreviewText</tt> which is run right before the preview is generated. It takes the edit box text, parses it, and stores the resulting categories. Next once mediawiki does the actual preview, it hooks into <tt>ParserBeforeInternalParse</tt>, which is a hook run very early in the parse process. At this point it checks if we already have the categories for this text stored, and if so uses those for calculating the <code>#incat</code>'s for the preview.<br /><p><br />This makes the preview give the correct result, albeit at the price of parsing the preview text twice, slowing down the preview process.<br /><br />However, there's one more situation where the extension could give wrong results during preview (or saving for that matter). What if someone does something like <code>{{#incat:Foo||[[category:Foo]]}}</code> (read: The page is in category foo only if it is not in category foo). There's really no correct answer for if the page is in category foo or not (as it is self-contradictory), so <code>#incat</code> can't chose the right result. A less pathological case would be <tt>#incat</tt>'s that depend on each other - if page in foo add cat bar, if page in bar add cat baz, if page in baz add cat fred, and so on. The category memberships can't be determined in this case by the two stage, figure out which categories the article is in, and then base the <tt>#incat</tt>'s on that, as each category would only be determined to be included once it was determined the previous category in the chain was included.<br /><p><br />Really there's not much we can do in these cases. Thus instead of trying to prevent it, the extension tries to warn the user. What it does is keeps track of what response <tt>#incat</tt> gave, and then at the end of the parse (during <code>ParserAfterTidy</code>) it checks if the #incat responses match the actual categories of the page. If they don't match, it presents a warning at the top of the page during preview, via $parser->getOutput()->addWarning(), which is similar to what happens if someone exceeds the expensive parser function limit (It doesn't add a tracking category though like expensive parser func exceeded does, but it certainly could if it'd be useful)<br /><br /><br />Anyways, hopefully the extension is useful to someone :)Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com0tag:blogger.com,1999:blog-7338246785946121007.post-32785338436408691382010-07-20T23:34:00.000-07:002010-07-20T18:38:51.748-07:00image metadataI thought I'd write a blog post about my google summer of code project. I've never been much of a blogger, but I see lots of my fellow gsoc'ers blogging, so I thought I'd write a post. <a href="http://www.mediawiki.org/wiki/User:Bawolff/GSoC2010">My project</a> is to try to improve mediawiki's support for image metadata. Currently mediawiki will extract metadata from an image, and put a little table at the bottom of the image page detailing all the metadata (for example, see <a href="http://commons.wikimedia.org/wiki/File:%C3%89cole_militaire_2545x809.jpg#metadata"> http://commons.wikimedia.org/wiki/File:%C3%89cole_militaire_2545x809.jpg#metadata </a> ).<br /><br />However this is far from all the metadata embedded in an image. In fact mediawiki currently only extracts <a href="http://en.wikipedia.org/wiki/Exif">Exif</a> metadata. Exif metadata is arguably the most popular form of metadata, so if you're going to only extract one, Exif is a good choice. Every time you take a picture with your digital camera, it adds exif data to your picture. Most of this type of data is technical - fNumber, shutter speed, camera model, etc. You can also encode things like Artist, copyright, image description in exif, however that is much more rare.<br /><br />What I'm doing is first of all fixing up the exif support a little bit. Currently some of the exif tags are not supported (<a href="https://bugzilla.wikimedia.org/show_bug.cgi?id=13172">Bug 13172</a>). Most of these are fairly obscure tags no one really cares about, but there are some exceptions like GPSLatitude, GPSLongitude, and UserComment.<br /><br />I'm also (among other things) adding support for <a href="http://en.wikipedia.org/wiki/IPTC_Information_Interchange_Model">iptc-iim</a> tags. IPTC-IIM is a very old format for transmitting news stories between news agencies. Adobe adopted parts of this format to use for embedding metadata in jpeg files with photoshop. Now a days its being slowly replaced by <a href="http://en.wikipedia.org/wiki/Extensible_Metadata_Platform">XMP</a>, but many photos still use it. IPTC metadata tends to be more descriptive (stuff like title, author, etc) in nature compared to how exif metadata is technical (aperature, shutter speed) in nature.<br /><br />My code will also try to sort out conflicts. Sometimes there are conflicting values in the different metadata formats. If an image has two different descriptions in the exif and iptc data, which should be displayed? Exif, IPTC, or both? Luckily for me, several companies involved in images got together and thought long and hard about that issue. They then produced a standard for how to act if there is a conflict <a href="http://www.metadataworkinggroup.org/">[1]</a>. For example If both iptc and exif data conflict on the image description, then the exif data wins.<br /><br /><hr/><br />Consider [[<A href="http://commons.wikimedia.org/wiki/File:2005-09-17_10-01_Provence_641_St_R%C3%A9my-de-Provence_-_Glanum.jpg">File:2005-09-17 10-01 Provence 641 St Rémy-de-Provence - Glanum.jpg</a>]]<br /><br />On commons the metadata table looks like:<br /><br /><table class="mw_metadata" style="font-size:smaller" ><tr class="exif-make"><th>Camera manufacturer</th><td><a href="http://en.wikipedia.org/wiki/CASIO_COMPUTER_CO.,LTD" class="extiw" title="w:CASIO COMPUTER CO.,LTD">CASIO COMPUTER CO.,LTD </a></td></tr><tr class="exif-model"><th>Camera model</th><td><a href="http://en.wikipedia.org/wiki/EX-Z55" class="extiw" title="en:EX-Z55">EX-Z55 </a></td></tr><tr class="exif-exposuretime"><th>Exposure time</th><td>1/800 sec (0.00125)</td></tr><tr class="exif-fnumber"><th>F Number</th><td>f/4.3</td></tr><tr class="exif-datetimeoriginal"><th>Date and time of data generation</th><td>14:21, 28 September 2005</td></tr><tr class="exif-focallength"><th>Lens focal length</th><td>5.8 mm</td></tr><tr class="exif-orientation collapsable"><th>Orientation</th><td>Normal</td></tr><tr class="exif-xresolution collapsable"><th>Horizontal resolution</th><td>72 dpi</td></tr><tr class="exif-yresolution collapsable"><th>Vertical resolution</th><td>72 dpi</td></tr><tr class="exif-software collapsable"><th>Software used</th><td><a href="http://en.wikipedia.org/wiki/Microsoft_Pro_Photo_Tools" class="extiw" title="w:Microsoft Pro Photo Tools">Microsoft Pro Photo Tools</a></td></tr><tr class="exif-datetime collapsable"><th>File change date and time</th><td>14:21, 28 September 2005</td></tr><tr class="exif-ycbcrpositioning collapsable"><th>Y and C positioning</th><td>1</td></tr><tr class="exif-exposureprogram collapsable"><th>Exposure Program</th><td>Normal program</td></tr><tr class="exif-exifversion collapsable"><th>Exif version</th><td>2.21</td></tr><tr class="exif-datetimedigitized collapsable"><th>Date and time of digitizing</th><td>14:21, 28 September 2005</td></tr><tr class="exif-compressedbitsperpixel collapsable"><th>Image compression mode</th><td>3.6666666666667</td></tr><tr class="exif-exposurebiasvalue collapsable"><th>Exposure bias</th><td>0</td></tr><tr class="exif-maxaperturevalue collapsable"><th>Maximum land aperture</th><td>2.8</td></tr><tr class="exif-meteringmode collapsable"><th>Metering mode</th><td>Pattern</td></tr><tr class="exif-lightsource collapsable"><th>Light source</th><td>Unknown</td></tr><tr class="exif-flash collapsable"><th>Flash</th><td>Flash did not fire, compulsory flash suppression</td></tr><tr class="exif-colorspace collapsable"><th>Color space</th><td>sRGB</td></tr><tr class="exif-customrendered collapsable"><th>Custom image processing</th><td>Normal process</td></tr><tr class="exif-exposuremode collapsable"><th>Exposure mode</th><td>Auto exposure</td></tr><tr class="exif-whitebalance collapsable"><th>White balance</th><td>Auto white balance</td></tr><tr class="exif-focallengthin35mmfilm collapsable"><th>Focal length in 35 mm film</th><td>35</td></tr><tr class="exif-scenecapturetype collapsable"><th>Scene capture type</th><td>Standard</td></tr><tr class="exif-contrast collapsable"><th>Contrast</th><td>Normal</td></tr><tr class="exif-saturation collapsable"><th>Saturation</th><td>Normal</td></tr><tr class="exif-sharpness collapsable"><th>Sharpness</th><td>Normal</td></tr><tr class="exif-gpslatituderef collapsable"><th>North or south latitude</th><td>North latitude</td></tr><tr class="exif-gpslongituderef collapsable"><th>East or west longitude</th><td>East longitude</td></tr></table><br /><br />But on my test wiki the table looks like:<br /><br /><table style="font-size: smaller"><tr class="exif-make"><th>Camera manufacturer</th><td>CASIO COMPUTER CO.,LTD </td></tr><tr class="exif-model"><th>Camera model</th><td>EX-Z55 </td></tr><tr class="exif-exposuretime"><th>Exposure time</th><td>1/800 sec (0.00125)</td></tr><tr class="exif-fnumber"><th>F Number</th><td>f/4.3</td></tr><tr class="exif-datetimeoriginal"><th>Date and time of data generation</th><td>14:21, 28 September 2005</td></tr><tr class="exif-focallength"><th>Lens focal length</th><td>5.8 mm</td></tr><tr class="exif-gpslatitude"><th>Latitude</th><td>43° 46′ 21.35″ N</td></tr><tr class="exif-gpslongitude"><th>Longitude</th><td>4° 50′ 1.34″ E</td></tr><tr class="exif-orientation collapsable"><th>Orientation</th><td>Normal</td></tr><tr class="exif-xresolution collapsable"><th>Horizontal resolution</th><td>72 dpi</td></tr><tr class="exif-yresolution collapsable"><th>Vertical resolution</th><td>72 dpi</td></tr><tr class="exif-software collapsable"><th>Software used</th><td>Microsoft Pro Photo Tools</td></tr><tr class="exif-datetime collapsable"><th>File change date and time</th><td>14:21, 28 September 2005</td></tr><tr class="exif-ycbcrpositioning collapsable"><th>Y and C positioning</th><td>Centered</td></tr><tr class="exif-exposureprogram collapsable"><th>Exposure Program</th><td>Normal program</td></tr><tr class="exif-exifversion collapsable"><th>Exif version</th><td>2.21</td></tr><tr class="exif-datetimedigitized collapsable"><th>Date and time of digitizing</th><td>14:21, 28 September 2005</td></tr><tr class="exif-componentsconfiguration collapsable"><th>Meaning of each component</th><td><ol><li>Y</li><li>Cb</li><li>Cr</li><li>does not exist</li></ol></td></tr><tr class="exif-compressedbitsperpixel collapsable"><th>Image compression mode</th><td>3.66666666667</td></tr><tr class="exif-exposurebiasvalue collapsable"><th>Exposure bias</th><td>0</td></tr><tr class="exif-maxaperturevalue collapsable"><th>Maximum land aperture</th><td>2.8</td></tr><tr class="exif-meteringmode collapsable"><th>Metering mode</th><td>Pattern</td></tr><tr class="exif-lightsource collapsable"><th>Light source</th><td>Unknown</td></tr><tr class="exif-flash collapsable"><th>Flash</th><td>Flash did not fire, compulsory flash suppression</td></tr><tr class="exif-flashpixversion collapsable"><th>Supported Flashpix version</th><td>0,100</td></tr><tr class="exif-colorspace collapsable"><th>Color space</th><td>sRGB</td></tr><tr class="exif-filesource collapsable"><th>File source</th><td>DSC</td></tr><tr class="exif-customrendered collapsable"><th>Custom image processing</th><td>Normal process</td></tr><tr class="exif-exposuremode collapsable"><th>Exposure mode</th><td>Auto exposure</td></tr><tr class="exif-whitebalance collapsable"><th>White balance</th><td>Auto white balance</td></tr><tr class="exif-focallengthin35mmfilm collapsable"><th>Focal length in 35 mm film</th><td>35</td></tr><tr class="exif-scenecapturetype collapsable"><th>Scene capture type</th><td>Standard</td></tr><tr class="exif-gaincontrol collapsable"><th>Scene control</th><td>None</td></tr><tr class="exif-contrast collapsable"><th>Contrast</th><td>Normal</td></tr><tr class="exif-saturation collapsable"><th>Saturation</th><td>Normal</td></tr><tr class="exif-sharpness collapsable"><th>Sharpness</th><td>Normal</td></tr></table><br /><br />Most notably, GPS information is now supported. As a note, the wikipedia links for camera model are a commons customization, which is why they don't appear on my test output.<br /><br />As another example, consider [[<a href="http://commons.wikimedia.org/wiki/File:P%C3%B6stlingbahn_TFXV.jpg">file:Pöstlingbahn TFXV.jpg</a>]]. On commons, it has no metadata extracted. (It does have some information about the image on the page, but this was all hand-entered by a human). On my test wiki, the following metadata table is generated:<br /><br /><table class="mw_metadata" style="font-size:smaller"><tr class="exif-imagedescription"><th>Image title</th><td>Triebfahrzeug Nr. XV der Pöstlingbergbahn bei der Rangierfahrt an der Bergstation</td></tr><tr class="exif-artist"><th>Author</th><td>Erich Heuer</td></tr><tr class="exif-datetimeoriginal"><th>Date and time of data generation</th><td>8 April 2006</td></tr><tr class="exif-copyright"><th>Copyright holder</th><td><a href="http://creativecommons.org/licenses/by-sa/2.0/de/deed.de" class="external free" rel="nofollow">http://creativecommons.org/licenses/by-sa/2.0/de/deed.de</a></td></tr><tr class="exif-headline collapsable"><th>Headline</th><td>Pöstlingbergbahn Triebfahrzeug XV</td></tr><tr class="exif-specialinstructions collapsable"><th>Special instructions</th><td>Eastman Kodak Company, Kodak CX7430;<p>1/181 sec; F 9.51; Farbmanagement; 640 x 526 Pixel</p></td></tr><tr class="exif-source collapsable"><th>Source</th><td>Erich Heuer, Dresden</td></tr><tr class="exif-objectname collapsable"><th>Object name</th><td>Pöstlingbahn TF XV</td></tr><tr class="exif-citydest collapsable"><th>City shown</th><td>Linz-Pöstlingberg</td></tr><tr class="exif-provinceorstatedest collapsable"><th>Province or state shown</th><td>Oberösterreich</td></tr><tr class="exif-countrydest collapsable"><th>Country shown</th><td>Republik Österreich</td></tr><tr class="exif-keywords collapsable"><th>Keywords</th><td>Bergbahn, Pöstlingbergbahn, Linz</td></tr></table><br /><br />I'm almost done with iim metadata, and plan to start working on XMP metadata soon. If your curious, all the code is currently in the <a href="http://svn.wikimedia.org/viewvc/mediawiki/branches/img_metadata/phase3/">img_metadata branch</a>. You can also look at the <a href="http://www.mediawiki.org/wiki/User:Bawolff/GSoC2010/Status">status</a> page which I will try to update occasionally.<br /><br />Cheers,<br />BawolffBawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com3tag:blogger.com,1999:blog-7338246785946121007.post-5080867125047955202010-06-06T18:33:00.001-07:002010-06-06T18:37:45.137-07:00drama definedDrama-defined:<br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEit8PQeBZMyUY0ta759RHDI4yMn4NXGTHDro5JYXYKkyeP-lZL_NMCNGHg4eLQtWX6uIPtaiuiIQi60U-b21gweJ1NufYIuzGoyJ7ZqBugiQACuxmmhF1wlnNZtO10mNNdxNzsMUX5J9VI/s1600/drama-defined.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 800px; height: 500px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEit8PQeBZMyUY0ta759RHDI4yMn4NXGTHDro5JYXYKkyeP-lZL_NMCNGHg4eLQtWX6uIPtaiuiIQi60U-b21gweJ1NufYIuzGoyJ7ZqBugiQACuxmmhF1wlnNZtO10mNNdxNzsMUX5J9VI/s1600/drama-defined.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5479839103931721906" /></a><br /><br />This is what the rc has looked like all day... :(Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com1tag:blogger.com,1999:blog-7338246785946121007.post-2636791943160413772010-04-14T20:24:00.001-07:002010-04-14T22:02:26.235-07:00restyling the reader feedbackRecently at Wikinews we've been trying to give a more inviting look to the <a href='http://mediawiki.org/wiki/extension:ReaderFeedback'>reader feedback extension</a>. This extension adds a little box at the bottom inviting readers to rate the article. Many people felt that it could do with a little more snaz. The extension makes the form as a bunch of boring old html <select>'s:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj60D6ZjP4-fMaV3MY7Pcc2-EDfuOy3NvMh8u7ZifDYDM088137ONIT1kfR5ULb_YB8cJtA_OOQgxrNEbCbS6TdvOkz-lWVp71uLV0Fs6Maa_2sgrlRf6EA7yUc7bEVI6gn2y95IPi6H10/s1600/feedback+form.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand; " src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj60D6ZjP4-fMaV3MY7Pcc2-EDfuOy3NvMh8u7ZifDYDM088137ONIT1kfR5ULb_YB8cJtA_OOQgxrNEbCbS6TdvOkz-lWVp71uLV0Fs6Maa_2sgrlRf6EA7yUc7bEVI6gn2y95IPi6H10/s1600/feedback+form.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5460211565132455378" /></a><br /><br />Some people wanted something more like the typical rating systems you find on websites now a days (youtube and newstrust were two prominent examples given of what people were looking for). So we tried experimenting with some custom javascript to give it a new look. First we experimented with unicode stars <big>✧</big>/<big>✦</big> (considering the 9 billion different type of stars in unicode, its amazing how few have filled in and non-filled in variants). Then we moved to different star images:<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjWBjjtnGr7qDDtmpvXA8TAwDYIZrJZUupbr43csGZ3vAqRxyHOYoZ46iXY27NIVY5DgtWaW0e_oNXMmPbnAzW-ev072L9KjND_17Yvc1flrUanbUoTKKcoAdIR9llg9-OHQK6VJCUgasU/s1600/different+star+types.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 395px; height: 184px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjWBjjtnGr7qDDtmpvXA8TAwDYIZrJZUupbr43csGZ3vAqRxyHOYoZ46iXY27NIVY5DgtWaW0e_oNXMmPbnAzW-ev072L9KjND_17Yvc1flrUanbUoTKKcoAdIR9llg9-OHQK6VJCUgasU/s1600/different+star+types.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5460212970970421266" /></a><br /><br />Eventually we choose to use red stars. On hovering it gives users help text to describe the rating they are giving (you know for the stupid people who think one star means excellent). Here's the final result:<br /><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjM8ZOWtfMh0CyYgPXSpwTn8BK3MenBrF0Mq2QZ49kpTQs6yu8ve1dzVmz-zx0cHjIo-lbRHAj8jewK2TuQEVxu8fNihlqAn4IJSaKyaVPs_p3ycG2In5YxfY0gd_ZoJ8NQoUe2qFnxj5s/s1600/ratingfinal.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 390px; height: 277px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjM8ZOWtfMh0CyYgPXSpwTn8BK3MenBrF0Mq2QZ49kpTQs6yu8ve1dzVmz-zx0cHjIo-lbRHAj8jewK2TuQEVxu8fNihlqAn4IJSaKyaVPs_p3ycG2In5YxfY0gd_ZoJ8NQoUe2qFnxj5s/s400/ratingfinal.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5460216381214249538" /></a><br /><br />You'll also notice in the images a "Comment on this article" box as well. This was the only part of this that was a little ugly (since this is now straying into stuff that should be done at the php level) It still doesn't handle captcha's for anons who include links that well. (it redirects them to an edit page for the moment, eventually it will just prompt people to answer the captcha, when I get around to doing that. for the moment the fallback is ok, as not to many people [read no one as of yet other then me during testing] post links). <br /><br />However if our comment pages are any indication, this feature seems to be quite widely used. There is some non-sense posted, but there is also some nice comments posted via the form. Whats really surprising is people commenting on our older articles <a href='http://en.wikinews.org/wiki/Comments:Ten_US_missionaries_charged_with_child_kidnapping_in_Haiti#Comments_from_feedback_form_-_.22This_article_about_Haiti_is_re....22'>[1]</a>. Often at Wikinews we assume articles have a shelf life of at most a week, and after that almost no one reads them. That appears not to be the case. We were considering having a comment form on the Main Page, but weren't sure where we'd want the comments to go. (Talk:Main Page, Opinions:Main Page, Wikinews:Water cooler/assistence, Wikinews:Geust book, etc - none of them really seem to fit) so currently there is only the rating part on the Main Page.<br /><br />With the adoption of liquid threads, our comment pages have really been taking off. I think the special:newmessages notification on the top right corner, brings commenter back to the comment page to respond to new comments. For example [[<a href='http://en.wikinews.org/wiki/Comments:Large_Hadron_Collider_reaches_milestone'>Comments:Large Hadron Collider reaches milestone</a>]] has while not the most intelligent of conversations, still quite the conversation going. Before it used to be somebody posts something, then forgets, now they have a reminder that they have new messages, and thus respond to those who reply to them, and so on.<br /><br />After a couple of days, it really does appear that these changes made a difference. Here is the graph of how people rated the Main Page over the last month. The green/blue line is how they rated us (in reliability), and the red line is how many people rated on a 1:6 scale. Notice how the number of raters per day increased <b>almost 10 times</b>!<br /><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAu-_ScLXcBDPUoukm-VVrStAMEv8Knk6ADem5OIQRXebBgfIbQ6m1i_SlP-MhZgiaHUchG3EZOgmu8PlIEuggJVnR0R7j1e0BpJKrF2jF_yRJuhQgF7lhbup99RXxgM4yCftrkEj1WVk/s1600/reliability.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAu-_ScLXcBDPUoukm-VVrStAMEv8Knk6ADem5OIQRXebBgfIbQ6m1i_SlP-MhZgiaHUchG3EZOgmu8PlIEuggJVnR0R7j1e0BpJKrF2jF_yRJuhQgF7lhbup99RXxgM4yCftrkEj1WVk/s1600/reliability.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5460222200994497778" /></a><br />Source: <A href='http://en.wikinews.org/w/index.php?title=Special:RatingHistory&target=Main_Page'>http://en.wikinews.org/w/index.php?title=Special:RatingHistory&target=Main_Page</a>Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com3tag:blogger.com,1999:blog-7338246785946121007.post-37720173011303438902010-04-11T21:50:00.000-07:002010-04-11T22:14:16.216-07:00mediawiki update brings new goodies for wikinewsNow that Wikimedia got updated, we get to have all the cool new features, which is exciting! I always love software updates. Furthermore, almost none of the javascript broke (well one minor thing we stole from commons did, but otherwise all is well. None of *<b>my</b>* js broke ;) Well actually one thing I did broke due to the mediawiki and user namespace on wiktionary becoming first letter case insensitive, but other then that nothing broke. On the bright side, due to the software update, my WiktLookup gadget now works in IE (or should anyways, haven't tested).<br /><br />The one feature we [at wikinews] were waiting for was changes to DynamicPageList that allows us to put our developing articles on the Main Page without them being picked up by Google news. (Google news assumes any article on our main page with a number in the url, that does not have nofollow is a published news article. Since we allow anyone to create an article, we don't want our articles in progress being picked up by google). Thus {{main devel}} is back on the main page after a long absence.<br /><br />Speaking of <a href="http://mediawiki.org/wiki/extension:Intersection">DynamicPageList</a> (to clarify, the Wikimedia one, not DPL2), it has a number of cool new features for us at Wikinews, and other wikis that use it. (I'm especially happy about this, as I contributed a patch for it, and its really cool to see something I've done go live). Among other things, it can now list articles alphabetically (a feature request from wikibooks), and you can specify the date format that the article was added to the category (before it was just a boolean on/off switch). However, one of the coolest new features (imho) is the ability to use image gallery's as an output mode. One can now use DynamicPageList to make a <gallery> of say the first 20 images in both Category X and Category Y but not in Category Z. <br /><br />Here's to all the devs for continuing to do an excellent job with Mediawiki.<br /><br />Cheers,<br />BawolffBawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com0tag:blogger.com,1999:blog-7338246785946121007.post-83883401163029349082010-04-08T21:26:00.000-07:002010-04-11T21:50:45.715-07:00forging the lqt signatureToday I discovered the little known fact that you can override the automatic signature in liquid thread.<br /><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhyh4nZY4icmK2t_W5qIgV1DzyUcD43_nQiMWeHyDpec-_CA9UaHARKviMn2yCuKLm0nHXxXGh6PRAMQCVGsOazwfKvH7Z2qU7D1RFFkXK_5QmEhY7oTFrHEcSwsh8pTBmWVdLw6DXdfSI/s1600/TINC.png"><img style="cursor:pointer; cursor:hand;width: 600px; height: 318px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhyh4nZY4icmK2t_W5qIgV1DzyUcD43_nQiMWeHyDpec-_CA9UaHARKviMn2yCuKLm0nHXxXGh6PRAMQCVGsOazwfKvH7Z2qU7D1RFFkXK_5QmEhY7oTFrHEcSwsh8pTBmWVdLw6DXdfSI/s1600/TINC.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5457990511360499442" /></a><br /><small>(from [[<a href="http://en.wikinews.org/wiki/Comments:Red_Shirts_cause_state_of_emergency_in_Thai_capital#TINC_557">Comments:Red Shirts cause state of emergency in Thai capital</a>]])</small><br /><br />This ought to be fun ;)<br /><br />-bawolff<br /><br /><br /><b>update:</b> Apprently this is now in the liquid threads UI. I was feeling very cool about myself when i thought you could only do it with the API ;)Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com0tag:blogger.com,1999:blog-7338246785946121007.post-12717059539895827062010-01-24T17:22:00.000-08:002010-01-24T17:30:03.046-08:00Wikinews Writing Contest begins!The 2010 Wikinews writing contest is off to a good start, with Blood Red Sandman submitting the first <a href='http://en.wikinews.org/wiki/%22Osama_to_Obama%22:_Bin_Laden_addresses_US_President'>article</a>. Good luck to all the competitors.<br /><br />P.S. Want to sign up? Its not to late, see [[<a href='http://en.wikinews.org/wiki/Wikinews:Writing_contest_2010'>Wikinews:Writing contest 2010</a>]]. Both newbies and old contributors are welcome (newbies get a handicap). There's even a small prize for the winner.Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com0tag:blogger.com,1999:blog-7338246785946121007.post-23809297819667306582009-12-03T18:14:00.000-08:002009-12-03T18:22:25.937-08:00new bottom mode on wiktionary lookup gadgetJust wanted to mention, I've added some new options to wiktionary lookup gadget. One of them is to display it as a bar at the bottom instead of as a tooltip. (This idea comes from [[<a href="http://fr.wiktionary.org/wiki/Utilisateur:Darkdadaah">:fr:wikt:Utilisateur:Darkdadaah</a>]] who made a gadget on the french wiktionary to display definitions on a bar at the bottom.)<br /><br />Here's what it looks like so far. Its the wiktionary logo (or at least in some languages anyways) in the background.<br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXwvEwFXgui2gsIWByVdSPX7NrFAc7ubAvcVAnpWnEYXOAfzciPNkAM1iotASsiy_vkTHhEg1QSoGWLk5jlkq1fiXlClBbxfKubjRBvGLQgqy27IFWvXHK38_eWKosknHfiiDw7im6M8E/s1600-h/wiktLookup-bottomMode.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 250px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXwvEwFXgui2gsIWByVdSPX7NrFAc7ubAvcVAnpWnEYXOAfzciPNkAM1iotASsiy_vkTHhEg1QSoGWLk5jlkq1fiXlClBbxfKubjRBvGLQgqy27IFWvXHK38_eWKosknHfiiDw7im6M8E/s400/wiktLookup-bottomMode.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5411199896976065410" /></a>Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com0tag:blogger.com,1999:blog-7338246785946121007.post-833041367341029442009-11-27T22:39:00.000-08:002009-11-28T12:30:14.538-08:00Quick update on WiktLookupWiktionary Lookup is a tool that allows you to double click on a word, and have a little popup with its definition from Wiktionary. (Its enabled on this blog, and several wikis). So far it is translated into:<br /><ul><br /><li>English</li><li>Spanish</li><li>Italian</li><li>French</li><li>Japanese</li><li>Dutch</li><li>And several other languages <a href="http://en.wikinews.org/wiki/MediaWiki_talk:Gadget-dictionaryLookupHover.js#Languages_available">pending</a></li><br /></ul><br /><br />The script itself has improved quite a bit. Words can now be selected by highlighting and pressing ctrl+shift+L in addition to double clicking. It should be able to determine what language the word you are clicking on is in (in some cases anyways), to make sure it gets the right definition. There is now also several configuration options:<br /><br /><ul><br /><li>showWord - determine if it should display the word that was looked up (its set to bold on this blog)</li><li>count - number of definitions to return</li><li>height - max height of the popup box</li><li>width - max width</li><li>key - the key combo to look up a word when highlighting</li><li>reverseShift - change the role of the shift key from preventing the popup from appearing when double-clicking, to requiring the shift key be pressed when double clicking</li></ul><br /><br />In the future, there will be even more options (such as say the word if audio pronounciation is available), more languages, and more awesomeness ;) See [[<a href="http://en.wikinews.org/wiki/WN:WiktLookup">n:WN:WiktLookup</a>]] for more details and how to use the script on your own webpage/wiki.Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com0tag:blogger.com,1999:blog-7338246785946121007.post-62812785021394344792009-10-24T22:52:00.000-07:002009-10-25T08:04:10.870-07:00Introducing Wiktionary lookup. Now for blogs<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZe_yXxD_Ot0B15Au0WV2t0bKqd7Ie3e0PdR07RlnELA4YIV_CLCkAvlfN6siQHUx7y6678cIuFVZupu6iDIHbGIlwXVtDHq5okYEgFWa3tPCQFKQvjymmxl7JAn9lyi9hy2xVotiU5K4/s1600-h/wiktLookup2.png"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 320px; height: 103px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZe_yXxD_Ot0B15Au0WV2t0bKqd7Ie3e0PdR07RlnELA4YIV_CLCkAvlfN6siQHUx7y6678cIuFVZupu6iDIHbGIlwXVtDHq5okYEgFWa3tPCQFKQvjymmxl7JAn9lyi9hy2xVotiU5K4/s320/wiktLookup2.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5396425908112477442" /></a><br />At the urging of Amgine, I've been developing a gadget for Wikimedia projects that looks up a word you double click in Wiktioary. (It is based on a gadget from french Wikinews by User:Conrad.Irwin and User:Bequw) Anyways, I've been told that some people want to use it on there blog, so now you can. All you have to do is insert somewhere in the <head> of your html document:<br /><br><br /><code><script src='http://en.wikinews.org/w/index.php?title=MediaWiki%3AWiktionaryLookup-external.js&amp;action=raw&amp;ctype=text/javascript' type='text/javascript'></script></code><br /><br />Then whenever someone double-clicks a word, the definition will popup (See diagram at top). Its currently enabled on this blog (as well as several Wikimedia projects). Please try it out and let me know what you think.<br /><br />Some notes:<br /><ul><br /><li>It detects the language from the lang attribute on <html> (note thats lang, not xml:lang) It will also look in js global variable wgContentLanguage for the current language.</li><br /><li>Supports english (EN), french (fr, frc), Dutch (nl). <a href='http://en.wikinews.org/wiki/MediaWiki_talk:Gadget-dictionaryLookupHover.js#Request_for_translations'>Translations welcome</a>.</li><br /><li>This is still a work in progress. Expect improvements. Bug reports can go on [[<a href='http://en.wikinews.org/wiki/MediaWiki_talk:Gadget-dictionaryLookupHover.js'>n:Mediawiki_talk:Gadget-dictionaryLookupHover.js</a>]]. There are also some generic instructions on [[<a href='http://en.wikinews.org/wiki/Wikinews:Javascript#Wiktionary_lookup_gadget_.28Hover_box_variety.29'>n:WN:WiktLookup</a>]]</li><br /><li>Current browser support is: Full support on Firefox, Safari. Partial support on Konqueror, Internet explorer (requires xslt support for full support. IE doesn't work due to <a href='https://bugzilla.wikimedia.org/show_bug.cgi?id=19528#c13'>mediawiki using a mime type that IE doesn't recognize as xslt</a>). Theoretically should have full support on Opera, but have not tested.</li><br /></li></ul><br />Update: now with compound word support.Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com11tag:blogger.com,1999:blog-7338246785946121007.post-73361830874034506452009-10-09T16:48:00.000-07:002009-10-09T17:00:12.735-07:00Wikinews gets a new Main Page!<p>Wikinews's <a href="http://en.wikinews.org/">Main page</a> has finally been updated to ShakataGaNai's redesign. (The delay was due to me not being avaliable to make the <A href="http://en.wikinews.org/wiki/WN:ML">automated lead generator</a> work with the new main page. I was a little worried today when i logged in i would be beheaded on irc for not being arround to do that for so long.)<p> I think the new design looks absolutely excellent. There have been so many failed attempts at redesigning the main page, I'm glad to see we've finally agreed on one and that it looks so nice.Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com0tag:blogger.com,1999:blog-7338246785946121007.post-37269593667844449492009-09-12T19:23:00.000-07:002009-09-12T19:31:02.657-07:00Go wikinews!Today (Sept 12) Wikinews had a record day of <a href='http://en.wikinews.org/wiki/T:LN'>20 articles</a> [<a href='http://en.wikinews.org/wiki/Wikinews:2009/September/12'>permalink</a>] published in a single day. This is record (while a record since we started requiring all articles be peer reviewed. we had one day back in <a href='http://en.wikinews.org/wiki/Category:September_2,_2005'>sept 2, 2005</a> on the second day of a <a href="http://meta.wikimedia.org/wiki/IWWC">writing contest</a>. with 24 articles).<br /><br />Wohoo! Good work Wikinewsies!Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com0tag:blogger.com,1999:blog-7338246785946121007.post-26490228940360846682009-08-07T19:45:00.000-07:002009-08-07T20:38:08.305-07:00Wikinews js and IESo i happened to be browsing Wikinews in internet explorer the other day (*shudder*), and apparently there was an error in some of the site JS (relating to the comments namespace) making some of the js not work in IE. The issue was that internet explorer doesn't support the <code>hasAttribute</code> method of dom elements. Which isn't that bad since its fairly trivial to replace <code>hasAttribute</code> with <code>getAttribute</code>. The scary thing is that this issue has been present since November 2007.<br /><br />Apparently (looking through the <a href='http://en.wikinews.org/w/index.php?title=MediaWiki%3AComments.js&diff=565758&oldid=521698'>history</a>) there was an attempt to fix back in January of 2008, by changing <code>hasAttribute('missing')</code> to <code>getAttribute('missing', 2) == ""</code>. (I'm not sure where the 2 came from. All the docs i've read seem to state that <code>getAttribute</code> only takes 1 parameter) This almost would have worked, except for one important fact. In the mediawiki api, if you try to find info on a non-exisistant page, it will give you: <code><page</code> ... <code>missing="" /></code> <a href="http://en.wikinews.org/w/api.php?action=query&format=xml&titles=Comments:Ezer_Weizman_former_Israeli_president_dies_at_the_age_of_81">[1]</a> Thus the normal value for the attribute if present is "", and if its not present, according to the <a href='http://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-666EE0F9'>w3c</a> <code>getArribute</code> should return the empty string. Thus testing for the empty string doesn't work, as it returns the empty string in both cases. However in practice this doesn't seem entirely true, as in both Firefox and IE, when <code>getAttribute</code> is used on a missing attribute, the browser returns <code>null</code> instead of <code>""</code> (The previous code of <code>getAttribute('missing', 2) == ""</code> would still of worked since <code>"" == null</code>)<br /><br />To fix the problem, the code in question now first tries to use the <code>hasAttribute()</code> method (for the good little browsers that support it), and if that throws an exception, it will see if <code>getAttribute(attributeName) !== null</code>. (assuming that anybody who doesn't support <code>hasAttribute</code> also returns null when getting a non-existant attribute.<br /><br />Anyways, the moral of this story is that we should probably set up some system for tracking problems with the local javascript on wikinews, since this problem was discovered and than totally forgotten. perhaps on <a href="http://en.wikinews.org/wiki/Wikinews:Javascript">[[Wikinews:Javascript]]</a>.Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com0tag:blogger.com,1999:blog-7338246785946121007.post-68134742574827480902009-08-06T20:17:00.000-07:002009-08-06T20:25:22.259-07:00TestThis is my first blog post. I plan to use this blog for stuff relating to wikinews. We'll see if i stick to it or not.Bawolffhttp://www.blogger.com/profile/02917810358934543942noreply@blogger.com0