Tuesday, July 20, 2010

image metadata

I thought I'd write a blog post about my google summer of code project. I've never been much of a blogger, but I see lots of my fellow gsoc'ers blogging, so I thought I'd write a post. My project is to try to improve mediawiki's support for image metadata. Currently mediawiki will extract metadata from an image, and put a little table at the bottom of the image page detailing all the metadata (for example, see http://commons.wikimedia.org/wiki/File:%C3%89cole_militaire_2545x809.jpg#metadata ).

However this is far from all the metadata embedded in an image. In fact mediawiki currently only extracts Exif metadata. Exif metadata is arguably the most popular form of metadata, so if you're going to only extract one, Exif is a good choice. Every time you take a picture with your digital camera, it adds exif data to your picture. Most of this type of data is technical - fNumber, shutter speed, camera model, etc. You can also encode things like Artist, copyright, image description in exif, however that is much more rare.

What I'm doing is first of all fixing up the exif support a little bit. Currently some of the exif tags are not supported (Bug 13172). Most of these are fairly obscure tags no one really cares about, but there are some exceptions like GPSLatitude, GPSLongitude, and UserComment.

I'm also (among other things) adding support for iptc-iim tags. IPTC-IIM is a very old format for transmitting news stories between news agencies. Adobe adopted parts of this format to use for embedding metadata in jpeg files with photoshop. Now a days its being slowly replaced by XMP, but many photos still use it. IPTC metadata tends to be more descriptive (stuff like title, author, etc) in nature compared to how exif metadata is technical (aperature, shutter speed) in nature.

My code will also try to sort out conflicts. Sometimes there are conflicting values in the different metadata formats. If an image has two different descriptions in the exif and iptc data, which should be displayed? Exif, IPTC, or both? Luckily for me, several companies involved in images got together and thought long and hard about that issue. They then produced a standard for how to act if there is a conflict [1]. For example If both iptc and exif data conflict on the image description, then the exif data wins.

Consider [[File:2005-09-17 10-01 Provence 641 St Rémy-de-Provence - Glanum.jpg]]

On commons the metadata table looks like:

But on my test wiki the table looks like:

Camera manufacturerCASIO COMPUTER CO.,LTD
Camera modelEX-Z55
Exposure time1/800 sec (0.00125)
F Numberf/4.3
Date and time of data generation14:21, 28 September 2005
Lens focal length5.8 mm
Latitude43° 46′ 21.35″ N
Longitude4° 50′ 1.34″ E
Horizontal resolution72 dpi
Vertical resolution72 dpi
Software usedMicrosoft Pro Photo Tools
File change date and time14:21, 28 September 2005
Y and C positioningCentered
Exposure ProgramNormal program
Exif version2.21
Date and time of digitizing14:21, 28 September 2005
Meaning of each component
  1. Y
  2. Cb
  3. Cr
  4. does not exist
Image compression mode3.66666666667
Exposure bias0
Maximum land aperture2.8
Metering modePattern
Light sourceUnknown
FlashFlash did not fire, compulsory flash suppression
Supported Flashpix version0,100
Color spacesRGB
File sourceDSC
Custom image processingNormal process
Exposure modeAuto exposure
White balanceAuto white balance
Focal length in 35 mm film35
Scene capture typeStandard
Scene controlNone

Most notably, GPS information is now supported. As a note, the wikipedia links for camera model are a commons customization, which is why they don't appear on my test output.

As another example, consider [[file:Pöstlingbahn TFXV.jpg]]. On commons, it has no metadata extracted. (It does have some information about the image on the page, but this was all hand-entered by a human). On my test wiki, the following metadata table is generated:

I'm almost done with iim metadata, and plan to start working on XMP metadata soon. If your curious, all the code is currently in the img_metadata branch. You can also look at the status page which I will try to update occasionally.


Sunday, June 6, 2010

drama defined


This is what the rc has looked like all day... :(

Wednesday, April 14, 2010

restyling the reader feedback

Recently at Wikinews we've been trying to give a more inviting look to the reader feedback extension. This extension adds a little box at the bottom inviting readers to rate the article. Many people felt that it could do with a little more snaz. The extension makes the form as a bunch of boring old html <select>'s:

Some people wanted something more like the typical rating systems you find on websites now a days (youtube and newstrust were two prominent examples given of what people were looking for). So we tried experimenting with some custom javascript to give it a new look. First we experimented with unicode stars / (considering the 9 billion different type of stars in unicode, its amazing how few have filled in and non-filled in variants). Then we moved to different star images:

Eventually we choose to use red stars. On hovering it gives users help text to describe the rating they are giving (you know for the stupid people who think one star means excellent). Here's the final result:

You'll also notice in the images a "Comment on this article" box as well. This was the only part of this that was a little ugly (since this is now straying into stuff that should be done at the php level) It still doesn't handle captcha's for anons who include links that well. (it redirects them to an edit page for the moment, eventually it will just prompt people to answer the captcha, when I get around to doing that. for the moment the fallback is ok, as not to many people [read no one as of yet other then me during testing] post links).

However if our comment pages are any indication, this feature seems to be quite widely used. There is some non-sense posted, but there is also some nice comments posted via the form. Whats really surprising is people commenting on our older articles [1]. Often at Wikinews we assume articles have a shelf life of at most a week, and after that almost no one reads them. That appears not to be the case. We were considering having a comment form on the Main Page, but weren't sure where we'd want the comments to go. (Talk:Main Page, Opinions:Main Page, Wikinews:Water cooler/assistence, Wikinews:Geust book, etc - none of them really seem to fit) so currently there is only the rating part on the Main Page.

With the adoption of liquid threads, our comment pages have really been taking off. I think the special:newmessages notification on the top right corner, brings commenter back to the comment page to respond to new comments. For example [[Comments:Large Hadron Collider reaches milestone]] has while not the most intelligent of conversations, still quite the conversation going. Before it used to be somebody posts something, then forgets, now they have a reminder that they have new messages, and thus respond to those who reply to them, and so on.

After a couple of days, it really does appear that these changes made a difference. Here is the graph of how people rated the Main Page over the last month. The green/blue line is how they rated us (in reliability), and the red line is how many people rated on a 1:6 scale. Notice how the number of raters per day increased almost 10 times!

Source: http://en.wikinews.org/w/index.php?title=Special:RatingHistory&target=Main_Page

Sunday, April 11, 2010

mediawiki update brings new goodies for wikinews

Now that Wikimedia got updated, we get to have all the cool new features, which is exciting! I always love software updates. Furthermore, almost none of the javascript broke (well one minor thing we stole from commons did, but otherwise all is well. None of *my* js broke ;) Well actually one thing I did broke due to the mediawiki and user namespace on wiktionary becoming first letter case insensitive, but other then that nothing broke. On the bright side, due to the software update, my WiktLookup gadget now works in IE (or should anyways, haven't tested).

The one feature we [at wikinews] were waiting for was changes to DynamicPageList that allows us to put our developing articles on the Main Page without them being picked up by Google news. (Google news assumes any article on our main page with a number in the url, that does not have nofollow is a published news article. Since we allow anyone to create an article, we don't want our articles in progress being picked up by google). Thus {{main devel}} is back on the main page after a long absence.

Speaking of DynamicPageList (to clarify, the Wikimedia one, not DPL2), it has a number of cool new features for us at Wikinews, and other wikis that use it. (I'm especially happy about this, as I contributed a patch for it, and its really cool to see something I've done go live). Among other things, it can now list articles alphabetically (a feature request from wikibooks), and you can specify the date format that the article was added to the category (before it was just a boolean on/off switch). However, one of the coolest new features (imho) is the ability to use image gallery's as an output mode. One can now use DynamicPageList to make a <gallery> of say the first 20 images in both Category X and Category Y but not in Category Z.

Here's to all the devs for continuing to do an excellent job with Mediawiki.


Thursday, April 8, 2010

forging the lqt signature

Today I discovered the little known fact that you can override the automatic signature in liquid thread.

(from [[Comments:Red Shirts cause state of emergency in Thai capital]])

This ought to be fun ;)


update: Apprently this is now in the liquid threads UI. I was feeling very cool about myself when i thought you could only do it with the API ;)

Sunday, January 24, 2010

Wikinews Writing Contest begins!

The 2010 Wikinews writing contest is off to a good start, with Blood Red Sandman submitting the first article. Good luck to all the competitors.

P.S. Want to sign up? Its not to late, see [[Wikinews:Writing contest 2010]]. Both newbies and old contributors are welcome (newbies get a handicap). There's even a small prize for the winner.