Working with Structured Data on Commons: A Status Report

Originally published at: https://space.wmflabs.org/2019/08/27/working-with-structured-data-on-commons-a-status-report/

Editathon at national archive.-American University COMM535.JPGFile:Editathon_at_national_archive.-American_University_COMM535.JPG (3 March 2014, 11:35:29) by Xiaweiyang, CC-BY-SA-3.0.

English: American University SOC students helping National Archive scan files and photos.

The beginnings of Structured Data on Commons have been available for a little over half a year now, so let’s take a look at how editors can already work with it, and what more is coming soon. (Disclaimer: though the author is a Wikimedia chapter employee, this post is written in a volunteer capacity only.)

What’s already available

You can, of course, edit the structured data (captions and statements) directly on the file pages. Like any other changes, these edits will show up in the page history, in recent changes, on your watchlist, etc., so other editors can see, inspect, patrol, improve or undo them as usual. This is a great way to get started with Structured Data and get a grasp on how it works.

The Upload Wizard supports structured data as well, and you can set captions on each file before uploading it (and, like with the description, categories, etc., you can copy one file’s captions into remaining files, if you want to use the same caption for a whole batch of uploads), as well as edit each file’s statements.

Another way to add Structured Data is offered by the ISA tool, which is focused on improving the metadata of pictures uploaded as part of “Wiki Loves …” campaigns. It allows participants to add captions in different languages, as well as “depicts” statements, to photos that are part of the campaign (as selected by the campaign coordinator via a category). The coordinator can optionally limit a campaign to only captions or statements if they don’t want to overwhelm their participants or they think that only one of those aspects is necessary.

The Wikipedia Android app also allows you to edit the captions of images embedded in Wikipedia articles. (The iOS app doesn’t seem to have any such feature.)

You can also search the structured data in the regular wiki search, using special search keywords. The full documentation is at mw:Help:Extension:WikibaseCirrusSearch, but the most important keywords are hascaption, incaption and haswbstatement: hascaption:en searches for files that have an English caption, incaption:"search text" searches for “search text” in a file’s captions (and not in its description, categories, etc.), and haswbstatement:P180 searches for files that have a matching statement. All of these can be combined with other search terms as usual – for example, “adoptado hascaption:es -hascaption:fr haswbstatement:P180=Q146” searches for files that depict cats and where the (non-structured) description contains the word «adoptado» (“adopted” in Spanish) which have a caption in Spanish but not in French.

There is also a way to edit the statements of multiple files at once: the user script Add to Commons / Descriptive Claims (AC/DC), written by yours truly, lets you add the same collection of statements (including qualifiers) to a whole set of files. You can use this, for example, to add a suitable “depicts” statement to all the files in a category. (But make sure that all the files actually depict the category subject and are not merely related to it! This wouldn’t work at all for Category:Käthe Kollwitz, for example, because it combines media depicting her with media by her. Sometimes suitable subcategories like Category:Potraits of Käthe Kollwitz exist.)

And finally, if you’re a technical expert you can always use the MediaWiki and Wikibase APIs directly to make any edits you want – for example, User:Multichill did this during the Wikimedia Hackathon 2019 in T223746.

What’s coming soon

A full-featured SPARQL query service for Structured Data on Commons is in the works (T141602); this basically blows the haswbstatement search keyword mentioned earlier out of the water, letting you search not just for simple “has statement” matches but providing a powerful way to query the whole data graph. For example, this will make it possible to search for files that were taken anywhere within a certain city (without having to mention that city on each file – connections from districts etc. to the surrounding city are already on Wikidata), or files depicting animals within a certain family or order. It will also allow users to query the qualifiers of statements, which is not possible in the regular search either. Regular search will remain the best way to search within the file captions (or traditional descriptions), but fortunately the two can be combined using MWAPI.

Lua support is also underway; this will make it possible to embed the structured data in the wikitext, usually via templates. For example, {{Location}} could be updated to get the coordinates from the structured data (specifically the property coordinates of the point of view) if they are not specified as a template argument, similar to how on many Wikipedias, {{official website}} gets the official website from Wikidata if it’s not specified as a template argument. Other templates could also automatically categorize images based on their structured data, similar to how {{Wikidata infobox}} already adds some parent categories to category pages based on the information in Wikidata. This will be up for discussion and implementation by the community, of course.

We can also expect to see support for Structured Data on Commons in more tools. QuickStatements, the Swiss Army knife for editing Wikidata, will hopefully gain support for editing captions and statements on Commons soon (T181062 – in fact there is some very rudimentary support already, but it’s so fragile that I don’t want to give any guidance on it). This will allow for more fine-grained editing than the AC/DC user script mentioned above, though I hope that AC/DC will remain useful as a more user-friendly tool for a common use-case. Support for the Pywikibot library (T223820) and the Pattypan upload tool (T181057) are also planned. And tools should learn to work better together: PagePile support in VisualFileChange or Cat-a-lot and AC/DC would allow you to select a set of files using the former tools and then add statements to all of them using the latter, by exchanging the selection of files via the PagePile tool.

7 Likes

@Lucas_Werkmeister: How do you add a category? I see that I can name a pagepile or add filenames manually. Is there a way to get all the filenames in the category from the popup or do I need to use another method for that?

There’s no direct way to do that (I might add it if there’s enough demand, as another button in the menu that currently only has “Load PagePile”), but for now you can generate a PagePile for that category using PetScan and then load that. (Optionally, you can first filter that PagePile using PagePile Visual Filter, or slice and dice it using other tools.)

1 Like

I believe that would be a very neat add-on that could lead to a lot of good edits. Many smaller categories have only files that fit, but the hassle of generate pagepile for less than 10 files is too much, and perhaps also actually adding depicts separately on 10 different files. I think these are the kind of categories that might benefit most of a tool like this.

I agree with Ainali, PagePile is way too much complicated when you only want to add a statement to 4-5 files from a category. Ideally, you should be able to select files like in Cat-a-lot.

2 Likes

I already said on the talk page that I don’t think it’s useful to add separate visual selection modes to a variety of tools – that’s what PagePile Visual Filter is for. I can add a mode to load all the files in a category, but not one to select files like in Cat-a-lot. Cat-a-lot should gain a PagePile export mode instead.

As for technical support, the wikibase Rust crate already contains (rudimentary) support for MediaInfo entities.

If we want to use PagePile more, I might have to beef it up a bit…

3 Likes

Another benefit of Structured Data:
Classification is made easier in the Commons Android app, thanks to labels and description in the user’s native language, and even thumbnails.
The next step is suggesting depictions using privacy-friendly image recognition.

4 Likes

Any reason it does not look so nice on desktop? Or am I looking in a wrong direction?

That was what I was trying to say, loading ALL files in a category will be a good help in a lot of the smaller categories.

It’s implemented now :)

3 Likes

I noticed, and it is really awesome! Thank you!

1 Like