What is the size of the gender gap? Like, actually?

Do we have an estimate of the size of the gender gap on Wikipedia given the current notability criteria? In other words, if every notable living or dead person had an article, what percentage would be about women? For example, would 30% women biographies be a good ballpark for a ‘closed’ Wikipedia gender gap?

I feel that a lot of people just assume that the answer is 50%, but once you think about it that’s not right seeing how women have faced Systematic Oppression through the millenia up until today.

From my perspective, having some kind of estimate for the size of the gender gap is important when we want to create a strategy and plan resources for ‘closing’ it. I am thinking of Wikimedia affiliates like Wikimedia UK who have already made commitments in this area.

Note: My question is not about reconsidering Wikipedia’s notability criteria. That would be a possible next discussion point once this question has been answered. Also, I am aware that I am neglecting the issue of intersectionality here.

(I started thinking about this after watching a recent episode of the Cologne Wikipedians’ vlog called ‘Unboxing Wikipedia’.)


You have to compare the gender bias with what it is in each of our present-day societies. As a matter of principe, the Wikipedias can not be more than the reflect of their backed societies (plural because of cultural / national / etc variations). And since our societies are for most of them if not all, largely biased (remember e.g. the recurrent Nobel prize disputes), so are the Wikipedias (remember Donna Strickland dispute for not being « in Wikipedia » before she got a Nobel prize). The flaws are first there, and what we, as a community respectful of the Wikipedia principles, can do, is just not accentuate it, but conversely, try to near the society bias as it is reflected in the media, in our sources of information. And I think this is a first step, and it is an affordable objective.
A different story is to be an engaged citizen in promoting gender equality, by fighting against society inequalities. As an individual, you can for instance privilege women biographies on your current Wikipedia, and complain loudly toward editors for the lack of publications…

1 Like

Hmm, I’m surprised it doesn’t seem easy to find such a number. Leaving aside how to interpret such a number, it seems like the type of statistic that is interesting and people would be keeping track of. Wikidata should have the information (mostly, lots of assumptions), but the query I tried for all english wikipedia articles timed out before it completed.

If you do just scientists that are in wikidata with gender identified in wikidata and having an article on english wikipedia you get 99365 men vs 19476 woman (and about 50 other various other labels) [Out of total 118888 articles]. There’s a lot of assumptions here - article may not be properly identified as a scientist in wikidata, gender might not be identified in wikidata, etc. But it gives a first approximation. https://query.wikidata.org/#%23Number%20of%20scientists%20per%20gender %23added%20before%202016-10 SELECT%20%3Fgender%20(count(distinct%20%3Fhuman)%20as%20%3Fnumber) WHERE { %3Fhuman%20wdt%3AP31%20wd%3AQ5 %3B%20wdt%3AP21%20%3Fgender %3B%20wdt%3AP106%2Fwdt%3AP279*%20wd%3AQ901%20. %20%20%20%20%20%20%20%20%3Farticle%20schema%3Aabout%20%3Fhuman%20. %20%20%20%20%3Farticle%20schema%3AisPartOf%20<https%3A%2F%2Fen.wikipedia.org%2F>. } GROUP%20BY%20%3Fgender LIMIT%2010


My assumption is that there is no number we could fix - as more and more women take part in the writing of history, more and more women are discovered to have been notable and been forgotten. Or misremembered as men - like the Birka Viking Warrior. So the number of notable women is not only going up as societies become more equal and more women get the chance to contribute to society in ways beyond the typical roles for women, but also because we rediscover what amazing and inspiring things they did even before that. If the percentage of Wikipedia articles about women does not keep increasing, the gap will be widening.

Another thought: What would happen, if we put in a bit too much energy, if we end up with a few articles too many about women after years of having too few articles? What would be the result of having too many articles on notable women on Wikipedia? Could we inspire too many girls to feel equal? Could we make too many women feel empowered?

And - as long as I find whole categories like this one on de.WP that does not contain a single woman, I believe we have work enough worth doing. (I did not check all categories for cricket players, but most of the larger ones and found only one woman in all of them).


i’m not surprised you cannot find a number. the idea of a “natural” gender gap number to asymptotically approach, assumes there is a theoretical number you could calculate. the sociologists do not give numbers like this. they instead talk of attainment gaps rather than representation gaps https://www.weforum.org/agenda/2017/11/the-gender-gap-actually-got-worse-in-2017/
i do not need a theory or numerical target to write content. i do not see any limit on creating yet more biographies; the rate of increase is not rolling over yet; it will be years before we see a decline in the rate of change in the gap.

1 Like

Asaf recently posted https://lists.wikimedia.org/pipermail/wikimedia-l/2020-February/094242.html

Thanks, @Bawolff – how would you answer my question?

Well according to Asaf’s data, 18.23% of articles (about people) are about women. Which doesn’t answer your question, but is kind of relavent.

Edit: in the interest of full honesty, i guess i didnt read the question originally, and thought this was asking about the percentages on wikipedia. I’m not sure how I managed to so blatently not read the question…

Haha okay! So, what are your thoughts on my question?

given the current notability criteria

In my very humble and personal opinion, the gap starts in these criteria.

1 Like

Hm – I think I understand, but what is its size?

@Magnus_Manske recently tweeted a graph showing that the gender imbalance in 937 authorities used in Wikipedia has a median of 0.143:1, which I suppose could be interpreted as ‘Wikipedia’s gender gap isn’t bigger than the one in the sources it relies on’.

Am I getting this right? Other thoughts?

1 Like

More on this, including data source etc:

I’m not sure of the implications myself, especially since Wikidata is a superset of all the other sources.


Here is one for Wikidata.

1 Like

Your question is the one I had in mind when I built the first version of Denelezh. My idea was to split the problem into smaller ones and to work by subsets to facilitate the study. There are subsets where the percentage of women is perfectly known. For instance, here are the percentages of women in the lower house of the French Parliament since 1945. In the Wikipedia in French language, members of parliaments automatically comply with notability criteria. Other subsets have to be studied. The tool partly permits to do so by providing statistics about humans depicted in Wikidata along various dimensions that you can combine (for instance country of citizenship + occupation = French politicians). But merging these subsets is not easy, as they are overlapping: you can’t simply add the percentages from two sets when some people belong to both. An athlete can also be a politician, someone can have several countries of citizenship, and so long…

The first version of the tool provided statistics about humans in Wikidata with a gender, a year of birth, and a country of citizenship. The assumption (unverified) was that Wikidata items with all these properties were of better quality than the ones with one or more missing properties. The problem is that statistics were only about 50% of humans depicted in Wikidata, and thus were misleading for people studying the gender gap in Wikimedia projects. The second version, which developement was rushed, solved this problem by no longer filtering Wikidata items on the number of properties they have and providing statistics as long as the data was available. With this change (and the addition of Wikimedia projects as an available dimension to filter / combine), it became closer to WHGI. One current problem is that the tool lack statistics on Wikidata quality (how many Wikidata items depicting humans have the property gender? etc.).

The third version will be a merge of Denelezh and WHGI. Some ideas are already in the pipeline (adding external IDs as a dimension, producing lists of notable people to help Wikimedia editors to find subjects to work on, etc.), some others are on Phabricator. Feedback welcome :)