The Numbers On Gender Representation On News Websites Are In, and They’re Undeniable: A Large-Scale Analysis

photo credit: visuals, via Unsplash

photo credit: visuals, via Unsplash

My name is Isabel Wu, and I am a rising sophomore at Harvard College. This summer, I worked as an intern at GenderAvenger. My assignment was to take a look at major tv news websites and analyze the gender breakdown of bylines and the sources journalists use in their stories.

The media holds dual power.

I believe the media holds dual power. It is both a product of perceptions and something that shapes perceptions thus, making it an indicator of injustice and a tool for change. I saw my project of looking at gender representation in the news as a way to call attention to a lack of women being represented and open up an avenue for change in journalism.

I designed a program for larger-scale analysis.

To provide a larger-scale for analysis, I designed a program to collect news segments, extract bylines and sources cited by the journalists, and assign a gender to the bylines and sources to give an estimate of how well news sources are doing in terms of elevating diverse perspectives. My analysis covered data stored by Media Cloud for the four major broadcast networks spanning 24 hours from 9 a.m. on Sunday, July 12 to 9 a.m. on Monday, July 13. This included television networks ABC, NBC, and Fox News and the CNN cable networks to cover websites where most people get their news. Bylines were extracted by searching through the webpage HTML, and sources were extracted by analyzing the articles’ sentence structure to find the speakers of quotes or paraphrased quotes.

The program could fail to identify the journalist or a source, or it could fail to assign the correct gender to the names. To test it, I randomly selected 10 articles from each news outlet and hand-checked four metrics: byline collection rate, byline gender accuracy, source collection rate, and source gender accuracy.

An analysis of the samples* indicated that the accuracy in identifying bylines ranged from 80-100%, and the gender of those bylines had a 100% accuracy, while the source information was less successful. The accuracy of the sources identified ranged from 47-85%, and the gender of those sources was 88-100% accurate. It is likely that this reflects the complexity of the human voice in data collection, as well as the different word styles among the media outlets. For example, ABC frequently used “said” in relation to sources and did not use pronouns. CNN uses pronouns and a variety of verbs like "told", "wrote", and "added".

Other challenges arose when I looked to account for non-binary individuals and women of color, two groups that GenderAvenger is committed to supporting. Existing tools are not able to differentiate between a non-binary individual and a cisgendered person with the same name, and programmatically labelling women of color by identifying their race through name is fraught with ethical implications.

Given these caveats, here is the data:

A few striking trends came through the data.

Despite the challenges, a few striking trends still come through in the data. Sources from all four news outlets were dominated by men, but the bylines told a more varied story, with CNN and NBC demonstrating good representation in their bylines, while Fox and ABC did not. Thus we can say that journalism has made some strides to ensure that there are not only men in the newsrooms, but it still has a way to go in terms of elevating expert women’s voices. The significant lack of representation in sources cited is not an old one. Adrienne LaFrance from The Atlantic analyzed her own citing habits and found that only 25% of her sources were women, a metric very similar to the numbers I collected in my study. It is also indicative of a multi-layered problem.

Gender inequality and journalists create a self-fulfilling loop.

There are more men dominating the fields that journalists look to for sources, and journalists are not doing enough to find women to cite, which completes a self-fulfilling loop. While it is easy for us to say that there are more men in a certain field, we also have the opportunity to be more conscientious about elevating women’s voices, which makes it much more likely for women cited to gain recognition, get offered spots on panels, and pave the way for women of the future.

The numbers are undeniable.

As an outsider, I saw GenderAvenger as a multi-talented group of individuals extremely dedicated to women’s issues. While the tangible goals are to call out a lack of diversity in public forums, the driving force is to pull systemic biases from our collective unconscious to the forefront and present people with numerical facts so that they have no choice but to change. The numbers I’ve presented here are undeniable.

 

 

* NBC: 100% Byline collection, 100% Byline Gender, 70% Source collection rate, 100% Source Gender

ABC: 81% Byline collection, 100% Byline Gender, 85% Source collection rate, 95% Source Gender

CNN: 92% Byline collection, 100% Byline Gender, 47% Source Collection rate, 100% Source Gender

Fox: 80% Bylines collection, 100% Byline Gender, 76% Source Collection Rate, 88% Source Gender


 

Isabel Wu is a rising sophomore in Adams House planning on concentrating in Economics. She is an active member of HAUSCR, Harvard's Association for US-China Relations, where she works in planning engaging conferences for Chinese high school students. She is also a member of CBE, Consulting for Business and the Environment, and AADT, Asian American Dance Troupe. Isabel is excited to engage with her passion in gender issues and advocacy at GenderAvenger!