Why Data Science Needs Feminism

1 / 2
2 / 2

Photo by Adobe Stock/Artem.

The act of collecting and recording data about people is not new at all. From the registers of the dead that were published by church officials in the early modern era to the counts of Indigenous populations that appeared in colonial accounts of the Americas, data collection has long been employed as a technique of consolidating knowledge about the people whose data are collected, and therefore consolidating power over their lives.  The close relationship between data and power is perhaps most clearly visible in the historical arc that begins with the logs of people captured and placed aboard slave ships, reducing richly lived lives to numbers and names. It passes through the eugenics movement, in the late nineteenth and early twentieth centuries, which sought to employ data to quantify the superiority of white people over all others. It continues today in the proliferation of biometrics technologies that, as sociologist Simone Browne has shown, are disproportionately deployed to surveil Black bodies.

When Edward Snowden, the former US National Security Agency contractor, leaked his cache of classified documents to the press in 2013, he revealed the degree to which the federal government routinely collects data on its citizens—often with minimal regard to legality or ethics. At the municipal level, too, governments are starting to collect data on everything from traffic movement to facial expressions in the interests of making cities “smarter.” This often translates to reinscribing traditional urban patterns of power such as segregation, the overpolicing of communities of color, and the rationing of ever-scarcer city services.

But the government is not alone in these data-collection efforts; corporations do it too—with profit as their guide. The words and phrases we search for on Google, the times of day we are most active on Facebook, and the number of items we add to our Amazon carts are all tracked and stored as data—data that are then converted into corporate financial gain. The most trivial of everyday actions—searching for a way around traffic, liking a friend’s cat video, or even stepping out of our front doors in the morning—are now hot commodities. This is not because any of these actions are exceptionally interesting (although we do make an exception for Catherine’s cats) but because these tiny actions can be combined with other tiny actions to generate targeted advertisements and personalized recommendations—in other words, to give us more things to click on, like, or buy.

This is the data economy, and corporations, often aided by academic researchers, are currently scrambling to see what behaviors—both online and off—remain to be turned into data and then monetized. Nothing is outside of datafication, as this process is sometimes termed—not your search history, or Catherine’s cats, or the butt that Lauren is currently using to sit in her seat. To wit: Shigeomi Koshimizu, a Tokyo-based professor of engineering, has been designing matrices of sensors that collect data at 360different positions around a rear end while it is comfortably ensconced in a chair. He proposes that people have unique butt signatures, as unique as their fingerprints. In the future, he suggests, our cars could be outfitted with butt-scanners instead of keys or car alarms to identify the driver.

Although datafication may occasionally verge into the realm of the absurd, it remains a very serious issue. Decisions of civic, economic, and individual importance are already and increasingly being made by automated systems sifting through large amounts of data. For example, PredPol, a so-called predictive policing company founded in 2012 by an anthropology professor at the University of California, Los Angeles, has been employed by the City of Los Angeles for nearly a decade to determine which neighborhoods to patrol more heavily, and which neighborhoods to (mostly) ignore. But because PredPol is based on historical crime data and US policing practices have always disproportionately surveilled and patrolled neighborhoods of color, the predictions of where crime will happen in the future look a lot like the racist practices of the past.34 These systems create what mathematician and writer Cathy O’Neil, in Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, calls a “pernicious feedback loop,” amplifying the effects of racial bias and of the criminalization of poverty that are already endemic to the United States.

O’Neil’s solution is to open up the computational systems that produce these racist results. Only by knowing what goes in, she argues, can we understand what comes out. This is a key step in the project of mitigating the effects of biased data. Data feminism additionally requires that we trace those biased data back to their source. PredPol and the “three most objective data points” that it employs certainly amplify existing biases, but they are not the root cause.  The cause, rather, is the long history of the criminalization of Blackness in the United States, which produces biased policing practices, which produce biased historical data, which are then used to develop risk models for the future.  Tracing these links to historical and ongoing forces of oppression can help us answer the ethical question, Should this system exist?  In the case of PredPol, the answer is a resounding no.

Understanding this long and complicated chain reaction is what has motivated Yeshimabeit Milner, along with Boston-based activists, organizers, and mathematicians, to found Data for Black Lives, an organization dedicated to “using data science to create concrete and measurable change in the lives of Black communities.” Groups like the Stop LAPD Spying coalition are using explicitly feminist and antiracist methods to quantify and challenge invasive data collection by law enforcement. Data journalists are reverse-engineering algorithms and collecting qualitative data at scale about maternal harm. Artists are inviting participants to perform ecological maps and using AI for making intergenerational family memoirs.

All these projects are data science. Many people think of data as numbers alone, but data can also consist of words or stories, colors or sounds, or any type of information that is systematically collected, organized, and analyzed. The science in data science simply implies a commitment to systematic methods of observation and experiment. Throughout this book, we deliberately place diverse data science examples alongside each other.

They come from individuals and small groups, and from across academic, artistic, nonprofit, journalistic, community-based, and for-profit organizations. This is due to our belief in a capacious definition of data science, one that seeks to include rather than exclude and does not erect barriers based on formal credentials, professional affiliation, size of data, complexity of technical methods, or other external markers of expertise. Such markers, after all, have long been used to prevent women from fully engaging in any number of professional fields, even as those fields—which include data science and computer science, among many others—were largely built on the knowledge that women were required to teach themselves. An attempt to push back against this gendered history is foundational to data feminism, too.

Throughout its own history, feminism has consistently had to work to convince the world that it is relevant to people of all genders. We make the same argument: that data feminism is for everybody. (And here we borrow a line from bell hooks.)  You will notice that the examples we use are not only about women, nor are they created only by women. That’s because data feminism isn’t only about women. It takes more than one gender to have gender inequality and more than one gender to work toward justice.

Likewise, data feminism isn’t only for women. Men, nonbinary, and genderqueer people are proud to call themselves feminists and use feminist thought in their work. Moreover, data feminism isn’t only about gender. Intersectional feminists have keyed us into how race, class, sexuality, ability, age, religion, geography, and more are factors that together influence each person’s experience and opportunities in the world. Finally, data feminism is about power—aboutwho has it and who doesn’t. Intersectional feminism examines unequal power. And in our contemporary world, data is power too. Because the power of data is wielded unjustly, it must be challenged and changed.

Excerpted from Data Feminism by Catherine D’Ignazio and Lauren F. Klein. Reprinted with Permission from The MIT PRESS. Copyright 2020.

In-depth coverage of eye-opening issues that affect your life.