Faculty Op-Ed: Beyond Inclusion in Numbers
Pride month is an opportunity to celebrate, honor, and make the LGBTQIA+ community more visible. Celebrated in the month of June in at least nine countries including the United States and Canada pride is an outlook that strengthens an ongoing fight for rights and dignity. From events and parties to critical reflections on where we are as a society in terms of equity and equality, all of us have something to learn during this time.
As data scientists, we strive to interpret the world, often looking everywhere for solutions to complex problems. As an emerging field, data science brings together people with various backgrounds, worldviews, and experiences to tackle questions whose answers might not yet exist. The idea of data science as a team sport, based on collaboration and learning together, can only be achieved through the inclusion of multiple voices. Pride month highlights the need to look around and ask whose voices are heard in our field and what we want the future of data science to look like.
There are many instances in which gender expressions and identities, as well as sexual orientation, can be approached in our work as data scientists. As an educator, non-computational data scientist, and non-binary member of the trans community, I will present a few reflections on where we are and how we can move forward.
First, it is important to understand how datasets might not be an accurate representation of reality, prone to statistical errors, missing links, and invisibilization of entire communities. If we ask the wrong questions, can we really hold a grasp of the complexities around us? Official numbers are fundamental to policymaking and to understanding the world. However, if these numbers only reflect an attempt at measuring people's sex or gender within a binary, they will dismiss whoever does not fit that framework.
In the U.S., according to a recent study, 1.6 million people over the age of 13 identify as transgender. Another study, from 2021, estimated that at least 1 million Americans are non-binary people. As crucial as these numbers are, they do not have the authority of official numbers. Unfortunately, when public data and official numbers such as the US census are built upon a binary notion of sex (male or female), they result in policies that overlook or ignore many people. The lack of more granular data about LGBTQIA+ communities, or data at the intersection of race, ethnicity, income, and disabilities, prevents a broader comprehension of gaps and opportunities for improvement of public policy.
Having in mind the limitations of certain datasets to represent reality in its fullness is a good starting point. I understand the limited access to data collection tools that we, as data scientists often face, having to work with whatever is available an issue constantly debated in the classroom. However, we need to find ways to calibrate, question and comprehend our datasets within these constraints.
Second, we must keep in mind how data influences and is influenced by policy and politics. Disputes around census data that impact entire populations are a good example about questions (not) being asked. Under Brazil’s authoritarian regime led by Bolsonaro, the Brazilian Institute of Geography and Statistics (IBGE) not only has been postponing data collection for the census which is supposed to happen every ten years but also said it would not conduct the census at all if questions about gender identity and sexuality were to be included in the survey. The government of Scotland, on the other hand, will ask questions about sexual orientation and trans status or history in its census for the first time in 2022. This is a step toward a better understanding of the nuances of the LGBTQIA+ communities that could lead to improvements in policies, services, and analyses.
While insights on stories not being told matter, visibility might not always be desirable, safe, or possible. The way trans and gender-nonconforming people are portrayed (or not) in datasets needs to be aligned with principles of ethics, responsibility, and digital rights. From body scanners at airports to facial recognition technologies, the scholarship on the impacts of data-centric systems on the transgender community is consolidating a body of work relevant to data scientists committed to advancing ethics in our field. Starting these conversations in the classroom is crucial not only by ethicists but by everyone concerned about the impacts of our work on society. Many of my data science students express surprise when learning about the unintended consequences of certain technologies. It is our job to integrate these grey areas into the hard questions we ask.
Third, talking about harm is deeply uncomfortable. It requires a combination of rigorous methods, openness, and kindness. Understanding the contexts in which our data-centric technologies unfold, how it is not always aligned with public interest, and that people can be harmed by our actions is important. It can also be difficult to navigate. It takes time, willingness to learn, and a degree of humility. Giving a voice to and holding space for LGBTQIA+ experiences should be part of this process. As educators, we can bring materials focused on LGBTQIA+ communities, studies, reports, datasets, and invite people to talk to students not only those harmed by these technologies, but also the many specialists, advocates, and policymakers committed to a better, inclusive future.
The transnational impacts of data science must be acknowledged, since many of the systems we build and work on are deployed in different countries, under authoritarian regimes and with flawed frameworks to protect human rights. As someone who worked in various countries over the last 15 years, in the global South and North, the perspectives about data science beyond national borders are urgent. The technologies contained by decent safeguard measures and robust protections in the U.S. or the European Union are also deployed in countries in which the rights of the LGBTQIA+ groups are under attack or non-existent. As our students will possibly be employed by transnational companies, having in mind a framework of contextualized responsibility without borders is essential to strengthening ethics in data science.
I hope these conversations will become more common in our field over the next few years. Lots of amazing projects are being developed and, while I cannot mention all of them here, I want to highlight a few, such as the Visualizing Anti-Trans Violence (Gayta Science); a book exploring Queer Data; the conversation we held in March for the International Transgender Day of Visibility; the establishment of the Center for Applied Transgender Studies; among many others.
We are reaching the end of Pride Month, but that does not mean we cannot keep these conversations going. We can focus on openness to learn together and face the challenges ahead. I believe in my students, colleagues, and the connections we make. Also, visibility in this job is important. Representation matters and being non-binary in data science helps me frame some of these discussions, share them with my students and learn with them. There is a great opportunity to shape this emerging field and I want us to hold space for everyone.