Read more articles by Betsy Gardner

By Betsy Gardner • January 29, 2020

There are over 2,000 datasets published by the City of New York on the NYC Open Data portal. There are spreadsheets about everything from landmark violation complaints to elder abuse to campaign finance contributions to the Queens library branches. Enough spreadsheets that there is a whole “dataset of datasets” to catalogue them all. MODA, the New York City’s Mayor’s Office of Data Analytics, began this work in 2012 after the signing of the city’s Open Data Law. But perhaps some of the most intriguing data the city has published is the 2016 report “Reducing Data Poverty in NYC: Achieving Open Data for All.” Prepared for MODA by a capstone team from New York University’s Center for Urban Science and Progress, this report asks if there are inequities in who has data featured on the portal, and who accesses the data — and if so, what can the city government do about it?

 

The research team found that representative data matters. They defined representation as data that “captures some aspect of the data user, their community, or neighborhood” and discovered that when users feel that they are “counted” in the open data, they are more likely to seek out datasets and explore the portal. This is an important finding for a few reasons, but ultimately reminds cities that the bias and discrimination that may be present in the physical landscape can also materialize in the digital one. 

 

In “Toward an Open Data Bias Assessment Tool” by the Urban Institute, researchers note that open data can be biased in several ways, with whiter, wealthier areas being over-represented in user-generated data like 311 reports. Some government-generated data can over-represent people of color, particularly in data around arrests and criminal activity. This is important because of the increase in data-driven policy making. Making open data more accessible to underrepresented populations can help correctly guide policy, by growing user-generated content and increasing oversight of potentially biased data. Additionally, innovation, problem solving, and data visualization stem from open data access and should include underrepresented groups.   

 

One way cities can increase representation, both in the actual data and in the data portals, is through good data storytelling. Instead of simply posting data without engagement and context, cities are drawing users in with narrative and characters. In Gilbert, Arizona, staff knew that just posting municipal data on the open data portal would not suffice. Instead of building the portal and waiting for users to come, the town actively recruited them. The Gilbert open data press release acknowledged that “data is only useful if residents, businesses, and staff engage with it.” And how better to engage than a story? 

 

Gilbert’s Office of Digital Government created Alex, an animated character that humanizes the portal and guides users. Alex “learns” about new datasets and shares them with visitors, offers navigation tips, and acts as the town’s data storyteller. Alex improves civic engagement and increases accessibility, two things that diversify usage by lowering barriers to participation. 

 

Storytelling and design are also key to Detroit’s data innovations. Kat Hartman, director of the city’s Innovation and Emerging Technology (IET) team, came to data by way of visual design, which affords her a unique perspective and understanding of the user experience. For the IET team, visuals are a way to translate data for users who may be less experienced with sifting through massive datasets. Many of the team’s projects are data viewers, maps, and visual trackers that educate and inform residents in clear visual records. As Hartman explained in an interview with Data-Smart, data storytelling is a top priority for Detroit agencies. When government prioritizes the narrative and visual aspects of open data, it provides an easy and equal access point for all users. 

 

Free trainings are another point of access, particularly for those who want to dive deeper into the raw data. In NYC, the MODA team hosts free events like “Visualizing Data for Greater Impact,” “JavaScript 101,” and “Women in Analytics” all over the city. These events are listed on their site calendar and are open to every level of user. Additionally, the NYC School of Data, a “community-driven conference” on open data and civic tech, held a week long event in March of 2019. The conference was organized by BetaNYC and MODA, and helped attendees “improve their lives and neighborhoods through workshops, panels, demos, and networking.” Free and accessible trainings equalize the field for everyone to investigate and engage with open data. 

 

There are also ways to make the data more representative. For instance, some minority groups might be uncomfortable providing information for a government database. Transparent collection practices and strict data privacy laws can lead to more inclusive data. And people of color are underrepresented in government and in technology, so when these two fields combine, the lack of diversity can be off putting. It’s important that the individuals and departments that will be collecting and storing the data are representative of the groups being asked to provide information. 

 

Many of these underrepresented communities are considered data deserts, and government has a two-pronged task of increasing representative and unbiased data, and increasing data understanding and engagement. Learning about open data, knowing what data does and doesn’t exist, and understanding how to increase representative data will help communities advocate. It is government’s responsibility to be aware of discrepancies and to correct for it; if data-driven decision making is the end goal, equal access must be the starting foundation.