Building Data Standards

By Benjamin Weinryb Grohsgal • April 14, 2014

Civic data standards have become an increasingly hot topic in the world of data-driven governance, heralded as a way to bring clarity to what has quickly become a crowded and ultimately chaotic area. They represent a way to bring much-needed form and structure to the untamed Wild West of government-held data, which will allow much greater cooperation among developers, cities, and the civic tech community. So far there are few instances of really successful efforts to create a rugged data standard that has been adopted in a widespread way; ironically, when data standardization really works, you probably won’t even notice it has happened despite the substantial changes it can bring.

To back up for a moment, we should explain what data standards are and what their purpose and potential really are. It’s quite simple: they’re an agreed-upon format for any kind of dataset that is comprehensive enough and useful enough that other cities see the benefit of formatting their own data to include the same fields, information, and formatting as another. Rather than creating databases ad-hoc and formatting the data in whatever manner seems appropriate at the time or continuing with formatting from legacy systems, governments agree to use one approach, one set of fields, and one format. This is not a spontaneously occurring phenomenon, and every instance thus far has required a third party – Google, Code for America and Socrata being the major players – to build a proof-of-concept with one city or a small group of cities, and then use those outcomes to push for other cities to adopt the same. Other cities will only adopt if they see the benefit, which largely comes in the form of ready-to-use apps, software, and whatever else might be developed to work directly with the decided data standard. With cities sharing data standards, technology replication and transfer become seamless through increased predictability and less added time and effort to modify and customize.

One of the most successful examples of data standards is General Transit Feed Specification (GTFS), developed by Google alongside Tri-County Metropolitan Transportation District of Oregon back in 2005. Nowadays we expect to be able to search for transit directions and timings on a map or use a transit app no matter where we are, but this is a reality that required data standardization first. After the initial partnership, dozens of cities switched to this common format for public transportation schedules and that number has since grown to hundreds internationally. GTFS allows websites like Google Maps to work for every city without additional coding and supplies the foundation for real-time services and third-party apps. Many point to GTFS’s success as a proof-of-concept that shows the worthiness of investing the initial required time to port over existing data to a shared format, and the ease with which that data can be then be utilized thereafter.

Following GTFS’s success and the increasing use of internet and smartphone devices in accessing cities’ 311 features, a national effort led by US Federal Chief Information Officer Vivek Kundra and Code for America quickly allowed a common Open311 format to spread across cities. 311 services are a relatively recent phenomenon: a single channel through which citizens can easily contact their governments about non-emergency matters. More importantly, this single channel creates the opportunity to track, monitor, and evaluate citizen inquiries and the quality of government services. Open311 data standards take this effort and make it all that more crucial to changing the way cities serve their constituents. Standardization allows cities to almost instantly adopt reporting apps like SeeClickFix if they so choose, and cities can easily share in-house technology while also opening up to third-party software and data analysis. Almost equally as important, Open311 data standardization has helped unify reporting across departments in the heavily balkanized world of City Hall – a first step in removing departmental silos. Through common reporting there can also be shared data and technology between departments. It’s not a panacea, but it is a critical initial move that has the potential to shift uniform reporting and all the conveniences it carries to be the the default rather than an aberration.

Since the game-changing innovations of GTFS and Open311, two other data standards efforts have been led by San Francisco. LIVES, a standardized format for restaurant inspection scores, is a product of a partnership between the city and Yelp. This standard allows government inspection data to actually reach consumers through integration with websites like Yelp. More recently the City developed the House Facts Standard with Code for America, which is a uniform format for health and safety inspection data for buildings. Other cities have already signed on and real-estate website Trulia has begun making this information available. Both of these efforts illustrate ways data standards can make government data actually useful to citizens when it is brought to them through the private websites they actually use, rather than squirrelled away in a rarely-visited government website.

As inspection data becomes finally accessible – a great thing – and more uniform, voices like Professor David Ho of Stanford have raised concerns, noting that the inspection process in fact varies from city to city and data standards can become misleading when applied across environments that are in actuality quite different. This is an important caveat, as well as an opportunity to standardize the inspection process itself from city to city, making the data collected more relevant through uniformity and predictability. Data standards can mean process standards as well, and can be a leverage point to incentivize cities to take a second look at long-established processes and improve them to make them more relevant to their citizens’ real-life experiences. Rules should be in place for a reason, a simple fact that so often gets slowly lost under government bureaucracy. Data standards can remind us of that.

Sharing relationships like the G7 Cities (Chicago, LA, Boston, NY, Seattle, DC, and SF) continue to develop the public-private partnership model. This approach has had a number of recorded successes so far and is a promising way to develop future data standards. We anticipate that data standards will becoming increasingly prevalent and will change the way governments work with their constituents, private partners, and each other to procure and deliver services.

About the Author

Benjamin Weinryb Grohsgal

Benjamin Weinryb Grohsgal is a joint MPP/MUP candidate at the Kennedy School and the Graduate School of Design at Harvard.