Right now, the City of Chicago is working on documenting all of its data. That’s right: all of Chicago’s public data, across all databases, in all its departments and sister agencies.
The project, called the Chicago Data Dictionary, is a massive, public metadata repository—a searchable archive of “data describing data”—that gives users information about the variety of data in the City of Chicago’s numerous databases. As the next phase of Chicago’s government transparency initiative, the Data Dictionary complements the City’s open data portal by providing background information on where such data comes from.
While it may not be the city’s chicest tech initiative, the Data Dictionary is nonetheless an ambitious and colossal project that is enhancing the city’s data landscape.
Why does a city need a Data Dictionary?
A database is only as good as the data it contains. Its validity, however, can suffer if its data is not defined clearly. Thus, data dictionaries, or metadata repositories, are important because they provide database users with key “ground rules” for understanding complex, often jargon-riddled, databases. Data dictionaries also allow users to find data quickly with a simple query.
In the case of a public data dictionary, “users” can include just about anyone who accesses municipal data. Public data dictionaries can benefit academic researchers and software developers who want to know what kinds of data a City holds, and how they can access it for research or application development. They can also assist City Staff who manage City databases and work to improve their efficiency.
Since government open data initiatives are still new, public data dictionaries are uncommon. In Cambridge, Massachusetts, the Cambridge Information Technology Department (ITD) is creating a Data Dictionary for its Geographical Information Systems (GIS) division. Cambridge’s dictionary provides information about the city’s geographical data use, coding, history, and other attributes.
Like Cambridge’s program, most metadata repositories cover only a single department, project or database. The Chicago Data Dictionary is a radical step: it takes the standard metadata repository model and amplifies it across an entire city.
Building a Metadata Repository in Chicago
The Chicago Data Dictionary is part of Mayor Rahm Emanuel’s vision to use technology to make government more efficient and transparent. The initiative also expands upon Chicago’s goal to be the nation’s leader in open data.
In March 2012, the Mayor sponsored an ordinance for the Data Dictionary, and it quickly passed through City Council. With the assistance of a $300,000 grant from the John D. and Catherine T. MacArthur Foundation, the Chapin Hall at the University of Chicago research center led the initiative along with Chicago’s Department of Innovations and Technology (DoIT).
Nine months later, an Executive Order issued by the Mayor mandated that city agencies regularly publish and update their public data on the City’s data portal. The Order specifically mentioned the Data Dictionary as a tool that would “improve City operations, services and analytical decision-making.”
Now one year into the project, Chapin Hall and DoIT are continuing work on the first of a three-phase plan to develop the Data Dictionary. In the past year, Chapin Hall has completed the inclusion of more than 12 city databases into the Dictionary; currently, they are identifying, processing, and cataloguing over 100 additional municipal databases.
However, Chicago government contains far more than 100 databases. By including every City and sister agency database in the new repository, how can Chicago ever complete such a herculean task?
This is the wrong question to ask. As the project’s scope implies, compiling the Dictionary is no quick job, nor is it ever a “done” job. Because new municipal databases may be added or changed, the Data Dictionary requires continued maintenance to ensure that its users receive useful and up-to-date information.
This brings us to the right question: how can the Data Dictionary improve the way Chicago’s citizens and government understand and use their City’s data?
One way to do so is to make the Data Dictionary available online, even as its development continues. Chapin Hall designed its homepage simply and efficiently, helping convey its purpose as a querying tool for users:
A second way to do so is to share the design of the Dictionary itself, so that outside cities and organizations may benefit by adopting it. As with many of Chicago’s other open-source projects, DoIT will make the source code for the Chicago Data Dictionary available on Github for anyone who wishes to build a metadata repository of their own.
Moreover, while some of Chicago’s open-source initiatives, such as the SmartData predictive analytics platform, are intended to be replicated by other cities, Chicago’s Data Dictionary model can serve a purpose for any type of organization. A ready-made API could be a gift to database administrators in nonprofits and private companies alike who use databases regularly.
A new tool for the public
When thinking of new and innovative ways data can improve cities, most people generally don’t think of metadata repositories. But without better understanding the “data about the data,” many of these new benefits may not develop in the first place.
Chicago’s Data Dictionary, a bibliographic giant growing bigger by the day, is providing the City with just that resource. The next time someone in Chicago has a question about their city’s data, they know where to look first.