A decade ago, I held a position in government responsible for overseeing 17 different grant programs. I asked each program director for a list of every grant in our $50 million portfolio. What I got in return was a mix – some leaders shared data in a spreadsheet, some in a database, and some in a Word document – but some had only pieces of paper with lists written out by hand.
At that point, few U.S. cities were using advanced data analytics – few even had it on their radar. That has changed dramatically since. In April 2015, Bloomberg Philanthropies rolled out its What Works Cities program to advance data-driven approaches to municipal government in 100 cities over three years. Within the first six weeks, 40% of eligible cities had applied. Cities are increasingly appointing Chief Data Officers, Chief Analytics Officers, and Chief Performance Officers to unlock the power of data to deliver better services to the public. And yet, there remains a wide variation in data maturity across government, with inconsistent progress over the last decade.
Where does data-driven government begin? With timely, reliable, consistent, high-quality data. Somerville, MA Mayor Joe Curtatone says that governing without data is like “driving a car blindfolded.” Improving the quality of data collected, and the ease of accessing and analyzing the data, takes the blindfolds off. With interest growing in applying analytics to government operations, it is worth reflecting on lessons from the leading cities. Examination on these successes may help demystify the path, and serve as inspiration to others.
The purpose of this paper is to describe why a public leader should move his or her organization from data disarray to data excellence, and to suggest some incremental means of achieving that goal.
I. Data-Driven Government
What does data-driven government look like? Some of the leading cities offer excellent examples.
- The City of Chicago was the first to appoint a Chief Data Officer and has long been a leader in predictive analytics. In 2015, the city developed a mathematical model to predict which of its 15,000+ restaurants and food establishments were most likely to cause foodborne illness. A predictive analytics model enabled restaurant inspectors to prioritize their visits according to risk, visiting first the places most likely to make customers sick. This project used data from across city government including from the Police Department, Department of Public Health, Department of Business Affairs and Consumer Protection, and 311, as well as external data such as weather and social media. The results were impressive – a 25% improvement in operational efficiency and the ability to find critical violations of the health code seven days faster.
- The City of San Francisco is driving culture change toward data-driven decision making by equipping each of the city’s departments with the skills and tools to better understand and use data. The city’s Chief Data Officer (CDO) has created a set of tools and templates designed to spread data literacy throughout city government. Each department has a data coordinator responsible for creating an inventory of datasets and then posting data to the city’s comprehensive open data portal. In partnership with Controller’s office, the CDO offers training to city staff in data analytics tools. Nearly every class fills to capacity, often the same day it is announced.
- In early 2016, the City of Los Angeles launched its public data mapping portal, the GeoHub. This user-friendly portal allows the public and city employees to visualize data from over 500 city and external sources on maps. Street Wize, one of the tools created for use on the portal, was built to help LA’s water and street departments avoid digging up the same streets for their separate maintenance activities one right after the other. Now that the data is public and easy to visualize, the fire department also uses it to speed their response to emergency calls by avoiding roads under construction.
- The City of Boston’s expansive open data portal includes agency performance dashboards, mapping tools, budget data, vendor contract data, service request data, food establishment inspections data, and a tool to track the progress of all capital projects. The city’s budget app shows every agency’s operational and capital budget, breaking down capital projects by federal, state, and local funding sources, and showing grant dollars and bond funds for each project. It also includes status details for each major project and ability to locate it on a map.
- In 2015, the City of New Orleans developed a predictive analytics algorithm to identify homes at highest risk of fire fatalities but least likely to have smoke detectors. Using data from the city fire department, along with Census and housing data, the city was able to target which blocks were most in need of free smoke alarms distributed by the fire department.
- Following a mild winter, the City of Somerville, MA was faced with a dramatic increase in rodent complaints. The city’s analytics team created an interagency Rodent Action Taskforce (RAT). Data drove all key decisions of the initiative, right down to the size of trashcans distributed around the city. Development of a predictive model had impressive results – a 66% drop in rodent sightings in the first half of 2015. Predictive analytics allowed comparison of their results to what would have happened under status quo conditions. Using open source software (Google’s CausalImpact), they created a model to predict what would have happened absent the intervention. According to this model, their efforts caused an estimated 40% decrease in rodent complaints.
No city has achieved excellence overnight. New York City, among the most accomplished and ambitious data-driven cities, has a long history of commitment to data beginning in the 1990s with the application of statistical analysis to the deployment of police resources, an approach called CompStat (for computer statistics). Today, data feeds the performance management system for city departments, and advanced data analytics improve resource allocation across many policy areas. But it has taken time to achieve such success – in 2018, when the city is set to achieve its deadline to have all city data public, it will have been nearly a decade since the 2009 push of data to the open data portal.
This success reflects investment that spans mayoral administrations. When Mayor Michael Bloomberg appointed Stephen Goldsmith as Deputy Mayor in 2010, his charge was to innovate city operations. Goldsmith used data to drive key decisions for city operations. He used analytics to unlock the power of existing data to reveal previously unidentified needs for services. He opened data to the public, increasing transparency. And for the first time, the city plotted requests for city services on neighborhood maps, and worked with community groups to understand how the city was marshalling resources to help. This significantly improved credibility of and community support for city government.
II. Capability Maturity Model for Data-Driven Government
Each leading data-driven city has taken a unique path. What is common at the highest level of data-driven government is a culture that values data and uses it to set priorities and allocate resources via stat programs, performance measurement, and advanced analytics. The framework below describes a generalized path showing that as cities mature in their capability to produce and share quality data, they provide the opportunity for both internal and external users to analyze and use the data, which in turn enables improved government performance. The model applies both to government as a whole and to a department within government, with its own distinct data environment and challenges.
This framework defines open data as a foundation on which a data culture can be built. While not a necessary precursor to creating a data culture, the process of publishing open data provides ample opportunity to assess data availability and quality and to become more familiar with data previously in silos and file drawers. Opening and sharing data can accelerate its use for analytics, management, and resource allocation.
While the stages described by this capability maturity model are discussed in sequence, progress is seldom linear. Rather, even within a single government agency, progress for different activities and datasets can vary dramatically. Nevertheless, it is helpful to have a standard against which to benchmark success, as well as a defined end state of excellence in data-driven government.
III. Publish: Launching an Open Data Program
As shown in the graphic, the first stage of maturity toward excellence in data-driven government is publishing data in a format usable to the public.
Launching an open data program doesn’t automatically improve data quality. But exposing data to the public invites interaction, and users of the data can provide feedback to improve it. The process of publishing may in itself improve quality as staff may choose to double-check the accuracy and completeness of data before making it public. Publishing data builds the appetite for more open data as users inside and outside of government begin to use the data for their own analytic purposes. As the New York City Open Data Plan for 2015 states, “More credible and robust data translates in bolder, bigger, more significant uses, with results to match.”
Starting with a well-defined, narrow scope and building incrementally over time allows initial successes to create momentum for the open data program. Most cities start with data that is already in machine-readable format, data that must by statute be made public, and data that is frequently the subject of Freedom of Information Act (FOIA) requests. Crime data is one of the most commonly published types of data, even among the lowest-ranked cities on the Open Data Census.
A helpful way to get started is to compile an inventory of citywide data assets and then develop a plan for what to release, and when. This should be done department by department, agency by agency, and then rolled into an enterprise-wide inventory and release strategy. San Francisco offers a helpful spreadsheet template for doing an inventory of potential data resources, along with step-by-step instructions and an example for reference.
An open data policy demonstrates executive commitment to open data and can set goals and standards for the open data program. An open data policy may describe the timeframe and process for publishing open data, and will typically specify roles and responsibilities for publishing data. It will define the applicable government data to be made public (such as statistics, lists, tables, charts, maps and databases) and will also define the types of data that can be restricted from public disclosure for privacy or security reasons. An open data policy typically establishes that data is open by default, and specifies that data should be updated regularly. Data quality measures and audit routines may be incorporated into the policy. The open data policy should describe the scope of applicable departments and agencies, reflecting that as an open data program matures, it will extend to quasi-public agencies as well. An open data policy helps set the tone of transparency for government.
An open data strategy describes how the goals of an open data policy will be achieved and what will be achieved by when. The strategy will detail specific steps, identify responsible parties, and provide performance targets. Selected leading examples include:
- San Francisco publishes a three-year strategic plan for open data and updates it each year. The city also publishes a detailed progress update on its strategy, showing how many datasets have been published by month, by agency, and by priority level. There is even information on each agency’s progress in creating its data inventory and its data publishing plan.
- New York City has been publishing an annual open data plan for several years and now updates it monthly. The plan includes a detailed inventory of the datasets scheduled for release in the upcoming year from school bus breakdown data to beach water quality to rodent inspection results to the average time to close a pothole request. The ambitious NYC Open Data Law requires that every city dataset be published by 2018.
These cities are walking the walk – openly sharing their plans for more open data.
In creating an open data strategy, setting priorities and establishing a realistic timeline are important. Code for America worked with the Sunlight Foundation and the Open Knowledge Foundation to create a list of the most important 18 datasets every city should publish, and these are the ones used by the Open Data Census.
At this stage of data-driven government, data is unlikely to be standardized across departments, limiting the capacity for analytics across the enterprise. When looking across the inventory of datasets, likely some are in databases, others in spreadsheets, and not all will include location-based data that enables GIS visualization. The availability of and experience level of staff responsible for gathering, publishing, and analyzing data varies across departments, and training resources can be limited.
iv. Polish: Publishing Large Volumes of Data and Improving Data Quality
The second stage of data maturity builds on the foundation of published data by improving its quality – “polishing” it.
At this stage of data maturity, publishing is routine and data volume increases rapidly. Publishing data in greater volume typically generates increased interest in and use of the data by the public, the media, universities, nonprofits, and other government users. These users can be expected to provide feedback that improves the accuracy of data released. Users will often notice inconsistencies in datasets, as well as inconsistencies across related datasets from different source departments. External users provide an essential feedback loop to address challenges with data quality.
Publishing a large volume and variety of data can also fuel curiosity and enable analytics and data-driven decision-making, and can inspire a data culture in government. A great example is found in Santa Monica, CA where the city last year released their groundbreaking Wellbeing Index, funded by the Bloomberg Philanthropies Mayors Challenge. The Wellbeing Index is a compilation of existing city data, social media data, and public opinion data. City employees are using results to guide strategy development and program priority-setting toward increased public wellbeing, from improving pedestrian safety to providing greater access to fresh fruits and vegetables in underserved neighborhoods. As part of the culture change toward publishing and using data, city employees received expert coaching from a local think tank on how to use data in resource allocation. Staff also asked for easier access to key city demographic data from sources such as the Census, which the city has made available on its open data portal. As city department leaders see positive results from new, wellbeing-oriented pilot projects, they are increasingly seeking ways to use data to both understand and describe public need and to design responsive programs – a true shift toward data-driven decision making.
With an established open data program, feedback from academics, the media, and community organizations can be incorporated in plans for data release so that regular updates create a predictable pattern of incremental progress. This steady flow of data enables increasing civic participation. Such public engagement may be formalized as it is in London, where a Consumer Data Council provides input to the city’s data strategy, or in NYC, where the 2015 Citywide Engagement Tour offered a series of open meetings for the public to ask questions and provide feedback on the city’s open data efforts. When dialog with constituents is established, transparency becomes an increasingly achievable goal.
Typically at this stage, staff data management skills are increasing both in the centralized data management agency (often led by the Chief Information Officer, the Chief Data Officer, or the Chief Technology Officer) as well as in the departments and quasi-public agencies supplying data. The availability, quality, and consistency of data enables better performance management within departments and across the enterprise.
At this stage of data maturity, data governance becomes more formalized, with clear lines of responsibility for data management, data updates, and data integrity. Audit and quality control processes are in place. Metadata provides context and improves ease of use by outsiders by adding context, such as with a data dictionary. A data dictionary compiles information about data such as “meaning, relationships to other data, origin, usage, and format," allowing external users to understand how the data was created and how it relates to other sources.
As an open data program becomes more robust, cities often provide to the public not only data but sophisticated tools for accessing data and creating new value from it by enabling data search, comparison, and download features. Some also offer advanced features such as the data visualization tools provided by Los Angeles and Chicago.
Many cities now provide the application programming interfaces (APIs) that enable the civic tech community to build their own apps using public data across a variety of fields, including permitting, housing, transit, and others. Once heralded, hackathons have generated varying degrees of success. Chicago has been particularly effective in leveraging its civic tech community by engaging with weekly hack nights. Volunteer coders have contributed over 200 hours to city projects, including building a model to predict when beaches need to be closed for unsafe levels of E. Coli bacteria. NYC’s Big Apps competition, which started in 2009 and now an annual event, invites developers to create apps for a variety of uses on priority city issues. Last year’s competition included over 100 applications to address issues such as affordable housing, waste reduction, and civic engagement. Opportunities such as this generate energy and increase public engagement with government.
V. Analyze: Using Data to Drive Performance and Improve Transparency
With large volumes of high quality open data, cities can begin to use it to analyze data, seek patterns and new insights, and drive better decisions.
A strong open data infrastructure lays the foundation for using data to make government better, from measuring performance to instilling accountability and analyzing data to understand programs and to predict and prevent problems. The most common use of data to manage government is through “stat” programs, modeled after CitiStat in Baltimore, 2004 winner of Harvard’s Innovations in American Government Award. CitiStat built on and expanded the analytical approach of CompStat, which applied data analysis at the neighborhood level to the deployment of police resources and enabled a new management approach based on individual manager accountability. The New York City Police Department won the Innovations in American Government award in 1996 for this work. The “stat” model has been well documented as a management tool to improve performance at the agency and program level using data.
In addition to stat programs, data can be used for predictive analytics and geospatial modeling.
Predictive analytics is the use of historical and real-time data to predict future events. This enables a city to prevent undesirable events (crime, foodborne illness, etc.) and to allocate resources more efficiently and effectively to address public needs. Chicago is a leader in developing such models to predict or prevent a variety of problems including childhood lead-based paint exposure, rodent infestations, and beach closures due to poor water quality. These models use temporal and spatial data and draw upon both city administrative data such as 311 calls, 911 calls, and operational inspections results, as well as weather and air quality data from external sources.
Most cities start their analytics programs by optimizing a specific task with a defined set of data from a handful of sources. Data analytics projects rely on large volumes of quality data. For most cities, the complex mathematical modeling and computing power required for this task is challenging, and many cities have developed relationships with universities or other outside experts for assistance. Predictive models that cross department boundaries are leading-edge not only because of the challenges of data matching across departments, but also because of the operational and organizational challenge of breaking down the silos of government.
For geospatial modeling of place-based data, many cities rely on commercial mapping tools such as Esri’s ArcGIS product. Another approach is to build a custom tool suited to local needs. Chicago developed an open source tool called OpenGrid, where large volumes of data can be imported from the city’s Socrata and CKAN open data portals. The tool allows users to visualize multiple layers of complex open data overlaid on interactive maps of the city. In addition to making the source code available to other users, Chicago now makes OpenGrid for Smart Cities available to others through Amazon Web Services. The idea behind making the source code free to others is to let smaller cities take advantage of the unique resources available to Chicago with its deep bench of city analytics talent and its rich civic tech community. Ideally, others will be able to benefit from the tool without having to invest resources to build their own.
Providing tools for both municipal employees and the public to access and analyze location-based data adds value to the raw data. Data visualization can help spot trends and gain insights through comparison of related datasets from different sources (say obesity rates and access to supermarkets or recreational open space). Layering different types of policy-relevant place-based information on a single map can create an impactful message, helping to identify connections among programs, policies and outcomes. Visuals can often illuminate both service gaps and locations where results outperform expectations and where insights might be gained through further study.
There is an additional benefit of making large volumes of high quality data public - it has the potential to erode the silos of government as agencies gain access to information from across the enterprise. Successful data analytics projects can garner positive attention, both inside government and in the media, which creates positive momentum toward a greater data culture. This is true in Chicago, where two analytics projects garnered significant press attention (a rodent abatement predictive model and a restaurant inspections predictive model). Managers across city government took note of the results, and of the public acclaim for improved basic city services. Now many additional departments are seeking support for their analytics projects. Publicity has also generated interest outside the city, with replication of the restaurant inspection analytics model underway in Montgomery County, MD.
Every city is different in the way it undertakes analytics efforts – some appoint a Chief Data Officer or a Chief Analytics Officer, while others assign these duties within existing roles such as Performance Officer or staff from among the mayor’s policy or analytics team. Regardless of how they are accomplished, analytics projects can make government better by helping decision makers understand patterns and predict needs.
At this stage, the skills of data analysts, data coordinators, and data scientists tend to improve as the availability of data provides momentum for more sophisticated analytics. In most cities, few options for data analytics skills development are available for public sector employees. At this stage of data maturity, data science skills are brought into government more often than developed internally, while the skills of data analysts and data coordinators come from either their prior background, or from self-taught or peer-based learning.
VI. Optimize: Embedding Data-Driven Decision Making Across the Enterprise
At the highest level of data-driven government maturity, use of data is optimized throughout the enterprise. The city culture not only accepts but embraces the use of data. At this stage, all levels of government use data to make decisions. Analysis spans departments because work is organized around public problems or issues, not the lines of authority created when the agencies were established. This is an ambitious goal, especially given that in the corporate sector most data stays in silos – a 2014 Forrester study showed that while executives want their data analyzed, they believe they are using only 12% of what’s available, leaving 88% of their data untouched.
At the highest level of data maturity, leaders take a customer-centric, problem-first approach to decision making, seeking data to support ongoing inquiry rather than to prove a point after the fact. This ethos of curiosity is well-described by political scientist James Q. Wilson, who said, “I’ve tried to follow the facts wherever they land. Every topic I’ve written about begins as a question. How do police departments behave? Why do bureaucracies function the way they do? … I can honestly say I didn’t know the answers to those questions when I began looking into them.”
In this stage, the data analytics and data science experts may be centralized as they are in Chicago, Boston and New York City, or can be embedded in each city agency as they are in San Francisco. New Orleans uses an innovative model - analytics capacity is built centrally and then intentionally spread throughout government. Analysts are hired into the central performance and analytics unit, and then when they have gained experience they are placed into operational leadership roles in city departments. Regardless of the structure, governments at this stage view analysts and data scientists as valued assets and therefore provide commensurate investment in training and skills development.
A key part of this stage, not yet fully realized in any city, is to have documented case studies of analytics projects to help guide other cities to replicate and build on the success of their peers. With support from the Arnold Foundation, the Ash Center for Democratic Governance and Innovation at Harvard will be supporting a peer network of Chief Data Officers and publishing case studies and resources over the next several years to help advance the field.
Is the road to analytics excellence easy? No, nor is any transformational change in government. As business change scholar John Kotter notes, “In reality, even successful change efforts are messy and full of surprises.” And yet, data-driven government holds the promise to better align services with needs and make better use of taxpayer resources. Cities beginning to understand and use their valuable data resources are embarking on an important journey toward greater responsiveness, transparency, and efficiency. The future is bright.
News Articles, Press Releases, Research Reports, and Scholarly Journal Articles:
“What Works Cities Brief: The City Hall Data Gap,” Bloomberg Philanthropies, 2016.
“The Forrester Wave: Big Data Hadoop Solutions,” Forrester, February 27, 2014.
“Leading Change: Why Transformation Efforts Fail,” John Kotter, Harvard Business Review, March-April 1995.
“Dictionary of IBM & Computing Terminology,” IBM, accessed May 26, 2016 at https://www-03.ibm.com/ibm/history/documents/pdf/glossary.pdf
“Open Data in San Francisco: Institutionalizing an Initiative,” City and County of San Francisco, Mayor Edwin Lee and Chief Data Officer Joy Bonaguro, July 2014.
“Open Data for All”, City of New York, Commissioner Anne Roest, Department of Information Technology & Telecommunications, and Dr. Amen Ra Mashariki, Chief Analytics Officer and Chief Open Platform Officer, Mayor’s Office of Data Analytics, July 2015.
Obituary of James Q. Wilson, New York Times, Bruce Weber, March 2, 2012.
Joy Bonaguro, Chief Data Officer, City of San Francisco, interview by author, April 1, 2016.
Craig Campbell, Data Fellow, City of New York, Mayor’s Office of Data Analytics, interview by author, May 26, 2016.
Dan Hadley, Chief of Staff, City of Somerville, interview by author, February 23, 2016.
Stephen Goldsmith, Daniel Paul Professor of the Practice of Government and the Director of the Innovations in American Government Program at Harvard's Kennedy School of Government, interview by author, April 26, 2016.
Katherine Hillenbrand, Ash Center for Democratic Governance and Innovation, Harvard Kennedy School, interview by author, April 15, 2016.
Tyler Kleikamp, Chief Data Officer, State of Connecticut, interview by author, February 1, 2016.
Skye Stewart, Director of SomerStat, City of Somerville, interview by author, February 23, 2016.
Sean Thornton, Research Fellow, Ash Center for Democratic Governance and Innovation, Harvard Kennedy School, interview by author, March 16, 2016.
Oliver Wise, Director, Office of Performance and Accountability, City of New Orleans, interview by author, March 7, 2016.
Selected helpful open data sources
Open Data Census surveys and compares progress made by different cities around the globe and the US in releasing open data. Results of the census can be found at: http://census.okfn.org.
The City of San Francisco Data Academy provides education and resources for data coordinators in city departments and all staff interested in learning to analyze and visualize data, http://datasf.org/academy/.
The Center for Government Excellence at Johns Hopkins University “helps local governments build capacity for decision making that is rooted in evidence, transparent accountability, and community engagement,” A variety of helpful resources can be found at: http://govex.jhu.edu.
Code for America “connects technologists with government to create 21st century services and digital platforms” and has created an Open Data Playbook for cities, found at: https://www.codeforamerica.org/practices/open/open-data/
The Sunlight Foundation is committed to open and transparent government, including open data, and provides helpful open data recommendations at: http://sunlightfoundation.com
Selected helpful data analytics sources
The Center for Government Excellence at Johns Hopkins University shares on its web site explanatory information about applying analytics to government performance, and provides case studies which can be found at: http://govex.jhu.edu/practice-area/performance-analytics/
The Center for Data Science and Public Policy at the University of Chicago Harris School was created in 2015 to help government access the power of analytics for optimizing performance. One of the ways they bring analytics to government is through the Data Science for Social Good program. Descriptions of completed DSSG projects can be found at: https://dssg.uchicago.edu/projects/
Source code for completed projects can be found at: https://github.com/dssg
An informative paper describing the history and context for data analysis by city government, including helpful framework for thinking about ways of applying analytics was written by the Center for Urban Science and Progress (CUSP) at New York University. The paper can be found at: http://cusp.nyu.edu/wp-content/uploads/2013/07/CUSP-overview-May-30-2013.pdf