The following is excerpted with permission from Beyond Transparency, the new book published this week by Code for America and edited by Brett Goldstein and Lauren Dyson.
College seminars, management consultants, and whole sections of the Wall Street Journal have all started to focus on something called “big data.” The general definition of big data that’s evolving is that it’s an exponentially larger set of information than we’re accustomed to analyzing, generated by machines, produced frequently, and often tagged with geo-location. The applications of big data are often an afterthought, while the conversation focuses on the quantity of data, how we’ll warehouse it, and assumptions along the general ethos of “more is better.” The reality is that big data holds promise, but it should not be confused with being data-driven.
A focus on outcomes is often lost in the discussion of big data because it is so frequently an afterthought. We have a huge fire hose of information, but even a fire hose is only valuable when it’s pointed at a fire. Data by itself is not inherently valuable. Collecting information about traffic patterns in a CSV file is not in itself helpful; the data becomes more valuable when it is used to form traffic-enabled maps and when city planners use the information to redesign traffic patterns. However, what really matters is not the CSV file, the map, or the traffic patterns, but the outcomes: using data to improve traffic and cut down on commute time, reduce automobile traffic and improve our air quality in the city, create crosswalks and bike lanes that decrease the incidents of car and truck accidents with pedestrians and cyclists, and allow us to live faster, cleaner, and safer lives.
The reality is that big data holds promise, but it should not be confused with being data-driven.
If you’re looking for well-managed, focused, and data-driven institutions, look no further than the major American cities. City governments provide the services that are the backbone of modern life: the water we use when we brush our teeth in the morning; the roads, buses, and subways that take us to work; the teams that keep our streets clean and our parks green; the schools where are children are educated; and the police and fire forces that keep us safe. Increasingly, we see that Americans are choosing to live in cities. Attracted by the economic and cultural opportunities, Americans and immigrants are pursuing their dreams right alongside hundreds of thousands, if not millions, of fellow citizens. They’re not drawn to spacious apartments or luxurious commutes—in fact, they’re often making trade-offs on housing and transportation. They’re moving because they are committed to an urban life.
This great urban migration is placing even higher levels of demand on basic city infrastructure: water, sewer, fire, police, housing, healthcare, education, parks, and so on are all in higher demand. Meanwhile, cities have even fewer resources to meet those needs. In response to economic conditions of the last decade, cities have witnessed tax revenues that are lower on a per-capita basis, which means that mayors and city leadership are forced to do more with less. In practice, that means finding new ways to get even better outcomes out of our current systems and processes.
Being a data-driven city is really about more efficiently and effectively delivering the core services of the city.
A data-driven city is a city that intelligently uses data to better deliver critical services. Transparency, open data, and innovation are all important parts of the modern civic identity, especially in a city like New York, which is focused on strengthening its position as a tech leader. However, being a data-driven city is really about more efficiently and effectively delivering the core services of the city: smarter, risk-based resource allocation, better sharing of information agency-to-agency to facilitate smart decision-making, and using the data in a way that integrates in the established day-to-day patterns of city agency front line workers. Being data-driven is not primarily a challenge of technology; it is a challenge of direction and organizational leadership.
For New York, a series of 2011 apartment fires helped galvanize our focus on the ability of data—in this case, the data that we already had—to save lives.
Apartment Fires in the Bronx and Brooklyn
In the spring of 2011, a pair of house fires in apartment buildings in the Bronx and Brooklyn killed five people as a result of unsafe living conditions. This sort of fire is not an isolated incident. When many people crowd into unsafe apartment conditions, with portable cooking devices, questionable electrical wiring, and inadequate fire escape access, catastrophic fires will take lives. The occurrence is all too common in a densely populated city like New York. The City receives over 20,000 citizen complaints a year from buildings suspected of being unsafe boarding houses.
New York collects an immense amount of information about every single one of our buildings. We know when and how buildings were built; we know if the building is receiving water service and is, therefore, inhabited; and we know if buildings are in good order based upon the location’s history of ECB (environmental complaint board) violations on quality of life issues. Every day, we receive over 30,000 service requests (complaints) through 311 from New Yorkers, which gives us more location-specific intelligence. We know even more about the neighborhood where the building is located: we know how often 911 runs are made to that block-face, if road construction is being done, if there are accidents in the intersections, and what kinds of businesses are on the block.
In the case of the fire in the two buildings, by the time they occurred, the City had information on tax liens, building complaints, sanitation violations, and building wall violations. Did we know enough about these buildings before the fire that should have raised a red flag? Could we determine which pieces of information are the most valuable predictors of catastrophic outcomes? Our team, the Mayor’s Office of Data Analytics, set to work to answer those questions.
Analyzing the Illegal Conversion Problem
Providing safe, abundant, and affordable housing is a priority for the leader of every community, from the mayor of a town of 25,000 to the mayor of New York City. Every year, more people move to New York City, and as they do, housing demand increases, the price of rent grows, and individuals are often in a bind as they search for affordable housing.
Because of this strain, the City continues to invest in constructing new affordable housing and maintaining our large system of affordable housing buildings. However, unscrupulous landlords often take advantage of the high demand by providing substandard apartments. They create these apartments by subdividing existing space, with disregard to fire exit access. They put deadbolts on bedroom doors in single-family houses and rent them out as hotel rooms. They put a half bath into a garage, seal the door with tape, and rent out the space. They put beds next to boilers in basements, which is an area that is prone to carbon monoxide poisoning and boiler explosions. In general, they allow for gross over-occupation of small spaces without sanitary conditions. The City classifies these substandard apartments as “illegal conversions.”
The New York City Building Code has one primary goal: safety. That code wasn’t created out of thin air; it has been created and refined with hundreds of years of civil enforcement in the city, often in response to catastrophic accidents. Rules around fire escape access, size of space, inhabitation of basements, etc., are all designed to prevent New Yorkers from dying in building accidents. The City enforces that building code with a team of building inspectors, who always examine buildings in the construction process and continue to monitor buildings as they mature. These inspectors are trained professionals. When they find an illegal conversion, they do a great job of enforcing the code by either ensuring that the space is immediately configured for safe living or vacating the space to get the residents out of the path of harm. With new residents moving to the city every day, though, and landlords willing to take advantage of them, especially those who are most vulnerable to exploitation, the City must address a constantly growing and changing stock of illegally converted living spaces.
The City’s single largest source of intelligence on illegal conversions is New Yorkers who phone in (or use the web or mobile app) to 311 with tips. We have millions of eyes and ears on the street, and every day, we get over 30,000 new pieces of intelligence. Often, that intelligence has immediate, direct value; when a New Yorker calls in a street light that’s gone out, we’re able to send a truck and replace the bulb. Almost every single one of those street light complaints is founded, meaning that the light is actually out. That makes sense because you can look at the lamppost and see if it’s shining or not. Seeing an illegal conversion is much more complex. The individual who makes the complaint often has no direct access to the space, and instead, they’re forming their hypothesis based on what they see on the outside of the building in terms of population flow in and out of the building, the number of cars parked on the street, the amount of trash generated by the building, etc. Unfortunately, only eight percent of the specific 311 illegal conversion complaints from the citizenry are actually high-risk illegal conversions.
Illegally converted housing spaces are the worst of the worst because they are the places where we’re most likely to lose lives. When we send out a building inspector to look at an illegal conversion complaint, ninety-two percent of the time, they get there and there’s nothing serious in terms of safety risk. That’s not to say that those ninety-two percent of complaints are worthless. They often send inspectors to places where less serious violations are found, and the very act of sharing intelligence on a location helps us build up the profile of the space. Still, we have a limited number of inspectors, and they have a limited amount of time. What we really want to do is sift through that pile of 311 illegal conversion complaints and find the eight percent of complaints that are the most serious. That’s where we should be sending inspectors immediately.
Thanks to twelve years of leadership by Mayor Bloomberg, the nation’s most data-driven mayor, we have no shortage of data from which to build a catastrophic risk model. By conducting an analysis of historic outcomes (past illegal conversion violations) and identifying similar characteristics in those locations, we were able to create a risk model that takes each inbound 311 illegal conversion complaint, pairs it with existing information we know about that building, and predicts whether or not the complaint is most likely to be founded, meaning there are severely bad living conditions.
It is important to note that while our team has evolved to use sophisticated tools and data, we started this project out with a couple old desktops and versions of Microsoft Excel that broke down after 36,000 rows of data. Using the rudimentary tools that are found on virtually every business machine, a talented young analyst was able to conduct the correlative analysis that told us what to look for in the 311 complaints.
By prioritizing the complaints that are most likely to be dangerous, we are remediating dangerous conditions faster, without any additional inspector manpower. That is a safety-based resource allocation plan.
Collecting Information That Drives Analytics
The experience of the Department of Buildings’ illegal conversions risk filter demonstrated firsthand for us how difficult it could be to gain access to agency datasets and make sense of them, especially in the context of simultaneously analyzing datasets from different city agencies.
Large organizations are often stove-piped, and few organizations exemplify that problem more than cities. New York City, for instance, has over forty different agencies and over 290,000 employees. Traditionally, these agencies have focused on their chartered responsibilities (policing, fire prevention and response, health, etc.) often independently and kept data within their walls. Even on special projects, where analysts from multiple agencies conducted a cross-functional analysis, the data sharing was one-off and only allowed for a moment-in-time analysis. There was no ongoing data cooperation that allowed for performance measurement and solution iteration. Half of the effort to becoming data-driven is connecting the data, and that is an organizational challenge, not a technological one.
There is an important distinction between collecting and connecting data. Data collection is based upon the actual operation of services in the field. Our analytics team gets very tactical data, for instance, on the numbers of trees that fall down during a storm. It’s our job to work with the data that is currently collected. For instance, the Parks Department decides how to respond to a tree and how to record that information, and we take it, but we do not let data collection get in the way of critical operations. Using analytics as a reason to change data collection can become a political problem, and at the very least, it is an organizational problem of retraining the front line. Instead of constantly pushing for new data, we rely upon what is already being collected and consult the agencies over time as they change and modernize their practices. Fortunately, cities have moved toward business reporting metrics in the last decade, and there is already a lot of data available. Led by Mayor Bloomberg, all city agencies measure their performance against annual goals and report that performance directly to New Yorkers. Those goals are important, but what we’re really interested in is the underlying data that tracks performance.
Half of the effort to becoming data-driven is connecting the data, and that is an organizational challenge, not a technological one.
Data connection is different. In the past, when the Parks Department removed a tree that fell down on a sidewalk on a Wednesday and the Transportation Department went to repair the sidewalk on a Thursday, we had no way of connecting those two pieces of data. The first problem is that they are not housed together. The second problem is that even if we had them together, we wouldn’t have had a clear way to connect them. Each agency has its own ontology of terms and data that have all been created through reasonable, rational evolution of service, but which sometimes make it nearly impossible to connect that data. One department may use a GIS identifier for the location of the downed tree, whereas another may refer to it by its cross streets.
For us, we found that BBL/BIN (borough block lot/building identification number), along with a specialty geocoding software program one of our analysts wrote, was the Rosetta Stone to connecting the city’s operational intelligence. For most city agencies, BBL and BIN are the standard way of identifying a location; however, they’re not used by all agencies, nor are they universally appropriate. However, we can take whatever geo data we have (an address, an intersection, etc.) and geocode it to the nearest BIN/BBL. By focusing on the common denominator, which is structures in specific locations in this case, we’re able to tie together datasets that have previously never been linked.
Having integrated data is important because of its application in stronger problem solving. The more information we have through which to run correlative analyses, the better we can form risk filters. In the case of the illegal conversion filter, two of the most important pieces of input are whether the building is current on its property taxes and whether banks have filed any mortgage foreclosures. Those two pieces of information come from two different sources—the New York City Department of Finance and the Office of Court Administration (mortgage default records), and their continued access is necessary to the ongoing effectiveness of the filter.
The capacity to connect data and analyze it is powerful, but it’s still dependent upon the agencies playing ball by giving us their data to examine. One way to get the data is to demand compliance with our project. Anyone who has ever worked on a team or in a business knows that demanding compliance is rarely the best solution. Instead, we have to sell our service to the agencies. The agencies deliver city services, and because what we really do is help them deliver city services more efficiently, we treat them as our clients. We work toward solutions to their problems in order to make their lives easier. It’s all about them, just as it should be. They are the ones who are actually keeping this city clean and safe every day, and if we can demonstrate that we’ll help them do their jobs better with very little effort and a very small footprint, they’ll want to partner with us. As a result, and without exception, we have never failed to get what we need in order to deliver this service back to them.
It’s important to note that even in our office, we still have lots of city data that is outside of our walls. We don’t yet have granular information from the New York City Department of Education or from internal employee management systems. We also don’t have data on particulate matter at the sewage treatment plants, the pollen counts on a given street, etc. Keep in mind that you don’t need everything to get started, and, conversely, you need a reason to collect and connect the information you ask for. When we have a project that requires particulate matter at the sewage treatment plants, we’ll reach out to the Department of Environmental Protection and collect it, but until then, we’ll work with what we have. A rational, project-based approach to data collection and connection is the best way to build success over time.
Agencies Are Our Clients
When we collect information from agencies, we’re asking for them to give us access to their legacy IT systems and share all of their information. They don’t have to say yes, but they do, for two reasons. First, by participating in the data exchange, they have access to the information of other agencies as well. They’re able to avoid picking up the phone every first Tuesday of the month and calling the IT department of another city agency and asking for a one-off query of information because they’re able to automatically access the information through our data sharing platform. Second, and more importantly, agencies like sharing their data with us because we help them.
Just as data is not valuable without a specific outcome in mind, neither is a centralized analytics team. Intelligently applied, an analytics team does not look for new problems to solve, but works with the teams in the field to solve existing problems in a way that makes their jobs more effective without burdening their work.
It is the agencies, and specifically, the employees at the agencies, who are on the ground and who understand all of the details of the service of delivery. These are the teams that can give us the best-observed information on what’s going on and how we can work to fix problems. Moreover, these are the teams that are going to implement whatever solution we find through our analysis. Having them on board is fundamentally important to actually delivering more valuable service. The best way to have them on board is to work on a problem that actually impacts their day-to-day lives.
In the case of the building inspectors, that was an intelligent way to automate complaint priority. The building inspectors have an enormous amount of professional experience, and when they are able to read complaints and compare it with their own experience, they’re able to identify those that are often the worst. With fewer and fewer inspectors, more and more illegally occupied buildings, and more and more 311 complaints, devoting the time to successfully risk assess those complaints one-by-one by hand has become an onerous challenge. When we use a filter to prioritize tickets, we’re not ignoring the experience level of those inspectors. Instead, we’re giving them a leg up by doing an automated first pass on the inspection priority, essentially applying their accumulated institutional knowledge in an automated fashion. They can still read and reorder based on their knowledge set, but we’re starting them off with an intelligent list.
With these agencies, we can talk about the benefits of an analytics approach all day, but what they really care about are the results. We have a ROI-driven mayor, a ROI-driven budget office, and leaders at all of the agencies that are ROI-driven. If we ask them for their time and their data to improve their delivery of service, we should deliver improved service, and at the very minimum, we should be measuring the change in levels of service in order to understand the impact.
Measuring results may require new ways to think about the metrics. The goal of the Department of Buildings illegal conversion risk filter is to reduce the number of deaths through fires and structural collapses. However, the reality is that due to the professional excellence of our agencies, those events are so rare, even in a city as large as New York, that it can be difficult to accurately measure the performance improvement from such a small dataset. Instead, we had to think about the leading indicators of outcomes.
In the case of catastrophic building incidents, “vacate orders” are a leading indicator. In the case of illegal conversions, remember, our building inspectors go out to all of the 311 complaints. Sooner or later, they are going to find all of the illegal conversions that have been reported and remediate that condition. When we re-prioritize the tickets, we are not altering the total number of illegal conversions that will be found. However, the important part is actually the “sooner” rather than “later” piece. In the case of illegally converted structures, which incidentally are at risk of fire, it makes a huge difference to the residents if we inspect the building three days after a complaint comes in or thirty days later. When we increase the speed of finding the worst of the worst by prioritizing the complaint list, we are reducing our time to respond to the most dangerous places, and we are in effect reducing the number of days that residents are living at risk. We calculated that as a reduction in fire-risk days.
As a result of the success of the program, in our next management report, the Department of Buildings will add two risk-based, outcome-based metrics as their critical indicators of performance measurement. This fundamental shift in how we measure performance is directly attributable to focusing on what is most important in this analytics project: we are reducing the amount of time that people are at increased risk of burning to death and that reduction in time is what we’re tracking.
Routinizing and Operationalizing the Insight
The greatest challenge for the analytics team is moving from insight to action. Insight is powerful, but it’s worthless if the behavior in the field doesn’t change. Getting the analytics into the field is dependent upon creating the lightest footprint possible, so that the intervention doesn’t cause a headache to the worker in the field.
To understand what will or won’t be disruptive, the analytics team needs to get a firm grasp on the way that operations are handled by the front line. When we work with an agency on a project, we shadow them to understand how they actually do their job. Seeing the way that the work is actually done is often very different from how it’s described on paper or in a meeting and is an important step in the process.
Immediately, we discount any intervention that changes the way that the front line works. New training and processes are non-starters because of the immense organizational difficulty in effectively turning battleships and reorienting them around new processes. Even new forms are frowned upon, as they get in the way, or at least change the way, the fieldwork is done.
Our concept is simple—a light footprint means that the solution must be delivered upstream of the front line. If our task is to re-prioritize inspections, we build that automatically into the inspection assignment generation system, so that the assignment is already delineated with a priority level by the time it reaches the inspectors. If our solution is a technological fix that connects two formerly disparate pieces of information and delivers a new piece of information, we make sure that piece of information is being delivered right alongside currently reported data, not in a different, detached method. It sounds simple, but it’s so easy to go wrong. Don’t change the front line process; change the outcome.
Keep Focused on What Will Work
While the buzz around big data seems to have been generated out of thin air, the outcomes associated with it will only come from hard work, with years and years of effort. Just as with any other business or government process, the steps are incremental in nature.
Analytics is not magic, and it’s not necessarily complicated. Analytics really means intelligence, and intelligence is better information that helps us make better decisions. To the extent that we can automate that information gathering and analysis, for instance, in automatically sorting the priority level of work orders, we’re streamlining the efficacy of the approach. The most important thing to remember, however, is that we are not changing the approach.
An effective analytics project is one that gets in and gets out sight unseen. Let the results speak for the project.
When I first joined the Bloomberg Administration at the end of 2009, it was just me at a cubicle, making phone calls, studying organizational charts and data schematics, surfing our open data page to see what was available, and visiting every office I could in my on and off time to see what was going on. It was six months before I hired my first analyst, a fresh-from-college economics major who had won his rotisserie baseball league three years in a row and was preternaturally affable. We tried a few different projects that didn’t end up going anywhere, but taught us extremely valuable lessons about how to make disparate pieces of city data work moderately harmoniously together to tell us the stories we needed to hear. It wasn’t until spring 2011—almost a year and a half after I started this project—that we delivered our first actionable insight. In the two years since then, we have become a central component of the administration’s approach to government, implementing citywide analytics-based systems for structural safety, emergency response, disaster response and recovery, economic development, and tax enforcement—and we’ve only just started to scale out.
This isn’t triumphalism. Moreover, it was far from easy. Tacked up over my desk since my first day is a quote from Teddy Roosevelt, and more days than not, early on, I found myself reading it over and over again.
It is not the critic who counts; not the man who points out how the strong man stumbles, or where the doer of deeds could have done them better. The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood; who strives valiantly; who errs, who comes short again and again, because there is no effort without error and shortcoming; but who does actually strive to do the deeds; who knows great enthusiasms, the great devotions; who spends himself in a worthy cause; who at the best knows in the end the triumph of high achievement, and who at the worst, if he fails, at least fails while daring greatly, so that his place shall never be with those cold and timid souls who neither know victory nor defeat. (Roosevelt, 1910)
What I’m trying to stress is you have to start somewhere, while bearing in mind the following lessons we’ve learned:
- You don’t need a lot of specialized personnel.
- You don’t need a lot of high-end technology.
- You don’t need “perfect” data (but you do need the entire set).
- You must have strong executive support.
- You must talk to the people behind the data, and see what they see and experience what they experience.
- You must focus on generating actionable insight for your clients that they can immediately use with minimal disruption to existing logistics chains.
Above all else, you need to be relentless in terms of delivering a quality product, while remaining flexible in terms of how you do it. For New York City’s analytics program, pragmatic, inventive problem solvers are always welcome, but ideologues need not apply. Finally, you need to remember at all times that the point of all this effort is to help your city and its people thrive. Keep all this in mind. Just dive in and do it. You may be amazed at what you find.