What kinds of operations-enhancing questions have cities asked and answered with data and analytics? The catalog below is an ongoing, regularly-updated resource for those interested in knowing what specific use cases can be addressed using more advanced data and analysis techniques.
For examples that are currently being implemented in cities across the country, you can click to expand the question to see additional information about the solution. All other examples represent potential questions that cities could work to address with data and analytics.
We welcome further submissions to the list by email. Submissions can include either current examples of how cities are addressing specific operational or policy issues with data, or ideas for how to address issues that you hope cities will one day be able to answer.
Health & Human Services
- What is the impact of providing an additional service(s) to a client already receiving one city service?
Since the creation of a unified Department of Housing and Human Services (HHS), Boulder County, CO has been a testament to the benefits of holistic human services delivery. Through its integrated service delivery system, Boulder County has been able to expand the number of residents receiving services by 140%, focusing on front-end and early intervention measures to prevent more costly services in the future. Technology has been a key feature of this transition. The Department, as it exists today, was formed after a 2008 merger between the County’s housing and social services agencies. To support this effort, HHS developed an integrated service delivery system, including technological tools that allow employees to track clients’ case histories across programs, refer clients to additional program areas, and collaborate with other department caseworkers. Read more in "Boulder County Colorado: Integrated Service Delivery" by Sam Gill, Indi Dutta-Gupta, and Brendan Roach.
- Who is most likely to apply for a city service(s)?
- Which clients are most likely to apply for multiple services?
- When clients apply for / obtain multiple services, which service do they typically apply for first?
- Can we forecast the number of caseloads for city services?
Five years ago, New York City launched an initiative, HHS-Connect, to collect its social service data in one place. The idea is to allow clients to walk into different social service agencies without having to re-enter their information and complete duplicate paperwork. “We have a vision of a client walking into, for example, a homeless shelter and not having to reapply your information if you had already been to the public welfare office or to the Administration for Children’s Services,” said Kristin Misner, chief of staff to the deputy mayor for health and human services. Read more in "Big Data Gives a Boost to Health and Human Services" by Stephen Goldsmith.
- How do we help clients leaving the criminal justice system, foster care, homeless shelters, etc. get and keep jobs?
- Which clients coming out of the juvenile justice, criminal justice, foster care, homeless services, or substance abuse systems who are placed in employment are most likely to return to city services?
Although lead paint was banned in the United States in the 1970’s because of the harmful effects exposure can have on children in particular, there may still be older homes with remnants of the dangerous chemical. The City of Chicago Health Department and the Center for Data Science and Public Policy at the University of Chicago (DSaPP) partnered to tackle this specific issue with the use of machine learning and analytics strategies. The goal of the partnership was to help identify homes that are most likely to still contain lead-based paint hazards. The data scientists at DSaPP built statistical models that predict possible exposure based on factors like the age of the house, historical health data on children previously exposed at certain addresses, and economic conditions of neighborhoods. The data sources used by the predictive model include blood lead level tests and home inspection records, combined with a variety of publicly available data, such as census information about neighborhoods and construction data about the size and structure of individual houses. The paper detailing the project said that the model creates a list of at-risk homes with individual risk-scores and that the Chicago Department of Public Health would use this list to prioritize homes it would target with outreach and intervention to engage at-risk families and landlords.
- Which interventions are most effective?
In the span of just one year, Cincinnati managed to decrease its infant mortality rate (IMR) by over 25 percent, from 13.3 deaths per 1,000 live births in 2012 down to 9.9 in 2013. To accomplish this feat, the city has incorporated and leveraged relevant data to concentrate its efforts where they are most needed. Since 2007, this targeted undertaking has tracked various indicators and outcomes, such as mother’s zip code, race/ethnicity, mental health, and smoking habits, as well as the child’s birth spacing and sleeping environment. By using data to zero in on quantifiable risk factors and on at-risk communities, Cincinnati is making major strides on a difficult undertaking, the fight against infant mortality. Read more in "Using Data to Combat Infant Mortality in Cincinnati" by Victoria Kabak. Maryland and Indiana also have notable successes in this area.
Improved analytical capabilities allow agencies to identify and implement more effective services for clients. In Oklahoma, as part of the SoonerCare (Medicaid) program, officials analyzed patient data including comorbidity factors to identify individuals prone to poor health outcomes. Equipped with a list of at-risk Medicaid recipients, managers have worked to sign these individuals up for intensive, managed-care programs. Meanwhile, the Rhode Island Department of Children, Youth, and Families developed the Real Connections program, which analyzes data on a child’s social network. Using this analysis of existing information, the Department is able to identify mentors best suited to enable the best outcome for each child. Read more in "The Technology Opportunity for Human Services" by Sam Gill, Indi Dutta-Gupta, and Brendan Roach. Stephen Goldsmith's "The Nexus Between Data and Public Health" highlights additional examples of health data that can help policymakers improve the health of their communities.
Louisville, KY partnered with Propeller Health in May 2012 to distribute 500 smart inhalers to asthmatic residents. When the devices were used, they sent time and location data to both the patient’s physician and city officials, who used the data to generate “heat maps” of emergency asthma attacks. With the help of data analysts at IBM, public health officials compared the trends against a variety of potential causes — including air quality, pollen outbreaks and traffic congestion — to strategize interventions in the most at-risk areas. Today, the project continues. The city plans to deploy bike-mounted sensors to monitor air quality along routes that are frequented by children during the summer. Read more in "Health Data Isn't Just for Hospitals" by Stephen Goldsmith and "Monitoring Air Quality and the Impacts of Pollution" by Laura Adler, which discusses additional examples of sensor-based air quality monitoring.
Argonne National Library and the Chicago Department of Innovation and Technology partnered to develop the Array of Things, a citywide network of sensors mounted on lampposts. Among other uses, these sensors track the presence of a number of air pollutants, including carbon monoxide, nitrogen dioxide, ozone, and particulate matter, with plans to monitor volatile organic compounds (VOCs) in the near future. Chicago has used this data to predict air quality incidents in order to take preventative action and has released data to the public via the city’s open data portal. Barcelona has pursued a similar strategy with its Barcelona Lighting Masterplan, deploying a smart lighting system with embedded air quality sensors that relay information to city agencies and the public.
More than 65 cities worldwide including Boston, Los Angeles, and Miami have installed Soofa benches, park benches equipped with a solar panel that channels electricity via USB ports to charge devices. These benches not only serve as a social space and sustainable source of energy, but also house sensors that record air quality, temperature, traffic, and radiation.
In partnership with Google, the Environmental Defense Fund (EDF) has used Street View cars to measure methane levels in eleven cities by equipping cars with an intake tube and methane analyzer. Using this data, EDF has created methane maps and identified more than 5,500 leaks. In 2014, Google began exploring more broad air quality monitoring, equipping Street View cars with Aclima’s Environmental Intelligence (Ei) mobile platform, which includes sensors that can measure particulate matter, NO2, CO2, black carbon, and more. During a pilot test in Denver, the car collected more than 150 million data points over 750 hours of driving, creating a street-level air quality map of the city.
In 2016, London attached air quality sensors to ten pigeons in order to monitor air quality over three days of flights. The city sent the pigeons across London carrying 25-gram sensors that monitored levels of nitrogen dioxide, ozone, and other volatile compounds. During the flights, Londoners could inquire about pollution levels in their areas by tweeting @PigeonAir, which would respond with readings ranging from moderate to extreme.
In New York City, MIT’s Senseable City Lab used anonymized cellphone data paired with air quality measures to determine the amounts of different chemicals to which New Yorkers are exposed. For example, the study determined that those who live and work in Manhattan are exposed to more pollution than residents who commute to the outer boroughs. This type of analysis goes beyond efforts to merely describe air quality in a city, outlining direct impacts on residents. Read more in “How Cities Are Using the Internet of Things to Map Air Quality” by Chris Bousquet.
In a trial in 2014, the city of Dublin fitted 30 bikes with air sensors measuring carbon dioxide, carbon monoxide, smoke, and particulates. In three days, these bikes gathered data for the entire city, which researchers across the country studied and mapped. Read more in “Pedal-Powered Data” by Daniel Curtis.
- Which individuals and families are most at risk of returning to the homeless services system?
- Which individuals and families placed into permanent housing are most at risk of returning to the homeless services system?
- How can we improve homeless prevention programs by identifying individuals and families most at-risk for homelessness?
New York City's Department of Homeless Services (DHS) has partnered with academics to develop customized risk assessment tools that support caseworkers in determining the best approach for each client during the screening process. DHS is also exploring more proactive approaches, partnering with the SumAll Foundation to analyze data on eviction notices to predict which cases are most likely to result in homelessness. While the pilot is still being tested in specific neighborhoods, the analytics will eventually become part of the City’s data visualization project that allow staff to visualize neighborhood data such as shelter entries and eviction filings, while also being able to tell caseworkers which of the thousands of households or buildings on the map are actually most at-risk of shelter entry. By tailoring outreach efforts and reducing barriers to access, DHS can provide more services to more at-risk families. Read more in "Data-Driven Strategies for Reducing Homelessness" by Lyell Sakaue.
- Which public housing residents are most likely to be placed into employment?
- Which city services have the greatest impact on reducing entry into homeless shelters?
Data Science for Social Good partnered with the Illinois Department of Human Services to identify women at risk for having adverse births, which are associated with negative personal, financial, and developmental outcomes for both the mother and the baby. They identified risk factors, including stress, socioeconomic factors, substance use, quality of life indicators, healthcare access, and age, that could be used to predict women at high risk for adverse births. The state then provided targeted programs and assistance to these identified high-risk women, giving them needed assistance. Read more on the DSSG site.
Data Science for Social Good and the Mesa Public School System are using Mesa's education data to identify students who are off-track for their future plans. Using students' classes, grades, test scores, and attendances, they can predict which students are college-ready but may not be applying to college, only applying to two-year programs, or will enroll but not graduate from college. These students can then be given extra support and resources to empower them to apply, graduate, and succeed. Read more on the DSSG site.
- Which client characteristics indicate that a client will leave a homeless shelter without a subsidy?
- Which clients would benefit the most from housing services?
- Which seniors in need of services and resources currently aren't receiving them?
A Texas law requires public schools to record fitness data on every student. Through data-sharing agreements with the school districts, Austin-based nonprofit Children's Optimal Health (COH) gathers metrics on BMI and cardiovascular fitness scores that are geo-tagged with social and economic information. COH converts de-identified person-level data to aggregate neighborhood-level maps that illuminate the conditions faced by families and children in the area, all while protecting personal information. Enhanced with other datasets, these maps tell a more complex story of the factors that influence health outcomes — from proximity to fast food restaurants to the stress of high neighborhood crime rates. Read more in "Austin Targets Youth Obesity with Neighborhood-Level Data" by Stephen Goldsmith.
When Bloomberg Philanthropies launched the Mayors Challenge, issuing a call for the most innovative proposals by cities across the United States, Providence, RI Mayor Angel Taveras seized the opportunity to seek out a new, creative solution to a serious issue in his city. After reviewing a number of ideas, he and his team ultimately developed the program now known as Providence Talks. The team discovered research that shows that high-income children hear an average of 30 million more words than their low-income peers in the first three years. The seminal study of language environment by Betty Hart and Todd Risley showed that the amount of conversation children had with their parents by age three was positively associated with their IQ scores at that age, along with a host of other positive outcomes. Providence decided to apply this important finding by creating a city-led effort to close the word gap using innovative technology: devices that can record and allow measurement of the auditory environment of children. A preliminary study demonstrated that sharing feedback reports generated by the recording devices led caregivers to increase the number of words spoken to their children by 55 percent. Simply having the information about the number of words their children were hearing inspired them to talk more, read more, and interact more with their children. The children’s increased exposure to words was the single greatest predictor of improved language skills and learning readiness before entering school. Read more in "Providence Talks: Progress on Closing the 'Word Gap.'"
West Nile virus, an ailment once rare and relatively unknown in the United States, is now an annual danger in many suburban communities. In Suffolk County, New York, a large suburban and rural county on Long Island, officials began seeing West Nile cases in the early 2000s. The county developed a model to assess the risk of outbreak using a combination of statistical methods and geographic information systems. Through modeling, they found relationships between human West Nile cases, landscape factors, population demographics, and weather patterns. Initial results showed a complex interaction between these factors and human cases of West Nile virus. Using this hot-spot analysis, Vector Control now targets larvicide efforts in established hot-spots and uses aerial adulticide spray only where quantitative evidence supports the use of pesticides. By being strategic in the use of analytics, the agency has saved time and money, while still providing a high level of public safety. Read more in "Predictive Tools for Public Safety" by Stephen Goldsmith.
Zika is another mosquito-borne illness that is making headlines in U.S. cities. In "How U.S. Cities Can Target Zika Risk," Jon Jay analyzed data from Miami and found relationships between vacancy, poverty, and Zika outbreaks. He applied that analysis to New Orleans and Houston, suggesting that vacant housing reduction strategies could help such at-risk cities tackle the challenge of Zika.
Internet of Things technology is helping cities monitor water flow to optimize water pumping and reduce the amount of water lost to leakage. Read more in "Come Drought or High Water" by Laura Adler.
Back in 2011, Transport for London, the transit agency behind the London Tube, collaborated with a group from the University College of London to study the daily operations of the subway system via the familiar Oyster fare card. The result was a paper detailing how the commuting patterns of individuals coalesce into a massive, crowded network of movement on the Tube, resulting in congestion and strain at important system hubs. The smart card Oyster system allows researchers to collect data on the journeys of individual travelers (with the assurance of safeguards to protect customer privacy) to understand the complex dance of the metro system. The data visualization that the study produced allows researchers to model theeffects of various situations on congestion patterns; now planners can determine exactly what would happen if mechanical failures were to slow trains on a particular line or cause other service problems. Read more in "Streamlining the London Tube with Data" by Nick Carney.
- How can we prioritize tree trimmings and removals?
Time, weather, and deferred maintenance have not been kind to many of New York City's East River crossings. The Brooklyn Bridge, an engineering marvel of its time, shows its age through the cracks in the masonry vaults that support the bridge's roadway over Manhattan. Fiber-optic sensors monitor these cracks, as well as other indicators such as temperature fluctuation, to assist structural engineers in determining when the vaults will ultimately need to be replaced. Further up the East River, on the Williamsburg Bridge, a series of interferometric and fiber Bragg grating sensors (both capable of measuring light waves) monitor wire deformation and breakage on the span's century-old suspension cables. Rather than make an annual manual inspection, engineers have access to continuous data, which can tell them if an individual strand in one of the bridge's cable is about to break. Read more in "How to Save America's Crumbling Bridges" by Stephen Goldsmith.
The Santa Clara Valley Water District in Santa Clara County, CA manages a network of natural and man-made infrastructure that supplies 1.8 million residents with water. In an effort to go paperless, district field staff was armed with GIS tablets to survey waterway infrastructure, cataloging and assessing the condition of levees and other assets. These data are now fed back into the district’s asset management software, allowing the agency to not only see infrastructure conditions but to make smart decisions about future investments. According to Esri, more than 4,000 paperless inspections have been processed since 2012. Read more in "Open Data's Road to Better Transit," by Stephen Goldsmith.
- How can we reduce outage rates for agency fleet without purchasing new vehicles? Under what circumstances do outages take place most frequently?
- How can we reduce accidents involving city vehicles? Where and when do most accidents involving city vehicles occur?
A number of US cities have implemented the Vision Zero initiative, a campaign that aims to eliminate traffic fatalities through education, enforcement, and engineering. As a part of the initiative, San Jose partnered with the Department of Transportation (DOT) and used data analytics and GIS analysis to identify 14 priority corridors where most major injury accidents occur. In response, the city has adjusted deployment times for Traffic Enforcement Unit (TEU) officers to align more closely with peak times of traffic collisions and provided them with GIS maps of high-incident intersections. Read more in “San Jose Improves Traffic Safety With Data” by Kevin Miller.
The Green City, Clean Waters program in Philadelphia is a city-wide low-impact development approach to mitigating the city’s combined sewer overflows (CSO). The program integrates very low-tech interventions like rain barrels and street trees with very high-tech data collection and analysis. The city is addressing a problem faced by many others that began before the 1950s: a CSO system wherein there is no physical division between stormwater and the sewer system responsible for wastewater coming from homes and businesses. This means when a big storm rolls in, the system becomes overwhelmed, stormwater and wastewater mix, and a toxic effluent is discharged into waterways, degrading the environmental quality for both plants and animals as well as citizens who may live or recreate near these rivers and streams. Luckily, big data analysis and a profusion of sensors spread within the city’s sewer system provide this vital piece of the puzzle, lending some big-technology insights to what is a purposefully low-tech, low-impact approach to attacking the CSO problem. Since the program’s conception, an extensive and quantitative evaluation plan has been in place. Philadelphia pulls data from sensors throughout the system (originally purposed just to warn departments and citizens of overflows) to see if the approach is really working, and also conducts health quality tests in various bodies of water to check if there is a substantive long-term impact. Real cost comparisons can be made between different elements of the program, allowing the city to adjust its plans over time and maximize the returns of each program dollar spent. Read more in "Low-Tech Solutions Meet Data Analytics in Philadelphia's CSO Approach" by Benjamin Weinryb Grohsgal.
Elsewhere, Chicago has recently begun an initiative to use sensors to manage stormwater. In a pilot project, underground sensors in test areas are collecting data on stromwater runoff to enable targeted depolyment of green infrastructure. Read more in "How a Smart City Tackles Rainfall" by Sean Thornton.
And in San Francisco, through a partnership between the Public Utilities Commission (SFPUC), the city data office (DataSF), and Code for San Francisco, the city produced an application called Adopt a Drain that allows residents to claim storm drains and agree to clear any debris in advance of storms. Thanks to this civic-engagement tool and supplies and training from SFPUC, 1,500 drains have been adopted and regularly cleared by San Franciscans. Read more in “Alleviating Flooding in San Francisco With Civic Tech Volunteers” by Blake Valenta.
- How can we better predict where the next major street light cable failure will be?
- Can we predict what areas have more open hydrants?
- Where should snow removal happen first?
Boston's Mayor's Office of New Urban Mechanics created a crowd-sourcing mobile app called Street Bump that helps residents improve their neighborhood streets by collecting road condition data while they drive. With StreetBump, citizen phones can report rough stretches of road to the City automatically as they drive over them, providing the City of Boston with a useful and cost-effective way of identifying which of its streets need work. Read more in "Beyond 311" by Stephen Goldsmith.
Cities have begun using digital monitoring techniques for roads in order to enable more frequent and accurate assessment of infrastructure quality. Some, like Cincinnati, have pursued vehicle-based monitoring, equipping vehicles with cameras, lasers, and sensors that identify road surface issues and create georeferenced images. Other cities, including Boston, have turned to smartphone apps that use devices’ cameras and accelerometers to track road quality. These innovations allow cities to identify road quality issues and target interventions before these problems become major pavement failures. Read more in “Sensors and Smartphones: Technological Solutions for Monitoring Road Conditions” by Paul Lillehaugen.
- Which intersections are likely to be blocked, and when?
UPS has a program in place called ORION—the On-Road Integrated Optimization and Navigation program — a sophisticated algorithm that ensures UPS vehicles take the most time- and energy-efficient routes. The program saves the company millions of miles each year, which adds up to hundreds of millions in savings and extreme reductions in CO2 emissions. The key to the program’s success is combining computational cost-cutting with functional, driver-friendly goals, like maintaining consistent routes and delivery times. This program provides a viable model for cities looking to improve routes for garbage trucks or other city vehicles. Cities including Boston, Philadelphia, and Raleigh, N.C. have begun taking steps towards route optimization for trash trucks, deploying smart bins that notify the appropriate agency when full to inform routes. Pairing this data with information on how busy a street is at a given time of day, how much a given neighborhood recycles, or where the trash goes after it's collected could make routes much more efficient. Read more in Stephen Goldsmith’s article “What a Brown Delivery Truck Could Teach Government.”
- Which indicators can help to identify areas with the greatest amounts of idling?
Washington, D.C.'s Urban Forestry Administration is exploring a model combining lidar and elevation data to find the best places to strategically plant trees to mitigate stormwater. Read more in "How D.C. Grew a Data-Driven Tree Strategy" by Stephen Goldsmith.
- Under what circumstances do residents throw recyclables in the trash instead of in recycling bins (and how can we mitigate this in order to increase recycling diversion rates)?
- What are the current refuse locations in the city? Which receive the highest amount of complaints?
In 2013, San Francisco began operating a real-time, web-based case management system across the the Departments of Public Health (DPH), Juvenile Probation (JPD), and the Human Services Agency (HSA) to systematically identify at-risk youth that were clients of multiple city social services. Together, these agencies found that “Crossover clients” of multiple systems were at strikingly increased risk of committing a serious crime. 51 percent of San Franciscans involved in multiple service systems were convicted of a serious crime; a third had been served by all three agencies; and the overwhelming majority (88 percent) of these youth committed the crime more than 90 days after becoming a crossover client – a critical window during which, the analysis suggested, case workers may be able to intervene. Read more in "Getting Data to the Good Guys" by Christopher Kingsley and Stephen Goldsmith.
Working with the Charlotte-Mecklenburg Police Department in North Carolina in 2015, data scientists from the University of Chicago’s Data Science and Public Policy (DAaPP) developed an Early Intervention System (EIS) to identify individual officers most at-risk for adverse interactions with citizens. The team built a machine learning model that creates a ranked list of officers based on historical data and situational factors that police departments can use to target individual officers for specialized training. This provides a more effective measure for prevention against excessive use of force than simply instituting broad training programs. The DSaPP paper on this machine learning models describes a ranking of past use of force events in historical data with the designations; “not justified, preventable, and sustained.” This data serves as a historical basis for identifying past officers at-risk for future adverse incidents and is combined with other data on dispatch events, criminal complaints, citations, traffic stops, and arrests. The machine learning model “significantly outperformed” the existing within the Charlotte Mecklenburg Police Department used for targeting counseling and training. The model also provided individual risk scores and will more accurately allow the department to allocate resources and reduce unnecessary administrative tasks. The data can also be used at the dispatch level, allowing a dispatcher to recognize an officer that may be less suitable for a specific call because of the associated risk score for adverse citizen interactions.
New Orleans' NOLA for Life campaign analyzes data to determine likelihood of homicide, then targets its campaign components specifically at four neighborhoods where 40 percent of the city's homicides occur despite being home to just 19 percent of New Orleans' residents. And on an even more granular level, the campaign has sought to identify 200 New Orleans students who are most at risk for violence, with the goal of involving them in preventive programs. Read more in "How New Orleans is Winning a War Against Murder" by Stephen Goldsmith.
- Which offenders are most at risk of committing domestic violence?
In 2006, violent re-offenders established Philadelphia as one of the murder capitals of the United States. Philadelphia’s Adult Probation and Parole Department (APPD) oversaw 50,000 individuals, with only 295 probation officers. To manage the escalating crime, the APPD needed a systematic way of identifying the riskiest individuals and dedicating staff resources accordingly. If the APPD could accurately categorize recently paroled individuals as low-, medium-, or high-risk for potential to commit violent crime, the agency could save time and money and reduce the likelihood of violent recidivism. They turned to sociologist Richard Berk, who built a predictive engine based on tens of thousands of individual criminal records, with dozens of variables such as age, gender, previous zip code, number of previous crimes, and type of offense. This intelligent, machine-learning model enables the computer to find patterns and relationships across dozens of variables and constantly reassess those relationships as new data is added. Read more in "Predictive Tools for Public Safety" by Stephen Goldsmith.
The Data-Driven Justice Initiative started as a White House mission in 2015 to break the “cycle of incarceration” and as many as 67 cities, counties, and states signed on to join the bipartisan effort. The municipalities agreed to install innovative solutions to divert potential incarcerees to mental illness or substance abuse treatment, using data analytics to target the individuals most in need of services.
The Data Science and Public Policy team (DSaPP) at the University of Chicago created an accurate Early Intervention System (EIS) that efficiently identifies individuals at risk of contact with the justice system so that the appropriate local agencies that employ this model can provide the necessary services and interventions. The data-driven EIS analyzes a jurisdiction’s mental health, justice, emergency medical response, and social services data to identify these individuals.In 2015, DSaPP partnered with Johnson County, Kansas to install a prototype EIS, combining data from the county’s mental health services, criminal justice system, and local EMS. The data scientists at DSaPP generated a list of 200 people most at risk of coming into contact with the criminal justice system that equipped county officials with a prioritized group most in need of intervention so as to save money from jail costs and reinvest those funds to bolster preventative measures.
- Which service(s) offered to juvenile delinquents have the greatest impact in reducing recidivism?
In summer 2012, Seattle had an unexpected uptick in gun-related crimes. The city increased the number of officers patrolling the streets. As a result, the gun-related crimes decreased, but at high cost to the city. In response, the city began to consider predictive policing software.In late February of this year, Mayor Mike McGinn announced that Seattle implemented predictive policing software in two precincts. The idea behind predictive policing is that police departments have a wealth of data that has been collected over a number of years for every neighborhood and block of a city. By using that pre-existing data that can tell a story about past experience, police cruisers can patrol areas that match the same characteristics to prevent crimes from occurring. The software uses data from 2008 to predict potential crime and it is estimated to be twice as effective as a human data analyst working from the same information. For a cost of $73,000 for the software and an additional $45,000 per year for maintenance, the price of the predictive policing software in Seattle will likely limit the need for additional officers on patrol and reduce the number of arrests through place targeted patrolling and deterrence. Read more in “Seattle’s Predictive Policing Program” by Jessica Casey.
ShotSpotter works with municipalities to provide instantaneous gunfire alerts to police departments across the country. The core of ShotSpotter’s service is a wide-area acoustic surveillance system, supported by software and human ballistics experts, all focused on accurately detecting gunfire. The company mounts waterproof, watermelon-size, acoustic sensors on rooftops across a city. Networked together, an array of sensors can triangulate the incident location accurately in real time. If ten sensors detect a shot, the array can determine the incident location with a two-foot margin of error. ShotSpotter guarantees that it can accurately detect 80 percent of gunfire in coverage areas, although actual detection rates are as high as 95 percent. The technology has been implemented in 75 cities and towns across the United States, including Washington, D.C., and Milwaukee. Read more in "Predictive Tools for Public Safety," by Stephen Goldsmith.
Huntington Beach, CA is monitoring real-time social media data for keywords that suggest problems might occur in order to deploy officers. Read more in "Learning from Location" by Laura Adler.
In 2014, police in Prince George's County, MD, found themselves faced with an alarming increase in armed robberies of commercial establishments.To reduce incidents of armed robbery, police analyzed crime data and identified nine business corridors where the robberies were concentrated, and they also zeroed in on 11 7-Eleven convenience stores outside the corridors that were the most likely to be targeted. Then they drilled down further, figuring out that Tuesdays, Thursdays and Saturdays were the nights when robberies were most likely to occur. The department deployed personnel based on the times and places where robberies were most likely to happen, but didn't stop there. Message boards on roadways in the targeted areas informed motorists (and warned potential criminals) that police operations were underway. Unoccupied police cars were parked in 7-Eleven parking lots and periodically moved. During the month that the county conducted this trial in innovative policing, armed robberies were reduced 40% compared to the same period the year before. Read more in "Harnessing Data to Fight Crime in Maryland" by Charles Chieppo.
The Fire Department of New York Emergency Medical Service's (EMS) historical databases, already enormous, are steadily becoming far more useful for predictive analytics and other purposes: EMS's improved ability to spot patterns and trends can have a major impact on pre-hospital care. For starters, EMS can now compare the call type assigned to a 911 contact (based on what a caller says under emotional pressure) to the disease or complaint EMS actually finds when it arrives on the scene; knowing how people tend to mis-describe what's going on can help EMS change what operators ask of callers. Better data, better call-center scripts, better patient outcomes.Read more in "Wireless EMS in New York City," by Susan Crawford.
Louisville Metro Emergency Medical Services (LMEMS) has sped up its ambulances’ turnaround times (the amount of time it takes from when an ambulance unloads a patient at a hospital until the crew becomes available to respond to another service call) in two ways. The first is by recording the time intervals for each step of its emergency responses with the Computer Aided Dispatch (CAD) system. This tool not only allows them to find which steps of the emergency response contain the greatest inefficiencies, but also holds ambulance crews accountable.The other is by monitoring the real-time location of the ambulances in the field. Using this tool, they can see the activity of their ambulance fleet, and communicate with crews to help them avoid any potential backups, or find out why straggling ambulances are not up to speed. By using data to identify obstacles to ambulance speed and hold ambulance drivers more accountable, the city has reduced its average ambulance turnaround time dramatically and saved the city $1.4 million dollars. Read more in "Stretch Goals" by Matthew McClellan.
Can we use historical 911 and 311 call volumes to adequately staff and schedule their call floors, at various times of the day/week/month/year?
New Orleans is looking to save lives by using data to predict which of the city’s buildings need to be equipped with fire alarms. By compiling data from sources like the 2011 American Housing Survey, the 2013 American Community Survey, the 2010 Using data collected by the Census and NOFD, the city determined that poverty among building inhabitants, building age and how long the residents have lived in a building are the best predictors that a structure may not have a smoke alarm installed. The city then determined that those over 65 and under 5 are most likely to die in building fires. It took the age data, added information about which areas of the city saw the highest concentration of fires over the previous five years, and mapped it. Finally, the likelihood of having a smoke alarm, residents' age and fire-concentration data were combined to rank every zone of the city based on the need for smoke alarms. NOFD is using the data to focus its door-to-door program to install free smoke alarms. Read more in "Predicting Fire Risk: From New Orleans to a Nationwide Tool" by Katherine Hillenbrand.
Direct Relief developed a social vulnerability index through demographic and housing information, and correlated those data against the constant stream of risk-assessment models generated by FEMA. Direct Relief could forecast where the medical needs would be, even before the storm made landfall. This data-driven modeling helped Direct Relief overcome the communications challenge in the first 48–72 hours after the storm. Health providers were completely out of contact—cell service and phone lines had gone down. There was no way for Direct Relief to know which providers needed assistance. With limited contact, Direct Relief used proxies, such as the electric-grid outage maps and whether local pharmacies were down in a particular area, to predict which groups needed assistance. Direct Relief volunteers were then sent to clinics in these vulnerable areas to confirm on-the-ground needs and coordinate medical-supply delivery. Read more in "Predictive Tools for Public Safety" by Stephen Goldsmith.
In partnership with the Event and Pattern Detection Laboratory (EPD Lab) at Carnegie Mellon University, Chicago’s Department of Innovation and Technology (DoIT) is taking on a predictive approach to the “war on rats.” By using data in innovative ways to help keep rat populations down, Chicago is putting to use a new strategy that can not only enhance rodent control initiatives, but add precision to other strategies that address a wide range of urban problems. Read more in "Using Predictive Analytics to Combat Rodents in Chicago" by Sean Thornton.
- How can we anticipate where vulnerable people will need help evacuating?
- Can we identify power outages in real time and coordinate emergency services' response?
- Can crowdsourced information be used to improve the delivery of pest control services?
The Atlanta Police Department wanted to reduce its dispatch time and improve its efficiency in employing human resources, so they turned to a team of data scientists for help. To find a solution to these problems, the team analyzed five years of data or approximately five million dispatches. It quickly became clear to the fellows that the traditional notion of workload (dispatch volume) did not capture the complexity of the work observed during the site visits. To weight dispatches more appropriately, a simple survey was developed by the team and then completed by 30 random dispatchers. Weights were then applied to dispatch types using a distribution from the survey results, effectively turning the notion of workload into an index. Coupling this with several other predictors, the team was able to develop a model to test different scenarios. One scenario that has gained traction as a result of the analysis is the movement of administrative dispatches (e.g. extra job check in and check out) to a single dispatcher, which creates greater availability for other dispatchers to focus on priority dispatches. Read more in "Optimizing Atlanta’s 911 Systems with Data Science" by John Zimmerman and Jon Keen.
At the height of Hurricane Sandy, New York City’s 911 switchboard was receiving 20,000 calls an hour, many of which were not emergencies. The call volume led to slow response times and a lack of prioritization; there was no way to distinguish calls for downed tree branches from people in life-threatening situations. An important first step for future preparation is better educating citizens about what qualifies as a 911 call and what can be relegated to a non-emergency 311 call, an effort the city is undertaking now. The 311 hotline could be shifted to function primarily as a reporting mechanism, especially at times of disaster when city services and 911 phone lines are becoming overwhelmed. Even if an immediate answer isn’t guaranteed, text and data analytics of 311 texts, calls, and social media posts could allow these services to give first responders a better picture of where to focus their efforts. Citizens should be informed that any communication they send to the government would be registered in this way, bringing attention to their particular problems as well as those of their neighbors, while enhancing the entire city’s response capacity. Read more in "Getting the Lights on Faster" by Benjamin Weinryb Grohsgal and Stephen Goldsmith.
The Greater Cincinnati area created Raven911, a regional map-based program designed to enhance situational awareness in times of disaster. Read more in "Raven911 Gives Emergency Responders a Bird's Eye View" by Daniel Curtis.
Completed in 2010, Austin’s Flood Early Warning System (FEWS) combines flood maps, real-time data, and predictive modeling to make better evacuation decisions and plans in response to imminent flooding. The new system can predict which streets will become flooded and impassable up to 6 hours beforehand and map flooded areas and road closures, replacing an old system that only displayed flood danger levels of locations and often caused evacuations to take place once flooding had already occurred. Read more in "Forecasting Flooding in Austin" on Data-Smart City Solutions.
In Europe, four cities are experimenting with X-band radar’s capacity to predict flash floods by counting every raindrop as part of the RainGain project. This innovation isn’t just about weather reporting: Real-time quantified rainfall data has the potential to help cities dynamically predict floods and deploy infrastructure to curb damage. Read more in "3 Ways to Optimize Urban Infrastructure" by Stephen Goldsmith.
Smart grids, and particularly smart electric meters, played a promising role in improving disaster response and the speed with which power could be restored after super storm Sandy downed power lines across the east coast in 2012. That role was small-scale and local, since electric utilities' conversion to smart-grid technology has been slower than desired, but the potential is there for the technology to have a much larger impact as these systems are rolled out more widely. At best, phone calls and spotty service-outage reports can slowly piece together a hazy picture of the conditions of a power network. But smart meters, programmed to send out a "last call of distress" when power is lost, can automatically report service cuts. This gives a utility company instant access to regional maps of outages, allowing it to prioritize repair-crew mobilization and begin getting service back to customers without them even having to report an outage. Additionally, smart meters can automatically report getting back on line when power is restored, eliminating unnecessary calls between the utility company and customers or follow-up service-crew visits. Repair crews can move on to the next repair rather than spending time checking on their last one, increasing efficiency and reducing system repair time considerably. Read more in "Getting the Lights on Faster" by Benjamin Weinryb Grohsgal and Stephen Goldsmith.
- Can we determine where unsafe housing problems are unlikely to be reported through 311?
- How can we use analytics to prioritize accessibility inspections for building alterations, and make sure they are compliant with municipal building code and state accessibility requirements?
- Who is most likely to be guilty of financial crimes and fraud?
In 2011, Boston created the Problem Properties Task Force, a cross-agency committee that works to address and preempt community disorder by identifying the city’s most risk-prone and risk-causing addresses. Each contributing agency (there were eight) furnishes their respective existing datasets, which are consolidated with data from the Mayor’s 24-Hour Hotline and then processed by the City’s High-Performance Analytic Appliance (HANA). Each member then contributes his or her department’s records and incoming public complaints to complement the analytics engine results in generating a comprehensive picture of the city’s most problematic residences. Then, using these sources of intelligence, the task force determines the appropriate action to take. This could mean increasing police surveillance, expediting enforcement proceedings by the Air Pollution Control Commission, levying of charges to recoup public costs, or commencing foreclosure proceedings against a property owner with delinquent real estate taxes. Read more in "Problem Properties: A Preemptive Strategy Toward Neighborhood Stability" by Craig Campbell.
- How can inspectors reduce response time to maintenance complaints?
Detroit, possibly the city hit hardest by blight and vacancies, is leading the way in attacking the problem in a data-driven method. Detroit’s Blight Removal Task Force, deploying more than 200 people over 14 weeks, has successfully surveyed more than 99 percent of the city’s 380,217 properties. Information collected onsite, including photographs, lot characteristics, condition of structures and the owner, is sent wirelessly to the operations center, where it is checked while the team is still at the property. This information has helped identify candidates for demolition and areas of elevated safety concern. "Data Helps Calculate the True Costs of Blight" by Stephen Goldsmith.
- How can we prioritize annual elevator safety inspections? For example, can we predict or identify which elevators pass every year and could be outsourced to a 3rd party?
The negative effects of blight and building vacancies can spread through an area in a city quickly, emphasizing the need for proactive data-driven strategies. The University of Chicago team of data scientists at the Center for Data Science and Public Policy (DSaPP) worked with the Cincinnati Department of Buildings and Inspections to develop a predictive model that allows for early intervention by building inspectors at homes and properties most at risk of vacancy or violations. The predictive models the team at DSaPP developed combined data about home values, fire, crime, tax, census, and water shutoff information with historical inspection data to develop a list of properties prioritized by their need for inspection. The logic is that the earlier an inspector can visit a property likely to be in violation of city code, the earlier problems can be addressed, and the more likely it will be that the property is fixed as opposed to abandoned. DSaPP’s blog post detailing the project says that the traditional method of property inspections being triggered by citizen complaints leads to a violation being found in 53 percent of cases. The initial results from 2015 when using the predictive model increases the likelihood of finding a building code violation in a specific property to 78 percent.
In order to address urban blight, New Orleans has used behavioral science to improve voluntary homeowner compliance. The city’s Office of Performance and Accountability (OPA) tested a new step in the code violation process whereby the city sends homeowners a letter stating that a 311 complaint has been made about their property. In the pilot, the letter resulted in a 6 percent decrease in observed violations and the city has now fully implemented the policy in the Code Enforcement Department. Read more in Katherine Hillenbrand’s piece “New Orleans Brings Data-Driven Tools to Blight Remediation.”
New Orleans noticed that using the traditional system for making decisions about what to do with blighted properties—tasking a single director with working through the complex list of factors and deciding the fate of homes—created a significant backlog of properties awaiting decision. In order to streamline the process, the city’s Office of Performance and Accountability (OPA) partnered with data science startup Enigma to create a machine learning algorithm called the Blight Scorecard that assigns properties a score from 0-100 to aid in the decision making process. To create the tool, the OPA manually scored over 600 test case properties on a number of factors, honed the list of criteria and determined how they affected the outcome, and then tested algorithms to see if a model could be trained on this data. Read more in “New Orleans Brings Data-Driven Tools to Blight Remediation” by Katherine Hillenbrand.
With assistance from the Mayor’s Office of Analytics using a hotspot analysis, New York City's Business Integrity Commission cross-referenced industry data on grease production with restaurant permit data and sewer back-up data from the Departments of Health (DOH) and Environmental Protection (DEP) to better target enforcement and predict illegal activity. Since launching this partnership effort with DEP in the Fall of 2012, they have achieved an increase in violations by 30% while achieving a 60% reduction in manpower dedicated to grease enforcement. Read more in "Enforcement and Data" by Shari Hyman.
- Which construction / renovation projects are the highest risk / should be inspected first?
- Which buildings are the highest risk / should be inspected first?
- Which equipment (such as boilers, elevators, cranes, vehicles, etc.) is the highest risk / should be inspected first?
Diners who suffer food poisoning rarely report it through official channels, even though foodborne illness is a public health concern. However, sick, unhappy customers have incentive to vent their complaints on Yelp, a popular app and website for local business reviews. New York City’s Department of Health and Mental Hygiene recently completed a pilot project in partnership with the company aimed at identifying unreported outbreaks of foodborne illness. Working with software developers at Columbia University, city researchers converted nearly nine months of Yelp reviews into machine-readable data. They were then able to pinpoint potentially hazardous establishments by reviews that included terms such as “sick,” “vomit” or “food poisoning.” Scanning 294,000 restaurant reviews in New York, the software flagged three restaurants that together produced 16 documented illnesses. When health inspectors subsequently visited these establishments, they discovered astonishing health code violations: improperly sanitized surfaces and bare-hand contact with ready-to-eat food at the first two, and live roaches and evidence of mice at the third. Read more in "How Social Media Listening Can Improve Public Health" by Stephen Goldsmith.
In a recently completed pilot program, Chicago used analytics to improve the process by which health inspectors identify "critical violations" in food establishments, usually related to improper food temperature. Here's how it worked: The city processed relevant data to identify predicting variables associated with violations, developed a model, ran a simulation and then used this forecast to allocate inspections in a way that prioritized likely violators. This data-optimized trial method sped up the process of identifying critical violations by seven days — meaning that restaurant patrons are that much less likely to contract a foodborne illness. Read more in "Chicago's Data-Powered Recipe for Food Safety" by Stephen Goldsmith.
- What variables affect inspector productivity and which can be most easily influenced? What distinctions can be made between inspectors who complete a high number of inspections and those who are at the bottom end?
- Based on the relationship between inspections and violations, what building inspection regimens are most effective at preventing violations from occuring?
- How many inter-agency inspections are conducted each year? Do they effectively detect current violations?
- Which city debts are least likely to be paid?
- Which taxpayers are least like to pay?
In New York City, a Finance Department auditing team decided to use analytics to increase the productivity of auditors reviewing companies thought to be underpaying their taxes. Using sophisticated data analytics, the commissioner instructed his department to look for patterns—identifying individuals who had businesses similar to others but who stood out as outliers on taxes paid. In so doing, the team reduced the portion of audit cases closing without change: from 37 percent to 22 percent over three years. This represents a 40 percent increase in productivity for the department and a 100 percent reduction of government intrusion for the thousands of companies that would have been catapulted into the audit process, with an end result of no change on their returns. Read more in "Making Data Matter in Administrative Systems" by Stephen Goldsmith.
- What city blocks need more inspection enforcement?
- Which businesses are most likely to be violating weights and measures?
In order to predict overcrowded properties, Los Angeles has combined data on demand for housing in a particular neighborhood—including house prices, education, incomes, and weather— with supply estimates, based on land area, topographical constraints, and construction labor wages. In areas where demand appears to outstrip supply, overcrowding is more likely. A creative program in one London borough monitors sewage flow and compares expected waste output with observed waste, as well as looking to trash left on the streets and calls to pest control as overcrowding indicators. New York City tracks landlord tax payments, noise violations, and complaints as potential predictors of overcrowding. Read more “How Can Cities be Preemptive and Effective in Preventing Overcrowding?” by Nyasha Weinberg.
- How can we tap social media for information on illegal businesses?
- What property owners, architects, developers, businesses and landlords need more regulatory enforcement?
- How can we use social media to ensure licenses are conducting legal business?
Washington, D.C.'s Urban Forestry Administration used lidar data to identify illegal tree removal based on the original height of the trees. This vastly improved enforcement of permitting laws. Read more in "How D.C. Grew a Data-Driven Tree Strategy" by Stephen Goldsmith.
- Can we predict which stores sell cigarettes to youth?
- How can we target stores that sell outdated food or expired baby formula?
- Does the order of inspections (building, health, or fire) increase the rate of violation?
This post has been updated over time with additional examples.