Katharine Robb headshot black and white

By Katharine Robb • March 22, 2021

Housing, Health, and Housing Codes:

Housing is more than just shelter. Where you live can have a profound impact on your physical and mental health and personal safety. Unsafe housing is associated with increased risk for health problems as far ranging as cardiovascular disease, infections, lead poisoning, and mental illness. Housing codes can help break the link between poor housing and poor public health. Housing inspectors visit properties to ensure that rental housing is “up to code,” or that it meets minimum health and safety standards stipulated by city housing codes. Many of today’s codes originated during the Sanitary Reform Movement of the late 1800s, when requirements for basic sanitation, ventilation, and other structural conditions dramatically reduced cities’ rates of infectious disease, fire, and injury.

Modern housing inspection has the potential not just to mitigate the public safety risks of fires and collapses, but also to ensure healthy and safe living conditions for residents in their homes and neighborhoods. However, cities have finite and often insufficient resources for housing inspection. Prioritizing the properties—and residents occupying them—with the greatest housing-related risk could promote equity in code enforcement without compromising efficiency and effectiveness. Using existing city data, machine learning models can nearly double the number of inspections that identify housing code violations that threaten health. As more cities adopt open government data platforms and predictive analytics capabilities, more communities could benefit from fewer housing-related health and safety risks.


Effective, Efficient, and Equitable Code Enforcement:

Routine housing code enforcement falls short of its potential to resolve housing-related health problems. Housing inspection is not as effective as possible because inspections are usually reactive to complaints. Some cities also run proactive inspection in limited areas. However, tenants may not report problems for fear of landlord retaliation or may not know they can file complaints. Therefore, housing inspection is often a blunt tool for detecting public health threats. Cities are often unaware of problems until they become crises, and many risks go undetected. Earlier intervention is a critical public health tool.

Inspection is also not as efficient as it could be, because code enforcement often operates within its own silo, with little coordination of data and strategies across health and other city departments.  Inspectional Services Departments often must prioritize inspections without data to inform where limited time and resources will yield the most benefits to households and neighborhoods. As a result, precious time and resources are wasted and opportunities to improve public health are lost.

Finally, inspection is not as equitable as it could be. Inspectors have broad professional discretion in how to prioritize properties for inspection. The absence of formal criteria for determining risk and need, lack of integrated data to inform that process, and fact that the most vulnerable residents are the least likely to file complaints with the city, all increase the likelihood of unequal government protection.  

Given the central role of housing to health, even marginally increasing the impact of housing code enforcement is an opportunity for significant public health gains. Investing in data-analytic capabilities can make code enforcement more effective, efficient, and equitable. Cities increasingly have access to data –from tax assessors’ offices, utilities, public works, and other sources –that can be used to identify housing-related health risks and to prioritize properties for inspection and coordinated service provision.


Can Integrated City Data and Machine Learning Improve Housing Inspection Practices?

With this in mind, I sought to determine whether machine learning algorithms, when applied to city data, could identify properties with housing code violations at a higher rate than inspector’s discretion alone. This work was a partnership between the Harvard Kennedy School’s Innovation Field Lab led by Jorrit de Jong and the City of Chelsea, where this study took place. Research assistant, Ashley Marcoux and I worked closely with staff at City Hall to identify and digitize administrative city datasets linked by parcel ID. These included cross-departmental data from Billing, Legal, and Police and Fire, and Code Enforcement, and other sources. In partnership with Tolemi, a data-analytics firm, we generated a dataset of each residential property in Chelsea (N=5,989) and its associated data.

Chelsea is a small, densely populated, demographically diverse city located just outside Boston. Half of homes are two- to four-family units and almost 70 percent of residents are renters. Motivated by the poor quality housing stock, low landlord compliance, and limited housing complaints, Chelsea began a proactive housing inspection program in 2015. Every month, inspectors meet to identify properties or blocks for inspection and to track progress. Inspectors prioritize properties and sections of neighborhoods for inspection based on perceptions of the risks to residents, informed primarily by exterior conditions of homes and on-the-job knowledge.

With research assistant, Nicolas Diaz Amigo, we used data from the 1,611 proactively inspected properties to build and test our machine learning model, withholding 20 percent to validate how well the model performed with data not used to build it. We optimized the model to balance sensitivity (the proportion of properties that truly have a code violation that are predicted to have a code violation), with positive predictive value (the proportion of properties predicted to have a code violation that actually do). Our GitHub site contains the code and a detailed description of our methods and model optimization can be found in our manuscript, “Using Integrated City Data and Machine Learning to Identify and Intervene Early on Housing-related Public Health Problems.”


Risk-Based Inspection Could Increase the Identification of Housing Code Violations:

Using the current system of inspector-led prioritization, 45 percent of housing inspection visits identify at least one housing code violation. If the City inspected the 600 properties (approximate yearly capacity) with the highest probabilities of a violation based on the machine learning algorithm, we would expect 81 percent to have a violation. Compared to current practices, risk-based inspection using the machine learning model would represent a 1.8-fold increase in the number of inspections that identify code violations.

In interviews, housing inspectors described that while exterior conditions of properties are helpful in prioritization, often there is no clear indication a violation will be present until an internal inspection is done. As one inspector said, “You can have two obviously run-down properties on a street of nice homes, but you have to inspect the whole street. You’d be surprised how many of those nice homes have violations.”

In comparing characteristics of properties predicted to have code violations versus those not predicted to have violations, some characteristics were potentially observable from the outside. For example, properties predicted to have code violations were older (on average by 23 years), had larger building-size to land-size ratios (1.5 vs. 0.6), an indicator of high-density housing, and had more municipal fines, such as overgrown vegetation or failure to remove snow (mean 9 vs. 4 fines) (p< 0.001 for all). However, most characteristics were not observable from the outside. For example, properties predicted to have code violations were significantly more likely to have lower property value, not be occupied by the owner, have fewer building permits, and have fewer home sales associated with the property. The machine learning model incorporates more factors into the risk calculation, allowing for detection of patterns that are not otherwise observable.


Implications for Policy and Practice:

Risk-based inspection using machine learning can allow for intervention at a greater number of properties without the need for additional inspection resources. When inspectors visit homes with no violations, no improvement to the housing stock or mitigation of public health risk is made. Prioritizing inspection of the highest risk properties may be more successful and cost-effective than waiting for problems to escalate to a housing complaint or call for emergency services. For example, smoke detector installation can dramatically reduce loss of life and livelihood from fire. Extermination of insects can reduce asthma triggers and asthma-related emergency room visits.

Despite its potential to improve public health, code enforcement can be overly punitive and lead to tenant displacement. Enforcement should be coupled with service provision, where appropriate, to address root causes of housing code violations. Beginning in 2019, Chelsea formally integrated social service provision within code enforcement through a partnership with a local social service agency. The program was the result of a collaboration between City Hall and the Innovation Field Lab. As a result, inspectors have tools beyond citations to resolve underlying causes of housing and health problems for landlords and/or tenants, such as mental illness or poverty, that make compliance with the housing code difficult. Integrated city data and machine learning can also be used to estimate the prevalence and spatial distribution of housing-related health threats – and inform strategic action by cities, health system, and community organizations to address housing-related health problems (link to other piece).

Prior to the COVID-19 pandemic, Chelsea had planned to trial risk-based inspections using results from the models; however, routine inspections were suspended in March 2020 and had not resumed at the time this article was published. Because the models were not trialed in real life and their impact evaluated with stakeholders, we can only make inferences based on the test data.

The time, cost, and expertise needed to develop risk-based inspection models will differ widely by city, based largely on the extent of data integration and culture around using data for decision-making. Data integration is a significant investment, but its benefits extend far beyond single initiatives or departments. Some cities have in-house capacity to develop machine learning models; many do not. If there is city-led demand, and data are integrated, cities can supplement their data analytic capacity through collaboration with local universities, as was the case in this study. Models do not need to be developed from scratch. Projects applying machine learning in local government often publish their code online (ours can be found here), which can be adapted for use by others.

The results of this study demonstrate the potential for increasing the public health impact of housing code enforcement through a novel application of city data. Identifying and responding to housing code violations is a critical public health intervention. Doing so in a more effective, efficient, and equitable manner – without the need for additional inspection resources – advances cities’ bottom line and quality of life for residents.