By Civic Analytics Network • June 13, 2018

In 2017, the Civic Analytics Network (CAN) published “An Open Letter to the Open Data Community,” offering eight guidelines for open data. Today, CAN released an updated version of the letter, along with an abridged version of the eight guidelines, found below.

1. IMPROVE ACCESSIBILITY AND USABILITY OF DATA TO ENGAGE A WIDER AUDIENCE

 

Portals should have simple and fast data downloads coupled with more user-friendly design for understanding data. Accessibility and usability changes should be data-driven; reflective of what datasets and features are being used most. Portals should include more space for conversations on the use of, and resources for, datasets including user guides, dashboards, and social media communication. Portals should be exemplars of good web design (e.g., many portals are not mobile friendly). User research from providers (who is using my data? where are the gaps?), including strategies for marketing open data to populations that might not be using it, would be helpful. This might include additional applications or view options for those communities. Portals should include more intuitive ways of visualizing and exploring data. It’s time for open data to move beyond spreadsheets on the web. In addition to publishing total views of datasets, cities’ data portals should also publish information on which datasets are being downloaded.

 

2. MOVE AWAY FROM A SINGLE DATASET CENTRIC VIEW

 

Tying together related datasets is a key priority for our network. Most portals make it difficult to tie together related datasets either during the discovery process or via APIs. Since data portals generally only support flat data, this means that information from relational databases need to be broken into smaller datasets, and later recombined by a potential user of that data. Making it easier for users to tie together and compare related data sets is important in regards to content, too; for example, current open data features do not make it easy for users to compare budgets or other city information. This relates to a greater need in general for data portals to do a better job of keeping pace with the types of data that are most useful and/or needed by the public.

 

3. TREAT GEOSPATIAL DATA AS A FIRST-CLASS DATA TYPE

 

On most data portals, geospatial data is an underdeveloped and undervalued asset; going forward, it should be treated as an integral part of any open data program. For the past several years, the general status quo for cities has been to post spreadsheets of data on the web for the public to consume. In this iteration, geospatial data largely serves as a secondary feature. Having better and more easily navigable geospatial data is a priority among open data consumers, and current data portals don’t meet this need both in terms of the user interface and via the API.

 

4. IMPROVE MANAGEMENT AND USABILITY OF DATA  

 

One driver of cost for government is providing quality metadata. In general, open data portals do not support metadata management tools and metadata is difficult to discover or customize. As a result there is decreased or limited understanding and use of published data. Some cities take on metadata costs and balance quality of metadata with their rate of publishing.

 

5. DECREASE THE COST AND WORK REQUIRED TO PUBLISH DATA

 

Portals’ automation services are generally minimal, putting the onus on localities to manage this on their own. And the (non-monetary) cost of getting datasets published is generally too high for local governments. The amount of time and manpower it takes for cities to upload new datasets prohibits them from having larger volumes of wanted data available to the public. Automation services from open data portals need to keep up with demand. While many cities have been able to manage automation on their own, the capacity to do so may vary considerably from city to city, especially as smaller and/or resource-constrained cities seek to launch open data programs of their own.

 

6. INTRODUCE REVISION HISTORY

 

Data quality is improved with revision history. To have a quality data portal, governments must be able to provide data that is timely and accurate. This means governments should (and do) update their datasets frequently. These updates are sometimes driven by users from the general public as well, who can note errors to be corrected. However, there is currently no easy way to track a dataset’s revision history over time. This is something that GitHub, for instance, does well, which is immensely helpful for users. Many cities have devised “workaround” solutions to this, such as posting snapshots of previous data versions onto their portal; however, in the long term, this is not a sustainable solution.

We also believe steps can be taken to improve the context for data and data quality. For example, if versions of the same dataset for different timespans are provided, context can also be provided to clarify the reasoning for different versions. Such changes that incorporate context need to be less about including a note that says “x revision occurred on x date,” and more about explaining the reasoning.

 

7. IMPROVE MANAGEMENT OF LARGE DATASETS

 

Some data portal providers are having trouble keeping up with larger datasets. This issue is becoming more and more salient for larger cities: updates can be slow, datasets are prone to errors, and any changes to the dataset only appear online long after the work is finished. Vendors should make sure their systems can manage large amounts of data, in both size and velocity.

 

8. SET CLEAR TRANSPARENT PRICING BASED ON MEMORY, NOT NUMBER OF DATASETS

 

 Data portal providers vary in how they price their services with varying levels of transparency. This makes it challenging for open data providers to budget and plan for their work. While we understand that portal providers need to position their services competitively, we strongly discourage pricing strategies that are misaligned with the goals for open data. In particular, pricing models based on number of datasets published (versus memory or volume) can disincentivize the publication of open data and undermine the spirit and goals of the movement.