This is an open letter directed towards those in the open data community – including vendors, startups, civic groups, and engaged individuals.
We, the Civic Analytics Network, are a consortium of Chief Data Officers and analytics principals in large cities and counties throughout the United States. As a group, we work together to advance how local governments use data to be more efficient, innovative, and in particular, transparent.
Open data is one of the most important and core missions of Chief Data Officers. In the past 5 years, the number of cities with an open data portal has grown significantly. While cities already have released terabytes of open data, CAN aims to set higher goals for open data to make it more accessible and usable. Our cities’ open data portals must continue to evolve to meet the public’s growing and changing needs.
While specific open data portal features may vary from city to city, there are universal requirements that all local governments need for an effective open data program. As a collective, CAN’s purpose in this space is to consolidate and communicate these common needs across the largest, most influential data portal cities in the United States. By doing so, we can work and communicate more openly and effectively with IT providers so that they may be better able to understand and meet these needs.
As such, the Civic Analytics Network offers the following eight guidelines that, if followed, would advance the capabilities of government data portals across the board and help deliver upon the promise of a transparent government.
1. Improve accessibility and usability to engage a wider audience. Historically, the features and services on open data portals targeted more technical users, in particular developers, as early adopters. Across our cities, we see the need to broaden the number and types of users engaging open data. As a result, the features and UI design of data portals need to evolve to align with the needs and abilities of the broader public. Key features and recommendations that can guide this principle include the following:
- Portals should have simple and fast data downloads coupled with more user-friendly design for understanding data.
- Accessibility and usability changes should be data-driven; that is, reflective of what datasets and features are being used most.
- Portals should include more space for conversations on the use of, and resources for, datasets including user guides, dashboards, and social media communication.
- Portals should be exemplars of good web design (e.g., many portals are not mobile friendly).
- User research from providers (who is using my data - where are the gaps?), including strategies for marketing open data to populations that might not be using it, would be helpful. This might include additional applications or view options for those communities.
- Portals should include more intuitive ways of visualizing and exploring data. It’s time for open data to move beyond spreadsheets on the web.
- In addition to view totals for datasets, cities’ data portals should also publish information on which datasets are being downloaded.
2. Move away from a single dataset centric view. Tying together related datasets is a key priority for our network. Most portals make it difficult to tie together related datasets either during the discovery process or via APIs. Since data portals generally only support flat data, this means that information from relational databases need to be broken into smaller datasets, and later recombined by a potential user of that data. Having a better ability to tie together and compare related data sets is important in regards to content, too; for example, current open data features do not make it easy for users to compare budgets or other city information. This relates to a greater need in general - that data portals need to do a better job of keeping pace with the data types that are most useful and/or needed by the public.
3. Treat geospatial data as a first class data type. On most data portals, geospatial data is an underdeveloped and undervalued asset; going forward, it needs to be an integral part of any open data program. For the past several years, the general status quo for cities has been to post spreadsheets of data on the web for the public to consume. In this iteration, geospatial data largely serves as a secondary feature.
Having better and more easily navigable geospatial data is a priority among open data consumers that current data portals don’t meet in both the user interface and via the API. It’s why cities such as Chicago and Los Angeles have sought their own solutions to the issue, with Chicago creating OpenGrid, an open-source public geospatial awareness tool, and Los Angeles commissioning GeoHub, an online forum for the public to discover, explore, and download geospatial data.
4. Improve management and usability of metadata. One driver of cost for governments is providing quality metadata. In general, open data portals do not support metadata management tools and metadata is difficult to discover or customize. The result is decreased understanding and use of published data. Some cities take on metadata costs and balance quality of metadata with their rate of publishing. For example, Philadelphia’s metadata catalog provides an alternative tool to browse and explore metadata and San Francisco’s data dictionary tool exposes field level definitions. Other cities instead put out more data with lower usability (because metadata isn’t available, or is of poor quality). Both cases undermine cities’ ability to deliver on open data’s transparency-driven mission. Portals can improve this by allowing custom metadata schemes, API methods to define and update the schema and content, and user interfaces that surface and support end-user use of the metadata.
5. Decrease the cost and work required to publish data. Portals’ automation services are generally minimal, putting the onus on localities to manage this on their own. And the (non-monetary) cost of getting datasets up is generally too high for local governments. The amount of time and manpower it takes for cities to upload new datasets prohibits them from having larger volumes of wanted data available to the public. Automation services from open data portals need to keep up with demand. While many cities have been able to manage automation on their own, the capacity to do so may vary considerably from city to city, especially as smaller and/or resource-constrained cities seek to launch open data programs of their own.
6. Introduce revision history. Data quality is improved with revision history. To have a quality data portal, governments must be able to provide data that is timely and accurate. This means governments should (and do) update their data sets frequently. These updates are sometimes driven by users from the general public as well, who can note errors to be corrected. However, there is currently no easy way to track a data set’s revision history over time. This is something that GitHub, for instance, does well, which is immensely useful for users. Many cities have devised “workaround” solutions to this, such as posting snapshots of previous data versions onto their portal; however, in the long term, this is not an adequate solution.
We also believe steps can be taken to improve the context for data and data quality. For example, if versions of the same dataset for different time spans are provided, context can also be provided behind the reasoning for different versions. Such changes that incorporate context need to be less about including a note that says “x revision occurred on x date,” and more about explaining the reasoning.
7. Improve management of large datasets. Some data portal providers are having trouble keeping up with larger datasets. This issue is becoming more and more salient for larger cities: updates can be slow, datasets are prone to errors, and any changes to the dataset only appear online long after the work is finished. Vendors should make sure their systems can manage large amounts of data, in both size and velocity.
8. Set clear transparent pricing based on memory, not number of datasets. Data portal providers vary in how they price their services with varying levels of transparency. This makes it challenging for open data providers to budget and plan for their work. While we understand portal need to position their services competitively, we strongly discourage pricing strategies that are misaligned with the goals for open data. In particular, pricing models based on number of datasets published (versus memory, volume) disincent publishing open data and undermine the spirit and goals of the movement.