How do cities create value from their open data – and how can they evidence it?
The cities in the Smart City Innovation Framework Implementation have identified six types of value they can create by opening their data. How can this be effected…and how can it be measured? In this short series of 3 articles we look at examples from the SCIFI cities and recommend some measurement methods, starting with Better Data Quality and Improved Findability.
Better Quality Data
Better quality data is more valuable because it reduces the potential for error when it is utilised. Poor quality data is a big risk for cities – poor datasets may lead to incorrect conclusions being drawn, or even create political risks and reputation damage. However, releasing data helps identify the ways in which the quality of the data is poor, so that it can be improved.
The other reason that opening data helps to improve the quality of data is that ‘quality’ is not a consistent attribute – it often depends on what the data is being used for. Although it’s easy to agree that data with a lot of duplicates, incomplete or non-conforming cells is ‘low quality’, in other cases, the quality issues arise because it is not fit for purpose because of a lack of timeliness, integrity or accuracy.
The SCIFI Experience
The SCIFI project has shown that such pilots can indeed be useful for improving data that is low quality. In the Delft de-icing pilot, partner Quantillion discovered missing cells in one of the geographical datasets opened up. Delft went back to the start of the process from gathering to publication to establish why the cells were missing and how they could improve the data. Also in Delft, waste pilot start up Sis.Ter was able to establish that, in an otherwise well-managed dataset, a crucial piece of data – the size of the waste bins installed in public locations around the city – was not held.
How to Measure Data Quality
A good place to start is with the 5 dimensions of the Quality Assurance Framework of the European Statistical System. Assessing datasets against these 5 criteria can help public authorities think about how they collect, store and use their data as well as valuing it.
Greater automation in the collection of data (often sensors, and other IoT devices) tends to mean higher quality data for several reasons. One is that there is less potential for error to be introduced, but another is that they are more likely to be an exact fit with the data that is required, unlike ’exhaust’ data sets (produced as an output of another activity) which often require a ‘best fit’ approach. However, this means that thinking carefully about exactly what is captured in the first place.
Even within publishing organisations, finding the right data within the organisation itself is often already a challenge. If cities strive to publish open data, they need to be able to identify datasets that might be published as open data. Despite the advent of Google Search, a central data platform or portal is still key to increasing dataset discoverability.
Traditionally, portals have been seen as an externally facing tool that focuses on publishing. However, more recently there has been a focus on moving the value proposition to portals as enabling a community of data users, including public authorities, citizens, developers and other technology businesses and experts. Data request systems also enable cities to understand what data is required that is not published (or sometimes, simply not findable!)
The SCIFI Experience
In the de-icing pilot, Delft was able to establish that key data was not being stored in the most reliable and accessible manner – in other words, it was available to only a small number of people in a single department. St Quentin found that although the data might be theoretically available, finding where it was located, or the correct version, was challenging. With data sets being newly published to a variety of standards, they also experienced problems harmonising their data. Once these issues were highlighted, they could be addressed – and systems put in place to ensure other instances of the same problems could be prevented or managed.
Delft found itself at the other end of the availability issue, when the city and start up teams identified a potential useful dataset from the Dutch passenger railway operator, but were unable to obtain it as it had yet to be opened.
How to Measure Findabilty
Internally, value can be measured by an increased number of departments accessing the data through the central platform instead of other channels. Other metrics include the number of departments accessing data that is not created by/managed by their department. However, this requires that the central data platform is well-promoted throughout the organisation.
For an overview of the future direction of portals, including how they can satisfy information needs beyond “find a specific dataset” and how they can co-locate useful capabilities and resources, take a look at this report on the European Data Portal.
In the next article in this series, we’ll look at creating value through increased transparency and citizen participation, before finishing the series with an overview of creating and measuring service improvement and economic benefit through opening data.
If you’re interested in what we have to say, and learning more about our approach to smart cities, we’d like to hear from you. Contact us at firstname.lastname@example.org
by Johanna Walker, researcher at the Web and Internet Science department at the University of Southampton