Barriers and Solutions to Open Data

This appendix was created from material generated from the SharePSI workshop of May 2010. The material is incorporated into the Open Data Manual with permission from Ton Zijlstra.

Introduction

This format does not take the narrative approach of the rest of the manual. Instead, bullet points taken mostly verbatim from the original meeting have been collected into themes. The themes include:

  • Licensing
  • Privacy

The responses in italics have been created by the Open Data Manual’s authors.

Licensing

General sentiment

Licencing is a pain
The fustrations over legalese can be minimised. There are now several publications which detail an appropriate license, including the Open Definition. Additionally, many governments have gone through the process of determining appropriate licenses. They will be able to share their experience.

Poor initial conditions

Absence of legal framework
If a country faces the problem of not having a sufficient legal framework, then it can fall back on several non-legislative policy which has been developed. For Europeans, a large corpora of texts have been gathered by the ePSIplatform.
Unclear licensing
This is less of an issue than it may have been during the meeting. The Open Definition provides an easily comprehendible guide to choosing an appropriate license.
Ignorance of licensing
An ignorance of intellectual property, often compounded by an unwillingness to learn more, is a difficult problem to be solved. Most officials and politicians understand the need to be clear about what is acceptable, to protect all parties. You can use this general line of reasoning to talk about the specific case of data licensing.
Concerns around intellectual property
Intellectual property rights can be unweildy. However, they are designed to give the owner of the rights control. Therefore, that control can be used to enforce open access with data that the government controls.
Licensing terms attached to datasets
Complex licenses increase barriers to entry. Governments should seek to minimise the complexity to the contract terms that they impose. Preferably, those contracts should be open and easily understood.
Proliferation of semi-custom terms in licenses
Open data is a relatively new field. This means that there are few established norms. As time progresses, we can expect to see a maturation in licensing schemes. Schemes will consolodate, reducing cost and complexity.
Incompatible open licenses
See previous.
Share-alike licenses create silos
Emerging community standards and peer pressure will reduce this problem over time.

Larger scale needed

Crossborder licensing needed for legal interoperability
Streamlining legislation and policy between countries takes a long time. Progress is being made. However, even in cases where the legislation is not uniform, great value can be derived from doing something locally. Local businesses and community groups will be more than happy to use local data of their own area, irrespective of what is happening in other countries.

Concern to keep own national perspective on licensing is often bigger than actual international differences

This is an important consideration. It can sometimes be important to create a consistent local understanding of the issues before embarking on a process of international collaboration.
Most licensing initatives are single jurisdiction/sector
This is at least partially to blame on intellectual property laws. They are all national in scope. Moreover, open government data are produced by governments. Government are also national in scope. With those two considerations in mind, it is perfectly natural that licensing is national. As time progresses, those policies will naturally streamline through a process of collaboration

Lack of harmonization between various EU level initiatives and projects, which risks fragmentation

While EU policy is never in perfect harmony, the trend tends is trending towards openness.

Privacy

Use of privacy concerns to prevent all discussion
Privacy concerns can stifle brash action. This can be positive, as it can lead to greater consideration of the risks and benefits. With that in mind, all parties should reference their country’s legislation to get a full picture of what is and is not restricted.
Privacy legislation
Privacy legislation only covers private information. Much of government data is not private. Therefore, arguments should be made for extracting that data out of government, before moving to contentous issues.
Governments are concerned for people’s privacy
See above.
Concern around personal data
See above.

Access

Digital divide
Data require special analysis and interpretation before they are able to inform discussion. Only a small number of technical specialists can gain access intially. Notwithstanding that, those specialists are often in a position where they can make the data much more accessible than things currently stand.
Multiple languages required
In regions where multiple and minority languages are spoken, efforts should be made to include the entire population. There is indeed a risk of segmenting the benefits to populations along linguistic boundaries. However, technology alo presents opportunities. Websites are much more easily able to be translated by machines than paper.

From access to new business

Increasing access to data

Access to data is political, not technical
Political climate is changing. Many governments around the world are moving towards openness.
Getting data
There are many more data catalogues available than was the case in 2010. Many of those catalogues provide easy access to the raw data in a direct manner. This shows that as the environment matures, the tools become more accessible to everyone.
Lack of standard open data policies
This is changing, with governments using each others’ policies to come together towards common standards. For example, New Zealand created its open access policy NZGOAL, which was then followed by Australia’s AUSGOAL.
Too many data sources are not exposed yet
As governments become more experienced with releasing data, they will be in a better position to release data that is more difficult to access. Lots of data are locked up in legacy systems. As those systems are replaced, local advocates are well positioned to make the case for open data to be considered.
Access to data is still largest issue
Hopefully things have come some way since 2010. The quantity and quality of data releases have substancially increased in recent times.

Publishing data

Inconsistent and diverse formats
Data formats are created to solve particular problems. Those problems are often sector-specific. Therefore, multiple formats are not a bad thing per se. Notwithstanding this, it is important to prevent this creep if at all possible. Data transferred over the web are moving towards the JSON data format. Where required, such as in the geospatial area, KML, an XML language is very popular. The most important factor is to provide the ability for third parties to easily read the data, which necessitates text rather than binary formats.
Transformation of data for publishing, ensuring correct tranformation
When a government department needs to transform data in order for it to be used by the public, then there is always a risk of introducing corruption. Where possible, government departments should seek to provide access to the raw data. When it is not, automated processes should be created and followed for undertaking the transformation.
Lack of standards, or ad hoc standards
Standards are emerging within communities of interest. For example, within the Linked Open Data cloud, there are requirements to create a full record at thedatahub.org with explicit and complete metadata.
Storing big data
Many private sector providers have taken it upon themselves to solve this problem for governments. They often provide bandwidth and storage for public data at no cost to the data owner.

Finding and combining data

Lots of fragmented sources
Public data are now being indexed by specialised search engines. This removes a large degree of the previous problems.
Lack of interoperability
Interoperability concerns are particularly difficult when encountering non-open systems. Where possible, governments should seek to move to vendor-neutral, patent and licence free data formats.
Lack of info on what reusable data is there
thedatahub.org is one of many services that provides information on public data. Its focus is on outlining exactly which data sets are released under open licenses.
Disparity of data sets
The disparity of data reflects the disparities of the world. Some areas simply do not collect data that others do.
Unclear what data is there
This uncertainty will hopefully reduce over time. There are now large volumes of open data available in several fields.
Limited quality of data
Data can be of poor quality. Wherever possible, governments should seek to provide raw data. They can then work with third parties to build cleaner, more usable datasets for everyone.
Lack of findability of data
See “Lots of fragmented sources”.
No unifed data structures in Europe
Data standards are often formed along sectorial lines, rather than national borders. If there is no prospect of consistency within Europe, try to build consistency within industries or disciplines.
Lack of metadata
Many sectors are increasingly creating their own catalogues for their data. These catalogues often include excellent metadata. Where this has not yet happened, services such as thedatahub.org provide some ability to relieve these problems.

Reuse

Concern about usefulness of data
Not all data are highly valuable. Yet, this fact should not be a general barrier the distribution of open data.
Unclear conditions for reuse
Efforts such as the :open:`Open Definition` provide a measure of clarity within the fog. Unfortunately, there is a large proliferation of licenses used in the open data world. If you are considering to ignore a dataset because of licensing terms, make sure that you inform the owner of that. The owner may be in a position to amend their terms.
Limited user friendliess / information overload
Data analysis is a technical skill. Yet, the skilled analysts are exactly the people who will be able to make data more user friendly and reduce the overload caused by floods of information. They are able to work with designers to generate lovely infographics. They are able to work with writers to consisely explain the trends and implications of data that are otherwise indecipherable.
Unclear data provenance
Where the origin of data is genuinely unclear, provenance can be a significant concern. We need to know where data came from in order to be able to trust it. Without that trust, it is impossible to rely on it for analysis. However, there may be other uses which do not have such stringent requirements. Students could be given that dataset to practice their skills. The origins of the actual data are in this case irrelevant. All that matters here is that ther data are in a format that can be easily read.

Finding viable business models

Working with data is not easy
This difficulty could be exactly where the business opportunities lie.
Starting local/small is not always possible, e.g. have to take MS at once for tenders
Smaller businesses are also better placed to be nimble enough to take on less visible, riskier opportunities.
Scalability issues
Concerns of scalability in open data business models are likely to be no worse than similiar concerns in other fields.
Data users still reluctant, mostly early innovators
This is natural. As the open data movement matures, it will become more accessible to a wider audience.
Lack of business models
The lack of business models currently is not in itself a reason to hold back on open data. Open government data can be used to lessen the costs of undertaking current business models, even without reference to any future effort that is yet unthought of.

Disruption of existing business

Current changing

Some public sector bodies have no choice but to charge for data
Many agencies were created under a model of generating a revenue stream from the data that they collect. That fact in itself does not limit the applicability of the general argument surrounding pricing at marginal cost. Where the marginal cost of distributing data is negligible, the price should be zero.
Concern about reduced income to public sector bodies
Income is likely to reduce if the current policy is to charge for access. Expenses may also decrease, as productivity gains from operating in an open manner are revealed.
Existing charging models
See above.

What charging hinders

Unclear where decision on charging lies
Responsibility for this depends on local circumstances.
Pricing models block market development by introducing arbitrary threshold for market entry
One thing to note is that removing pricing may only lead to a small increase in activity in the near term. There remain very significant business risks for creating products from an open data market. Public sector agencies need to provide certainty that their open data stance will be long-lived.
Lower end of reuse market cannot exist for now
The lower end of the market will take a fairly long time to develop, even when open data is widespread. The market participants at this segment have less capability and are unlikely to be able to execute new, profitable ideas.

Different perspectives

Some have stake in non-open data
Conflicting interests are not unqiue to this area. Where interests do conflict, policy should seek to minimise any negative impact caused by this situation.
Media and journalism like to have exclusive access
Providing open access to data does not provide open access to stories that emerge from that data. Data mining is complex and expensive. Data can be thought of as raw materials. Media outlets are positioned differently to refine these raw materials.
You cannot compete against free
Yes, you can. Many businesses are built on providing a more convienent or more tailored service than a free alternative. Consider the case of bottled water.

Where not charging disrupts

Public sector bodies in direct competition with market with services based on their data which they also sell

Resellers will be nudged towards the value-added market segments. However, they also provide a convienent level of service and are also able to market their services effectively. Therefore, the disruption to the current data market may be radical, but is unlikely to be terminal.

Current markets seeing disruption (e.g. publishers) because of governments’ publishing data sets with added value

See above.

Linked and Federated Data

Linked Data

  • storage concerns
  • search/browsing/exploration challenges
  • manual revision challenges
  • classification challenges
  • extraction challenges
  • interlinking of data challenges
  • quality analysis challenges
  • evolution/repair challenges

Jurisdictional

National authorities are neither financed nor mandated to create international interoperability

That may appear to be the case on paper, however in practice there are very strong incentives to undertake practices which lead to international interoperability. There is more collaboration between academics of the same discipline between continents than there is between academics of differing disciplnes at the same university. Industries are also highly globalised. There are often international standards, codes of practice and norms which lend themselves to consistency between countries. Lastly, national authorities are much more likely to adopted accept international best practice than take on the cost of developing their own standards.

General

No unified data structures

Linked Data is an exciting prospect. There is likely to be a large degree of reliance on this technology to be able to bridge current concerns.

Needed level/scale may superseed current stakeholders

Do not underestimate the need to be able to meet the needs of local stakeholders. Grand, beautifully designed policy frameworks are wonderful. Yet, to a family interested in the water quality of the river, a spreadsheet is much more practical.

Transition Process for Government

Lack of knowledge and awareness

General resistance to overcome

Cyncism is often accompanied with cost concerns and worries that job scope will increase without any recognition. Once the concerns are allayed, managed or resolved, then the resistance will be overcome.

Drivers are often external

Pressure from the outside can sometimes move governments the fastest. There must be external demand to justify that governments should supply.

Closed government culture

Government is heterogenous. While some aspects of government are very closed, others are not. Start by talking to the receptive listeners.

Barriers are often internal

The fact that a barrier stands in the way is not by itself a sound rationale for inaction. Instead, the costs and benefits of overcoming that hurdle should be weighed against the relative costs and benefits of other options. The relative position of different options should determine what action is undertaken, rather than the absolute value of any obstacles.

Lack of knowledge (data holders and users)

Local open data communities hold a wealth of knowledge. Officials should be able to trust their users. They should create relationships, just as commercial suppliers create relationships with their customers.

Lack of awareness

Awareness is quickly increasing. The open data movement is no longer new. This means that there is far more information to bring people up to speed than there once was.

Change is hard

Risk of overcomplicating issues

There is not a single type of open government data system. As the body of the Open Data Manual explains, it is possible to have a perfectly functioning open data that fits in well with many budgets, cultures and technical infrastructures.

Government is concerned by complexity

Governments only need to absorb as much complexity as they think is practical. If a huge range of policies need to be adapted to fit into an open data framework, then start with data sets which are simpler. If a legacy system would be too expensive to move into an open data environment, make that fact public.

Tensions exist between those sticking to old roles and those trying to adapt to new ones

Tensions between old and new are perennial. This fact in itself should not prevent any change from occurring.

Losing control, feeling disrupted

Government is concerned by losing control

Government also has a mandate to act in the best interests of its citizens and residents. The fear of losing control is one based out of a lack of experience with this particular area. As more and more examples of useful things being created with open data, that fear of losing control will ease.

No inexpensive conflict resolution

The data owner retains ownership. For the public sector, it holds significant power if any conflict situation arises.

Data is power

Data is often unrealised power. Data is collected by governments for specific purposes. They do not have the flexibiity to experiment with using that data in ways which were not anticipated.

Data catalogues perceived as centralisation (loss of power and control)

Data catalogues are created simply to make things easier for consumers of the data. While this may be perceived as the loss of power or control, it is also the adoption of responsibility for support and upkeep by a central agency.

Government is concerned by a lack of security

See the security section, below.

Language

Different stakeholders speak languages, e.g. legal vs technical
This is likely to be a transient issue. All parties will increasingly be able to communicate with each other as the open data movement matures.
Lack of common vocabulary
See previous.

Seeking viable ways forward

Public service does not see own need for open data

Officials talk. As open government data spreads, news of its positive effects inside of government will be spread too.

Government’s concern about open data’s long term sustainability

The ability for the private sector to be able to consume, process and analyse large volumes of data will not decrease. Nor will its demand. The sustainability of individual open data intiatives is less certain. Public sector managers should seek to develop programmes which will be financially viable across changes of governments.

Uncertain economic impact

The economic uncertainties of open data are real. However, this justification for open government data is not purely financial.

Little empirical evidence

Empirical evidence is growing.

Security

Security threats

If data owners are concern about security threats for distributing data openly, then they should third party services.

Fear of data manipulation

Once data have been modified, it would be a misrepresentation for the data’s modifier to claim that it is the original data. Therefore, if some harm is caused on the basis of that modification and/or misrepresentation, it’s likely that the data owner would have some form of legal recourse to be able to insulate themselves.

Selective use of the data

Effective communication is key. Data owners should be up-front with their data’s limitations. This information can be included as a seperate file along with the source data or be displayed along side a download link or similiar.

Where does responsibility lie?

Responsibility for security threats lies where it currently does.

Costs of transition

It’s not as cheap as you may claim

There is more than one way to undertake an open data program. As we discuss within the manual, there are many alternatives to building a full service API. Many of those will be close to no cost.

Government procedures take a long time to change

They do. However, they are changing. Open data is no longer new.

No funds for transition

Start with changes which are likely to save money and increase efficiency. If there are data sets which different departments, or branches within departments need to go through a complicated process to access? If not, consider the difficulties that are currently required to access data from other levels of government. Each of these transaction costs impose a burden on officials.

The cost of transition falls with data owner, but revenue is gathered centrally by another agency

This is where a whole of government approach is required. There are circumstances where it is appropriate to look at a systems level to see the impact of current policy.

It’s not as expensive as you fear We hope the Open Data Manual can go some way to minimising any costs whcih are incurred.