Human says no, AI says more data please

Consultation proposes amendments to UK copyright and database right law to balance the interests of the AI and creative industries.

On 17 December, the UK Government published its long-awaited open consultation on copyright and artificial intelligence (AI). The Government last consulted on this issue – as part of a broader consultation on intellectual property (IP) in the context of AI – in 2021 and in June 2022 announced plans to introduce a new, broad, exception to copyright and database right for text and data mining (TDM). This proposal sparked an immediate backlash from the creative industries. The UK Government has, nevertheless, pressed ahead.

The Government’s objectives (paraphrased):

Give rights holders more control over the use of their content to train AI and improve their ability to be remunerated if their content is so used
Give AI developers greater clarity about how they can lawfully access a wide range of high-quality materials to train their AI models in the UK
Build trust between AI developers, the creative industries and the public

Why it might be necessary to change the law to achieve these objectives?

Modern AI models (including those characterised as GenAI and large language models (LLM)) depend on significant volumes of input data to be “trained” (that is, be refined and made fit for purpose). There are few publicly available databases of the volume and quality required for training. It's now widely known that those building and training AI models are therefore using IP-protected works to train those models.

Uncertainty over the extent to which existing laws restrict the use of copyright works and databases for AI training has prompted a series of legal disputes in the UK and further afield (most notably, the dispute between Getty Images and Stability AI, currently going through the US and English courts). Noting that it will likely take several years for these issues to be definitively resolved, the Government says it's considering “a direct intervention through legislation to clarify the rules in this area and establish a fair balance.”

Rights holders are concerned that they have no way of knowing when their works are being used to train AI. The lack of transparency makes it difficult for rights holders to generate revenue by licensing the use of their content to train AI (or prevent their content from being used in this way in the first place).

AI developers, meanwhile, don’t know what content they can lawfully use to train their AI models and, as a result, may choose not to develop their AI models in the UK. The consultation notes that this “stunts AI innovation in the UK and holds back AI adoption.”

What changes to the law is the Government proposing?

The Government is proposing “a data mining exception [to copyright] which allows right holders to reserve their rights, underpinned by supporting measures on transparency.”

To illustrate how this proposal might apply in practice, imagine two parallel conversations. One between an AI company and its legal advisor and another between a copyright holder and its legal advisor.

This isn’t a complete analysis of all the relevant issues, but it draws out some of the key points.

The AI company’s perspective

AI company: "How do I know if I'm allowed to use online content to train my AI model in the UK?"

Legal advisor: "The first thing is to ask yourself whether you have lawful access. If the content is behind a paywall, you’ll have to pay a subscription fee before you can consider using it to train your AI, and check the terms and conditions. The second thing you’ll have to ask yourself is whether the rights holder has ‘reserved their rights’ in their content or not. If they have, this means that you can’t use their content to train your AI without their permission. If content is made publicly available online – and it doesn’t include a rights reservation – then the proposed data mining exemption would allow you to use that content to train your AI."

AI company: "Ok, great. But how will I know if a rights holder has reserved their rights or not? In an ideal world, I’d like to delegate the task of checking for rights reservations to my AI model. Is there any prospect of that?"

Legal advisor: "The EU version of the data mining copyright exemption in the Digital Single Market Copyright Directive does envisage that possibility for publicly available online content. It says that for content of this nature, rights reservations should be machine readable. Over half of news publishers already block the main generative AI web-crawlers using something called the robots.txt standard. This standard isn’t perfect – and there is a need for the development of more sophisticated mechanisms to help you identify whether a content creator has reserved their rights or not, but this is a good starting point to narrow things down."

The rights holder’s perspective

Rights holder: "I’d consider allowing my content to be used to train AI, but I’d want to know when it’s happening and understand roughly how it works."

Legal advisor: "The government’s consultation is sympathetic to your perspective. It says that “increased transparency by AI developers will be crucial to ensuring copyright law is complied with and can be enforced. Transparency over AI model data is also crucial for consumers to understand the provenance of the content they are accessing. Regulation may be needed to ensure that this happens and the government will consider the case for it.”"

Rights holder: "The additional transparency sounds like a good idea to me. My content is valuable – and if it’s going to be used to train AI, I’d like a share of the proceeds. I think I’ll reserve my rights and wait for AI developers to come to me and ask for a licence."

Do the government’s proposals achieve its objectives?

In our view, the two central proposals in the government’s consultation make good sense and – at least on paper – strike a reasonable balance between the interests of AI companies and rights holders.

There's an argument that a TDM exception along the lines proposed is essential for the UK’s competitiveness. The UK is, however, in an interesting position because its technology and creative sectors, which are those most affected by these proposals, are both important to the UK economy.

Some rights holders will be unhappy at the prospect of their works being freely available to train AI under the data mining exemption. In that case, they would have two options open to them:

Place their content behind a paywall and make it clear in the terms and conditions that this content is not to be used to train AI
Mark their content with a rights reservation notice, and either refuse permission for the content to be used to train AI, or grant permission and earn a licensing fee

Early signs are that rights holders, particularly those in the creative sector, are unsatisfied with these safeguards. Their fears seem, broadly, to fall into two categories. First, a concern that these safeguards will not overcome market dynamics – given the economic (and, increasingly) political power of the large technology firms, which are leading AI development – and, therefore, that the remuneration available to rights holders will be “unfair”. Second, the feeling that use of creative works to train AI will directly and/or indirectly challenge the nature of artistic creativity, at the potential cost of livelihoods and subtler forms of non-economic value.

Some AI developers will be unhappy at the prospect of having to navigate rights permissions rather than simply hoovering up whatever content they can get their hands on. But the more responsible companies will recognise that if they are going to generate huge revenues from products trained on material that belongs to someone else, it's fair to ask permission and pay a licence fee.

Final thoughts

We're optimistic that increased transparency over the content used to train AI would build trust between AI companies, rights holders and the public, on the basis that we're often less suspicious of things when we know how they work. Mills & Reeve are following developments in this area closely, and if the Government does decide to change the law, we're ready to help rights holders and AI companies alike decide how best to adapt. Whatever your perspective, we encourage interested parties to have their say before the consultation closes on 25 of February 2025.

Our content explained

Every piece of content we create is correct on the date it’s published but please don’t rely on it as legal advice. If you’d like to speak to us about your own legal requirements, please contact one of our expert lawyers.

Human says no, AI says more data please

The Government’s objectives (paraphrased):

Why it might be necessary to change the law to achieve these objectives?

What changes to the law is the Government proposing?

The AI company’s perspective

Final thoughts

Our content explained

Contact

How we can help you

Related sectors & services

Existing clients

Staff

Human says no, AI says more data please

The Government’s objectives (paraphrased):

Why it might be necessary to change the law to achieve these objectives?

What changes to the law is the Government proposing?

The AI company’s perspective

Final thoughts

Our content explained

Contact

How we can help you

Related sectors & services