Artificial intelligence systems feed on information from different sources: books, encyclopedias, treatises, research and, especially when it comes to current affairs, newspaper articles. Online newspapers are one of the main sources of information from systems like ChatGpt, Claude or Gemini, but this is bringing a problem to newspapers. This is because users are increasingly turning to artificial intelligence programs which, in a certain sense, exploit the work of journalists without however allowing them to see the benefits. Indeed, people often test also subscriptions and clicks, given that many are no longer considered necessary.
Artificial intelligence is a new and continually developing world and for this reason the legislation concerning it is partial. This is why the European Parliament has asked for urgent interventions to support newspapers and, in general, all producers of content covered by Copyright, in this new digital environment that is being created.
The resolution
To do this, the Strasbourg Chamber approved by a very large majority (460 votes in favour, 71 against and 88 abstentions) a resolution asking the European Commission to intervene with new rules and concrete mechanisms to protect authors, publishers and publications. The text precisely indicates the political direction and the tools to be adopted: mandatory transparency on the use of protected works, the right to exclude one’s own contents from the training of AI systems, fair remuneration that also extends to past uses.
“We need clear rules on the use of copyrighted content for AI training. Legal certainty would allow developers to know which content can be used and how to obtain licenses,” said the text’s rapporteur, German politician Axel Voss. “At the same time, rights holders would be protected from unauthorized use of their content and would receive remuneration,” he added.
“Innovation must go hand in hand with respect for the rights of those who create content”, asked the MEP of the 5 Star Movement Mario Furore, according to whom “rules that protect creative and journalistic work” are needed.
The context
Generative artificial intelligence represents that category of computer systems capable of autonomously producing texts, images, videos and audio based on enormous quantities of data with which they have been “trained”. To build these models, technology companies have collected enormous quantities of content online, including newspaper articles, books, photographs and musical works, often without asking for permission and without paying any compensation to the rights holders.
The existing European regulatory framework, in particular the 2019 Directive on Copyright in the Digital Single Market, already provides for some exceptions that allow the automated extraction of text and data (so-called “text and data mining”) for research purposes. However, it also provides the possibility for rights holders to oppose the commercial use of their content. The problem is that this exclusion clause has been found to be largely unenforceable: the ways to exercise it are not standardized, AI providers often ignore guidance published by publishers, and there is a lack of any independent verification mechanism.
“There is evidence of widespread infringement of copyright law by providers of generative artificial intelligence, including the unauthorized collection of works from the internet, failure to respect the reserved rights of the owners and the use of pirated sources,” Parliament announced in its resolution.
Transparency
The first and most urgent request concerns transparency. Anyone who puts a generative artificial intelligence system on the European market (whether the manufacturer of the model or the company or professional who integrates it into their services) must provide a detailed list of all copyrighted works used for training. It is not enough to generically declare the “categories” of data used: the individual contents must be identified.
The resolution goes beyond initial training and also covers later uses such as “inference” (the process by which the model processes user requests in real time) and so-called “retrieval-augmented generation”, a technique by which the system draws on external sources every time it answers a question. In practice, when an AI-based search engine responds to a query by summarizing newspaper articles, that operation should also be documented and declared.
To make this obligation effective, MEPs propose a presumption mechanism: if a provider does not comply with transparency obligations, it is automatically presumed to have used protected works without authorization. And if a court finds in favor of the rights holder, all reasonable court costs will be borne by the AI provider. It is a reversal of the burden of proof with potentially disruptive effects: it will no longer be the damaged publisher who will have to prove the violation, but the AI provider who will have to prove their compliance.
The exclusion mechanism
Alongside transparency, MEPs want rights holders (including press publishers, authors, photographers and publishing houses) to be able to effectively exclude their content from the training of AI systems. This right already exists in current legislation, but it works poorly. “Opt-out” signs published in newspapers are often ignored, and there is no centralized register to ensure compliance.
The proposal is to rely on the European Union Intellectual Property Office (Euipo), which manages trademarks and designs registered in the EU. The EUIPO would be responsible for managing an official register of exclusions, in standardized formats automatically readable by IT systems, so that AI providers can verify it before collecting data.
The issue of remuneration
The European cultural and creative sector, which includes cinema, music, publishing and journalism, is worth around 6.9 percent of the EU’s gross domestic product and employs around 8 million people. Allowing its contents to be used for free to train competing systems is, according to MPs, indirectly subsidizing big tech companies at the expense of the creators.
This is why Parliament calls for remuneration to be “fair and proportionate”, determined through good faith negotiations between rights holders and AI providers, and explicitly rejects the idea of a “blanket licence” that would allow providers to pay a single lump sum to train their models with any content.
A particularly relevant aspect concerns past uses. Many of the AI models available today were trained years ago, when the rules were more ambiguous or simply ignored. Parliament asks the Commission to evaluate compensation mechanisms also for these past uses, recognizing that waiting for the establishment of a licensing market would leave those who have already been harmed without protection.
The press as a special case
The text pays particular attention to the press and information sector, considered strategic not only economically but also democratically. The concern is not only that newspapers are stripped of their contents without compensation but also that AI systems tend to select sources in a non-neutral way, favoring some newspapers over others or favoring information services produced by the technology companies themselves.
