Lorem

As the first court in Germany, the Hamburg Regional Court (‘Landgericht Hamburg‘) ruled on Artificial Intelligence whether datasets used for AI training activities may infringe German copyright law (Judgment as of 27 September 2024 – file no. 310 O 227/23).

Background

The plaintiff is a photographer who made one of his photos freely available to the public via a photo agency’s website, subject to the following restrictions:

“ […] you may not: […]

  1. Use automated programs, applets, bots or the like to access the…com website or any content thereon for any purpose, including, by way of example only, downloading Content, indexing, scraping or caching any content on the website.

The defendant is a non-profit organization dedicated to promoting research activities in AI. The defendant creates and provides open datasets consisting of text-image pairs ready for use to train generative AI. By way of matching a large number of texts and images, AI is able to learn how people, animals or objects look like, based on which AI can be empowered to distinguish between and artificially create people, animals or objects by itself.

For the purpose of AI training activities, defendant downloaded and stored a copy of a variety of pictures from publicly available resources. The defendant runs a software over these copies which analyses as to whether the content of each photo matches its particular description. In case of a mismatch, the software eliminates the particular picture from the dataset as the latter needs to be reliable and consistent to ensure proper AI training. The defendant’s dataset comprises nearly six billion text-image pairs including one photo created by the plaintiff.

The plaintiff requested the defendant to stop any reproducing activities as to plaintiff’s photo in question.

The court’s decision

At a glance

The court dismissed the claim, essentially because of the following reasons:

  • According to Section 15 para. 1 no. 1 of the German Copyright Act (“Urheberrechtsgesetz” – UrhG), it is the plaintiff in its position as the author of the photo in question who holds exclusive rights of reproduction. ʼRight of reproductionʼ means the right to produce copies of that particular photo, whether on a temporary or on a permanent basis (Section 16 para. 1 UrhG) which can only be waived by way of (i) plaintiff’s consent or (ii) by statutory exception. In absence of plaintiff’s explicit consent, the court had to deal with the question whether Section 44b para. 2 sent. 1- and/or Section 60d para. 1 UrhG, as a statutory exception for text and data mining for scientific purposes, applies to the benefit of the defendant. The court answered this question in the affirmative.
  • Section 60d para. 1 UrhG applies to ʼtext and data miningʼ in terms of Section 44b para. 1 UrhG. Section 44b para. 1 UrhG specifies ʼtext and data miningʼ as an automatic analysis of datasets by automated means for the purpose of gathering information, in particular patterns, trends and/or correlations. The court holds the opinion that this definition also encompasses text and data mining activities for the purpose of AI-training activities.
  • Section 60d para. 2 sent. 1 and 2 UrhG sets out that, inter alia, research organizations shall be entitled to make reproductions for text and data mining activities for the purpose of scientific research. Research organizations comprise universities, research institutes, but also other institutions being active in scientific research if they (i) pursue non-commercial purposes, (ii) reinvest all their profits in scientific research or (iii) act in the public interest based on a state-approved mandate. The court decided that defendant can invoke on this statutory waiver as, according to the court, defendant pursues a scientific purpose in terms of Section 60d para. 2 sent. 1 and 2 UrhG. Furthermore, the court assumed a non-commercial purpose to be given as the plaintiff was not in the position to reasonably demonstrate particular influence of a commercial third party which may prevent defendant to invoke the statutory exception (Section 60d para. 2 sent. 3 UrhG).
  • Owners of copyright-protected works may restrict use for text and data mining activities by way of declaration to a certain extent subject to and in accordance with the conditions set out in Section 44b para. 3 UrhG for which, according to the court and deviating from jurisprudence literature, a simple note in plain language shall be sufficient.
Further background and detail

AI-enthusiasts have a particular interest in collecting as many material suitable to train AI as possible. Opulent databases of text and image pairs provide large-scaled opportunities for effective AI trainings and promise better and more reliable outputs. In order to pull together huge amounts of training data, web scraping seems to be a pragmatic option. It means that a software automatically scans through websites available in the public domain, reproduces protected material such as photos, videos, sounds or alike and compile them in a dataset for AI training purposes. These activities are at risk to potentially conflict existing copyright laws because, according to the basic principle set out in Section 15 para. 1 no. 1 UrhG, it is the exclusive right of the particular author to reproduce his/her work.

If someone reproduces a copyright protected work without the author’s consent or without a statutory waiver, the rights holder may, inter alia, enforce to stop such reproduction activities, e.g. by way of injunctive relief, and/or claim for compensation of damages. Sections 44b UrhG and 60d UrhG constitute statutory exceptions for so called ʼtext and data miningʼ activities. Whilst Section 44b UrhG drives a broad and general approach to permit particular activities, but subject to authors’ option to opt-out (Section 44b para. 3 sent. 1 UrhG), Section 60d UrhG takes a narrow view by just constituting a waiver for reproduction activities linked to a scientific purpose, but without an opportunity for authors to opt-out.

The court had to decide whether the defendant can successfully invoke Sections 44b para. 2 and/or 60d para. 1 UrhG as a statutory exception for systematic reproduction in the context of creating datasets suitable and ready to run AI training sessions. The court came to the conclusion that this is the case:

  • Neither it is indicated by German copyright law, its wording or systematic, nor by Art. 4 of the European Directive EU 2019/790 on Copyright and Related Rights in the Digital Single Market (DSM Directive) that Section 44b or Section 60d UrhG require a restrictive interpretation of its wording to the extent that they shall not apply to reproduction activities concerning preparation of data sets suitable and ready to run AI training sessions.
  • According to Section 44b para. 1 UrhG, text and data mining is defined as an automated analysis of single or multiple digital or digitized works in order to extract information from those, in particular about patterns, trends and correlations.
  • To the extent that German scholars dispute that this definition was meant to also include reproductions for AI-Training activities, as (i) AI was not in focus of the EU-legislator when creating these statutory waivers and (ii) AI deployers could invoke the exception to rectify web scraping technologies to create competitive products with a similar content free of charge (e.g. Schack, Neue Juristische Wochenschrift 2024, 113 (114 f.)), the court dismissed both arguments (para. 75 ff.). According to the court, it is Art. 53 para. 1 lit. c) of the AI Act and thus most recent AI legislation, which confirms the legislator’s mindset that text and data mining activities around AI do not require further, in particular no more restrictive, adjustments. This is because, by way of Art. 53 para. 1 lit. c) AI Act, the legislator implemented an explicit statutory requirement that a clear strategy to comply with text and data mining restrictions in accordance with DSM Directive is mandatory (and sufficient) for so called General Purpose AI (GPAI). Further, the court said that AI trainings based on datasets on the one hand and subsequent use of AI to create new content on the other hand require clear differentiation. While in the latter case, competitive products might be created, preparation of datasets for AI-training, as the case at hand, is, according to the court, not supposed to create a similar risk.
  • The court holds the opinion that ʼscientific researchʼ in terms of Section 60d UrhG means methodical and systematic pursuit of new knowledge in general without direct production of such being required.
  • The court assumed also a non-commercial purpose to be given as the plaintiff was not in the position to reasonably demonstrate particular influence of a commercial third party which may prevent defendant to invoke Section 60d para. 1 UrhG as
    • plaintiff bears the burden of proof for commercial influence according to Section 60d para. 2 sent. 3 UrhG preventing the defendant to invoke the statutory waiver for scientific research; and
    • plaintiff was not in the position to properly substantiate and provide evidence for a particular commercial influence.
  • As the court leaned its decision onto Section 60b UrhG, Section 44b para. 3 UrhG was ultimately not relevant. Nevertheless, the court emphasized its opinion by way of an obiter dictum stating that a simple note in plain language should be sufficient for rights holders to effectively opt-out in terms of Section 44b para. 2 UrhG. In deviation from opposing argumentation in jurisprudence literature, a machine readable opt-out, e.g. by way of robots.txt files, is not required in court’s opinion. The court argues, inter alia, that today’s market standard technologies (e.g. AI) should be in the position to extract and reflect also plain website language, so that the statutory requirement of ʼmachine readabilityʼ as set out in Section 44b para. 3 sent. 2 UrhG should be interpreted in consideration of such new technical standards.
Key Take-aways
  • The case is not dealing with AI-trainings as such, but preparation of datasets ready and suitable to run AI training sessions.
  • The court holds the opinion that “scientific research” in terms of Section 60d para. 1 UrhG means methodical and systematic pursuit of new knowledge in general without direct production of such being required.
  • The burden of proof for commercial influence according to Section 60d para. 2 sent. 3 UrhG preventing a party to invoke the statutory waiver for scientific research is with the rights holder.
  • Owners of copyright-protected works may restrict use for text and data mining activities by way of declaration to a certain extent subject to and in accordance with the conditions set out in Section 44b para. 3 UrhG. According to the court, a simple note in plain language should be sufficient for rights holders to effectively opt-out in terms of Section 44b para. 3 sent. 2 UrhG. However, this is still open for debate as legal experts, in deviation from court’s obiter dictum, argue in favor of machine readable opt-out, e.g. by way of robots.txt files, being required.
  • The court’s decision is open for appeal to the Hanseatic Higher Regional Court of Hamburg (ʼOberlandesgericht Hamburgʼ). Since Section 60d UrhG is based on Art. 3 and 2 lit. 1 of the DSM-Directive the case or at least similar disputes are at risk also to be referred to the European Court of Justice for request for preliminary ruling. In the meantime, the first serve in the match between copyright and AI has been made and rights holders should definitely pay attention. EU-copyright law does, with the good intention of fostering innovation, provide exceptions and limitations for text and data mining. These exceptions can allow for free reproduction of protected material in datasets that are subsequently used for AI-training. Certainly, this only represents one of many scenarios concerning use of copyright protected material by intelligent systems.