Can AI Companies Rely On Section 52 Of Copyright Act To Train Large Language Models?
The Next Big Copyright Question in India
A crossroads has been created with respect to generative AI and copyright law. Large Language Models are trained on heavy volumes of images, text and other creative works out of which many are protected by copyright law. In the contemporary world, authors, publishers and different technology companies have been litigating over whether this process of training needs copyright permission from owners or falls within statutory exceptions. The answer to this question remains ambiguous within the boundaries of India.
The instinctive response would be to look at Section 52 of The Copyright Act, 1957, which speaks about India's fair dealing exceptions. Can AI developers' revolt that training an LLM using copyrighted material amounts to “research” or any other protected activity under section 52? A better understanding leads to the conclusion that Section 52 in its current form, offers no clear safe harbour with respect to commercial AI training and must not be stretched beyond its meaning.
Section 52: Limited Exception?
India follows a statutory fair dealing model unlike the United States' flexible fair use doctrine, which is outlined in Section 52 by defining the circumstances under which copyrighted material may be used without infringement. These are limited to: use for private/personal, criticism/review and reporting on current events. As such, this is an important distinction to be drawn in relation to parliament's decision to implement specific exceptions to copyright rather than the general test of balance. Hence, the courts cannot say that technological advances deserve an exception because it promotes innovation. They must look at whether use made of such advances is within the considerations of the text and rationale of the provisions enacted by the Parliament.
This was seen in the case of University of Oxford v. Rameshwari Photocopy before the Delhi High court where the court construed Section 52 in accordance with its statutory purpose and educational purposes but did not move from the text as enacted by the Parliament. The court further did not indicate that courts had unlimited authority to develop new categories of permissible use as technology evolves.
Can AI be Categorized as Research?
Supporters of large-scale AI exceptions claim that training models is a research exercise because LLMs do not replicate entire novels for consumers. They rather find and analyse statistical patters and relationships within millions of works. However, this distinction ignores an important principled difference. Humans typically read or use copyrighted works; they do not generally copy copyrighted works while performing research. AI developers generally boilerplate widely excluding copies of copyrighted works when performing their training activities or copying raw text or may contain the illicit portions of copyrighted works which are actually infringing.
Moreover, many LLMs are not research activities. They are being used to fuel subscription or enterprise software and commercial goods where there is significant revenue being generated by them because of their work. Because of their scale, automation and commercial context, it cannot be equated to the private study or research protected pursuant to Section 52. The reasoning seen by the Kerala High Court in the Civic Chandran v. Ammini Amma, emphasized examining the purpose and character of impugned use while assessing fair dealing, which reinforces this concern. Even if AI development contains research components, that does not necessarily mean industrial scale commercial training automatically qualifies for statutory protection.
Why the Answer Matters
The debate regarding AI includes several parties beyond AI developers. It includes all creators of original works which could potentially be included in datasets used to train AI without consent or compensation, such as authors, journalists, publishers, musicians, and researchers. The interpretation of Section 52 will be critical not only to issues relating to copyright disputes but also to the future of the AI and innovation ecosystem in India.
A broad interpretation of Section 52 may facilitate continued use of copyright-protected works by AI companies and encourage domestic innovation among Indian startups with less access to financial resources to purchase large amounts of copyright-protected works. However, if the development of commercially valuable AI applications can occur through the reproduction of large amounts of copyright-protected works with neither authorization nor the benefit of a statutory exception, the balance that copyright law attempts to establish between innovative activity and incentives to create may erode.
This lack of clarity creates costs of its own. AI companies are unable to establish a firm baseline of compliance; copyright owners have no way to know if existing copyright law will protect them from having their work included in large datasets. In other words, clarifying the interpretation of Section 52 represents a public issue that transcends the dispute between technology companies and copyright owners, and has far-reaching implications on the overall innovation and creative economy of India.
Who Should Decide?
With such conflicting interests, courts should use caution when interpreting Section 52 of The Copyright Act as a general exclusion to AI training. This section was enacted before any of the generative AIs currently in existence and does not actually contain a direct reference to text/small data mining or big data computing analysis.
Parliament may be a more appropriate venue for a viable solution. Instead of attempting to apply the current fair dealing provisions (which were not intended to cover any of the technologies now available), Parliament could create a new targeted legislative framework that specifically regulates the training of AI. Such a reform initiative would not only maintain a viable environment for innovation, but it would also help ensure copyright owners are protected while providing greater assurance for developers.
Until such time as section 52 is expanded through case law decisions, that expansion only creates an exemption where there is a risk of expanding and creating exclusions that Parliament itself has never considered.
Section 52 was passed with a focus on the copyright system as it pertains to the use of content by people, schools, and traditional types of research. Extending that framework into the area of commercial AI training to create large amounts of LLMs is not just a logical progression due to changing technologies, it is also a major change in interpretation with consequences throughout India's innovation system and the creative industries.
Accordingly, courts should be cautious about interpreting AI training as fair dealing without a clear legislative statement to this effect. Unless Parliament or the courts provide more guidance, Section 52 should not be assumed to be created to provide a safe harbour for commercial AI training. Any such assertion should be examined by reference to the text, purpose and limits of the Copyright Act as opposed to solely on the transformative nature of AI.
Views are personal.