top of page

FAIR USE DOCTRINE AND AI TRAINING: LEGAL LOOPHOLES OR NECESSARY FLEXIBILITY?

-      -          Manu Jalaj[1] & Jahan Goyal[2]

 

I.       ABSTRACT

The rapid growth of generative AI has revived debate around the Fair Use doctrine, particularly in relation to the use of copyrighted works during model training. AI systems depend on large datasets that are often protected by copyright, yet the legal status of such use remains unclear. Courts have increasingly focused on transformative use, creating grey areas that developers can exploit. Existing frameworks, including Indian fair dealing provisions, lack clear rules for AI, making liability and compensation for creators challenging. This paper explores the tension between innovation and copyright, highlights legal loopholes, and proposes a balanced regulatory model for sustainable AI development.

II.     INTRODUCTION

It is imperative that we stop this theft in its tracks or we will destroy our incredible literary culture, which feeds many other creative industries in the U.S.”, said Mary Rosenberger, CEO, Authors Guild[3]. Her statement comes in the wake of a class-action suit filed by the Guild against OpenAI. The suit alleges that OpenAI used the authors’ copyrighted works to instruct its large language models (hereinafter referred as LLM) without any authorisation.[4].

This case, along with recent developments in India,[5] raises a fundamental and pressing question: where do we draw the line between innovation and infringement, and can such use of copyrighted materials be justified under the Fair Use doctrine?

It is crucial to recognise that for copyright holders, their work often represents their primary source of income and creative identity. Failing to provide suitable compensation for the use of their intellectual property not only undermines their livelihood but also devalues the foundational principles of copyright protection. This article seeks to examine the emerging tension surrounding the doctrine of fair use in the context of AI development. It also highlights the interpretative gaps and legal loopholes that developers may exploit, thereby proposing a sustainable and equitable copyright framework.

Keywords: Fair Use; AI; Copyright.

 

III.   FAIR USE AND AI: HISTORICAL ROOTS AND MODERN TENSIONS

Copyright is a “property right” that grants the author undivided authority over the use and reproduction of their original work.[6] Its origin can be traced back to the 1710 “Statute of Anne”, the first legislation to formally recognise such a right.[7] However, copyright laws are not limitless, as there exist certain exceptions to these rights. These exceptions include the use of copyrighted work for the purpose of teaching, criticism, scholarship, research, etc.[8]

Such exceptions are based on the doctrine of “Fair Use”, which is evaluated using the traditional “four-factor test”. While determining infringement, the court considers what kind of work was copied, for what purpose the work was copied, how much of the work was copied, and the influence of the use on the market price of the work.[9] The fair use doctrine also traces its roots back to the “Statute of Anne”. Though it was implemented to prevent frivolous reprinting of books, the courts soon realised that some uses, such as fair abridgements reflecting originality, learning, and serving the public interest, did not necessarily harm the author’s right[10].

The 18th and 19th-century common-law rulings continued to refine the boundaries of ‘fair abridgement’. Rulings like Dodsley v. Kinnersley[11], Strahan v. Newbery[12], and Wilkins v. Aikin[13] reaffirmed that the fair and bona fide abridgement, substantiated by the abridger’s own intellectual labour and judgement applied to it, could be non-infringing.

In the US, these English precedents provided the foundation for the landmark case of Folsom v. Marsh[14], which is considered the beginning of the “American Fair Use Doctrine”. Justice Story relied on English authorities and introduced a systematic approach. After Justice Story’s decision, US courts continued to refine Fair Use through common-law adjudication until the codification of the doctrine in §107 of the USCL[15] took place, which embedded a balanced four-factor test into law.[16] Post 1976, the US Supreme Court (hereinafter referred to as the US SC) expanded the doctrine in various landmark rulings. With the growth of software and digital platforms, Fair Use faced new challenges concerning the reuse of content, giving rise to ‘transformative use’[17]. Initially laid down in Campbell v. Acuff Rose[18], for a parody, the concept later expanded to permit the copying of certain Java API declaring code for the development of applications within the context of a transformative use, as illustrated in the case of Google LLC v. Oracle America Inc.[19] The court in Campbell explained that a transformative use is one that “alters the original work with new expression, meaning, or message”[20].

These evolving interpretations of Fair Use have become increasingly significant in the digital age, particularly in relation to AI. The phrase ‘AI’ was coined in 1956 by John McCarthy. In 1959, Arthur Samuel proposed the concept of machine learning, which enables machines to learn from huge amounts of data and make predictions without explicit programming[21].

 An AI model fundamentally consists of algorithms combined with a large dataset for training. These algorithms operate much like equations with unknown variables. The process of training an AI involves exposing the algorithms to a large dataset so they can identify the most suitable outcomes and refine the model until optimal performance is achieved.  While there exist various types of AI training methods, the one commonly used method is Generative models, which uses “large data sets to create a prompted output”, such as ChatGPT.[22]

The development of generative AI typically involves three stages: curation of the dataset, model training, and deployment[23]. In the initial stage, the data is curated from a vast array of online sources. The AI model then autonomously learns statistical patterns without relying on human annotation or pre-labelled training inputs[24]. The second stage involves training the model, wherein the AI is pre-trained to identify underlying patterns, linguistic structures, and semantic information embedded within the corpus[25]. This is then fine-tuned using ‘Reinforcement Learning with Human Feedback’, a process in which human evaluators provide feedback on the model output[26]. The last stage is the public release or deployment, whereafter the user’s input can be further leveraged to fine-tune the AI[27]. Generative AI systems are capable of producing creative outputs, including text, images, and music. However, during the initial stage, the data used for training the generative AI is predominantly sourced from publicly accessible internet content, much of which is protected by copyright[28]. This practice raises significant legal and ethical concerns regarding potential infringements of IP rights[29].

 

IV.   BEYOND FAIR USE: THE LEGAL GREY ZONE OF AI AND COPYRIGHT

Copyright concerns related to AI extend beyond authors to encompass celebrities, artists,[30] musicians, and other copyright holders across creative industries. While copyright laws are in place to provide safeguards against such infringement, developers frequently invoke exceptions such as ‘fair use’ or ‘fair dealing’ to justify the use of protected content for training purposes.

While fair use traditionally involves a four-factor test, courts have progressively emphasised the transformative nature of the use[31], often departing from the standard criteria[32]. The US SC clarified the concept of “transformativeness” in Warhol v. Goldsmith[33]. Stating that the new use should have a purpose that is substantially different from the original and must add a new message or meaning that authorises the copying[34]. Going by such a definition, an AI operator can easily establish that the copyrighted material was merely utilised in the training of the LLM, a purpose substantially different from the original. Thus, it allows the AI operators to get away with the fair use exception.

Apart from the aforementioned, there are several interpretative loopholes that the AI developers can utilise, such as § 52 of the Indian Copyright Act (hereinafter referred as ICA)[35], which exempts certain acts from copyright infringement. These include inter alia, the storage of work in electronic format for the purpose of “criticism or review”[36], or reproduction of an unpublished work for the “purpose of research”[37], or production of an artistic work from an already copyrighted work, provided that the reproduced work does not wholly “imitate the main design of the work”[38].  Such provisions, while intended to safeguard legitimate use, may inadvertently provide scope for misuse in the background of AI training and development.

It is imperative to note that Indian courts have consistently held that where an individual takes two distinct copyrighted works and brings “the two together” to create “something different”[39] then the resulting work may not constitute copyright infringement, as illustrated in Associated Publishers (Madras) Ltd. v. K. Bashyam[40]. Since copyright protects the ‘expression of the idea’, and not the underlying idea itself[41]. Consequently, when a machine processes an extensive dataset to produce new content, it is unlikely to be held legally responsible, as the generated material differs from the original author’s work. Additionally, the developers may argue that the machine replicated only the concept, not the unique manner in which it was expressed.

In addition to the aforementioned limitations, it is imperative to highlight that merely prohibiting AI from utilising copyrighted work for training does not offer a comprehensive solution. Such restrictions could significantly hinder the advancement of AI, as the companies would either be compelled to pay a hefty compensation to the copyright holders or rely on a limited pool of non-copyrighted materials, thereby constraining the quality and scope of training[42].

In light of these structural gaps and limitations, the next challenge concerns the attribution of liability within AI ecosystems. Who should be held accountable for copyright infringement done by an AI? While it may appear straightforward to place the blame on the developers, the issue is much more complex. It is crucial to recognise that most AI systems, especially the generative AI models, operate with minimal to no human intervention[43]. These AI systems simply gather from sources on the web and then generate outputs through self-supervised learning. Thus, while the courts may hold the AI developers liable in cases where there is direct involvement of the developer in the infringement, as demonstrated in “Thomson Reuters Enterprise Centre GmbH v. ROSS Intelligence Inc.”[44], they may not implicate a developer for the action of a fully automated AI system initiated by a user. 

So, can the user who induces a generative AI to produce a work be deemed the infringer? Relying upon such an assumption could further lead to havoc, potentially creating widespread liability by associating multiple users with a single act of infringement[45]. Moreover, identifying the AI itself as the infringer presents its own challenges, as AI systems are not recognised as a legal person under the current legal framework[46].

 

 

V.     BRIDGING THE LEGAL GAPS: TOWARDS A SUSTAINABLE COPYRIGHT FRAMEWORK FOR AI

In the backdrop of copyright and AI, using copyrighted material to train AI models and generate content poses serious threats. When a model is fed with a vast, diverse collection of copyrighted images in order to identify patterns and produce completely new images it is generally considered as transformative use. However, problems arise when a model is fine-tuned to closely replicate the distinctive style of an artist, making the result more like a copy rather than a new creation, raising stronger infringement concerns[47].

A striking example of copyright infringement during AI training is ‘Deep fake technology’, which can alter a person’s face, body, or voice in a video or image. It uses an adversarial training process, consisting of two neural networks, a discriminator and a generator. A generator produces fake images, videos, or audio by using patterns analysed from real content, and a discriminator attempts to distinguish the fakes from the original content.[48] Currently, since there are no regulations that directly govern deep fakes, their outputs must be evaluated upon four factors of Fair Use doctrine, as many deep fakes have the potential to harm the market and economic interests of their original author[49].

While, India currently lacks a comprehensive framework governing the use of copyrighted works for AI training. Some may argue that in order to keep up with the rapid changes, India should implement a more expansive exception similar to the US Fair Use Doctrine. However, fair use itself lacks the predictability required for AI training that depends on expressive copyrighted works and may jeopardise the economic interests of copyright holders.

The most appropriate approach may be the enactment of a clear statutory provision that combines a narrowly designed Text Data Mining (hereinafter referred as TDM) exception with broadly applicable public interest clauses protecting freedom of expression and right to information. TDM, a computational process involving the collection and analysis of large datasets to identify patterns and extract useful information. However, unaddressed in § 52 of the ICA, judicial interpretations, such as Alkesh Gupta[50] and Tips Industries[51] allowed limited TDM exception for non-commercial academic use[52].

India could adopt a framework similar to Australia’s, where copyright law permits copyright holders to issue takedown or violation notices when AI training uses unlawfully obtained works and the outputs produced can be linked back to the original sources. Additionally, Australia’s ‘AI Ethics Principles’[53], which are part of country’s AI Ethics framework, advocate for transparency regarding the use of AI in the creative process. Although these principles are non-binding, they provide essential best-practice guidelines for AI development[54].

Another recommendation is that companies should incorporate “Responsible R&D” into their copyright compliance policies. As per Microsoft, responsible AI means creating and implementing AI systems in a safe, moral, and reliable manner. Leading firms like Meta, and Microsoft and trade associations like the IT Industry Council and the US Chamber of Commerce include “Responsible R&D” as one of the core principles of private governance. Accordingly, when such entities pre-train generative AI on copyrighted works, they must adhere both to applicable copyright laws and responsible AI principles. They must obtain the consent from the original owners and fairly compensate them for using copyrighted content in model training[55].

Several big market players have also adopted preventive measures to mitigate potential copyright infringements from AI training. In September 2022, ‘Getty Images’ banned the uploading and selling of AI-generated images. ‘Shutterstock’ created a Contributor Fund to license and compensate artists whose works are used in training generative AI models. ‘Valve’ refused to release a game on Steam that used AI-generated assets. In June 2023, ‘Adobe’ agreed to indemnify enterprise customers against copyright claims for content created by its Firefly tool[56].      

 

 

 

VI.   CONCLUSION

The fair use doctrine, while vital for advancement, must not be used as a backdoor to circumvent copyright laws in the age of AI. AI’s unprecedented ability to devour and reshape creative works on a large scale demands closer scrutiny and a more responsible approach.

Regions like the European Union[57] and Australia[58] have taken proactive steps to modernise copyright laws in response to AI-driven challenges. India to is making a headway, with the 161st Report of the “Department of Parliamentary Standing Committee on Commerce” recognising the impact of AI on copyright laws[59].

It is not only the legislature that is adapting, the judiciary seems to picking up pace. In a 2023 decision[60], the Delhi HC ruled in benevolence of actor Anil Kapoor in a case involving the unauthorised replication of his voice using AI technology. The judgment not only affirmed the right to protect a celebrity’s persona but also marked a significant moment in the evolution of copyright jurisprudence in the context of AI.

As both legislative and judicial bodies begin to grapple with these transformative shifts, the global legal landscape is approaching a defining moment in the protection of copyright in the age of AI, with people standing both in support and against AI.



 


[1] Manu Jalaj is a student at National Law University Odisha (NLUO), Cuttack.

[2] Jahan Goyal is a student at National Law University Odisha (NLUO), Cuttack.

[3]“The Authors Guild, John Grisham, Jodi Picoult, David Baldacci, George R.R. Martin, and 13 Other Authors File Class-Action Suit Against OpenAI”, The Authors Guild, (Sept. 20, 2023) https://authorsguild.org/news/ag-and-authors-file-class-action-suit-against-openai/ (last visited Sept. 12, 2025).

[4] Celeste Shen, “Fair Use, Licensing, and Authors’ Rights in the Age of Generative AI, 22 Nw. J. Tech. &Intell. Prop”.157, (2024)https://scholarlycommons.law.northwestern.edu/njtip/vol22/iss1/4 (last visited Sept. 14, 2025).

[5] Online Bureau, “OpenAI faces landmark copyright infringement case in India; Legal experts weighs in”, ET Legal World, (Jan.31, 2025) https://legal.economictimess.indiatimes.com/news/litigation/openai-faces-landmark-copyright-infringement-case-in-india-legal-experts-weighs-in/117782660 (last visited Sept. 12, 2025).

[6] Lior Zemer, “What Copyright Is: Time to Remember the Basics”, 4 Buff. Intell. Prop. L.J. 54, (2006) https://heinonline.org/HOL/LandingPage?handle=hein.journals/biplj4&div=7&id=&page= (last visited Sept. 13, 2025).

[7] Intell. Prop. Rts. Office, Copyright History, https://intellectualpropertyrightsoffice.org/copyright_history/ (last visited Sept. 14, 2025).

[8] Pankhuri Agarwal, “Appropriation Art: Copyright Infringement or Fair Use?”, 8 Indian J. Intell. Prop. L. 61, (2017) https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3374205 (last visited Sept. 15, 2025).

[9] Dan L. Burk, “Algorithmic Fair Use”, 86 U. Chi. L. Rev. 283, (2019) https://heinonline.org/HOL/P?h=hein.journals/uclr86&i=295 (last visited Sept. 14, 2025).

[10] Matthew Sag, “The Prehistory of Fair Use”, 76 Brook. L. Rev. 1371, (2010-11) https://heinonline.org/HOL/P?h=hein.journals/brklr76&i=1379 (last visited Sept. 16, 2025).

[11] Dodsley v. Kinnersley [1761] 27 E.R. 270 (Ch.).

[12] Strahan v. Newbery [1774] 98 E.R. 919 (K.B.).

[13] Wilkins v. Aikin [1810] 34 E.R. 163 (Ch.).

[14] Folsom v. Marsh, 9 F. Cas. 342 (1841).

[15] Copyright Act, 17 U.S.C. § 107 (2024).

[16] Sunimal Mendis, “The US Approach to Resolving the Tension: The Fair Use Exception”,Copyright, the Freedom of Expression and the Right to Information: Exploring a Potential Public Interest Exception to Copyright in Europe 32 (1st ed.Nomos VerlagsgesellschaftmbH 2011) https://www.jstor.org/stable/j.ctv941qss.6 (last visited Sept. 17, 2025).

[17] Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569, (1994).

[18] Id.

[19] Google LLC v. Oracle America Inc., 141 S. Ct. 1183, (2021).

[20] Campbell, 510 U.S. at 569.

[21] Ravindra Kumar & Pankaj Kumar, “Training AI and Copyright Infringement: Where does the law stand?”, 2 Indian J. Integrated Rsch. L. 1, (2022) https://heinonline.org/HOL/P?h=hein.journals/injloitd2&i=1300 (last visited Sept. 17, 2025).

[22] Michael Chen, “What Is AI Model Training & Why Is It Important”?,Oracle, (Dec. 6, 2023) https://www.oracle.com/artificial-intelligence/ai-model-training/ (last visited Sept. 15, 2025).

[23] Yotam Kaplan, “Generative AI Training as Unjust Enrichment”, 86Ohio State L. J. 1, (2024) https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4775342 (last visited Sept. 15, 2025).

[24] Katherine Lee, “AI and Law: The Next Generation”, SSRN, (2023) https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4580739 (last visited Sept. 19, 2025).

[25] Bijit Ghosh, “Empowering Language Models: Pre-Training, Fine-Tuning, and In-Context Learning”,Medium, (June 12, 2023) https://medium.com/@bijit211987/the-evolution-of-language-models-pre-training-fine-tuning-and-in-context-learning-b63d4c161e49 (last visited Sept. 18, 2025).

[26] Kaplan, supra note 22.

[27] Id.

[28] Id.

[29] Tim W. Dornis, “The training of Generative AI is Not Text and Data Mining”,2 European Intell. Prop. Rev., (2024) https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4993782 (last visited Sept. 1, 2025).

[30] J.V. Abhay, “Stealing Styles — Artistic Styles and AI-Generated Art”, SCC Online Blog, (April 10, 2025) https://www.scconline.com/blog/post/2025/04/10/stealing-artistic-styles-ai-generated-art/ (last visited Sept. 20, 2025).

[31] The Chancellor of Masters and Scholars of the University of Oxford v. Narendra Publishing House, 2008 SCC Online Del 1058.

[32] Swetha Meenal & Sayantan Chanda, “Keeping Up with the Machines-can Copyright Accommodate Transformative Use in the Age of Artificial Intelligence?”, 11Indian J.Intell. Prop. L. 242, (2020) http://scc-nluo.refread.com/DocumentLink/QG826ak3 (last visited Sept. 19, 2025).

[33] Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith, 43 S. Ct. 1258, (2023).

[34] US Copyright Office, Copyright and Artificial Intelligence Part 3: Generative AI Training, Pre-Publication Version,(2025) https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf (last visited Sept. 19, 2025).

[35] The Copyright Act, 1957, § 52, No. 14, Acts of Parliament, 1957 (India).

[36] Id., § 52 (a).

[37] Id., § 52 (p).

[38] Id., § 52 (v).

[39]  Associated Publishers (Madras) Ltd. v. K. Bashyam Alias 'Arya' And Anr., AIR 1961 Mad 114.

[40] Id.

[41] R.G. Anand v. Delux Films & Ors., (1978) 4 SCC 118.

[42] Kaplan, supra note 22, 25, 41.

[43] Kaplan, supra note 22, 25.

[44] Thomson Reuters Enterprise Centre GmbH and Ors. v. Ross Intelligence Inc.,694 F.Supp.3d 467, (2023).

[45] Michael P. Goodyear, “Who is Responsible for AI Copyright Infringement?”, Issues in Science and Technology, (2024) https://issues.org/wp-content/uploads/2024/10/31-33-Goodyear-Who-Is-Responsible-for-AI-Copyright-Infringement-Fall-2024.pdf (last visited Sept. 19, 2025).

[46] Stuti Puri & Dr. Seema Gupta, “AI and Copyright: Navigating Legal Frontiers in the Age of Artificial Intelligence”, 4Jus Corpus L. J. 26, (2023) http://scc-nluo.refread.com/DocumentLink/5aV0Wd77 (last visited Sept. 20, 2025).

[47] Vaishnavi Chandrakar, “From Canvas to Code: Analysing the Generative AI and Fair Use through the Lens of the Andy Warhol Verdict”, 8J.Intell. Prop. Stud. 47, (2024) https://heinonline.org/HOL/P?h=hein.journals/jnloitl8&i=50 (last visited Sept. 20, 2025).

[48] Gaurav Singh Bisht, “Deepfake Detection Using GAN Discriminators: Implementation and Result Analysis”, 12 IJIRT 1069 (2025) https://ijirt.org/publishedpaper/IJIRT180296_PAPER.pdf (last visited Oct. 27, 2025).

[49] Kumar supra note 20.

[50] Saregama India Ltd. & ors. v. Alkesh Gupta & ors.  2013 SCC OnLine Cal 118.

[51] Tips Industries Ltd. v. Wynk Music Ltd., 2019 SCC OnLine Bom 13087.  

[52] Kumar supra note 20, 47.

[53] Department of Industry, Science and Resources, Govt of Aus., Australia’s AI Ethics Principles, (2019) https://www.industry.gov.au/publications/australias-artificial-intelligence-ethics-principles/ (last visited Sept. 20, 2025).

[54] Puri, supra note 44.

[55] Thomas A. Hemphill, “Copyright protection, artistic imagery, and the adoption of responsible artificial intelligence principles”, 4 J. Ethics in Entrepreneurship & Technology 2, (2024) https://www.researchgate.net/publication/386281493_Editorial_Copyright_protection_artistic_imagery_and_the_adoption_of_responsible_artificial_intelligence_principles (last Sept. 21, 2025).

[56] Patrick K. Lin, “Retrofitting Fair Use: Art & Generative AI after Warhol”, 64Santa Clara L. Rev. 467, (2024) https://heinonline.org/HOL/P?h=hein.journals/saclr64&i=493 (last visited Sept. 21, 2025).

[57]European Union, E.U. Directive 2019/790, Copyright and Related Rights in the Digital Single Market, (2019) https://eur-lex.europa.eu/eli/dir/2019/790/oj/eng (last visited Sept. 21, 2025).

[58] Department, supra note 51.

[59] Puri, supra note 44, 52.

[60] Anil Kapoor v. Simply Life India & Ors., (2023) SCC OnLine Del 6914.

 
 
 

Comments


bottom of page