The Flaws In India's Approach To Copyright Reform For AI
Gauri Bagali & Reedhav Gulati
31 Jan 2026 10:00 AM IST

Recently, the Government of India announced having constituted an eight-member committee to undertake a reassessment of the Copyright Act, 1957 in light of developments in the field of Generative Artificial Intelligence (“GenAI”) and Artificial General Intelligence (“AGI”). Meanwhile, the Department for Promotion of Industry and Internal Trade (“DPIIT”) also released a detailed working paper addressing two major policy gaps: a licensing framework for use of copyrighted material to train AI (also known as text and data mining) and copyrightability of content generated by artificial intelligence.
The working paper proposes a “hybrid” model, providing a blanket license to companies for AI training without an option to opt-out for copyright holders and the establishment of a centralized royalty collection body, Copyright Royalties Collective for AI Training (“CRCAT”). This body is proposed to made by rightsholders but would be composed exclusively of organizations. The body will be responsible to collect royalties based on rates fixed by a government appointed committee and to disburse it to its member organisations. The government appointed body for setting rates would mostly include government officials and technical experts with one member from CRCAT and one member from the AI industry, as chosen by the Central Government.
Although, the paper attempts to remedy significant legislative gaps and articulates an indigenous policy response, the framework it proposes is fundamentally deficient and fails to provide adequate safeguards for the interests of copyright holders and citizens alike.
The Blanket Licence and Personal Data
The proposed model does not allow copyright holders to opt-out from the blanket licence in order to provide AI with enough material to prevent hallucinations. Practically, this means access to vast quantities of lawfully-accessible information available on the internet. However, owing to accidental leaks or published for other reasons, this information is also composed of personal information that finds its way to the internet which may be “lawfully” accessed since it is openly available to the public.
Interestingly, the only reference made to safeguards of personal data appears in a footnote describing NASSCOM's submissions that the proposed exception for text and data mining would apply “without prejudice to applicable laws that protect specific categories of data, including personal data and confidential data”. The proposal is accepted in the paper without any further deliberation raising multiple concerns against the right to privacy.
In fact, for the purposes of research, we tinkered with a few AI engines to retrieve some personal information, and while some refused certain kinds of information, at least one engine was able to give us home addresses, caste categories, Aadhar Card numbers in certain instances (accompanied by internal caveats acknowledging the highly sensitive nature of the information) and phone numbers. Although such disclosures did not occur in response to every query, they did occur wherever the relevant information was available online even where such information may not have been uploaded for the purpose of granting such access to the public (for eg., in old PDFs of rank lists of some university entrance exams, etc.).
Paywalled Content and Technological Protection Measures
AI is capable of producing all content in exact verbatim and as summaries for free including paywalled content. While the paper clarifies that the proposed blanket licence would apply only to lawfully accessed data, this qualification operates primarily at the level of the AI developer and does little to address downstream consequences.
Bypassing paywall content is a punishable offence under Section 65A of the Copyright Act, 1957. While some exceptions have been made to this provision, specifically for broadcasting, even in that case, the exception does not persist if “that broadcast or performance is an infringement of the copyright in any work”.
In the recent case of Elsevier Ltd. v. Alexandra Elbakyan (popularly known as the Sci-Hub case), the Court treated the continued availability of paywalled works through mirrors and redirections as a substantive infringement, emphasising functional access over formal circumvention.
In the case of AI, it is possible for an end user to circumvent the paywall, if the company by itself lawfully accessed paywall content, and the end user seeks this information from AI, without any independent authorisation or payment.
The framework proposed in the paper does not engage with this downstream access problem, nor does it clarify whether such end users could be held liable under existing copyright or anti-circumvention provisions. This ambiguity is significant, as it effectively shifts the burden of enforcement and loss onto publishers and rightsholders, while leaving users insulated by the opacity of AI-mediated access.
The paper further proposes a grossly insufficient compensation model (analysed further in this piece) under which AI developers would recompense copyright holders for access to copyrighted works used in AI training. However, this compensation framework appears misplaced in the context of paywalled content, as it addresses only the act of access by the AI developer and not the subsequent, potentially unlimited dissemination of such content through AI outputs.
The Royalty Rate Setting and Disbursement Mechanism
In the context of royalties, the paper proposes establishing two principal bodies: CRCAT, responsible for royalty management and disbursement body, and a government-appointed committee to fix royalty rates.
According to the proposal, CRCAT would include only organizations formed by individuals per class of work in the form of Collective Management Organizations (“CMOs”). This has been limited to just one CMO per class of work. Notably, the Copyright Act, 1957 does not contain any details on what a CMO is, instead it refers to “copyright societies”. CRCAT would collect royalties and distribute them, including to non-members of CRCAT.
While the paper's proposal on the collection function mostly aligns with pre-existing law, the paper's treatment of the distribution function is problematic. CRCAT is allowed to exercise autonomy over the way it wishes to distribute royalties, however, the paper does not prescribe a method for calculation. It is unclear whether valuation would be based on contribution, usage intensity, revenue attribution or any other economic model. Although the paper outlines certain illustrative approaches, the ultimate decision is left to CRCAT and is to be taken by a simple majority of its member organizations.
As a result, the non-members applying to receive royalties (to be made eligible according to the proposal) are completely excluded from the process, having no say either over whether they intend to allow AI companies to access their work or how much compensation they will receive in return for their work. This development is particularly concerning for small unregistered creators, who the proposal supposedly aims to safeguard.
For rate-setting, the paper proposes setting up a government appointed committee dominated by government officials, with only one member from CRCAT and one member from the industry. Independent creators, including journalists, researchers, and artists, who constitute a disaggregated class responsible for the creation of cultural value, are referenced in the framework largely at a conceptual level and left out of the decision-making process. In contrast, the proposed structure accords a central role to organised intermediaries and the government's officials themselves.
The Illusion of Commercialisation
With respect to AI companies, the paper proposes that royalties would be collected in the form of a revenue sharing model. This would not be collected upfront as remuneration at the training stage and instead payment obligations would only arise only upon generation of revenue, i.e. commercialisation of the AI system. This definition subtly, but significantly merges revenue generation with commercialisation, a step that is neither economically accurate nor legally safe. Contemporary AI markets themselves expose this flaw: companies such as OpenAI report enormous revenues while remaining unprofitable, sustained by venture capital rather than profit driven commercial exploitation. The Supreme Court in CIT v. Surat Art Silk Cloth Manufacturers' Assn. hinted that mere generation of surplus or revenue does not amount to commercialization, unless profit making is the predominant object. By deferring remuneration until a vaguely defined future moment of 'commercialization' the framework effectively legitimises uncompensated extraction while offering creators only a contingent promise of payment that may never materialize.
In this context, the proposal's lack of discussion on valuation is evident. If AI-generated outputs are not substitutable for the underlying works, the basis for assigning value remains unclear. Likewise, where revenue accrues at the platform level, the proposal does not explain how such revenue would be traced to individual creative contributions. In the absence of these clarifications, “revenue sharing” risks operating more as a legitimising label than as a meaningful redistributive mechanism.
Broadcasting Analogy
The working paper repeatedly draws an analogy between AI training and broadcasting, where collective licensing has long facilitated mass use of copyrighted content. At a surface level, both involve large scale use of copyrighted works and benefit from collective licensing. However, in broadcasting, the one-to-many model is applied, meaning a fixed signal is sent to passive recipients. The copyright exceptions and licensing models in broadcasting have experienced changes to meet the challenge of limited frequencies, obligations to the public, and the costs of concurrent transmission. AI training, by contrast, involves the ingestion, transformation, and abstraction of a massive corpora of data to produce general purpose models capable of indefinite downstream uses. By importing broadcasting institutional solutions into AI regulation, the framework risks regulating processes as though they were performances. Collective licensing works best where use is identifiable, repetitive and expressive. However, AI training is opaque and largely non-consumptive. Boxing the two as functionally equivalent obscures the fundamental differences in how value is generated.
The government's approach while novel does not account for multiple stakeholders and the proposal ultimately tilts neither in the favour of the industry nor the creatives and citizens. The proposal would benefit from reconsideration by the eight-member committee and must be amended to include safeguards against use of personal data and ensure that the copyright holders' right to consent is not taken away.
India's AI mission should not come at the cost of its own citizens' rights. It is important that we take a recalibrated approach and while being grounded in transparency, consent, proportionality, and creator agency.
The authors are Law Students. Views Are Personal
