The New Steel: Distillation as AI's Defining IP Contest

The master resource in artificial intelligence is not compute, not data, not even talent—it is the model itself, the productive asset into which all three inputs have been poured. And the technique now at the center of every serious intellectual property dispute in this industry is distillation: a method for extracting, compressing, and replicating the capabilities of those assets at a fraction of the cost. What the Bessemer process was to steel—a way to produce more, faster, and cheaper—distillation is to AI. And like every transformative process that reshapes the cost curve of a strategic industry, it has ignited a contest over who owns what, who can copy whom, and where the durable advantages will ultimately reside.

Alphabet sits at the epicenter of this contest. It trains models on proprietary hardware, distributes them through a cloud platform, and simultaneously releases open-weight alternatives that can be—and, by credible allegation, already are being—distilled by competitors. The tensions are not theoretical. They are playing out across White House policy papers, antitrust investigations, copyright litigation, and the architecture decisions that will determine whether proprietary AI models remain a defensible business or become a commodity swept away by open replication.

2. Distillation as an Industrial Weapon

Distillation is, at its core, a legitimate machine-learning technique: a high-performing teacher model transfers knowledge to a smaller student model, compressing not only outputs but aspects of reasoning itself ¹⁵. It reduces computational requirements, slashes training time, and enables domain-specific optimization ¹⁵. Every major AI lab uses it internally ^15,63. There is nothing illicit about the technique in isolation.

The dispute arises when distillation crosses organizational and national boundaries. The White House, through a memo authored by Michael Kratsios, characterized cross-border distillation as "industrial-scale" exfiltration and a direct threat to AI model intellectual property, principally involving China-based entities ⁶². Anthropic and OpenAI have publicly accused DeepSeek, Moonshot, and MiniMax of attempting to copy their models via distillation ^59,62. The alleged technique used to replicate U.S. AI systems is precisely this one ⁵¹.

The economics are disruptive by design. Training a large language model can cost $2 billion; distillation, by one credible estimate, costs $200 million ⁵—a 10x differential. Distilled 7-billion-parameter models are 10x cheaper than 70-billion-parameter equivalents ²⁹, enabling capital-efficient strategies for achieving competitive capabilities ¹⁵. This is not marginal cost improvement; it is a structural rewriting of who can afford to compete.

Distillation attacks can extract models built at great expense ¹⁵, enabling cheaper replication that disrupts normal competitive dynamics ⁶². The technique goes beyond simple copying: it compresses and transfers the teacher model's reasoning process itself ¹⁵. Leading U.S. labs are intensifying efforts to detect and block distillation attempts ⁵⁶. Anthropic, Google, and OpenAI have characterized unauthorized distillation as intellectual property infringement and unauthorized use of proprietary technology ¹⁵.

If the accusations against DeepSeek are substantiated, the company could face lawsuits, injunctions, or forced model takedowns ⁵⁹. The admission of distillation use has already intensified the debate over AI model IP and licensing compliance ⁵², and the adversarial tone signals potential future litigation on a material scale ⁵².

There is a paradox here that should concern any strategist. U.S. export controls, designed to constrain China's AI capabilities, may have actually accelerated engineering adaptations—including adoption of mixture-of-experts architectures, model distillation, and other optimization techniques ¹⁴. Containment bred innovation. Historically, the distillation window from U.S. labs to competitors lagged by 4–6 months; that window appears to be narrowing ²⁴.

3. The Open-Source Frontier and the Commoditization Threat

The distillation dispute cannot be understood in isolation from the broader open-source movement in AI. The debate over open versus closed models has persisted for most of the past five years ³, and Google competes on both sides. Its open models have surpassed 500 million total downloads ^28,33, its Flash models are recognized for price-performance and cost-efficiency versus open-source alternatives ¹⁷, and it has released models like Trinity under permissive Apache 2.0 licenses ^1,2.

But open-weight models reaching approximately 80–90% of frontier capability would shrink the total addressable market for paid API access to proprietary models ⁴⁷. Some enterprises already use open-weight models for primary inference and reserve proprietary models only for high-stakes tasks ⁴⁶. Open-source downloadable weights apply pressure on proprietary pricing ⁴⁸ and enable lower-cost deployments that impose pricing discipline on API services ⁴⁹.

Meta's release of open-source Llama weights has driven developer adoption and created competitive pressure on Google and Anthropic's proprietary API pricing ^48,49. When Meta shifted from open-source Llama to proprietary Muse Spark, Andrew Ng noted that developers who had built on the open-weights models were left exposed ¹⁶—a reminder that open-weight strategies create ecosystem dependencies that can become liabilities when reversed.

China's open-source strategy compounds the threat. Chinese AI labs including DeepSeek, Baidu Ernie, Zhipu GLM, and MiniMax have released multiple open-weight models in sequence ⁴⁶, leveraging approaches that reduce licensing fees and dependency on a small group of global providers ⁴⁴. China's open-source ecosystem involves universities, startups, and enterprises building on shared foundations and contributing improvements back, creating a virtuous cycle that strengthens model quality ⁴⁴. Chinese companies have a cultural tendency to publish models and papers to enable rapid iteration ¹⁸, and some sources claim China is the largest contributor to open-source AI models globally ^42,43.

Over 70% of global foundational AI models originate in the United States ⁶⁰, but the gap is narrowing. Open models are no longer dependent solely on distillation; they have achieved state-of-the-art parity with leading proprietary models ²⁴. More than 200 models are available through Google Cloud's Model Garden alone ²⁰.

For Alphabet, the strategic question is whether its proprietary models maintain a sufficient performance gap to justify premium API pricing, or whether the market shifts decisively toward low-cost and open alternatives. This is not a question for next quarter; it is the defining structural question for the next five years.

4. Copyright, Content, and the Cost of Training Data

A global consensus is consolidating around a simple proposition: AI companies should pay for the content used to train their models ⁵⁴. The White House document "Respecting Intellectual Property Rights and Supporting Creators" states plainly that U.S. copyright laws remain in force and that unlicensed use of content for AI training is illegal ^{34,35,36,37,38,39,40,41,54}.

Multiple court cases are underway to determine whether training on copyrighted content without permission constitutes infringement ^13,26, and the outcomes could establish precedent for AI training-data practices for the next decade ¹³. France has already acted: it fined Google €250 million in 2024 for using copyrighted content without permission in training its Bard AI ⁹.

The New York Times Company's litigation against AI companies alleges copyright infringement, unauthorized use of content, and building of competing products using NYT content without permission or fair value exchange ⁵⁵. Copyright disputes may affect content supply and increase costs for generative AI providers ³⁸.

The Brazilian antitrust authority CADE has investigated whether Google used publishers' material in AI-generated answers and summaries without proper compensation ²³, with CADE councilor Thomson de Andrade authoring a 185-page dissenting vote—the first instance of civil society participation in preparing a dissenting vote in CADE's history ⁶. The theory of harm involves exploitative abuse of dominance via forced free-riding, where journalistic content feeds search quality, advertising inventory, and AI training without compensation ⁶. Google described the decision as "a misunderstanding of how its products work" ²⁵.

The direction of travel is unmistakable. Proposed solutions include licensing content for training, creating royalty-distribution models, and developing provenance technologies ³¹. Reddit has already licensed content data to Google and OpenAI ²¹, and copyright holders can negotiate licensing deals under existing laws ⁵⁴. Watermarking techniques and blockchain-based provenance systems are in development ³¹, though provenance verification tools such as SynthID face technical fragility and ongoing debates about effectiveness ⁵⁸.

For Alphabet, the cost question is material. If training data transitions from a freely available resource to a licensed input, operating costs rise. But barriers to entry rise as well—and well-capitalized incumbents may find the net effect advantageous. The decisive question is whether Google can convert its balance-sheet strength into durable licensing relationships that competitors cannot match.

5. The Broader IP Litigation Landscape

The distillation and copyright disputes sit within a wider litigation environment that bears directly on Alphabet's risk profile. Google and Character.ai settled lawsuits in 2025–2026 alleging their AI chatbots harmed minors, including claims that they contributed to a Florida teenager's suicide ⁹. Google faces lawsuits in Delaware, Washington, D.C., Minnesota, and Brazil alleging harms from AI-generated falsehoods ⁸. A new privacy lawsuit alleges issues with Google's search and AI-powered information tools ⁵³.

The company is also defending against claims that it diverted revenue from websites and publishers through anti-competitive conduct in its core online advertising business ⁴. The draft NBI law explicitly names Google and Meta as targets ¹², and the DOJ's possible remedies include the plausible forced divestiture of Chrome or Android ⁵⁰. Classic structural antitrust remedies presuppose physically separable organizational layers—a presupposition that contrasts sharply with Google's integrated engineering architecture ⁶.

Intellectual property disputes extend beyond Google itself. Getty Images has litigated against Stability AI over use of its imagery in generative model training ^31,61. GitHub Copilot has faced copyright lawsuits alleging unlicensed use of copyrighted code ³¹. Databricks faces a copyright lawsuit that could seek potentially massive damages and, if precedent-setting, threaten AI training practices industry-wide ¹⁹.

Enterprise AI coding tools process proprietary source code, raising data privacy and IP protection concerns ^45,57. Tech companies are scraping internal employee communications, including old startup Slack archives and Jira tickets, to collect training data ²⁷. Open-source software communities are moving toward private development environments due to concerns about AI scraping of public code ³².

The pattern is clear: the IP regime that has governed software for decades is being stress-tested by AI, and the outcomes of these cases will determine who bears the costs of model training and who controls the distribution of value.

6. Strategic Implications for Alphabet

First, the distillation threat is real and accelerating. The combination of a 10x cost advantage for replicating capabilities ^5,29, open models achieving frontier parity ²⁴, and a narrowing distillation window ²⁴ means that proprietary AI models cannot rely on secrecy or technical complexity alone for protection. Google must decide whether its open-weight releases strengthen its ecosystem position or accelerate the commoditization of its own proprietary assets. The answer may depend on whether Google's proprietary models maintain a meaningful and durable performance gap—and whether the market values that gap enough to pay for it.

Second, the copyright and content-compensation regime is shifting from voluntary to compulsory. The convergence of White House policy ^{35,36,37,39,41}, French penalties ⁹, publisher litigation ⁵⁵, and international antitrust investigations ²³ points toward a world where AI companies pay for training data. For Google, this increases costs but also raises barriers to entry. The Reddit licensing deal ²¹ may be a template. The company that builds the most comprehensive network of content licenses will have both a cost advantage in training and a legal moat against smaller competitors.

Third, the integrated architecture that makes Google formidable also makes it fragile in litigation. The inability to cleanly separate Chrome, Android, Search, and AI into distinct legal entities ⁶ is a defensive strength in operations but a vulnerability when antitrust remedies are on the table. The concentration risk from AI-generated code propagating through 75% of new code simultaneously ¹⁰ compounds this fragility. If a model flaw, a copyright violation, or a distillation challenge hits one part of the stack, the entire edifice may feel the tremor.

Fourth, the talent dimension cannot be ignored. Google's removal of internal red lines prohibiting AI weapons development ³⁰, the employee protests over Project Maven ¹¹, and the concerns about Israeli government contracts ⁷ create governance and reputational risk that bears directly on the company's ability to attract and retain the people who build its models. The worst-case scenario—accelerated talent flight leading to reduced innovation capacity, reputational damage, and further talent loss—is a negative feedback loop that no amount of infrastructure spending can arrest ²².

7. Conclusion

The contest over AI intellectual property will define the industry's structure for the next decade. Distillation is the technique at the center of that contest—a technology as transformative to AI economics as the continuous rolling mill was to steel. The question is not whether distillation will be used. It will be. The question is who controls the terms on which it is deployed, who bears the costs when it crosses boundaries, and whether the legal and regulatory response strengthens or weakens the position of the firms that have invested most heavily in the underlying models.

For Alphabet, the path forward requires clarity on three fronts. First, a coherent position on open versus closed models that acknowledges the tradeoffs rather than trying to have both without cost. Second, a licensing strategy that converts the coming content-compensation regime from a liability into an asset. Third, a governance framework that reassures regulators, publishers, and its own engineering talent that the company's immense power over information and computation will be exercised with discipline.

The industrial history is instructive. The steel barons who prevailed were not necessarily those who built the biggest mills. They were the ones who integrated the right assets, secured the right supply lines, and understood that control of a critical production process is worth more than ownership of a commodity output. The AI models themselves may ultimately be commoditized by distillation and open replication. The durable advantage will belong to whoever controls the inputs, the infrastructure, and the distribution channels through which AI capabilities reach the market. For Alphabet, that means the contest is not over whether its models can be copied. They can be. The contest is over whether the rest of the stack is strong enough to make copying beside the point.

Sources

The New Steel: Distillation as AI's Defining IP Contest

2. Distillation as an Industrial Weapon

3. The Open-Source Frontier and the Commoditization Threat

4. Copyright, Content, and the Cost of Training Data

5. The Broader IP Litigation Landscape

6. Strategic Implications for Alphabet

7. Conclusion

KAPUALabs

Comments ()

More from KAPUALabs

Microsoft's $190B AI Infrastructure Bet: A Capital Allocation Analysis

Microsoft's AI Evolution: From OpenAI to Multi-Model Orchestration

Can Microsoft Keep Its Hyperscale Engine Running Without Overheating?

Microsoft Copilot: Bull Case for AI, Bear on Utilization