The Great AI Book Heist – How Big Tech is Plundering Our Cultural Heritage
- David Salariya
- 2 days ago
- 4 min read
Updated: 2 days ago
How big tech is using pirated books to train AI – the digital theft of our culture, creativity, and intellectual property.
Once Upon a Time…
Once upon a time, you had to storm a castle to steal a library. Now, all you need now is a server farm, a shady dataset, and a Silicon Valley postcode.
Welcome to the latest chapter in the annals of cultural pillaging: The Great AI Book Heist.
Only this time, it’s not medieval raiders burning scrolls or colonial looters boxing up bronzes. It’s billion-dollar tech companies, quietly ingesting millions of books — our books — to feed their algorithms. And they’re doing it without asking, without paying, and with a shrug that says,
“Who’s going to stop us?”

Locked Doors, Stolen Stories, Book Heist
On 3rd April 2025, authors and illustrators gathered outside Meta's London HQ,11-22 Canal Reach, London armed not with pitchforks but placards. Among them: "Get the Zuck off our books" and "I’d write a better sign but you’d just steal it." They came to deliver a letter of protest, alleging that Meta trained its AI using 7.5 million pirated books — including works lifted from pirate site LibGen. The doors, unsurprisingly, were locked. Meta staff refused to accept the letter. The symbolism was perfect: shut out by the very company that’s rummaging through their intellectual drawers.
This isn’t just a copyright skirmish. It’s a heist of human imagination, the wholesale harvesting of culture created over centuries, all to fatten the datasets of LLMs (Large Language Models) that can then churn out derivative text at scale. As author KJ Charles put it bluntly: “I’d like some money.”
Let’s not mince words. This is digital looting.
The Myth of 'Fair Use' – and the UK’s Legal Line in the Sand
In the US, tech giants like Meta and Anthropic wave the flag of "fair use", claiming their ingestion of books is transformative. But in the UK, we don’t buy that. Our legal system deals in “fair dealing”, a stricter standard that clearly doesn’t cover the commercial use of entire books to build AI tools.
Under UK law, scraping copyrighted works without permission is unlawful. Not a grey area. Not up for debate. Unlawful.
So why is it still happening? Because these companies believe their scale shields them. They’ve made themselves too big to question, too vast to sue, and too slippery to pin down. It’s the logic of the old colonial empires — plunder first, litigate later.
Cultural Extraction in the Age of Algorithms
What’s particularly galling is the hypocrisy. Meta won’t accept a protest letter — but it’ll ingest entire literary legacies, page by page, through the backdoor. This isn’t some fringe concern. If you're a novelist, a poet, a historian, a children's author — your work might already be inside the machine.
And what’s more, your style — your voice — may be mimicked by AI systems trained on your work. Want a bedtime rhyme in the style of Dahl? A war poem in the voice of Sassoon? A cosy mystery like Agatha Christie? There’s a bot for that now — because someone fed it the real thing.
As Baroness Beeban Kidron said, if fragments were all they needed, a dictionary would do. What these machines want is meaning. And meaning comes from human minds, honed over time, in the form of books.
What Happens When Culture Becomes “Data”
Here’s the pulp horror twist: this isn’t just about lost royalties. It’s about who controls the future of knowledge.
If tech companies are allowed to hoover up books to train AI, then everything that makes our culture rich — dissent, style, eccentricity, surprise — risks being flattened into predictable, machine-learned pap. And that’s not just bad for authors. That’s bad for readers. For democracy. For history.
Books aren’t just data. They’re memories, mysteries, and voices. They are how one generation talks to the next. The theft of books is the theft of a civilisation’s ability to remember itself.
What Should Be Done about the AI Book Heist?
Transparency – We must know what data was used. Scraped books? Name them. Cited authors? Show them.
Legislation – The UK government must tighten protections, support a licensing framework, and outlaw mass scraping without consent.
Compensation – Authors should be paid. Not just as a nice gesture, but as a legal and moral imperative.
Public outrage – This cannot be a niche debate. It’s not about luddism. It’s about labour. Ownership. Cultural dignity.
Final Word: This Is Our Past – And Our Future
In the Pulp History of the 21st century, this moment may be remembered as the great cultural mugging. Books — the most democratic tool of education and empowerment — are being sucked into corporate black boxes. If we don't stop it now, we risk letting the future be written by the very systems that stole our stories to begin with.
So, next time you hear someone ask: What’s the big deal about AI reading books? — you can answer:
It’s not reading. It’s robbing.
Comments