Pay for the data you use

There’s something of a sea change going on in the global AI debate, and it’s happening in the UK of all places. But not in a subtle way, by any means. The MPs ended up pushing back against another favorite of the tech industry: applying AI algorithms to large amounts of online content without looking at who actually owns it.
Their solution is simple, almost obvious. If an AI model is trained on someone’s content, it should definitely pay for it.
Meanwhile, a UK parliamentary committee is calling on the government to implement what it calls a “licensing-first model”. That would mean companies would need permission before they could use copyrighted works to train AI models. This includes everything from books and journalism to music, art and photography, basically all the raw materials that make up the web.
It’s not hard to understand why.
If you’ve followed the rise of AI at all, you’ve likely come across the term “text and data mining.” It sounds vague, maybe innocent. But it actually means what it says on the tin: algorithms that sift through vast amounts of web content to understand patterns. That’s how AI learns to create text, images, snapshots and conversations.
It’s clever stuff, really.
But there’s a part of the equation that some in the tech industry are reluctant to discuss at times. Many of those things are owned by people, writers and artists and photographers and journalists, who often spend decades producing them.
And, understandably, none of them are too happy to work as unpaid teaching assistants in an AI classroom.
“The harm that could be caused to creators by the widespread use of AI produced without proper copyright permissions or the payment of fair remuneration is clear and real,” the House of Lords Communications and Digital Affairs Committee warned in a briefing with the UK government. “If this were to happen, the creative industries which play such a vital role in the success of the UK economy could be seriously damaged.”
You can feel the creators’ frustration with the title.
Imagine spending years writing a book, or an album, or a photo portfolio, only to find that AI has somehow taken over your style. It’s not cheating in the old sense, perhaps, but it’s close enough to raise some eyebrows. But here’s the kicker: the artist won’t even know.
That’s why some policymakers believe the default should be rolled back. The burden should be on the AI provider to demonstrate that it has a license for the material used. Where did we get this data? How did we find it? Let’s make this clear.
It sounds straightforward. It’s actually tricky.
But it’s an idea that’s gaining momentum. The UK is not the only country facing this issue. Many countries are trying to find a way to control AI without stifling its development.
Soft dance.
The European Union, for example, recently unveiled its own proposal for an EU Artificial Intelligence Act that aims to increase accountability and transparency of AI systems. It’s far from a cure-all, but it shows that governments are serious about AI governance.
But here’s the thing.
When one area of the empire becomes serious, others tend to follow. Technology companies are global, they don’t respect borders, so a decision made in London or Brussels can affect how AI is developed in California, Toronto or Singapore.
So while this may appear to be a UK problem, it is actually part of a wider tug of war game.
If the UK eventually decides to require licenses, AI developers may have to completely rethink how they get their training data. That could create whole new industries: data licensing companies, publishers and news organizations partnering with AI providers, whole businesses popping up just to provide AIs with things to learn.
Data conflict can be a business opportunity.
Unsurprisingly, the tech community is very pessimistic about this prospect. It argues that requiring licenses for all the information an AI system learns from could stifle innovation, or make it more expensive. Training large AI models is already very expensive. Sometimes millions. Sometimes billions. In dollars.
If you pay licensing fees for that, it can get dicey.
But the Wild West approach, taking as much data as we can now and worrying about legal issues later, may be coming to an end.
Whether you’re an AI enthusiast, a techie or just a curious person who’s ever wondered why chatbots seem to be so good at imitating you, the training data debate is growing into one of the biggest areas of the AI age.
And if the UK’s propaganda is any indication, the war has just begun.



