Menu Menu

Reddit signs over its user data to train unnamed AI model

As user data becomes an increasingly hot commodity, an unnamed AI company is reportedly plotting a $60 million swoop to scrape Reddit for AI training purposes. Should we be concerned?

Decades of Reddit ramblings could become fuel for the next generation of AI models.

The self-proclaimed ‘front page of the internet’ has reportedly negotiated a content licensing deal with an unnamed AI company. For the cool price of $60 million, this business, in theory, now has the right to train its AI model using anything and everything posted on Reddit.

Millions of ceaseless posts from the most popular subreddits, the lurkers, and the dregs of questionable topics will become a rolling annual commodity for this undisclosed ‘big player’ of Silicon Valley.

This surprising deal arrives months after Reddit threatened to cut off Google and Bing’s search crawlers if an official deal couldn’t be struck to trade in its data. One source told the Washington Post at the time that the platform ‘can survive’ without search. Perhaps this current AI deal was in the offing back then?

Though Reddit’s yearly revenue was up by 20% in 2023, it was roughly $200 million shy of its $1 billion target set two years prior. The impending AI pact, paired with an opening for public investment next month, however, will likely send Reddit’s readies way north of this figure.

After a tumultuous few years, this data trade off makes perfect sense for Reddit financially speaking. Exactly what it means for consumers, though, and the ever-murky ethics of AI remains up for debate.

It’s one of the worst kept secrets that our user data is anything but private in the modern world. Remember that recent watchdog study into Meta that showed 48,000 companies had sent the platform data on a single user without consent?

Until recently, most AI companies used the open web to train their models without any sort of verified green light, but a combination of high-profile cases in 2023 appeared to be changing the landscape.

OpenAI’s quibble with the New York Times and Apple’s negotiations for major news partners suggested that AI companies were beginning to establish a strong legal framework for data scraping in line with copyright laws in other sectors.

Similarly, Reddit’s deal shows that host companies are beginning to demand compensation for data, but the key difference here is that its 812 million monthly users have not explicitly given their consent to become part of the AI machine.

Whether or not Reddit’s terms of service are updated in the future for transparency about where our data goes, we know that all digital information prior to this deal is also fair game. In laymen’s terms, you can’t ask for permission after the fact, can you?

Perhaps this is the key difference between scraping news outlets and social media platforms for AI learning. The latter is almost entirely populated by user generated content; the use of which seems to be completely down to their respective management’s discretion.

In lieu of creating AGI (Artificial General Intelligence) platforms with a more distinctly human quality, this type of deal will likely become more common in the years to come.

While this may sound positive, just think of the endless droves of misinformation and nonsense this unidentified learning machine will be subject to if all goes to plan.

It’s Reddit for Pete’s sake. Those AI safeguards will need reinforcing for sure.