Reddit signs over its user data to train unnamed AI model

As user data becomes an increasingly hot commodity, an unnamed AI company is reportedly plotting a $60 million swoop to scrape Reddit for AI training purposes. Should we be concerned?

Decades of Reddit ramblings could become fuel for the next generation of AI models.

The self-proclaimed ‘front page of the internet’ has reportedly negotiated a content licensing deal with an unnamed AI company. For the cool price of $60 million, this business, in theory, now has the right to train its AI model using anything and everything posted on Reddit.

Millions of ceaseless posts from the most popular subreddits, the lurkers, and the dregs of questionable topics will become a rolling annual commodity for this undisclosed ‘big player’ of Silicon Valley.

This surprising deal arrives months after Reddit threatened to cut off Google and Bing’s search crawlers if an official deal couldn’t be struck to trade in its data. One source told the Washington Post at the time that the platform ‘can survive’ without search. Perhaps this current AI deal was in the offing back then?

If you use Reddit, they just sold you out to AI. https://t.co/0vjrl6Oyhs

— Reid Southen (@Rahll) February 20, 2024

Though Reddit’s yearly revenue was up by 20% in 2023, it was roughly $200 million shy of its $1 billion target set two years prior. The impending AI pact, paired with an opening for public investment next month, however, will likely send Reddit’s readies way north of this figure.

After a tumultuous few years, this data trade off makes perfect sense for Reddit financially speaking. Exactly what it means for consumers, though, and the ever-murky ethics of AI remains up for debate.

It’s one of the worst kept secrets that our user data is anything but private in the modern world. Remember that recent watchdog study into Meta that showed 48,000 companies had sent the platform data on a single user without consent?

Until recently, most AI companies used the open web to train their models without any sort of verified green light, but a combination of high-profile cases in 2023 appeared to be changing the landscape.

OpenAI’s quibble with the New York Times and Apple’s negotiations for major news partners suggested that AI companies were beginning to establish a strong legal framework for data scraping in line with copyright laws in other sectors.

The AI being thrust back to the technological stone age after 2 reddit posts: https://t.co/mg0DYwwrc2 pic.twitter.com/o9X9CiwUm3

— squeeb 🍉 (@Squeebus1) February 20, 2024

Similarly, Reddit’s deal shows that host companies are beginning to demand compensation for data, but the key difference here is that its 812 million monthly users have not explicitly given their consent to become part of the AI machine.

Whether or not Reddit’s terms of service are updated in the future for transparency about where our data goes, we know that all digital information prior to this deal is also fair game. In laymen’s terms, you can’t ask for permission after the fact, can you?

Perhaps this is the key difference between scraping news outlets and social media platforms for AI learning. The latter is almost entirely populated by user generated content; the use of which seems to be completely down to their respective management’s discretion.

In lieu of creating AGI (Artificial General Intelligence) platforms with a more distinctly human quality, this type of deal will likely become more common in the years to come.

While this may sound positive, just think of the endless droves of misinformation and nonsense this unidentified learning machine will be subject to if all goes to plan.

It’s Reddit for Pete’s sake. Those AI safeguards will need reinforcing for sure.

AI detection is coming to Substack

Substack is rolling out a new AI detection feature able to scan posts, notes, replies, and comments. Substack is one of the last safe havens for organic human-written content, right? This week, the beloved writing platform announced its imminent integration with AI detection software Pangram. It will allow users to instantly scan a post, comment, note, or reply – anything over 100 characters – for the presence of text generated...

By Jamie Watts London, UK

How research says Gen Z is redefining friendship online

Credit: Pexels

Insights

How research says Gen Z is redefining friendship online

Online friendships have long been viewed as less ‘real’ than those formed through face-to-face interaction. However, research indicates that for Gen Z, whose social lives are increasingly shaped by digital spaces, that assumption is outdated. In a world of 8.3 billion people, how realistic is it to assume that the right people for us are all within a 10km radius? More than 2,000 years ago, Aristotle defined friendship as a reciprocal...

By Bee Labutale Midlands, UK

Credit: Wikimedia Commons

Technology

Does V.A.R suggest the future of super technology isn’t so black-and-white?

The intense backlash against digital interventions in sport proves we’re still slow to trust the tech designed to help us. For all the promises made about Video Assistant Referee (V.A.R) technology, if this World Cup has proved one thing it’s that people don't want perfect decisions. They’d rather have decisions they truly believe in. V.A.R has, over the course of this summer, become football’s most controversial player. Recent flashpoints involving Argentina...

By Flo Bellinger Brighton, UK

Is loneliness the latest Gen Z status symbol?

Credit: Unsplash

Insights

Is loneliness the latest Gen Z status symbol?

A pivot on singledom, the rise of ‘lonely influencers’, and a crackdown on self-optimisation have more and more of us romanticising the lonely life. I’ve grown up feeling that loneliness is something to hide, or at least feel negative about. My earliest visualisation of adult loneliness was probably Bridget Jones singing ‘All By Myself’ in her pyjamas, and as I evolved through school, university, and my first jobs, I came to...

By Flo Bellinger Brighton, UK

OpenAI alleges the New York Times tricked ChatGPT into plagiarism

Does V.A.R suggest the future of super technology isn’t so black-and-white?

More from thred.

Does V.A.R suggest the future of super technology isn’t so black-and-white?

As user data becomes an increasingly hot commodity, an unnamed AI company is reportedly plotting a $60 million swoop to scrape Reddit for AI training purposes. Should we be concerned?

Related articles

OpenAI alleges the New York Times tricked ChatGPT into plagiarism

Study lays bare Meta’s intrusive data gathering for targeted ads

Popular

Toy Story 5 comes for big tech, but does the moral lesson land?

Is loneliness the latest Gen Z status symbol?

Does V.A.R suggest the future of super technology isn’t so black-and-white?

Keep up with thred by signing up to our planet-positive newsletter!

More from thred.

AI detection is coming to Substack

How research says Gen Z is redefining friendship online

Does V.A.R suggest the future of super technology isn’t so black-and-white?

Is loneliness the latest Gen Z status symbol?