Reddit signs over its user data to train unnamed AI model

As user data becomes an increasingly hot commodity, an unnamed AI company is reportedly plotting a $60 million swoop to scrape Reddit for AI training purposes. Should we be concerned?

Decades of Reddit ramblings could become fuel for the next generation of AI models.

The self-proclaimed ‘front page of the internet’ has reportedly negotiated a content licensing deal with an unnamed AI company. For the cool price of $60 million, this business, in theory, now has the right to train its AI model using anything and everything posted on Reddit.

Millions of ceaseless posts from the most popular subreddits, the lurkers, and the dregs of questionable topics will become a rolling annual commodity for this undisclosed ‘big player’ of Silicon Valley.

This surprising deal arrives months after Reddit threatened to cut off Google and Bing’s search crawlers if an official deal couldn’t be struck to trade in its data. One source told the Washington Post at the time that the platform ‘can survive’ without search. Perhaps this current AI deal was in the offing back then?

If you use Reddit, they just sold you out to AI. https://t.co/0vjrl6Oyhs

— Reid Southen (@Rahll) February 20, 2024

Though Reddit’s yearly revenue was up by 20% in 2023, it was roughly $200 million shy of its $1 billion target set two years prior. The impending AI pact, paired with an opening for public investment next month, however, will likely send Reddit’s readies way north of this figure.

After a tumultuous few years, this data trade off makes perfect sense for Reddit financially speaking. Exactly what it means for consumers, though, and the ever-murky ethics of AI remains up for debate.

It’s one of the worst kept secrets that our user data is anything but private in the modern world. Remember that recent watchdog study into Meta that showed 48,000 companies had sent the platform data on a single user without consent?

Until recently, most AI companies used the open web to train their models without any sort of verified green light, but a combination of high-profile cases in 2023 appeared to be changing the landscape.

OpenAI’s quibble with the New York Times and Apple’s negotiations for major news partners suggested that AI companies were beginning to establish a strong legal framework for data scraping in line with copyright laws in other sectors.

The AI being thrust back to the technological stone age after 2 reddit posts: https://t.co/mg0DYwwrc2 pic.twitter.com/o9X9CiwUm3

— squeeb 🍉 (@Squeebus1) February 20, 2024

Similarly, Reddit’s deal shows that host companies are beginning to demand compensation for data, but the key difference here is that its 812 million monthly users have not explicitly given their consent to become part of the AI machine.

Whether or not Reddit’s terms of service are updated in the future for transparency about where our data goes, we know that all digital information prior to this deal is also fair game. In laymen’s terms, you can’t ask for permission after the fact, can you?

Perhaps this is the key difference between scraping news outlets and social media platforms for AI learning. The latter is almost entirely populated by user generated content; the use of which seems to be completely down to their respective management’s discretion.

In lieu of creating AGI (Artificial General Intelligence) platforms with a more distinctly human quality, this type of deal will likely become more common in the years to come.

While this may sound positive, just think of the endless droves of misinformation and nonsense this unidentified learning machine will be subject to if all goes to plan.

It’s Reddit for Pete’s sake. Those AI safeguards will need reinforcing for sure.

Crimson Desert developers apologise for using AI art

The massive open-world RPG Crimson Desert has been a big hit with gamers, selling over 3 million units in five days. However, despite the early success, developer Pearl Abyss has now come under fire for using AI assets without prior disclosure on Steam. The video game developer Pearl Abyss has faced criticism for using undisclosed AI-generated artwork in its new open-world RPG, Crimson Desert. https://www.youtube.com/watch?v=ZdmoGYg8tB0 Players were quick to notice signs...

By Charlie Coombs Bristol, UK

NVIDIA’s DLSS 5 was never going to be well received

Credit: YouTube

Gaming

NVIDIA’s DLSS 5 was never going to be well received

While aversion to it isn’t unanimous, NVIDIA’s DLSS 5 has received far more flack than praise. The AI graphics filter is being slammed as a slap in the face to game designers and a sad indictment of how generative tools are taking over creative spaces. People are already thirsty for Leon. He doesn’t need a yassify filter. If you missed NVIDIA’s recent demo, the company unveiled DLSS 5 as...

By Jamie Watts London, UK

How effective is medical AI? It’s complicated

Credit: Unsplash

Technology

How effective is medical AI? It’s complicated

Some healthcare professionals believe artificial intelligence will buy them more time with patients. Others fear it puts jobs, and lives, at risk. In a recent episode of The Pitt, a viral TV show about the inner-workings of a Pittsburgh hospital, overworked doctors battled a new antagonist: AI. An attending physician tells her staff that the technology can cut their time spent on charting results by 80%, buying them more time to...

By Flo Bellinger Brighton, UK

Credit: Thred

Technology

Why has Meta just acquired Moltbook?

Meta has just acquired Moltbook, the Reddit-style platform where AI agents communicate all day without the direct influence of humans. Why? The Dead Internet Theory is in full swing, and Meta wants in. Remember when people fretted that the internet would soon become an echo chamber of bots with human created content becoming secondary to machines? Well, Moltbook is giving us a sample of what that eventuality would look like. The...

By Jamie Watts London, UK

OpenAI alleges the New York Times tricked ChatGPT into plagiarism

The Pentagon’s war against American AI ethics

How effective is medical AI? It’s complicated

More from thred.

Crimson Desert developers apologise for using AI art

NVIDIA’s DLSS 5 was never going to be well received

How effective is medical AI? It’s complicated

As user data becomes an increasingly hot commodity, an unnamed AI company is reportedly plotting a $60 million swoop to scrape Reddit for AI training purposes. Should we be concerned?

Related articles

OpenAI alleges the New York Times tricked ChatGPT into plagiarism

Study lays bare Meta’s intrusive data gathering for targeted ads

Popular

Why has Meta just acquired Moltbook?

The Pentagon’s war against American AI ethics

How effective is medical AI? It’s complicated

Keep up with thred by signing up to our planet-positive newsletter!

More from thred.

Crimson Desert developers apologise for using AI art

NVIDIA’s DLSS 5 was never going to be well received

How effective is medical AI? It’s complicated

Why has Meta just acquired Moltbook?