Menu Menu

OpenAI alleges the New York Times tricked ChatGPT into plagiarism

In what is due to be a definitive case for the future of generative AI, the New York Times is suing OpenAI for training ChatGPT using its articles without permission. The defendant alleges that the media outlet tricked its AI model specifically to produce answers verbatim.

The landscape of generative AI may not look so lawless in 2024, if the New York Times can win its landmark case against OpenAI’s parent company, Microsoft. Big if.

In what is due to be a pivotal juncture for generative AI platforms and their innate processes, the media outlet is suing ChatGPT’s creator for training its language models using NYT content without permission.

While the very nature of a deep learning model is to compartmentalise as much data as possible to generate valuable responses, the NYT alleges that ChatGPT has recited its content verbatim on several occasions.

A spokesperson said this ‘undermines and damages’ the company’s reputation while simultaneously depriving it of ‘subscription, licensing, advertising, and affiliate revenue.’ The Times updated its terms of service in August 2023 to prohibit the scraping of its articles and images for AI training.

In laymen’s terms, the NYT now views ChatGPT as direct competition in the news business and isn’t keen on sharing its intellectual property without compensation.

In a juicy turn of events, however, OpenAI has stated a belief that employees at the NYT deliberately tricked the generative AI tool into replicating excerpts from its articles. Dismissing the case as being ‘without merit,’ OpenAI still hopes to partner with the media outlet – as it has with The Associated Press, among others.

Of the apparent examples of plagiarism, which the public are obviously not privy to, OpenAI claims that the NYT either explicitly instructed the model to regurgitate or cherry-picked examples from many attempts.

The selected quotes ‘appear to be from year-old articles that have proliferated on multiple third-party websites,’ a company spokesperson said. OpenAI previously axed a ChatGPT feature called Browse upon discovering it unintentionally reproduced content, but seniors refute allegations that its generative AI has the same problem now.

On utilising the NYT’s content for system training, OpenAI argues that its practices fall under fair use rules that allow for repurposing copyrighted works. OpenAI permits companies to block its web crawler from scraping by directly blocking its IP address, but the NYT feels it already took the initiative by introducing its blanket policy changes last summer.

Expressing a similar stance to the UK House of Lords, the ChatGPT proprietor argued that copyrighted works must be incorporated to ‘represent the full diversity of breadth of human intelligence and experience.’ This is hardly surprising, given the alternative represents death to the very concept of generative AI.

On the other hand, you can understand why staple institutions of the publishing world aren’t chuffed with the notion of new, ambiguous tech ventures muscling in on their revenue streams. The ethics of AI are still contentious at best, and the green shoots of regulations aren’t keeping up with the technology’s ceaseless commercial growth.

The onus, arguably, should be on generative AI companies to forge alliances by making it worth the while for content creators. Otherwise, legal retort such as this will always be a possibility.

In this instance, however, it doesn’t appear that a compromise is forthcoming from either party. The primary source of interest now moves to the potential ramifications of this lawsuit, and just how big they could be for the future of generative AI as an entity.

Accessibility