OpenAI claims it’s ‘impossible’ to develop ChatGPT without copyrighted content, an inadequate excuse.

A few weeks after facing a lawsuit from the New York Times, alleging the unauthorized use of “millions” of copyrighted articles for training its large-language models, OpenAI defended its practice before the UK’s House of Lords communications and digital select committee. The organization argued that it has no choice but to utilize copyrighted materials for its systems to function properly. Large-language models, like those forming the foundation of OpenAI’s ChatGPT chatbot, gather extensive data from online sources to “learn” how to operate, raising copyright concerns.

The New York Times’ lawsuit claims that OpenAI and Microsoft aim to benefit from the Times’ journalistic investment without permission or compensation. OpenAI faces similar accusations from a group of 17 authors, including John Grisham and George RR Martin, who accused it of “systematic theft on a mass scale” in 2023.

In its presentation to the House of Lords, OpenAI acknowledged using copyrighted materials but argued that it falls under fair use, emphasizing that it is practically impossible to train advanced AI models without such materials due to the broad scope of copyright today. The organization contended that limiting training data to public domain content from over a century ago would not meet the needs of current AI systems.

Despite OpenAI asserting the fairness of its use, some find the argument unconvincing. Critics liken it to justifying illegal activities by claiming necessity. OpenAI’s reliance on fair use is crucial to its defense, maintaining that it complies with applicable laws, including copyright laws. The organization emphasized that training AI models using publicly available internet materials is a widely accepted practice, supported by various stakeholders, including academics, businesses, and civil society groups.

OpenAI also pushed back against the New York Times’ lawsuit, accusing the Times of ambushing them during partnership negotiations and manipulating prompts to extract content from their model. OpenAI pledged to develop mechanisms for rightsholders to opt out of training and pursue additional partnerships. However, skeptics view this as a “forgiveness instead of permission” approach, suggesting that OpenAI is scraping content regardless and prefers agreements over potential legal constraints.

Leave a Reply

Your email address will not be published. Required fields are marked *