It is still too early to tell if this is going to be a habit in the tech industry, but there is growig pattern of what appear to be scraping for troves of copyrighted content for AI training.
Last 5 August, 404 Media’s Samantha Cole reported that the US$2.4 trillion company NVIDIA asked workers to download videos from YouTube, Netflix and other datasets to develop commercial AI projects. The graphics card maker is among the tech companies appearing to have adopted a "move fast and break things" ethos as they race to establish dominance in this feverish, too-often-shameful AI gold rush.
The training was reportedly to develop models for products like its Omniverse 3D world generator, self-driving car systems and "digital human" efforts.
NVIDIA defended its practice in an email to Engadget. A company spokesperson said its research is "in full compliance with the letter and the spirit of copyright law" while claiming IP laws protect specific expressions "but not facts, ideas, data, or information." The company equated the practice to a person’s right to "learn facts, ideas, data, or information from another source and use it to make their own expression." Human, computer… what’s the difference?
YouTube doesn’t appear to agree. Spokesperson Jack Malon pointed to a Bloomberg story from April, quoting CEO Neal Mohan saying using YouTube to train AI models would be a "clear violation" of its terms. "Our previous comment still stands," the YouTube policy communications manager wrote to Engadget.
NVIDIA employees who raised ethical and legal concerns about the practice were reportedly told by their managers that it had already been green-lit by the company's highest levels. "This is an executive decision," Ming-Yu Liu, vice president of research at NVIDIA, replied. "We have an umbrella approval for all of the data." Others at the company allegedly described its scraping as an "open legal issue" they’d tackle down the road.
In addition to the YouTube and Netflix videos, NVIDIA reportedly instructed workers to train on movie trailer database MovieNet, internal libraries of video game footage and Github video datasets WebVid (now taken down after a cease-and-desist) and InternVid-10M. The latter is a dataset containing 10 million YouTube video IDs.
0 comments
Post a Comment