Downes.ca ~ Stephen's Web ~ NVIDIA: Copyrighted Books Are Just Statistical Correlations to Our AI Models

NVIDIA: Copyrighted Books Are Just Statistical Correlations to Our AI Models

Ernesto Van der Sar, TorrentFreak, Aug 19, 2024
Commentary by Stephen Downes

This is a fairly in-depth look at the details of a case between content authors and Nvidia, a manufacturer of AI chips. Some parts aren't in contention - for example, "the company's use of the 'Books3' dataset, which was scraped from the library of 'pirate' site Bibliotik." But others are contested. There are two major elements here. First, is the copying of a book for the purposes of analyzing it fair use? Second, and more significantly, is the extraction of certain 'facts' from the book fair use? Eg. suppose I learn from a book that "the Battle of Hastings was in 1066" (which, in fact, I did). If I restate it, is that fair use? It seems so. But what about things like grammatical principles and word order? We start a sentence 'The battle of Hastings...' but never 'Battle the of Hastings'. This, Nvidia argues, is what it extracts from books. Not the content.

Today: 0 Total: 413 [Direct link] [Share]

View full size