Downes.ca ~ PD12M

PD12M

Source.Plus, Dec 06, 2024
Commentary by Stephen Downes

From Alan Levine comes this link: "At 12.4 million image-caption pairs, PD12M is the largest public domain image-text dataset to date, with sufficient size to train foundation models while minimizing copyright concerns. Through the Source.Plus platform, we also introduce novel, community-driven dataset governance mechanisms that reduce harm and support reproducibility over time." Search could be better, but the images are great.

Today: 0 Total: 464 [Direct link] [Share]

View full size