PD12M
Source.Plus,
Dec 06, 2024
From Alan Levine comes this link: "At 12.4 million image-caption pairs, PD12M is the largest public domain image-text dataset to date, with sufficient size to train foundation models while minimizing copyright concerns. Through the Source.Plus platform, we also introduce novel, community-driven dataset governance mechanisms that reduce harm and support reproducibility over time." Search could be better, but the images are great.
Today: 3 Total: 559 [Share]
] [