Goods: organizing Google’s datasets
Adrian Colyer,
The Morning Paper,
Jul 13, 2016
This is in my mind the correct way to manage data. Rather than define your data models ahead of time (and then require that every system and every person comply with the data model) you simply allow people to define and store data however they want, and then collect it and organize it after the fact. That is, after all, what Google does with the world wide web. This article summarizes a paper describing a system that does that. 'Goods' is a system that organizes the documents used inside Google. "Goods crawls datasets from all over Google, extracts as much metadata as possible from them, joins this with metadata inferred from other sources (e.g. logs, source code and so on) and makes this catalog available to all of Google's engineers." Did it work? "Goods quickly became indispensable." Yeah, it would. Tell me again why you have to design your models ahead of time? "Because Goods explicitly identifies and analyzes datasets in a post-hoc and non-invasive manner, it is often impossible to determine all types of metadata with complete certainty."
Today: 5 Total: 108 [Share]
] [View full size