Content-type: text/html Downes.ca ~ Stephen's Web ~ Incremental Jobs and Data Quality Are On a Collision Course - Part 1 - The Problem

Stephen Downes

Knowledge, Learning, Community

The exciting bit comes in the first paragraph, where Jack Vanlightly references "the rise of DuckDB and its message that big data is dead."This leads to a world of incremental data processing jobs using data sets that are "inherently small, corresponding to things like people, products, marketing campaigns, sales funnel, win/loss rates, etc." The problem, though, is what happens to data quality. "Bad things happen when uncontrolled changes collide with incremental jobs that feed their output back into other software systems or pollute other derived data sets.... The ingest-raw-data->stage->clean->transform approach has a huge amount of inertia and a lot of tooling, but it is becoming less and less suitable as time passes."

Today: 35 Total: 35 [Direct link] [Share]


Stephen Downes Stephen Downes, Casselman, Canada
stephen@downes.ca

Copyright 2024
Last Updated: Nov 14, 2024 11:21 p.m.

Canadian Flag Creative Commons License.

Force:yes