We are processing petabytes of data every day; only a few kilobytes are relevant. Storage is cheap, but processing is still expensive, especially when the users need to wait for it.
Back in the monolithic/RDBMS platform era, we cared about not to store too much data and optimizing the queries as much as we could. Now with our de-normalized, microservices-based data pipelines, we are storing everything we can just in case.
As we design our next-gen data platforms, we need to think deeply about the consumption layer and have well-defined maps (aka APIs) where users can quickly discover the data treasures. Data and product thinking need to converge.
Latest posts by Leo Celis (see all)
- The Role of Color in Brand Identity - 10/23/24
- Human-in-the-Loop for Bias Mitigation - 10/16/24
- Challenges in Implementing Federated Learning in Ad Tech - 10/09/24