We are processing petabytes of data every day; only a few kilobytes are relevant. Storage is cheap, but processing is still expensive, especially when the users need to wait for it.
Back in the monolithic/RDBMS platform era, we cared about not to store too much data and optimizing the queries as much as we could. Now with our de-normalized, microservices-based data pipelines, we are storing everything we can just in case.
As we design our next-gen data platforms, we need to think deeply about the consumption layer and have well-defined maps (aka APIs) where users can quickly discover the data treasures. Data and product thinking need to converge.