Data lakes power UIs, not engineers

“Data (…) is typically stored in a distributed file store that can hold high volumes of large files in various formats. This kind of store is often called a data lake.”
Big data architectures

Big data architectures are often designed backward: engineers start with the data sources first (inputs.) The bigger the data size, and the faster it needs to be processed (real-time), the more excited it is for engineers to design the pipelines.

The right approach is the opposite: you need to start with the outputs. Are you powering a dashboard? About what? For which period? Which level of granularity? How often will users check the dashboard?

Perhaps you are combining, de-duping data, and creating a source of truth for different entities across your data lake.

Or you might want to predict something. What are the metrics you need to predict? How often? With which accuracy?

As it is true with most of the product design work, you need to start with the end-user: what is the problem you are trying to solve, and how you are going to present the solution to the user. The final UI will determine how you need to orchestrate your big data architecture.

Author
Recent Posts

Leo Celis

Founder & CEO at InTheValley

I help startups fix engineering teams that should be moving faster. If you're scaling a startup, you've probably felt the pain: great people on paper, but execution feels slow. I've been building remote teams for startups since 2005 — engineers you can trust who actually deliver and know how to leverage AI to ship faster.

Data lakes power UIs, not engineers

Related Posts