Data Pipeline Strategy

If you are running campaigns across multiple channels, like Facebook or Google, you need to pull data from different Ads API to get a clear picture on how you are spending your ad dollars.

Usually, the story goes: we are spending the majority of our budget on Facebook Ads, so let’s integrate with their Marketing API. Then, we want to explore Twitter and see if it works better than Facebook, so let’s integrate with their Ads API too.

And that new integration ends up being a new isolated-monolithic app, storing data in a different database, causing both tech debt and an expensive data warehouse.

This when a Data Pipeline Strategy becomes handy:

Each channel will tend to change their ads metadata or stats format. Make sure to store the data in a de-normalized way. Elasticsearch, Cassandra, MongoDB, any NoSQL database will do here.
You will be pulling a lot of data, potentially every hour. Watch over for each API’s rate limits and plan accordingly. A microservice architecture can help you to distribute the load across multiple services.
The channels will tell you that they will autocorrect the reported data eventually. You need to create jobs at different schedules to do backfills.

If you are running on an AWS infra (and most people do, or are migrating to), my advice is that you take a look at their Data Pipeline service. After you have pulled gigabytes, you can use the Data Pipeline to move the data around, aggregate it and make it available for your team.

Author
Recent Posts

Leo Celis

Founder & CEO at InTheValley

I help startups fix engineering teams that should be moving faster. If you're scaling a startup, you've probably felt the pain: great people on paper, but execution feels slow. I've been building remote teams for startups since 2005 — engineers you can trust who actually deliver and know how to leverage AI to ship faster.

Data Pipeline Strategy

Related Posts