If you are running campaigns across multiple channels, like Facebook or Google, you need to pull data from different Ads API to get a clear picture on how you are spending your ad dollars.

Usually, the story goes: we are spending the majority of our budget on Facebook Ads, so let’s integrate with their Marketing API. Then, we want to explore Twitter and see if it works better than Facebook, so let’s integrate with their Ads API too.

And that new integration ends up being a new isolated-monolithic app, storing data in a different database, causing both tech debt and an expensive data warehouse.

This when a Data Pipeline Strategy becomes handy:

  1. Each channel will tend to change their ads metadata or stats format. Make sure to store the data in a de-normalized way. Elasticsearch, Cassandra, MongoDB, any NoSQL database will do here.
  2. You will be pulling a lot of data, potentially every hour. Watch over for each API’s rate limits and plan accordingly. A microservice architecture can help you to distribute the load across multiple services.
  3. The channels will tell you that they will autocorrect the reported data eventually. You need to create jobs at different schedules to do backfills.

If you are running on an AWS infra (and most people do, or are migrating to), my advice is that you take a look at their Data Pipeline service. After you have pulled gigabytes, you can use the Data Pipeline to move the data around, aggregate it and make it available for your team.

Leo Celis