CDP as in "Pipeline", not "Platform"

Why we stopped relying on traditional CDPs years ago, choosing instead to create Customer Data Pipelines on top of Data Warehouses.

Whether we’re working with a fast growing startup like GOAT or a Fortune 500 like Nike, GM or Walmart, a Customer Data Platform is usually a key part of our digital strategy. This is actually more important to us than a CRM even.

And yet, most CDPs our clients have deployed fall short of their promises.

This doesn’t surprise us. We’ve been building Customer Data Platforms since our days working with Starbucks in 2015. And yet, we stopped years ago.

Instead, we’re focusing on Customer Data Pipelines.

Let me explain.

What does a CDP do?

Let’s step back a bit and look at how a CDP works.

Essentially, you have a few steps:

  1. Ingest data coming from multiple sources: 1st party apps (mobile, Web, WeChat, backend…) or 3rd party apps (Analytics, Advertising, CRM, Customer service, e-commerce platforms, databases, …).
  2. Process the data. This usually means running through logging, validation and normalization.
  3. Identify the data. We maintain an identity graph that ties together all the customer identifiers across devices and touchpoints. We can use deterministic (e.g. matching phone numbers) or probabilistic tactics.
  4. Store the data.
  5. Act on the data, which usually comes down to activation (e.g. personalized ads, targeted messaging…).

Overview of a Customer Data Platform

This concept of identity resolution is where the magic happens. It’s not particularly hard to understand, but quite difficult to do well, especially at scale and in real-time.

Almost everything you would use a CDP for relies on the identity graph: product and content recommendation, segmentation, personalization, …

The problem with “platforms”

All that sounds great, but as I mentioned, most CDPs fall short of their promises:

They lock you out. Ultimately, you want complete control over your customer data, particularly in the context of data regulations like PIPL or GDPR. Using a CDP, you’re delegating a large part of your data privacy and data governance away.

They end up being another data silo. Most CDPs are optimized for a few use cases, often times around marketing. But all teams in your organization need access to this data: executive teams, legal, customer support, IT… It’s usually difficult, and costly, to get data to flow through the entire organization.

They’re limited. There’s only so much you can do through simple user interfaces, especially considering where the industry is going with advanced analytics (think AI and ML). At some point you’ll need to use expert tools, as close to the underlying tech and data as necessary.

You need a Pipeline, not a Platform

At the end of the day, a lot of what CDPs are trying to solve isn’t new: most mature organizations have already invested in their data infrastructure and capabilities.

More specifically, they usually already have a data warehouse. And modern data warehouses are pretty powerful, offering a long tail of features, some of which go well beyond what would be possible with a CDP out of the box.

So, why not build a Customer Data Pipeline on top of your data warehouse?

Enable the collection, processing and identity resolution steps and keep your entire data stack on-premise.

And consider the job done right after the identity resolution step. The CDP is here to collect the data, build the identity graph and store all the customer data along with it in your data warehouse.

What happens downstream (in-app personalization, targeted campaigns, segmentation, real-time reporting and analytics…) is best addressed on a case-by-case basis.

Your CDP can push the data to other platforms (e.g. Salesforce Marketing Cloud), some teams may directly fetch data from the warehouse with a BI tool, or you may choose to invest in building a custom piece of software to enrich user profiles in your first-party apps.

This approach has many advantages:

  • We get a fully open and customizable stack (e.g. custom identity resolution).
  • We retain full control of the data (great for things like PIPL).
  • We can safely integrate sensitive data sets (e.g. transactional data).
  • We can leverage data warehouse technology for things like advanced analytics (e.g. predictive analytics).
  • We make data easy to access and integrate out of the data warehouse for all teams.
  • We use best-of-breed solutions for downstream problems.
  • We reduce cost. No need to purchase an expensive CDP solution, no need to maintain multiple copies of the customer data or maintain multiple architectures.

Our usual stack

We usually recommend Rudderstack, but other solutions exist: Snowplow, Airbyte, …

Rudderstack comes out with over 150 integrations (Adobe Analytics, Salesforce, Marketo, GA, Tiktok Ads, …) and allows us to roll out our own (we created our own WeChat mini-program SDK for example).

Combine it with things like dbt, to create materialized views for other teams to consume, and Metabase to create interactive dashboards and you already have a pretty compelling data stack.

Get started sooner rather than later…

This may be a nuanced point, but choosing to look at a CDP as just a data pipeline problem has greatly improved the quality of our digital execution. It made it a lot easier to compartmentalize roles and responsibilities between CDP, digital products, martech, reporting, …

We’re a lot more likely to support other teams across the entire organization: building a referral program for the sales team, creating real-time dashboards for the executive team, pushing customer segments and scoring to the marketing team’s Social CRM, helping IT and legal comply with PIPL, personalize product recommendations in WeChat mini-programs…

Now, you may think you don’t need a CDP just yet. But keep in mind that you’ll need to collect data for months before you can do anything useful, so better standing up your data infrastructure as early as possible.

Moreover, in China, PIPL has made it a must for international organizations. You simply can’t have your Chinese customers’ data stored across the globe.

Finally, keep in mind that this is just a small part of an effective omnichannel strategy. We’ve started covering some of the other technical bricks (like our middleware strategy), but getting your organization to shift to a data-driven culture is the real challenge. We’ll hopefully get to cover some of that in our upcoming “Omnichannel Blueprint” ebook.

Ronan Berder
Posted on October 13, 2022 in Technology

Share this post

Scan to open in WeChat

Stay tuned

Follow our newsletter or our WeChat account; every other week you'll receive news about what we're up to, our events and our insights on digital products, omnichannel, cross-border ecommerce or whatever the team fancies.

  • Add us on WeChat

  • Subscribe to our newsletter