There’s never been a better time to be a data engineer, in large part due to the rapid innovation and rising popularity of modern warehouses and data processing tools. When working with customer data, though, collecting all of your relevant information in one place and making it usable around the organization is a non-trivial challenge.

Customer Data Platforms (CDPs) have tried to solve for data collection and activation, but unfortunately most of them make the problem worse by creating additional data silos and integration gaps. …


There’s never been a better time to be a data engineer, in large part due to the rapid innovation and rising popularity of modern warehouses and data processing tools. When working with customer data, though, collecting all of your relevant information in one place and making it usable around the organization is a non-trivial challenge.

Customer Data Platforms (CDPs) have tried to solve for data collection and activation, but unfortunately, most of them make the problem worse by creating additional data silos and integration gaps. …


Overview

This post looks at Mattermost ‘s customer data stack, which allows them to seamlessly leverage unlimited, real-time data across multiple sources to drive various analytics use-cases. We also look at how this data stack aligns with their open-source values and complies with their strict data privacy and security requirements.

Who is Mattermost?

Mattermost is an open-source messaging and collaboration platform that is a popular alternative to enterprise business communication tools like Slack. It is built for high-trust environments, and the deployment is fully self-hosted and brings together all enterprise-wide communications into one place.

As you’d expect from an open-source tool, they offer hundreds…


Over the last 5 years, cloud SaaS tools have made the jobs of developers and data engineering teams much easier in many ways. One of the most profound improvements has been the ability for teams to ‘outsource’ the build and infrastructure management of core functionalities.

It’s a good time to be building software when Stripe manages payments infrastructure, Okta takes care of SSO, Algolia provides robust search and so on.

When it comes to customer data, though, cloud SaaS tooling often tells a different story.

There’s no shortage of powerful software for creating audiences, running user analytics and other use…


In our previous post, we discussed why Apache Kafka wasn’t the right solution for RudderStack’s core streaming/queueing engine. Instead, we built our own streaming engine on top of PostgreSQL. This article discusses the internals of our implementation using the queueing system in more detail.

Queueing Systems: An Introduction

The core concept behind any queueing system is trivial. A CS101 implementation involves a linked list of items. A queueing system adds elements (or, in our case, events) to one end while consuming them from the other, as shown in the figure below. Once the system consumes an event, one can remove it from the list.


Overview

In this post, we will see how to access and query your Amazon Redshift data using Python. We follow two steps in this process:

  • Connecting to the Redshift warehouse instance and loading the data using Python
  • Querying the data and storing the results for analysis

Since Redshift is compatible with other databases such as PostgreSQL, we use the Python psycopg library to access and query the data from Redshift. We will then store the query results as a dataframe in pandas using the SQLAlchemy library.

The purpose of this exercise is to leverage the statistical techniques available in Python to…


Overview

In this post, we see how to load Google BigQuery data using Python and R, followed by querying the data to get useful insights. We leverage the Google Cloud BigQuery library for connecting BigQuery Python, and the bigrquery library is used to do the same with R.

We also look into the two steps of manipulating the BigQuery data using Python/R:

  • Connecting to Google BigQuery and accessing the data
  • Querying the data using Python/R

In this post, we assume that you have all your customer data stored in Google BigQuery.

If you are interested in learning more about how to…


In this post, we cover two key algorithms for mining clickstream data — Markov Chain, as well as the cSPADE algorithm. These techniques allow you to leverage the clickstream data to get a 360-degree view of your customers and personalize their overall product experience.

We also focus on the two key problems that these data mining techniques solve:

  • Predicting customer clicks to create data-driven customer personas, based on their behavior
  • Segmenting clickstream data based on user profiles and the actions performed by these users.

Note: For this post, we assume that you have your clickstream data already collected and stored…


We recently came across this question on Quora:
What are the benefits of a data warehouse for a web startup over third-party analytics tools like Google Analytics and Mixpanel?

It’s a good question, and the answer isn’t necessarily simple, in large part due to the problem of scale. Your analytics needs in the early stages of your business are far different than when your business is much larger, and processes much more data.

Popular analytics tools like Google Analytics, Mixpanel, and Amplitude are excellent products. In fact, we use many of them ourselves here at RudderStack. But that doesn’t mean…


RudderStack is an open-source Customer Data Infrastructure for collecting and routing your customer data for analytics. With a special focus on data privacy, security, and reliability, RudderStack is enterprise-ready and gives you the flexibility of transforming your event data to suit your business requirements.

In this interview with Software Engineering Daily, Soumyadeb Mitra — the founder, and CEO of RudderStack — talks about the Customer Data Infrastructure space and RudderStack.

You can listen to the entire interview below:

https://softwareengineeringdaily.com/?powerpress_pinw=9383-podcast

Customer Data Infrastructure with Soumyadeb Mitra

Here is the transcript of the interview:

JM: Soumyadeb Mitra, welcome to the show.

Soumyadeb

RudderStack

The Customer Data Platform for Developers, Written in Go and React- https://rudderstack.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store