Producing and Analyzing Rich Data with PostgreSQL

Last month, I presented at the Rich Data Summit, a conference here in San Francisco focused on turning big data into rich, meaningful data. We were really excited to see a conference focused less on fancy new machine learning algorithms and more on the cleansing and processing techniques and technologies required to support sound analysis. It’s an area that deserves more attention as it unfortunately makes up the majority of a data team’s work nowadays.

As a data engineer at Chartio, a large part of my work has involved helping data teams get the most out of their data pipelines and warehouses so the topic of data cleansing and processing is something near and dear to me. Over the past five years or so, I’ve noticed the perception that relational databases are only good at descriptive statistics (count, sum, avg, etc.) on medium sized structured data sets. In other words, SQL just doesn’t work for inferential, predictive or causal analysis on larger or unstructured data sets. Although this may have been true five years ago, it’s a lot less true today.

I took the opportunity at the Rich Data Summit to talk a little bit about how far relational databases have come and how they compare to non-relational systems. In particular, I spoke about the PostgreSQL extension system and many of the great extensions that can be used for producing and analyzing rich data. Below is a video of the talk I gave as well as my slides.

If you’re interested in learning more, I’ll be giving a much more in-depth talk on this subject with an example analysis at PGConf SV later this month.