Data Stack

Table of Contents #

  1. What is a software stack?
  2. What is a data stack?
  3. Thinking in “Full Stack Data” terms
  4. The SHPPD Stack

What is a software stack? #

A software stack follos is commonly used to indicate a specific “stack” of technologies which is used to compose the front and back end of an application. Most commonly used examples are the LAMP stack (Linux, Apache, Mongo, PHP/Python/Perl) and a personal connection of mine is an early evangelist of the AND (Angular, Node, Docker) stack.

What is a data stack? #

A data stack is a similar infrastructure “stack” but composed around specific data storage and consumption services. In this case I propose that we would instead have directly analogous counterpoints to the traditional front end / backend engineering roles as frameworks for reproducible visualization and reproducible pipelining become more and more common.

The holdovers of BI designer and similar titles mostly focus on the utilization of visualization platforms rather than code for visualization and thus don’t meet that part of the critera for “front-end” data engineering.

Similarly, building and maintaining Airflow instances, code as ETL, various adapters and protocols, and more is under-represented by the title of “data engineer' whereas back-end data engineering is the domain focus on these implementation and correctly distinguishes as having only the softest focus on data analysis, interpretation, and / or visualization.

A significant inspiration for me in conceptualizing this ideas is the “Meltano” project of GitLab. (https://meltano.com/docs/)

When given this framework of:

  • Model
  • Extract
  • Load
  • Transform
  • Analyze
  • Notebook
  • Orchestrate

I believe it becomes very clear that the title of “full stack” data practioner becomes a truly aspirational ascendancy of which very few gain the practical experience to actively hold the title.

Thinking in “Full Stack Data” Terms #

When I think about developing a project now, I make the effort

The SHPPD Stack #

Components:

  • Salesforce
  • Heroku
  • Postgres
  • PowerBI
  • Datalinks / DBT

The MASTR Stack #

Components:

  • MS SQL Server
  • Azure
  • Tableau
  • R (Microsoft Open)