Is it possible to modernize the data pipeline without hiring a new data team?

Link copied

In the last 20 years, data processing capabilities have increased at least by 1000x thanks to cheaper storage, cheaper bandwidth, increased compute, and most importantly, advancements in distributed data processing techniques (MPP, Hadoop, Spark). What used to cost millions of dollars in CapEx and OpEx to process a terabyte’s worth of data now costs hundreds of dollars, if that. Despite such modern advances, with data flowing in real time from all aspects of our analog and digital lives, the volumes of data collected and processed by enterprises continued to grow at an even faster exponent, and the old terabyte problem is now a petabyte problem for many enterprises.

In the past generation, OLTP and OLAP worlds were bridged by ETL and ESB. Multibillion-dollar companies such as Oracle, Teradata, Informatica, Tibco, and MuleSoft emerged as leaders in these fields respectively. Most of their technology innovations occurred during the “terabyte-scale” era. Since then, a new generation of cloud-enabled services began to disrupt these incumbents. Companies such as Cloudera, MongoDB, Snowflake, Databricks, and Confluent began to promote new technologies and best practices to ensure companies could handle “petabyte scale.”

Many enterprises rushed ahead to build pristine “data lakes” often to discover that they had “data swamps.” Data lakes were more focused on storing vast sums of data infinitely, as opposed to ensuring that the quality and format were correct. As a result, a new class of tools emerged to solve these legacy problems such as data preparation, data catalog, and ETL. Many of these products were based on open source technologies contributed by hyperscale internet companies. They were complex, unstable, and had steep learning curves without adequate documentation or tutorials. Furthermore, they forced enterprises to invest in new talent who could do platform engineering work in order to get data to a useful state. These people were data engineers. They had skills in Java, Scala, distributed clustering, message queues, systems management, and provisioning, in addition to data analysis. Data engineers were difficult to source, and the very best worked in the largest successful internet startups (who paid them far better than most enterprises could afford).

However, recently we’ve seen the data world start to fully embrace cloud computing. As a result, there is a lot of fantastic growth and momentum in storing, processing, and analyzing data all in the cloud. In such a world, why would I want to spend time cobbling together my own tooling and looking to build a data engineering competency?

This is when we intersected on our latest investment, Upsolver. Upsolver was founded by two data practitioners frustrated by the experience of having to manage a data lake in order to scale their ad-optimization startup. They toiled away at solving systems management problems (scaling, reliability, performance) instead of working with datasets. As a result, they pivoted their company to focus on the data pipeline (ETL) problem and built a sophisticated product that can be used by traditional DBA or SQL analysts to handle modern data pipelines in the cloud.

Upsolver found early traction and was recently featured as an AWS Advanced Technology Partner. We loved that they were able to easily handle production “petabyte-scale” real-time workloads for their customers, but we were more impressed upon learning that these customers were able to realize the full value by upskilling their existing staff and resources instead of having to hire a new team or outsource implementation to a consulting firm.

One of our investment theses is that there is a large opportunity in creating heroes out of an underserved class of stakeholders. That’s how companies like VMWare and Splunk got built. In Upsolver, we found a similar pattern in the forgotten DBA who had been left out of the journey to data lakes and the cloud. We are excited to announce our lead in Upsolver’s $13M Series A financing along with our friends at Wing Venture Capital and our close collaborators Jeff Rothschild and Sohaib Abbasi.

Is it possible to modernize the data pipeline without hiring a new data team?

Related articles