Airbyte's Bold Ambition To Build A Platform
Michel Tricot on how the new open-source standard for data integrations came to be—and what’s top of mind as it continues to grow.
As a founder, I saw the problem many businesses face isn’t just gathering data, it’s integrating it and putting it to quick use. Michel Tricot, cofounder of Airbyte, saw this dynamic too, and set out to create an open-source data integration platform that can unify and connect different types of data.
In early 2022, the company reported in just a year and a half, more than 20,000 companies used Airbyte to sync data from sources such as PostgreSQL, MySQL, Facebook Ads, Salesforce, and Stripe, and connect to destinations that include Redshift, Snowflake, Databricks, and BigQuery. The team also saw a 6x increase in deployments around this time, from 400 per month in Q1 to 2,500 by the end of the year.
So as I spend time learning from developer tool leaders and reporting back findings with the community, Michel was high on my list. In this interview, he shares where the company evolved from, his definition of product-led growth, and where he sees Airbyte—today and in the future—on the pyramid of data needs. (P.S. Follow along to stay up to date with each installment of the series.)
Can you explain Airbyte and the inspiration that got you there?
I've been working in the data space since 2007. I started first in the medical space, finance space, AdTech marketing space, and then Airbyte. I had a little bit more experience on the business side, like mapping technology, but I have always been in the data space. I was in different roles over the years as an engineer, manager, and director, and I was able to see the scale of, "What is the cost today of being a company that can leverage its data?"
Especially when you look back to around 2015, that's really when BigQuery and Snowflake started to get a lot of traction. So data has become way easier to process and extract information from. And it's great if the data is there, but every single company in the world is struggling with the same problem: How do you get data to a place where you can extract value from it?
When I think of Airbyte, I think of Fivetran, open-source, and the advantage it brings by empowering a community to maintain the long tail of data sources an organization wants to use. Do Airbyte users use Airbyte and Fivetran, or once they're using you, are they using you only?
In a sense, you can say that "Yes, Airbyte is an open-source version of Fivetran," but the way we're building Airbyte is different from Fivetran. Fivetran is a SaaS product. They have a set of connectors. They might add a few more, but that's pretty much it. What we're building, "is a platform, we're going to provide you everything that you need to integrate data." It will have off the shelf connectors, which is how it is similar to Fivetran, but the end state will be one platform to unlock data movement from any source to any destination.
Okay, crystal clear. So you guys have really taken off?
Congratulations. What did you get right?
I think what we got right was the fact that we were born just before COVID, and we got hit very hard by COVID. We had a product before we started Airbyte.
You had a product before you started Airbyte?
Yeah. We went through YC in early 2020, and we started with a different idea. We always had a focus on data integration and getting more people more access to more data. And the product we had when we were at YC was very geared toward marketing teams: getting more data on people visiting and interacting with websites, et cetera. Things were going well. We had a ton of very good conversations. People even started to pay us and integrate our product. And then COVID hit, and nobody wanted to talk to us anymore because marketing budgets were being cut across the board. For John and I, when that happened, we paused and thought, "What are we building? What do we want to achieve with the company we're starting?" Because we're putting our whole life, all our energy, into it. From there, we shut it down completely and went back to the drawing board.
Today's signals are pretty positive, but I think the future is going to prove it more. We just focused on, "Let's dig deeper than this surface problem that we're seeing. Let's go a bit deeper, and let's not be afraid of just shutting down the next idea if we're not satisfied with it."
One thing that I find fascinating in terms of the startup journey is trying to figure out who the ICP, the ideal customer profile, is. Was there a very specific ICP for you, or was it just all the people on the data team?
They all have different needs. When we talk about Airbyte, we like to think of Airbyte not as a marketplace but as this interface between users of data and producers of data. For producers, it means they're responsible for the data pipes, and making sure that the data lands in the right place. When you talk to data analysts, you talk to analytics engineers, and they tell you, "Why do I have to wait one month to get my HubSpot data into my warehouse?" And then on the other side, you have, "I'm tired of data analysts asking me to build all these data connectors."
Yeah, I have lived that problem. How much do your users use DBT?
Quite a bit. When we released Airbyte the first thing people asked us was, "Can I run my DBT jobs into Airbyte? Can you run DBT transformations?" So we got that a lot. That was the top feature that people were asking for.
And I'm just curious, what are you working on right now? From both a product standpoint and also from an organizational standpoint?
That's a good question. The first year and a half of Airbyte's life was around bootstrapping the community, bootstrapping open source, and bootstrapping a bottom-up adoption of open source.
What we very quickly saw is, of the people that are adopting open source in their infrastructure, there is a gap between the ones who activate once, so they've been successful once with Airbyte, and the ones who are using Airbyte every single day. And we try to talk to as many people as we can about why they are not using Airbyte every single day because we know that they have a problem. And they say, "Open source is great, but I need something. I don't have the team, or I don't have the expertise to maintain it and to operate it alone." For us, this feedback was the catalyst for building our cloud product. What we have on the product roadmap now is continuing to invest in our cloud offering on top of open source.
What I hear you describing having done is product-led growth so I would be curious to hear what product-led growth means to you.
That’s a good question because everybody has a different definition. For me, it's how do you get the product to sell for you? And in that case, how do you minimize friction to get the product in the hands of people? How do you get them to, very quickly project themselves into what the product can do for them?
What do you feel like you need to do to improve the organization? You must find yourself in a very different organization today than you were in 12 months ago, and 12 months from now, it's going to be completely different again.
We recently hired our VP of engineering for Airbyte. That's been a big win for the team.
Now in terms of how we think about building the organization, it's important to realize which parts of the organization are more mature than others. If you look at engineering, it's probably more mature than the rest of the organization because we started there and the execution plan is there. We know what kind of features we need. I would say the ones that are still a work in progress are on the go-to-market side and making sure that this team becomes more mature.
When you tell the grand vision story for Airbyte when Airbyte is taking over the world, what are you doing that today you're not yet doing?
I don't know if you're familiar with what we call the pyramid of data needs. At the bottom it’s your processing engine. It's your Snowflake,BigQuery, et cetera. Then above that, you have the data management layer. Airbyte is in this layer, meaning how do we bring data into your core infrastructure. Today, we focus on this level of the pyramid. But the thing is, as you go higher and higher, then you start discovering what you need to do also at the foundation, to propagate that information up to the top of the pyramid. We're the first touch of data. We're the first software that sees data before it lands into your data infrastructure.
Today, we're not leveraging that so much. Can we do cataloging? Can we do data quality while it's in transit? There are a lot of things that you need to build that will inform the top of the pyramid. And at the top of the pyramid, you also have what we call, "Reverse ETL," which is enriched data you've joined into your warehouse and are pushing into the operating system. That's not something we do today, but something we want to get into.
One thing that I'm cognizant of, is just how crowded of a space that pyramid starts to become. And that there are a lot of really well-funded businesses that are executing in some place in that pyramid, with some particular angle. Does that stress you out?
It doesn't stress me out, because when you have a lot of fragmentation, you always have a phase when consolidation happens. The thing is, as far as fundamentals for data go, first you need to be able to access it. Second, you need to be able to process it. Third, you need to be able to activate it. These three categories will always exist. Today we fit in the first one. We might fit, at some point, in the last one. But I think within each of these steps, there will be some consolidation happening.