Investigating Infrastructure, One Data Center at a Time: Experts Discuss Cutting Infrastructure Costs the Right Way
Five experts across industries sat down with Primary’s Brian Schechter to share why and how companies are cutting infrastructure to save costs—and the lessons they’ve learned along the way.
Data centers meant to store, share, and exchange information around the world are often left unused. Some companies see upwards of 85% of their centers sit idle. In 2023, deciding how to minimize this waste has become a priority as companies reduce their energy, secure sustainability, and cut costs—all while retaining their competitive edge.
Primary Partner Brian Schechter sat down with five leaders in infrastructure cost savings to discuss how unused data builds up, the decisions they’ve made to cut costs, and the key learnings found along the way. The panel featured Datadog Product Manager, Kayla Taylor; Dataminr Director of Engineering, Nitin Pillai; Mother Duck’s Cofounder and CEO, Jordan Tigani; Equinix’s Head of Edge Infrastructure, Zachary Smith; and Plural’s Cofounder and CEO, Sam Weaver.
Below, we’ve distilled four key findings from the live discussion, from moving data centers in-house to emerging tools that offer companies the right amount of infrastructure at the right price.
Current infrastructure companies are geared toward huge solutions—and huge costs.
“You see a fair amount of wasted spend in public clouds because the pre-purchase discount model such as the EDP—allows you to buy a mass amount of credits up front. Of course, if you have them, you're incentivized to use them, so you don't actually have to turn off the instance or be conscious about it. So long as you don’t use more than you pre-purchased you’re happy to keep inefficiently deploying”
- Sam Weaver
“One of the biggest problems we see with Equinix’s customers is they're usually an IT department or an infrastructure department for a company which has a hundred different divisions. They’re not in the business of metering and chargeback. They rely on a point of sale that people effectively put in once every three years to grab as much infrastructure as they can, as in, you might as well buy as much crap as you can and have it sit there. It's so much harder to go back for more. It's actually a disincentive to buy less.
“That’s why we need to transform the chargeback model. For many enterprises, leveraging the cloud is not really about OPEX or CAPEX. It's often as simple as ‘I need to have a basic understanding of what is being used,’ and be able to charge it back to different departments or projects across the organization.”
- Zachary Smith
Bigger infrastructure isn’t always better.
“That unused data is really sort of like the shirt you've had in the back of your closet for six years. You're probably not going to use it if you haven't worn it. People should be a lot more active about saying, ‘Hey, I can probably delete this.”
“There's also legal implications of keeping data around. That's why most companies have email retention policies these days. If you have the logs that say, ‘Such-and-such customer visited something,’ you may not want to keep that around forever.”
- Jordan Tigani
“It's not just about infrastructure savings. If you are using a managed solution, you’re probably going to get faster development velocity. Developers are also not going to be spending their hard-earned time managing infrastructure. They're busy writing code that makes your product and earns you money.”
- Nitin Pillai
“One of the reasons investors like Snowflake is because the net revenue retention is 170%, which means that your Snowflake bill is going to go up by 70% every year. But any exponential progression, especially a cost progression, eventually that's going to kill you.”
- Jordan Tigani
Optimizing prices, resources, internal tools, and data locations cuts significant costs.
“When we help our customers identify cost savings opportunities at Datadog, there’s two dimensions we think about. One is price optimization—negotiating enterprise private discounts and taking advantage of our SP, spot instances, and so on. The other is resource optimization, like right-sizing your computer. We have a product that profiles your code and makes your applications and systems more performant in order to downgrade that instance and reduce its cost to run.
“Of course, we also have a bunch of monitoring data systems. Most of what we're doing is telling our customers, ‘Hey, let's surface orphaned resources in a central experience where we can see particular EBS volumes that have been unattached for seven days,’ and we then address or delete them as needed.”
- Kayla Taylor
“I’m on the hook to save a few million dollars from Dataminr’s infrastructure. The number one easiest solution is to first address the low hanging fruit. Take an inventory of your assets and inventory of every single system that you're actually using, and figure out the utilization of each of the services and serverless architectures. Just go and turn it off. We saved hundreds of thousands of dollars just doing that.
“We also noticed how impactful pure user management can be. Because we don't have the right controls built into the engineering systems, people can just go and provision infra they want. After they're done, they don't really go and deprovision them. It's just sitting there. It's good for AWS, but it’s costing us a ton of money. We had to figure out which of those services are just orphaned and delete them.”
The second approach was discussing if we’re using the right technology to do the job. We use Elasticsearch for a lot of pure database queries other than Search.. It's not exactly the best usage of the technology, and we are spending around $8- $10k a day just using it. Instead, we could use a pure relational database system that provides us with ACID compliance and simple SQL like query-based access patterns to take off many of the scenarios that Elasticsearch offers us and move those workloads off to a combination of a document database like Aerospike, MongoDB, DynamoDB or even any kind of SQL database.
- Nitin Pillai
“There's plenty of optimizations you can have by just using the right database for the right job. Everybody's got a MacBook with M2 chips nowadays. Those have 16 CPUs and incredibly fast memory and disk subsystems. They’re basically supercomputers sitting idle while your really expensive cloud hardware runs. That seems like a mismatch of resources to me. Why not bring it in locally? You've already paid for that laptop, and can do things even faster that way.
“For example, if you do a bit of investigation and figure out what's actually going on, you may say, ‘Oh, I don't need these 147 joins. I can just keep a materialized view here as a single scan,’ and then, ‘Oh, I can use DuckDB to do this.’ DuckDB is the SQLite for analytics. It’s a process database, or library, you link into your work. It runs on memory quite fast through a vectorized engine. That way, you don't have to marshal data in, do a bunch of ETL, and marshal data out. You just do things where it actually sits. It even speeds up development time.
- Jordan Tigani
“More and more people are switching to open source alternatives of expensive proprietary stacks, You can easily deploy a solution that's as good as your million dollar data stack, all on open source, running your infrastructure on Kubernetes at a fraction of the cost.
“With Plural, the platform is built to perform the job of a highly effective SRE. It generates all infrastructure-as-code, stands up the service in your cloud, built to best practices for scale, security and availability. You can get into the nuts and bolts if you need to, but it's going to effectively operate , scale, provide authentication, monitoring, and logging. It feels like a managed service, but it gives you all the control of running the service yourself.”
- Sam Weaver
“People are realizing they don’t need that much data. They're spending a million dollars a year on Snowflake. That’s why Mother Duck wants to offer them a better way: $10 a month for basic storage—probably not the equivalent of a million dollars’ worth of Snowflake, but this at least gets them started, and offloads the more expensive cloud services.”
- Jordan Tigani
Have a network team dedicated to scaling back or rearranging your infrastructure.
“Most companies who are trying to repatriate workload out of the public cloud have already gone into multiple clouds and normalized on open source tools. However, even some of the most sophisticated multi-cloud companies often lack a network team, and that’s a major gap – or they lack the expertise to repatriate workload on a global scale. Customers often run head first into the ‘giant barbed wire fence’ of the physical reality of infrastructure - things like logistics, customs, importing hardware, fixing broken machines in the middle of the night, etc. When you go into the data center to fix something simple like a failed hard drive and something goes wrong and you don't come out for 12 hours, you’re left wondering what time it is and if this is really what you want to be doing.”
- Zachary Smith