373 / June 9, 2026

94% CAGR: What the Inference Boom means for your AI costs | Vamshi Ambati

52 Minutes

Play episode • 52 Minutes

Explore Playlist

Listen on

373 / June 9, 2026

94% CAGR: What the Inference Boom means for your AI costs | Vamshi Ambati

52 Minutes

Listen on

About the Episode

Vamshi Ambati has spent more than two decades in AI, through the symbolic era, statistical era, and the neural wave we’re experiencing today. A CMU PhD, founder of LatentStructure and Predera (which was acquired), now an investor at Virama Ventures, he’s one of the sharper voices on what’s actually happening under the hood of the AI boom.

We discuss a simple question: Who wins when models become cheaper and more abundant?

And try to answer this by looking at how inference spend v/s compute spend is shifting, and why inference may become the biggest infrastructure opportunity of the next decade.Vamshi explains what actually goes into the cost of a token, why AI is simultaneously getting cheaper and more expensive, and why the inference market alone could reach $1.3 trillion by 2030.

If you’re building in AI or someone who wants a clear mental model of where this industry is headed, this conversation is for you.

Watch all other episodes on The Neon Podcast – Neon

Or view it on our YouTube Channel at The Neon Show – YouTube

Transcript

Open All

Siddhartha Ahluwalia 0:45
Hi, this is Siddhartha Ahluwalia, welcome to The Neon Show. Today, I have with me Vamshi Ambati. He has been a founder, researcher and investor. It’s a very unique journey where an individual started in research 20 years ago in AI space and now it’s all paying off. Welcome Vamshi to The Neon Show Podcast.

Vamshi Ambati 1:05
Thanks for having me Sid. Finally, it’s happening.

Siddhartha Ahluwalia 1:08
Yes, yes. Finally, we are able to make it happen. So glad we are sitting across and discussing things of both of our interests. So Vamshi, if you have to summarize the current juncture that we are today, what’s happening in the macro, in the AI world with Anthropic, Claude, OpenAI, how would you summarize it to a layman and which way do you think the industry is going in the next couple of years?

Vamshi Ambati 1:36
Got it, yeah. So, I mean, to start off, it’s the best time to be an AI engineer, right? So you have an enormous amount of compute at your disposal. You have amazing capable models that are actually accessible to most people today. And you have, you know, the level of sort of access and distribution that is unprecedented. You know, you build a small app, you know, you think of Clawd Bot or all these other apps that people are building today that are going from, you know, zero stars to like maybe 100K stars in less than a month.

So all these three things together at a point in time where, you know, we have this sort of trifecta of things coming together, you know, good compute, good data, and good distribution. It’s great to be an engineer, you know, building things, experimenting quickly. And having said that, I think it’s also very difficult to get things right, because the same three things are accessible to everyone.

So really to stand out and then build something amazing in this day and age is going to be, you know, exciting nevertheless. And when I look at Anthropic, OpenAI, and others, you know, what has happened is, you know, let’s say taking my journey as an example, I’ve seen or I’ve been in AI for about 20 years now, right? So a little more than two decades, right?

My undergrad, and if I have to think of it as, you know, there were about three waves in AI, and I have probably been part of all three of them, right? And I’ll explain what I mean by that. So typically, the whole NLP and machine learning space, this was more symbolic in nature.

So think of the first wave as the symbolic AI wave, right? Where you’re building rules, you’re thinking about how to, you know, bring a flowchart or a logic to how you want the computer to think, right? So bringing more structure to the computer thinking.

And then you have the second phase where, you know, you went from the symbolic AI to statistical AI, because you didn’t want to teach the computer through a logic diagram, but you wanted to give it a lot of examples and data and say, hey, go figure this out, right? And then the third phase where we are in right now is the neural AI, where you’re not necessarily, you know, changing a lot of things, but the scale that you’re providing to these algorithms is going to, you know, the loss of scaling and everything coming together, the compute and the data at this scale given to neural machine learning algorithms is the third wave that you’re seeing today, right? And so I’ve been through all these three, and then I look at, you know, what’s happening today.

The pace of innovation is very different, right? So the first rule engines, you know, you build these rule engines, you build these expert systems, if you will. And then modifying these was so difficult that it took a long time to change things, right?

In the statistical phase, collecting data was difficult, you know, data was just not existing everywhere, you know, for people to build, right? So the, you know, I mean, you, a lot of people have access to it, like you’re building vision models and so on, right? So you have these data sets, which are like a few hundred images, a few thousand images.

So if you have to change and iterate and move things, you have to bring more data into the picture. And that was getting, you know, very expensive, as well as time taking. So you wouldn’t find the pace at which these models were rolled out. But what you’re seeing today is a whole different beast. Every month, if you will, there is a new model, right? That’s beating the old model, and visibly adopted already, and people are moving to the new models.

And the data is just flowing, like you’re generating data, you’re sort of getting access to new data created by old models, and sort of improving on top of it. So it’s an amazing time to be an AI researcher, as well as for someone to, you know, someone who has seen all these three waves, it’s like the pace is what is baffling me this time.

Siddhartha Ahluwalia 5:50
And where do you think the industry is going in terms of enterprise adoption? Will claude eat up all software, as I said, because currently the SaaS prices are down?

Vamshi Ambati 6:02
I know, I know, that’s definitely on everyone’s mind, right? So I think enterprises is a whole different beast, right? So again, this is my take, and having been, you know, the second phase of my life where, you know, I’ve been an enterprise AI entrepreneur, I look at it slightly differently, right?

So, you know, getting enterprise right is about a couple of things, I would say, enterprises don’t optimize for accuracy of a model, right? So they’re more so optimizing for reliability, and trust. So it’s very, it’s very, I think, in today’s age, with this new level of capability that the models have, getting the accuracy right is available to everyone.

But who gets the trust and reliability right in enterprise AI is probably going to, you know, the likes of these deals and the customers and enterprise, right? So I think the next two years is going to be, it’s a shift that is happening, where, you know, like, if you zoom back a bit, you know, how did enterprises move from all these waves of, you know, you have your on prem to the cloud shift, you know, the mobile shift, and then there’s the big data shift, and then you have the predictive AI shift, right? So now with generative AI, what has happened is whatever the whole fabric that the enterprise was built on, has now shaken, because enterprises, at least in the SaaS world, were about capturing workflows, right?

So you had a workflow, and then you’re capturing it in some sense, to achieve a task, right? So now, generative AI is coming in and saying, I can actually generate that workflow for you. So then the basis under which the SaaS companies are lying has been shaken a bit, right?

Now, we have to look at SaaS different from enterprise, because at the end of the day, whatever you build, whether it’s a cloud bot, or whether it’s, you know, generative AI apps, you’re still selling them back to enterprises that are going to be using it, right? So getting that right is needed today for all the startups that are coming out. But SaaS is a whole different conversation.

I mean, it’s just a transformation that we’re seeing, going from SaaS to maybe call it an intelligent SaaS, or whatever, but it will transform, it will transform into a space where generative AI is going to be very integral to, you know, the workflows and intelligent workflows that are going to be defined.

Siddhartha Ahluwalia 8:33
So in the previous eras, when claude came, for example, from on-prem, so on-prem didn’t die, else companies like Nutanix wouldn’t have been born. And it said that globally, claude is only 10% of all on-prem that is there, right? So in terms of the new, you know, the current models, acceleration that we are seeing, so would it be similar that, you know, it will be like 10% of all software will belong to the model layer?

Vamshi Ambati 9:05
That’s a tough one to put a number on. But your intuition there is right, that, you know, what’s happening is, you know, going back to the two pillars on which enterprise sales is dependent on, right? You know, trust and reliability.

Getting these two right takes time, right? So which is why what you have seen in the cloud space, you know, the adoption curve for these enterprises is going to be longer than what people would imagine, right? Because getting trust and reliability right takes time.

It takes time for someone to say, I’m okay moving all my data into the cloud. I’m okay, you know, moving all my compute needs to the cloud and not have and own my own data center. And then that could happen 100% in some verticals, but it may not happen in some regulated verticals. So I think the time from a new innovation hitting to enterprise adoption is going to be longer. I don’t think that will change. And really, that is the opportunity where, you know, people are playing, right?

Now, what percentage will be owned by the model, we are starting to see more evidence that models are going to own that more than the 10%. You know, whether it’s the new plugins that, you know, cloud has released that shook up the whole SecOps market, or, you know, the more recent sort of clawd Bot that has come in, that is starting to question productivity for everyone and then think of organizations or is very flat. So all these are good innovations.

But for that to really mature into reliable and trustworthy technology is really where, you know, the current startups will have to think. It’s tough, but the advantage is towards startups who get this right, because I can’t imagine an existing behemoth going back and saying, okay, now I already have your trust in, you know, reliability and everything. I can take care of that.

But I’m going to reinvent and now be an AI company. That’s going to be a tough sell. But a good innovative startup trying to address the trust and, you know, the reliability angle will probably be on our winning side.

Siddhartha Ahluwalia 11:18
When building today has become cheap, what has become valuable now?

Vamshi Ambati 11:23
Yeah. So when you say building has become cheap, I’m assuming that, yeah, it’s the access to these models that, you know, prompt is creating an app, prompt can create a workflow. So I mean, but I want to question that because, you know, today, I think, yes, there is access to models.

And then these models require, you know, a humongous amount of compute behind the scenes to really, you know, take that prompt and convert into a full-fledged running app, right? And let’s just talk about that part. Imagine, you know, if the token’s prices go up by 2x, 3x, will all of us still be building those same, you know, throwaway apps or a one-time use apps or, you know, an app that we already have been using for a while, but we want to rethink about it?

I don’t think, you know, the access to compute today is gated by very few providers. And, you know, the models are definitely in the hands of like a couple of them, if you will, right? So I think it has become easy in the sense that directionally, people know that this can be done.

But I would still question that the access is not fully available to everyone. You know, I for one, in the last couple of months, when the models have gotten their reasoning capabilities, you know, much to a different notch, what we have seen is they’re consuming more tokens. And then you run out of these token limits, and then you sort of hit these limits, and you can’t build your models anymore.

You can’t build your apps anymore. So I think they’re going to get more expensive as the models get better. The compute needs are going to increase, and then you’re going to get better.

So it’s not for everyone to build these applications. Right now, what we’re seeing is sort of a teaser of, you know, what’s possible. But to truly make it accessible to everyone, I think the compute has to be solved.

And then we need to make sure that compute and models are actually available to everyone, right?

Siddhartha Ahluwalia 13:35
So is there any effort going on where to make compute, let’s say, 100x cheaper or the price of model 100x cheaper, which is very counterintuitive to what you said, you said, the price of models are going to become 2x or 3x?

Vamshi Ambati 13:47
Yeah. So the, you know, there’s so to think of the price of a token, right? So if you go into the anatomy of a token, and then the general price of, you know, what constitutes the cost, right?

So there’s the model training cost that has gone into, you know, the years of training or the number of GPUs that went in, and then the scaling loss that they had to hit in order to, you know, get a better model out. So that’s all baked into it. The second is, you know, how, what sort of chips do you have access to while running the inference layer?

Because you built your model, but now you’re getting into the inference phase. And then what level of accuracy do you want out of these models? You want to run this in a lower quantization where it’s cheaper for you to get these tokens out, but then you’re losing some accuracy.

So there’s the cost angle, there is the sunk cost of training, then there is the hardware sort of limits of, you know, what hardware you’re running on. Primarily, it’s been NVIDIA so far, but then we’ll see, you know, more coming up, right? And so, you know, when you start bringing the, and then how much throughput, how many users you want to serve and everything else.

So when you start bringing all of these together, you realize that, you know, you have to sort of give and take somewhere, right? So the trade-offs are like across these four or five dimensional problem where you have to pick for your use case, what works best and where you sort of draw the. So when it comes to reasoning models, everyone wants the best model.

And so you’re looking at, you know, the cost going up, right? But there’s definitely innovation on the software front of making like all the innovation that we’ve seen with the memory bottlenecks, the solving the KV cache, the disaggregated serving where you’re, you know, considering multiple kinds of hardware, one for pre-fill and one for decode. So there are lots of these techniques.

And again, a plug is we’ve written up some of this as a book called Peak Inference. It’s on Amazon for anyone who wants to download, a friend of mine, Rajan and I wrote this book. So there’s all this innovation on the software side, which is trying to drive the overall cost down. And then there’s all the innovation on the modeling side where the model’s cost is going up. So you have to sort of bake that in. And then the usage is going through the roof.

So the more users using it for more things is obviously going to drive sort of a shortage of the tokens, right? So yes and no in different places. So it’s going to get expensive in some places, the software is going to make it cheaper, but then the need is going to drive the demand higher. So there is a time in the next year or two where we’ll start to see these. Directionally, yes, all the token prices are going down. But as the models are maturing, we’re starting to see that, you know, the opposite effect as well, where the cost is going up for the latest models.

Siddhartha Ahluwalia 16:38
So right now, globally, let’s say AWS revenue, mostly from compute layer is $140 billion. That’s the AWS revenue that’s come out recently. Assuming the total cloud revenue of the entire globe is $250 billion, seeing AWS as the leader in that, right? Do you think there’s any point in time where inference revenue of the globe, of the inference spend of the globe will become more than compute?

Vamshi Ambati 17:12
Interesting. So I’ve been looking at inference for more than four years now, right? So right before we were building PredEra, where the LLM ops layer that we built was primarily focused on how do we optimize for inference. And that was the company that we exited to, a data center company where we are also now looking at these similar sort of problems and spaces. So the inference market is one of the largest or fastest growing CAGR. So what we’re seeing is about 94% CAGR.

And then today, I think around 2026, we are looking at an $80 billion sort of market. But soon by 2030, it’s going to be a $1.3 to $1.4 trillion market. So that market is a huge one where we are going to be seeing, I mean, this decade is the diffusion decade for AI, right? So you’re going from models that people are happy with to where are we going to figure out the end use cases. And we are starting to see already two to three of really good use cases, like the coding being the number one. You have the voice models in customer facing sort of customer support and other observed other CS use cases that we’re looking at, right?

So we are seeing this decade where a lot more such use cases will be using AI, right? So this is the diffusion decade for AI. So that means inference is at the center of the conversation, right? And that market is growing huge. So the whole big bet of open AI, trying to build these target and other big data centers is not that they just want to train on this, because that cost is a one-time cost. As you train one, you sort of run inference for a lifetime. So the entire infrastructure cost of AI today is largely going into inference, because the diffusion has already started. And it’s only getting accelerated, I would say.

Siddhartha Ahluwalia 19:29
Got it. Let’s say for our users who don’t come or listeners who don’t come from AI background, how would you explain them inference?

Vamshi Ambati 19:38
Got it, yeah. So let’s take an analogy of our brain. Human brain has a very big neural network that we’ve trained over a decade, add the DNA memory as well to it. But let’s say the human brain is now taking a lot of inputs and then reacting to that could be through one of the senses, right? So you’re speaking or you’re taking some action or you’re sort of sensing or feeling a thing through the skin. So a lot of these inputs are processed in your brain. So essentially, think of the whole decades long of learning that you’ve done as a kid growing up, exposed to language, exposed to visual cues, all the rules and the societal contracts that you’ve been through, all of that is the training phase. And now you have a well-developed mind. And that is now starting to make decisions, making decisions, that’s taking actions, that’s speaking words and everything else.

So to me, that’s inference, right? So in your own space, again, people say most of the learning happens in the first few years of your character forms in the four years and so on. But let’s say even in argument’s sake, you learn till 25 years. So your model is already built. But then think of how much of sort of decision making and then sort of activity that you do with your brain. All of that is inference, right?

So inference is nothing but how do you take all this input, process it through all your neurons and then get that final outcome, right? In the large language model space, so that model training activity is essentially the training phase where you’re running through thousands or hundreds of thousands of GPUs, where you’re training all the parameters of your neural net, right? And that essentially during inference, you have to step, do this in a forward pass where you’re going through all the parameters, computing the math to say, okay, if I see this word and that next to this, there’s another word, how do I then process that the next word that is going to come out, right?

So that prediction of the next word is what we have simplified it to in the case of LLMs. And that’s inference today. And so the bigger the model size, the more expensive the inferences, because it’s going to be running through that entire list of parameters to compute, you know, what is that word coming next, right? It’s the next word prediction problem, if you will. But things get complicated when you have other modalities like video and voice coming in. But by and large, think of it as, you know, like when you have a context, predicting what comes next is the inference problem.

Siddhartha Ahluwalia 22:20
You mentioned that the biggest applications of world that till now have seen for inference is one is coding and the other is customer support. Why is that, right?

Vamshi Ambati 22:33
Yeah. So both come from, it’s a very good question and even something that I think about, like, why is it that coding has taken off in such a big way? Anyone would imagine that software engineers would be the last thing to get automated. But you know, AI went straight for it, right? So you have software. So think of, you know, just the evolution of programming, right? So you had, you know, it’s very deterministic in nature, meaning you’re going from, you know, your grammars that are sort of generating and then your programs are essentially code that adheres to a certain, you know, context-free grammar, right? So in your programming language. So, you know, when I look at the coding, there are two things that are happening.

One is the output is so deterministic that you have a static piece of code that you can actually take and then run it anywhere and validate it, right? So when something can be brought down to that, where whatever is the level of abstraction that you’re dealing with, which is English language, you know, prompts and everything, but then that gets distilled down to an artifact that is static that you can, you know, deploy and pretty much validate, it becomes much easier to get that feedback loop right, right? So to me, that is one example of why the coding has taken off.

And then the second reason I think the coding has taken off is, you know, when you look at the level, the amount of code that people write today, there’s a lot of boilerplate code, right? There’s a lot of repeatability. There’s a lot of reuse. The whole evolution of programming languages happened around the concept of reuse, right? You had these monolithic C programs that then had to be more reused. So you sort of brought in abstractions that became your C++ and then so on and so forth, right? That’s how things evolved. So there, I feel like the two things working towards why, you know, something like coding, I can understand that part. What I don’t understand is, you know, the human angle part of it, right?

So places where mundane work, which is calling people, whether it’s a sales calls, whether it’s, you know, customer support calls or like, you know, debt collection calls. In fact, like lately, I’m seeing a lot of banks reaching out with automated bots reaching out and saying like, we have these offers and so on, right? So that is like, I think at this point in time, it just felt like, you know, something that’s very mundane and repeatable action that humans don’t want to do. They’re starting to offload that to the scene. But I don’t fully agree to that, because anything that has a human to human touchpoint should be the last one that should get automated. Today, we’re just probably seeing that first sort of spike, but then that should probably, you know, die out soon, I feel. But these are two extremes.

Siddhartha Ahluwalia 25:35
But customer support has companies like Sierra, Decagon, which are already almost $10 billion companies.

Vamshi Ambati 25:43
Yes. Yes. So, I mean, the way to, I guess, it’s just the current spike, if you think of it. I mean, the models have to get extremely good, so that they sort of have that empathy, they emote correctly with humans. It’s very easy, like when you get a call today from one of these robotic or agentic models, don’t you sense it within like the first, it becomes so, so after that, what happens to the brand association? The next time you get a call from them, you’re probably going to, even if it comes from a human, you’re not going to pick up, right?

So I feel like, pretty soon, before the spike dies out, we really need to see better models that sort of emote well, have an empathy angle and everything, and at the same time, have the right interventions with humans, right? So if we have both, and I’m betting that companies like Sierra and others will figure this out soon. But if they don’t, then I think the spike has to come down.

Siddhartha Ahluwalia 26:44
So in the current cycle, what do you think is overvalued and undervalued?

Vamshi Ambati 26:51
In the current cycle, I think in terms of, I’ll speak in terms of tech more than the startups per se. So I think the agents are overhyped in the short term, but I feel maybe they will be like underhyped in the five-year term. So in the five-year term, I can clearly see a lot of my day-to-day work being offloaded to agents. But the current hype around agents is such that a Clawbot releases or a Hermes agent releases or a Nemo Claw releases, and there’s so much buzz around, okay, this is going to change everything. Every company will be run by agents. You don’t need a CEO anymore.

You don’t need the organizational hierarchy anymore. So that’s the hype part that I don’t like. But I can see these sort of things happening soon, but probably in a mid to long term, but not necessarily in the short term. Undervalued, I think there was a time where we looked at like how difficult voice was to get right. I mean, even though I complain about voice not having the emotions and everything, voice to me is still going to be the frontier to which people will interact with AI. So, I was working in TTS, like text-to-speech systems, maybe 20 years ago as part of my course project, and that was so difficult to get these things right. But now, in 14, I think Google Gemini is probably like in 140 languages or some such, it’s so much easier to do the voice modality. So, I think that is one that has matured, but it’s not yet fully diffused, and it’s ready for diffusion, but it’s underhyped, I feel a bit, as a voice.

Siddhartha Ahluwalia 29:00
They seem undervalued also then.

Vamshi Ambati 29:02
Undervalued also. I think ElevenLabs is doing okay. But imagine a country like India, how do you think AI will get diffused? You have so many languages, so many dialects, so much variations in how people sort of even say the same thing. So, forget about the vocabulary, but in the same language, you have so much variation. And then the level of literacy that is required for what used to be called digital literacy, you’re not thinking about AI literacy, so you have to go from, how do I type things into WhatsApp, versus how do I use AI, how do I use this chat GPT, everyone wants to know how to use this thing.

And voice is a very good modality to sort of interact with AI. So I think the use cases are plenty for voice, it’s just that I feel it’s a little undervalued today.

Siddhartha Ahluwalia 29:55
You started your entrepreneurial journey with services and then pivoted into MLOps, but not many entrepreneurs who start in services are able to do that, or into product also. What led you to having a successful transition?

Vamshi Ambati 30:16
My journey as an entrepreneur started in a more accidental way. So, I’m an accidental entrepreneur, I didn’t really set out to build a big company. I was basically always thinking that I need to test my potential a bit more. So, when working at companies like, you know, Identity Research Labs, or PayPal, I felt like, you know, there’s enough learning there, but then I had to push my potential a bit more. So I went to another startup that later got acquired by Zendesk. And that’s where I felt like, okay, I can understand, I understand what it takes to sort of go from, you know, a series C company to get an exit to a company like Zendesk, right? Because I was leading the data science side of things, which was the big thing in 2012, 13 timeframe.

But when I was really looking at, you know, I wanted to do something different and something where I can challenge myself. I think it was more of a personal thing where my dad had some health issues. And so I was like, okay, I understand data science and I want to apply this to a place where it’s more impactful and things that I could, you know, bring back to my personal life. So that’s how I wanted to just, you know, start to do something in healthcare. And honestly, without understanding anything about healthcare, but I had the passion to learn and do something in that space. So I went to a hospital in LA and said, I’m going to just work with you for the next three months, teach me everything about healthcare and sort of, you know, let me help you because I’m a data science expert. So that’s how I just like went, stayed with them for about three months as an in-house.

Siddhartha Ahluwalia 31:56
Quit your job?

Vamshi Ambati 31:56
Yeah, quit my job and then started off. It’s pretty stupid to do that. Or when I look back.

Siddhartha Ahluwalia 32:02
This is 2016?

Vamshi Ambati 32:03
This was 2016. And I just had my, like my kid at the same time and then I quit my work. And, but it’s just that that passion or madness, whatever you want to call it, I think that drove me to saying, okay, I have to do this at this point. So like what started as a good learning exchange between me and the hospital, then these guys were no small shop, right? So they were a fifth largest provider in healthcare. It’s called Prime Health out of in the US. They want about 48 or 50 plus hospitals. So it was a very good learning experience for me.

And then in return, I could, from close quarters, see what does it take to work with an enterprise? You know, why do they not trust? It’s not like they, you know, like people say that healthcare is slow to adopt, right?

So when it comes to tech, but I could see from close quarters why they were resistant for change. Like, why did they not move to Azure? Why were they not trusting the predictive models?

Siddhartha Ahluwalia 33:01
And what are the reasons for that? Why were they slow?

Vamshi Ambati 33:04
It’s a very different space, right? So talking about healthcare per se, you know, you need to first understand how the roles lay out in the healthcare space. So you have a hospital, like in the whole, they call it the four P’s, right? So you have the provider, patient, pharma, and then the payer, which is the insurance company, right? So the four P’s. In that, the equation of how the providers, which is the hospitals, what are they for? You know, they’re looking at, you know, sort of maximizing the overall occupancy rates, you know, how quickly they can move people. And then the legal framework around, you know, what or how, you know, the American hospital system is set up, where, you know, what are the laws that they need to abide by?

And then, you know, sorts of metrics that they get penalized on and so on. And so there are different roles for each of these, right? So there was this new role called the CAO role, which I never thought existed. It was the chief administration officer who actually oversees everything, all the metrics around the doctors, right? But when you build something, you typically build with the CTO in mind, thinking I’m gonna build this and I’m gonna go and sell to the CTO. The CTO is where you sell, but then you actually get a check from CFO, but it’s used by the CAO.

So all of that mapping, like no one, unless you’re inside, you don’t really understand some of these things. So I think that exposure led me to, you know, coming back to your question on the services side. Yeah, they were like, you know, they’re starting to trust me, but then they were like, and I didn’t have a clear view of a product. And they were like, okay, I want you to take on more work and then, you know, help us with these.

Siddhartha Ahluwalia 34:40
Just give you more work for IT.

Vamshi Ambati 34:41
Exactly. So then I was like, okay, so let’s start a company around it. And then this is an area-

Siddhartha Ahluwalia 34:46
You started the company later.

Vamshi Ambati 34:47
Exactly, yeah.
I started, exactly. I started out learning healthcare and then started the company three months later. And that was the latent structure, which is the services.

Siddhartha Ahluwalia 34:56
And how big that customer became?

Vamshi Ambati 34:58
So that customer, you know, funny, you need to think of your first customer, not in terms of revenue, but in terms of credibility, right? So they’ve added so much credibility to my tag that I could actually, I didn’t even have a startup at that point, but whatever I started afterwards with this knowledge that I could gain from watching close quarters, I could go to a company like a GSK and that became my third customer. I went to another Parkland Health out of Dallas and they became my customer. I had like two more hospitals, one in Seattle and others.

Siddhartha Ahluwalia 35:34
So in the entire journey at PredEra, right? How many of your percentage of customers were in healthcare?

Vamshi Ambati 35:41
Yeah, so this is where, you know, like my interest in healthcare for the first one and a half year and like trying to chase and land these customers, being a first time founder, not understanding healthcare, sort of hit limitations, right? So the sales cycles are really long, trying to gain trust. And once you are in, you need to know how to sort of leverage that and then go to the next level of revenue unlock, right?

So if you don’t do that, then all the hard work that goes into cracking that enterprise account really doesn’t matter much, right? So all of these are learnings and I can look backwards and say, this is how it worked. But coming back to the hard part of running services and product, when you’re running services, you understand all this, but you should also invest in how to capitalize on what you’ve built so far. Whereas what I was capitalizing for is the product that I could see, right? When I was working with multiple hospitals, I saw the same problems. Then because of the long sales cycles, we went to FinTech as a vertical.

So we worked with MasterCard afterwards. Then we went to Pharma as a vertical, GSK. Then we went to retail as a vertical, Walmart. So working with different companies, I was optimizing for how do I now build a product out of it, right? So you can go double down into the same space and unlock more services revenue, or you can see the common pattern across and then unlock the product value. So I went the other route, trying to build.

And that’s when PredEra was started in 2019 as a product-only company, right? So we saw all of this and then built out PredEra with a view of, okay, this is the MLOps problem that we are solving, and let’s give it a tag. And surprisingly enough, 2018, 19, when we started out with this product idea, I was testing the waters with, okay, let’s go knock on the doors of all DVCs in Sand Hill Road and see what they think about it, right? We were welcomed in every single, like all the big folks who were looking at AI, like Anderson Horowitz and General Catalyst and Foundation Capital, one of your folks on the podcast. All of them looked at us. They were all pretty interested that we were looking at MLOps as a problem because MLOps as a vertical started much later, maybe 2020, I think, right around that time when we saw five or six companies come up.

But we were the first wave of MLOps company that wanted to productize this. So I wasn’t in a fully successful, coming to the limitations of running services and product. It’s very hard to sort of make the transition into a VC fundable product space. But then I realized that I couldn’t make the transition, but I want to continue this services sector that actually feeds the real use cases to me, opens the doors to the real customers. And so I wanted to continue it as a bootstrapped product. So that takes a toll on you personally, but I think that transition is hopefully smoother than the VC funded transition.

Siddhartha Ahluwalia 38:58
Understood. And all of this could happen because you, back then the term was not coined, but you were a forward deployed founder in a hospital.

Vamshi Ambati 39:08
Yeah, absolutely. I think it started in a very accidental manner, turned into a services company and a product company.

Siddhartha Ahluwalia 39:16
And when you exited, how much of the revenues were coming percentage? No need to answer absolute, but percentage were coming from services versus products?

Vamshi Ambati 39:27
So product, we exited very early, like I was telling you, right? So we had to make pivots, even in the product, things change, right? So we started out with this MLOps product, but we were innovating on the monitoring aspect of AI, because AI reliability and trust, as I keep going back to that, I think that seemed like the pillars around which enterprises are harping on. So we said, okay, let’s build monitoring dashboards for AI. Let’s make sure that whatever decisions are run through AI, we know exactly why they were happening that way. And so the MLOps product was starting to take shape. And funny enough that the first customer for our MLOps product was Walmart, the largest giant in retail. And they bought it not for themselves, but they wanted to open it up for all their suppliers. So it was a very big thing that could have just turned into gold for us.

It did work for a good year and a half before COVID hit. And then a lot of those initiatives at Walmart sort of took a backseat. But from every crisis comes an opportunity. So we were watching very closely what was happening throughout this. And we really caught the wave of LLMs. So in 2022, when the LLMs came about, we were in the right mind space as a company, as well as founders, where we thought about it as this could be it, because I could clearly see, just from my journey itself of seeing that symbolic AI transitioning to statistical AI. And that really helped my PhD shape out as well, where I could turn, my PhD was in statistical AI, even though I started my sort of grounding in symbolic AI.

So when this happened, I could clearly sense that this is a sort of a shift that doesn’t come across too often. So 2022, we said, the boldest thing that we did was we did not renew the contract with Walmart, because they said, continue the MLOps product. And we said, we’ll be an LLMops company. And Walmart was not ready for LLMops at that point. But we said, okay, fine, we’ll stop this. And all in with LLMs, 2023, I think like beginning of Jan timeframe, and then the open source models started to come out at that point in a small way, that really helped our journey as well.

So we pivoted to this LLMops product. And I think that shaped out. So we didn’t collect too much revenue, but we had enough users on the LLMops site that was interesting for.

Siddhartha Ahluwalia 42:03
So most of the revenue was coming from services side?

Vamshi Ambati 42:05
Yeah, so I would say the services to the split was about 15% or 20% tops for the product to the services side. And both companies, we sort of exited in two different transactions.

Siddhartha Ahluwalia 42:22
But to the same

Vamshi Ambati 42:23
No, different. So we sold the services separately. And then because the product company, the value of the LLMops platform to the buyer was a lot more. So they didn’t really care about the rest of the services.

Siddhartha Ahluwalia 42:37
Would you advise founder to start with services if no product or problem is visible?

Vamshi Ambati 42:44
Interesting. So, I mean, I’m a bit biased, like from what I can see, but just stepping back and thinking about what’s happening with AI today, to your earlier question on, if it’s become so easy to build things, where does the founder focus on, right? And I think like today’s AI, the problem of AI is not intelligence, but it’s an integration problem, right? So you have to figure out how to integrate this into the right workflows, gain the right end users and then sort of work with. And this is a classic enterprise sales problem, right? So which can only be solved by forward deployed engineers like the Palantir model, or figuring out the right recipes, or let the use cases sort of come to you where you’re working on these use cases and building solutions that you can take to the customer.

And so all of these are kind of quote unquote solutions and services or non-recurring engineering.

Siddhartha Ahluwalia 43:45
So would you call Palantir a services company or a product company?

Vamshi Ambati 43:53
It’s a great product disguised as services. So it’s, I’m sure there’s merit to the product, but the launching vehicle is services, right? Because these are hard to understand products that you have to take to the customer. There is a long lead time of getting trust in the product. All of this can only happen when you’re embedded closer to the customer.

Siddhartha Ahluwalia 44:14
When you are forward deployed with the customer like you, how do you ensure that you are not building a, say just for this customer, but you are building a real company? Else you can turn out to be a one-man shop for this.

Vamshi Ambati 43:48
Absolutely, absolutely. Yeah, yeah, that’s the biggest concern that any person subscribing to this direction of, let’s go build this FTE approach of learning the problems, right? So I think that the two things that we followed, which kept us a bit more honest to this, were building the product ambition out, was one, we said, until we get repeatability with five customers, let’s not think about a product, right? So that’s one rule of thumb we set up. Just build it out, build it out for five customers. Then the second time you work with someone in the same vertical, it should not be this, it should not be as difficult as the first time around, right?

So if we start seeing that, then you know that that metric is off, then you clearly don’t have a product. So we were very closely guarding those two numbers, right? So we wanted to get applicability in a broader range across five customers from different verticals. And so the MLOps product really helped us get that, right? So we were working with the top three in every vertical, right? So whether it’s a GSK in Pharma or Walmart or MasterCard and so on, we had those bases covered. You pick the verticals we want to play in, we want to be working with one of the top three. And then within that, once you land them, also the GTM becomes easier, right? When you sell to the best guy in that vertical, the second best guy or the third best guy is more open to talking to you, right?

And then when we can do that much quicker, then you have a much better leverage as thinking about a product. Until then, I wouldn’t say, don’t even pretend that you’re building a product because that’ll hurt you more than it’ll help you because you’re basically thinking that you’re building product, the cycles go there, but then they’ll really just start to look like multiple products and not just one single product.

Siddhartha Ahluwalia 46:23
Any other learnings, Vamshi that you have from your journey that you could share with founders?

Vamshi Ambati 46:29
On the enterprise AI side, I think the biggest learning is, so I’ll speak about it from the startup angle because it’s often people, the question that I get asked very often is, how is it that you landed some of these big names, right? I didn’t have a sales team. I didn’t have a marketing team. It was purely with the products that we were building and then the ability to deliver, right? So I think the two simple things I can say from my journey of distilling that sort of know-how is one, enterprises are actually looking for solutions to their problems. And quite often what happens is the big companies that they’re working with, the managers, the VP and below levels, like the directors and managers, they actually don’t approve of the, call it the big five or the different firms that come and pitch large solutions, big PowerPoint decks, they don’t get their trust, right?

And they look at that and say, okay, if you’re forced to take this company and work with it, we’ll do it. But is there someone else that is solving this, right? So if you can position yourself as someone who goes in with that level of authenticity, goes in and honestly says, I’m good at this particular piece, I don’t know the other pieces, but can you position yourself to solve their problem? They will trust you. They will trust you and then they will come back and tell you, hey, can you customize it this way? Can you do it that way? Because then they know that they can work with you, they can talk to you, right? So you need to be open to that, find those opportunities where they’re unhappy with the big pitches, right? So that is one that I’ve learned from IGN if I can distill it.

First build that trust and then you yourselves can be that sort of beacon of trust where you’re going in and then sort of helping, right? The second thing is, and don’t be shy going to the biggest names in the bucket, right? It took me nine months to land Walmart, right?

But I’ve known people who have spent years and still couldn’t get into Walmart, right? But we had the right product. We could approach them in a point of trust where we could work with them and sort of were going in with something that we were really good with and then sort of open to making changes in the way that we wanted, right? So whatever it takes, like how much effort is required, just go for the first, the top three at least, right? But then good things will follow after.

Siddhartha Ahluwalia 49:15
And how do you transition from a technical and a research founder to a sales, leading sales in your organization?

Vamshi Ambati 49:23
With a lot of scars on the back. It’s, yeah, so I mean, it’s good and bad, right? So the, like when I walk into a room with a customer, they don’t look at me as someone coming to sell them, but someone coming in to solve their problem, right? So that trust comes from my technical background, right? So that definitely helps sort of position myself in that. But then there comes a point where the sale doesn’t move forward. There comes a point when you need to step up that game of I’m gonna follow up, I’m gonna understand, I’m gonna sort of read the room correctly. I’m gonna sort of take what it takes to, like play what it takes to sort of close the deal, right? So those really, the first few, I mean, I traveled the first one and a half years of running the services firm.

I’ve been all across US, but I drove literally 50,000 miles in my car, just going to people and talking to people and trying to understand, and then being very critical about myself and where my shortcomings are, right? So then it was clear that, okay, sometimes in some deals you need to go as two people, not one, and sometimes you need to follow up at the right points in time, right? So sometimes you’re like, you shouldn’t speak, like you’re just, you need to be quiet and listen to their pain points. So all these took a good year, year and a half of talking to a lot of, I think the failed deals taught me a lot more than the deals that I won. So that’s just being self-critical at different points.

Siddhartha Ahluwalia 51:05
Well, thank you, Vamshi. It’s been phenomenal to record this podcast with you. Thank you for sharing your journey super candidly.

Vamshi Ambati 51:13
Thanks, yeah. I think if at all my journey helps, another entrepreneur in the same space, always happy to sort of help and talk to them. And as I said, I have this thing called Virama Ventures, through which I support founders who are sort of in similar spaces where I sort of identify and I’m passionate about. So always happy to talk. And thanks for the opportunity to talk to you guys.

Siddhartha Ahluwalia 51:36
Thank you so much.