Ascent of Open AI Models in Production

By Rohit Seth (rohit@cloudnatix.com) Jan-28-2025

The Deepseek announcement…Is it a Sputnik moment for the US or AI in general? Maybe…

  • Sovereign AI is another story. There should at least be 30 countries that must be thinking national LLMs are not only plausible but easily doable.  Details will matter.

  • Has it already started the discussions around (reducing the) cost for AI initiatives in enterprises? The answer is most likely to be yes! Deepseek or not, enterprises always get to paying attention to the cost of doing business.  They have to! 

At least, Deepseek has proven again that constraints, even though painful, are often good for computing…they typically lead to new innovation faster and quicker adoption of new technologies.

Ever since the release of ChatGPT, so far the training of foundation LLMs has consumed most of the cost.   We are still at the leading edge of the development of a new technology.  That training has been extremely expensive to get right.  Deepseek seems to have lowered the bar there dramatically.  The devil is in detail.  Though a HUGE win for open source for sure! 

This march to reduce the cost of doing AI operations has only barely begun for the broader ecosystem.  There are a lot of opportunities ahead.  Different parts of SW and HW stack will work together to achieve another few orders of magnitude efficiency gains. 

While LLM training is a large part of the recent focus for enterprises.  The inference market has started to grow and I believe that cheaper training will lead to more companies actively deploying inference solutions.  The flexibility and control of self hosted open models will likely play a bigger role for enterprises.  While lower training cost does reduce the total cost of ownership for inference, but it may not dramatically lower operation costs for inference.  Don’t expect a HGX H200 at AWS to become <$10 per hour from its current price of $45 per hour any time soon.   

For the pricing of growing inference footprint, look at the Statista report here:

Deepseek is about 50% cheaper than Gemini for inferencing.  But look at what is the cheapest.  The Llama model.  Again, an open model.  These open models give flexibility to enterprises for their use case.  They could host it themselves, they could use third party service providers to host them and charge based on operations cost.

CloudNatix AI platform is built on our own open source LLMariner.  True to our values, we make sure that the inference stack is easy to set up (less than a couple of hours), easy to manage (day-2 operations) and extremely cost efficient (based on our sophisticated Kubernetes optimizations).

The most important change in AI ecosystem is likely that we are entering a new AI operation dynamic.  Enterprises are going to have a rapidly growing LLM inference footprint, the training use cases itself might now grow because it is cheaper, and then there are other critical use cases in any enterprise for GPUs like non LLM related vision, deep learning etc. 

More workloads, wider scope and much higher scale– this sounds very familiar to us!  This is typically where resource sharing and platform optimizations have a huge impact.   

Though GPU optimizations (like virtualization or time sharing) themselves are in their infancy.  And the acute demand of Nvidia GPUs makes it harder to optimize when you have barely enough capacity and cycles to just be able to run the workloads.

This is where Cloud optimizations give us a head start.

Some of the existing technologies that have helped traditional CPU workloads to incur less cost and best performance  in the cloud might come handy immediately.  Something as simple, but still very effective, could be just identifying the right kind of GPU VMs.  It does make a difference whether you use H100 ($20 hr) or A100 ($4 / hr).  As an additional example, spot cpu instances have typically saved 50 - 60% in production environments with almost zero down time through proper SW management of resources.  And even though spot GPU instances could be even more scarce, it could be a great opportunity.  Pseudo spot instances backed by cheaper clouds (tier 3 & 4) could be used as well to similar effect.  Look at the price difference between H100 SXM in AWS (~$5/hr) vs Lambda labs ($3.20/hr)

CloudNatix AI platform has advanced features like GPU federation. This enables enterprises to seamlessly look at GPU capacity and cost across multiple clouds and be able to run some of the workloads at a dramatically lower cost.

Overall, the DeepSeek announcement is a significant development that is likely to have a ripple effect across the AI ecosystem. It underscores the importance of open models, the need for cost-efficient AI solutions, and the ongoing evolution of the AI landscape.

For any inquiries, please contact:

Email:contact@cloudnatix.com

Website: https://www.cloudnatix.com/

Follow us on LinkedIn: https://www.linkedin.com/company/cloudnatix-inc 




Previous
Previous

Stop Thinking Tasks, Start Thinking Transformation: The Real Power of AI

Next
Next

Taming the AI Infrastructure Beast:  Why Simplicity Matters!