You’ve proudly labeled your services as “AI-powered” by integrating LLMs. The home page of your website proudly flaunts the revolutionary impact of your AI-powered services using interactive demos and case studies. This also makes your company’s first mark on the global generative AI landscape. Your small but loyal user base is loving an enhanced customer experience, and you can see the potential for growth on the horizon. Just a week ago, you were talking to customers and assessing a product-market fit (PMF), and now, thousands of users have flocked to your website (anything can go viral on social media these days) and crashed your AI-powered services. As a result, your once-reliable service is not only leaving existing users frustrated but is also affecting new ones. A quick and obvious fix is to revive the services immediately by increasing the usage limit. However, this temporary solution comes with a sense of unease. You can’t help but feel like you’re locked into a dependency on a single provider, with limited control over your own AI and the associated costs. Fortunately, you know that open-source large language models (LLMs) are a reality. Thousands of such models are available for instant use on platforms like Hugging Face, which opens up the possibility of self-hosting. owever, the most powerful LLMs you’ve come across have billions of parameters, are hundreds of gigabytes in size and need considerable effort to scale. In a real-time system demanding low latency, you can’t just plug and play them into your application, unlike traditional models. While you have full confidence in your team’s abilities to build the necessary infrastructure, the real concern lies in the cost implications of such a transition.
Full story : OpenAI or DIY? Unveiling the true cost of self-hosting LLMs.