Inferless May 2024 Newsletter
Learn how to build Real Time streaming apps using open source, check out our latest features including Runtime Versioning,Auto fix etc, and highlighting community achievements and more!
Hello Inferless Community! 🎉
We're excited to bring you a host of new resources and platform updates. For those new here, at Inferless we specialize in deploying custom machine learning models in a serverless environment. Our approach ensures minimal cold starts, making GPU inference workloads both speedy and cost-effective.
🚀 Inferless Platform : New Features & Enhancements:
Custom Runtime Versioning: You can now track changes across custom runtimes with ease. View different versions and manage your deployments without affecting older versions. Updates will only apply when you deploy the updated version in the model settings, providing greater control and stability. Check the documentation here.
AutoFix Suggestions: Introducing AutoFix Suggestions for model imports. Leveraging our AI-powered RAG application, we analyze error logs to provide tailored suggestions for fixing issues. This feature helps streamline the troubleshooting process, saving you time and effort.
3. Concurrent Build Workers: To speed up the model import process, we've introduced concurrent build workers. This update minimizes wait times in the queue by allowing multiple builds to proceed simultaneously, significantly reducing overall build duration.
We also shipped latest load balancer version, enhanced error error handling for builds to increase platform performance and infrastructure stability better. You can check the May Changelog here.
🌟 From Inferless Blog:
Dive into our latest blog post where we explore the integration of SSE with NVIDIA Triton Inference Server for building real-time AI applications yourself. Using a Python backend and the Zephyr model, this blog provides a comprehensive guide to setting up effective real-time data streaming and inference systems. You can read it here.
Discover how Cleanlab slashed their GPU costs by 90% with Inferless Serverless Inference. In our latest success story, we explore Cleanlab’s journey of switching from traditional GPU clusters to to Inferless. This change not only resulted in significant cost savings but also improved scalability and reduced cold start times, boosting performance and freeing up resources for their technical team. Read more about their experience here.
Looking to deploy your own customer service bot? Check out our latest step-by-step guide to deploy a Serverless customer service bot using Llama Index, Pinecone and Inferless. Transform text into speech efficiently using Piper, all while maintaining cost-effectiveness—approximately $4/day for nearly 1000 requests. Check out the detailed tutorial below OR visit the documentation here.
💚 From Inferless Community:
We are thrilled to congratulate Simplified, on being recognized as the number one tool in their category by G2, standing alongside tools like Grammarly and Notion. We are proud to support trailblazers like Simplified who are reshaping the landscape of creative AI marketing.
Check out how Unstudio is stepping up their game with "Real Time Photoshoots", a new feature designed to optimize speed and control in AI imagery. Leveraging a rich library of digital props and advanced AI models, this feature promises to transform the way product shots are created.
Catch our next Tech Breakfast in San Francisco on June 13th! Join us for engaging discussions on AI in production, where we'll tackle challenges and explore solutions.RSVP here.
That's it for this month! If you have questions, ideas, or just want to say hi, feel free to reach out. We love hearing from you. Until our next exciting update! 🌐💡👩💻