Inferless March 2024 Newsletter
Exploring LLM Tokens/Second Benchmark Insights, Streamlining Model Imports with New Features, Enhanced Error Resolution, Detailed Tutorials for Developing a Logo Generator and Voice Chatbot & more!
Hello Inferless Community! 🎉
We're excited to bring you a host of new resources and platform updates. For those new here, at Inferless we specialize in deploying custom machine learning models in a serverless environment. Our approach ensures minimal cold starts, making GPU inference workloads both speedy and cost-effective.
💡Introducing LLM Speed Benchmarking : An independent analysis
Our latest analysis is a detailed benchmarking highlighting tokens/second of three advanced 7 billion parameter language models (LLMs)—LLama 2, Mistral, and Gemma. Our test suite included six unique prompts with input lengths from 20 to 5,000 tokens and output lengths (100, 200, and 500 tokens) to assess six different library's ( Text Generation Inference, vLLM, DeepSpeed Mii, CTranslate2, Triton with vLLM Backend, and TensorRT-LLM.) adaptability to varied task complexities.
It presents an independent and fair analysis aimed at developers, researchers, and AI enthusiasts, offering insights to help choose the right model for their needs. Read the full blog here.
We're eager to incorporate your feedback into our research. If there's a specific model or speed tests with new libraries and varying prompt sizes you're interested in, don't hesitate to respond.
🚀 Inferless Platform : New Features & Enhancements:
Inferless Run : Inferless run helps you test the container locally before pushing it to us that can resolve the build and runtime errors faster. Just three steps to get started:
pip install inferless-cli
inferless init
inferless run
Here is a tutorial and documentationGPU Utilization Insights: A new feature that displays GPU utilization for API requests is now available, enabling users to monitor and optimize their models' performance more effectively.
Optimized S3 Uploads: We have significantly boosting the speed of S3 uploads via CLI. This improvement facilitates faster data transfers, enhancing productivity.
Apart from these enhancements, Inferless community can now track monthly average storage usage and generating alerts, gain insights into your last 20 API calls with our new Recent Runs feature, better exception Handling for Models. For more details, Check out our Changelog here.
🌟 Community Highlights at Inferless:
Dive into our newest Inferless cookbooks, showcasing Serverless GPU inference to maximize cost efficiency and optimize performance. Start with the "Serverless Logo Generator," which merges creativity and technology using a logo-tuned LoRA and Stable Diffusion XL model, optimized with Diffusers and deployed via Inferless.
Then, explore the "Serverless Voice Conversational Chatbot," merging speech technology with intelligence for fluid voice interactions, using an advanced stack including Bark, Faster-Whisper, Transformers, and Inferless.
These tutorials offer detailed instructions and deployment tips, highlighting potential cost savings of up to 90% compared to AWS, perfect for AI and ML enthusiasts aiming to bring their visions to reality.
Our "Breakfast with Inferless - For Developers" series has ignited inspiration every Thursday morning, diving into the intricacies of model deployment, exploring performance measurement for open-source models, and discussing the latest frameworks, tools, and technology advances.
As we prepare for an India edition during our team's visit in the coming weeks, don't miss our next meetup in Bengaluru on April 4th. It's an excellent chance for networking and tech discussions in a casual atmosphere. RSVP to make your Thursdays more insightful and enjoyable.
That's it for this month! If you have questions, ideas, or just want to say hi, feel free to reach out. We love hearing from you. Until our next exciting update! 🌐💡👩💻