Inferless July 2024 Newsletter
Fresh off the press: new AI chatbot, 30% faster builds, OOM detection, and guides for cutting-edge models like Llama-3.1 and Qwen2
Hey community,
Bea here bringing the lastest cool stuff we’ve shipped this month. We've been hard at work enhancing Inferless to better serve your ML deployment needs. For those new here, at Inferless we specialize in deploying custom machine learning models in a serverless environment. Our approach ensures minimal cold starts, making GPU inference workloads both speedy and cost-effective.
Here's what's new:
AI chatbot
We've launched an AI-powered chatbot to provide instant support and guidance. It's trained on our documentation and common deployment scenarios, helping you troubleshoot issues and optimize your workflows round the clock.
Faster builds
Our engineering team has significantly optimized the build process. You'll now see up to 30% faster build times, allowing you to iterate and deploy your models more rapidly.
Added OOM Detection
New Out-Of-Memory (OOM) detection alerts you when models exceed GPU memory, suggesting GPU upgrades or switching to dedicated environments for better performance.
Automatic TOML Detection
CLI now auto-detects TOML files and creates runtime.yaml, simplifying project setup.
AWS PrivateLink 🤝 Inferless
AWS PrivateLink now enables secure, private connections between Inferless and your AWS services, VPC, and on-premises apps, bypassing public internet.
Newest into docs
Deploy Llama-3.1-8B-Instruct using Inferless
Get private endpoints with one click using our how-to guide and experience blazing fast cold-starts of 15.44 seconds for Meta's 8B multilingual model, fine-tuned for accuracy, helpfulness, and safety using SFT and RLHF.
Deploy Qwen2-72B-Instruct using Inferless
Deploy the powerful Qwen2-72B-Instruct model and check its impressive 128K token context length. Experience state-of-the-art performance in coding, mathematical tasks, and multilingual support across 29 languages.
How to Stream Speech with Parler-TTS using Inferless
Implement real-time text-to-speech streaming using parler_tts_mini model with Parler_TTS library.
Community buzz
Breakfast with Inferless
Join us in SF on August 29th, 2024
Don't miss this chance to connect with the Inferless team and other ML devs in person!
Guardrails AI's new feature
Guardrails AI has announced a preview of Guardrails Server, a new feature offering hosted models for model-based guardrails. Currently in testing phase, this feature aims to eliminate the need for developers to host their own models when implementing guardrails, potentially simplifying AI safety integration in applications.
Meet Ushur at AI4 2024 conference
If you are in Las Vegas, meet Ushur at Ai4 - Artificial Intelligence Conferences 2024! Join them to discover industry trends, network with AI pioneers, and learn how Ushur is transforming customer experiences.
That's it for this month.
If you have any questions or want to share ideas, reach us out on X.
See you 👋