Inferless June 2024 Newsletter

Learn how to use streaming APIs, open source Nvidia Triton copilot, check out our latest case study on scaling AI inference with Dynamic Batching etc and highlighting community achievements and more!

Jul 01, 2024

Hello Inferless Community! 🎉

We're excited to bring you a host of new resources and platform updates. For those new here, at Inferless we specialize in deploying custom machine learning models in a serverless environment. Our approach ensures minimal cold starts, making GPU inference workloads both speedy and cost-effective.

🚀 Inferless Platform : New Features & Enhancements:

Streaming APIs: We now support streaming APIs with SSE, ideal for creating a communication channel from the server to the client. This is particularly useful for real-time chat, live updates, and streaming data such as audio and video frames. You can send multiple outputs for the same input, enhancing the versatility of your applications. Learn more in our documentation here
Enhanced CLI Commands and Model Management APIs

Runtime Patch Command: Update packages in your runtime easily now. Simply use inferless runtime patch -p path/to/file to apply updates efficiently.
Machine Settings Configuration: You can now programmatically configure machine settings for your models using our API. Documentation here.
Fetch Model Logs: Retrieve logs for any model programmatically with our new API. Documentation here

Introducing Open-Source Nvidia Triton Copilot: Over the past year, we noticed many developers using Nvidia’s Triton Inference Server struggle with writing glue code.
To help, we created a tool that allows you to generate Docker containers from simple Python code in seconds, using models like ChatGPT or Claude. This enables rapid iteration and experimentation with a Python-first developer experience. You can fork the Github Repo Here to start using it. We are sharing this tool for the first time and welcome your feedback.

We also shipped Flexible Logging Options, internal enhancements to increase platform performance. You can check the June Changelog here.

Dive into our latest blog to learn about understanding the difference between HTTPS vs. WebSocket for Real-Time Model Inference. It also includes two experiments, simulating a video processing use case in which a video is processed on the client side, and frames are sent to the server for further processing. You can read it here.
Check out our latest case-study with SpoofSense, pioneering a facial AI spoofing detection product on how they scaled their AI inference with Inferless Dynamic Batching and Autoscaling. Before discovering Inferless, they attempted to tackle the challenge by deploying on on-demand GPU clusters with Nvidia Triton Inference server but couldn’t get to scale across multiple machines and meet their autoscaling demand. That is when Inferless helped them. Read the blog here.
Deploying Phi-3-mini-128k-instruct on Inferless: Microsoft has introduced Phi-3-mini-128k-instruct, a compact yet powerful model designed for instruction-following tasks. This model is a part of the Phi-3 family, known for its efficiency and high performance. Check out the deployment guide and experience unmatched efficiency with an inference time of 18.42 seconds and cold-start of 7.8 seconds. You can read it here.
Share

💚 From Inferless Community:

Please join us in congratulation Omi on their Series A round as they continue to deliver the magic of 3D and AI through simple, high-performance marketing solutions. Omi’s mission is to showcase your products with ultra-realistic and creative visuals. Check out the announcement here.
If you are in San Francisco, Don't miss out on Cleanlab’s Sake Social event happening on July 17th with Open Data Science Conference (ODSC). They are inviting AI enthusiasts and data science professionals to join them for a night of networking, learning, and curated sake. RSVP Here 🍶✨
Catch Inferless next Tech Breakfast in San Francisco on July 11th! Join us for engaging discussions on AI in production, where we'll tackle challenges and explore solutions.RSVP here.

Share Towards Scaling Inference

That's it for this month! If you have questions, ideas, or just want to say hi, feel free to reach out. We love hearing from you. Until our next exciting update! 🌐💡👩‍💻

Inferless June 2024 Newsletter

Learn how to use streaming APIs, open source Nvidia Triton copilot, check out our latest case study on scaling AI inference with Dynamic Batching etc and highlighting community achievements and more!

🚀 Inferless Platform : New Features & Enhancements:

🌟 From Inferless Blog:

💚 From Inferless Community:

Discussion about this post