Inferless April 2024 Newsletter
Exploring Llama 3 fine-tuning & deployment tutorial, including improved runtime configurations and CLI capabilities, and highlighting community achievements and more!
Hello Inferless Community! 🎉
We're excited to bring you a host of new resources and platform updates. For those new here, at Inferless we specialize in deploying custom machine learning models in a serverless environment. Our approach ensures minimal cold starts, making GPU inference workloads both speedy and cost-effective.
🚀 Inferless Platform : New Features & Enhancements:
Enhanced Runtime Configuration: You can now include shell commands in your Runtime.yaml that execute sequentially during runtime build. This feature is particularly useful for packages that require step-by-step installation procedures. For more details on bringing custom packages, please visit our docs here and check out the example video below.
Inferless CLI volume cp: Transfer data directly from local or cloud environments to remote storage without the need for an initialization function. With the Inferless CLI, you can effortlessly upload code and model weights straight to Inferless Volumes with just a simple command.
Check the documentation here and an example video below:Python Client: Easily interact with the API using the Inferless Python client. This client seamlessly handles conversions from Python objects to Inference Protocol Version 2, allowing you to send Python objects directly. It supports both synchronous and asynchronous API calls. Check out the docs here.
We also shipped internal enhancements to make platform performance, build times and autoscaling capabilities better. You can check the April Changelog here.
🌟 From Inferless Blog:
LLM Speed Benchmarking Part-2 : An independent analysis We're excited to share our new study in our series on LLM performance benchmarks. This report assesses five sophisticated LLMs—Llama 2, Solar-10, Qwen 1.5, MPT, Yi—ranging from 10B to 34B parameters, using six inference libraries on an Azure A100 GPU.
Our benchmarks cover key metrics such as Time to First Token, Tokens per Second, and total inference time, providing essential insights for developers, researchers, and AI enthusiasts to make informed decisions on the optimal model for their needs.Discover our comprehensive step-by-step guide for fine-tuning (Colab Notebook) and deploying Meta's Llama 3-8 Bn model on Inferless. Get your own Llama3 Bn Model API Endpoint with just one click in your Inferless Console.
Looking to deploy your own text to audio model? Check out our latest step-by-step how to guide to deploy a AI music generator app using Inferless with an average cold start time of 16.17 seconds and an average inference time of 13.78 seconds for music sample length of 8 seconds
💚 From Inferless Community :
Discover the new Trustworthy Language Model (TLM) by Cleanlab AI, which provides a trustworthiness score for each LLM response to highlight reliable outputs. Test the TLM for yourself at tlm.cleanlab.ai and learn more about their groundbreaking research here. Also, excited to celebrate their recent recognition as one of Forbes' top 50 AI companies.
Excited to share that SpoofSense, who helps companies by utilizing computer vision to detect identity fraud, has been recognized as one of the top 20 startups by Google Accelerator.
Join us in congratulating Tenyx on their partnership with Deepgram! This collaboration marks a significant advancement in voice AI, promising reduced customer wait times and more efficient interactions. Dive into the details here.
💡What’s coming up in May?
Announcing the return of our Breakfast catchups in San Francisco! Join us for engaging discussions on AI in production, where we'll tackle challenges and explore solutions.RSVP here.
We're in the process of revamping the Inferless documentation based on feedback from our users. Get ready for significant improvements, including CLI import enhancements and platform UI updates. We're excited to introduce you to the latest and even faster version of our true Serverless product, and you'll soon receive communication from us regarding the migration to this upgraded version.
That's it for this month! If you have questions, ideas, or just want to say hi, feel free to reach out. We love hearing from you. Until our next exciting update! 🌐💡👩💻
Cleanlab TLM looks good