Inferless February Newsletter: Achieving SOC2, One-click Model Deployment, PDF Q&A App Cookbook, Breakfast Series with Devs and more!
Greetings, Inferless Community! 🌟
We're thrilled to present a plethora of fresh resources and updates on our platform this February. For those new here, at Inferless we specialize in deploying custom machine learning models in a serverless environment. Our approach ensures minimal cold starts, making GPU inference workloads both speedy and cost-effective.
📘 Launching Inferless Cookbook: Create a PDF Upload Application with Q&A in Under 10 Minutes & Save 84% on Costs v/s AWS
Dive into our latest cookbook, which breaks down the process of building a PDF Q&A app using cutting-edge technologies like LangChain, Pinecone, and, of course, Inferless.
This guide is meticulously sectioned to walk you through each step of the development, complete with comprehensive setup instructions and a cost analysis comparing AWS and Inferless deployments.
A must-read for Machine Learning Engineers, AI Engineers and anyone looking to build and deploy AI in production.
🚀 What's New on the Inferless Platform: Updates & Improvements
Triple Compliance Achievement: SOC 2, ISO 27001, and GDPR We're proud to announce that Inferless has hit a major milestone in our commitment to data security and privacy. Attaining SOC 2, ISO 27001, and GDPR compliance underscores our dedication to safeguarding your data and building trust as a core value. You can read the announcement here and request access.
Explore Models: Our new feature allows you to discover and deploy popular models into production with a single click, eliminating the need for integration efforts on your part.
Docker Import Enhancements: We've enhanced the flexibility in port configurations for the inference server, allowing dynamic port specification upon import. You can now also set custom endpoints for health checks and inferences, offering more adaptability in model integration.
Plus, with our latest update, users can review your last 20 API calls with our Recent Runs feature and benefit from improved exception handling for models. For more information, visit our Changelog.
🌟 Community Highlights at Inferless:
Deploying Code Llama 70 Bn on Inferless: Experience unmatched efficiency with an inference time of 6.67s and 33.18 tokens/sec, showcasing the power of Inferless in action. Detailed How-to Guide here.
from vllm import LLM, SamplingParams
class InferlessPythonModel:
def initialize(self):
self.sampling_params = SamplingParams(temperature=0.7, top_p=0.95,max_tokens=256)
self.llm = LLM(model="TheBloke/CodeLlama-70B-Python-GPTQ", quantization="gptq", dtype="float16",gpu_memory_utilization=0.5)
def infer(self, inputs):
prompts = inputs["prompt"]
result = self.llm.generate(prompts, self.sampling_params)
result_output = [output.outputs[0].text for output in result]
return {'result': result_output[0]}
def finalize(self):
pass
Spotlight on Simplified: Celebrating our friends at Simplified's recognition in G2’s 2024 Best Software Awards for the Design Products category. Congratulations to the team for their innovation and excellence in user experience.
Introducing Breakfast with Inferless - San Francisco Edition:
Join us for "Thursday Breakfasts with Developers," our new series for those passionate about AI in production. Enjoy enlightening conversations and delicious breakfasts at a cozy San Francisco cafe . A fantastic opportunity to network and discuss tech in a relaxed setting. RSVP now and make your Thursdays more insightful and enjoyable!
That wraps up our February newsletter. Should you have any questions, suggestions, or just wish to say hello, don't hesitate to get in touch. We always enjoy hearing from our community. Stay tuned for more exciting updates coming your way! 🌐💡👩💻