04-28, 15:00–15:45 (America/Los_Angeles), St. Helens
Are you interested in learning about the emerging open source stack for Large Language Models (LLMs)?
LLMs have gained immense popularity in recent months and require scalable solutions to overcome challenges they present in terms of data ingestion, training, fine-tuning, batch (offline) inference, and online serving. However, LLM-type workloads share some common challenges with other types of large scale ML use cases.
Let’s explore the current state of Generative AI and LLMs and have a closer look at the emerging (yet still early) open source tech stack for this workload. Then we will evaluate how Ray AI Runtime provides a scalable compute substrate, addressing orchestration and scalability problems.
Finally, we will demonstrate how you can implement distributed fine-tuning and batch (offline) inference with HuggingFace and Ray AI Runtime, using recent Google’s Flan-T5 model and Alpaca dataset.
You will walk away with three key takeaways:
1. An understanding of the challenges presented by LLM workloads, including training, fine-tuning, batch (offline) inference and serving.
2. An overview of the emerging open source tech stack for LLM workloads, including Ray AI Runtime for scalability and orchestration.
3. A bit of practical overview for implementing fine-tuning and batch (offline) inference with HuggingFace and Ray AI Runtime, using the Google Flan-T5 model and alpaca dataset.
No previous knowledge expected
Kamil is a technical training lead at Anyscale, where he builds technical training and educational resources for the broader Ray and AI community. Prior to joining Anyscale he co-founded Neptune.ai and worked with AI models and MLOps processes in the AI consultancy company. Kamil holds M.Sc. in Cognitive Science and B.Sc. in Computer Science.