Sharding Large Language Models: Achieving Efficient Distributed Inference
Techniques to load LLMs on smaller GPUs and enable parallel inference using Hugging Face Accelerate
Techniques to load LLMs on smaller GPUs and enable parallel inference using Hugging Face Accelerate
Understanding the Large Language Models and their applications
Understanding various text sampling techniques in NLP, their applications, pros-cons and how they can help control various aspects in LLMs
Understanding the core of Large Language Models, a journey through evolution of various transformer models and their applications.