Home/Learn/Quantization
Topic

Quantization

Learn Quantization as a connected topic across chapters, concepts, simulations, and interview reasoning.

10 Concepts6 Articles1h 43m

Overview

Learn Quantization as a connected topic across chapters, concepts, simulations, and interview reasoning.

How this topic helps

Llm
Ai
Deep Learning
Inference

Learning Path in this Topic

Series that contain articles from Quantization. Select a path to filter the article list.

Articles

6 matched articles

Article 1Types of LLM Quantization: By Timing, Scope, and MappingTLDR: There is no single "best" LLM quantization. You classify and choose quantization along three axes: when you quantize (timing), what you quantize (scope), and how values are encoded (mapping). In17 minArticle 2Practical LLM Quantization in Colab: A Hugging Face WalkthroughTLDR: This is a practical, notebook-style quantization guide for Google Colab and Hugging Face. You will quantize real models, run inference, compare memory/latency, and learn when to use 4-bit NF4 vs15 minArticle 3GPTQ vs AWQ vs NF4: Choosing the Right LLM Quantization PipelineTLDR: GPTQ, AWQ, and NF4 all shrink LLMs, but they optimize different constraints. GPTQ focuses on post-training reconstruction error, AWQ protects salient weights for better quality at low bits, and 15 minArticle 4LLM Model Quantization: Why, When, and How to Deploy Smaller, Faster ModelsTLDR: Quantization converts high-precision model weights and activations (FP16/FP32) into lower-precision formats (INT8 or INT4) so LLMs run with less memory, lower latency, and lower cost. The key is13 minArticle 5Fine-Tuning LLMs with LoRA and QLoRA: A Practical Deep-DiveTLDR: LoRA freezes the base model and trains two tiny matrices per layer — 0.1 % of parameters, 70 % less GPU memory, near-identical quality. QLoRA adds 4-bit NF4 quantization of the frozen base, enab31 minArticle 6LLM Model Naming Conventions: How to Read Names and Why They MatterTLDR: LLM names encode practical decisions: model family, size, training stage, context window, format, and quantization level. If you can decode naming conventions, you can avoid costly deployment mi12 min