Abstract AlgorithmsAbstract Algorithms

  • Home
  • All Posts
  • All Series
  • About

Category

quantization

3 articles in this category

Practical LLM Quantization in Colab: A Hugging Face Walkthrough

TLDR: This is a practical, notebook-style quantization guide for Google Colab and Hugging Face. You will quantize real models, run inference, compare memory/latency, and learn when to use 4-bit NF4 vs safer INT8 paths. šŸ“– What You Will Build in Thi...

Mar 12, 2026•10 min read

GPTQ vs AWQ vs NF4: Choosing the Right LLM Quantization Pipeline

TLDR: GPTQ, AWQ, and NF4 all shrink LLMs, but they optimize different constraints. GPTQ focuses on post-training reconstruction error, AWQ protects salient weights for better quality at low bits, and NF4 offers practical 4-bit compression through bit...

Mar 12, 2026•11 min read
LLM Model Quantization: Why, When, and How to Deploy Smaller, Faster Models

LLM Model Quantization: Why, When, and How to Deploy Smaller, Faster Models

TLDR: Quantization converts high-precision model weights and activations (FP16/FP32) into lower-precision formats (INT8 or INT4) so LLMs run with less memory, lower latency, and lower cost. The key is choosing the right quantization method for your a...

Mar 8, 2026•11 min read

Abstract Algorithms

Exploring the fascinating world of algorithms, data structures, and software engineering through clear explanations and practical examples.

Navigation

  • Home
  • All Posts
  • All Series
  • About

Series

  • Machine Learning Mastery

Popular Topics

  • System Design
  • interview-prep
  • llm
  • distributed systems
  • Reliability
  • Databases

Author

Abstract Algorithms

Abstract Algorithms

@abstractalgorithms

Ā© 2026 Abstract Algorithms. All rights reserved.

Powered by Hashnode