Abstract AlgorithmsAbstract Algorithms

  • Home
  • All Posts
  • All Series
  • About

Category

inference

3 articles in this category

Types of LLM Quantization: By Timing, Scope, and Mapping

TLDR: There is no single "best" LLM quantization. You classify and choose quantization along three axes: when you quantize (timing), what you quantize (scope), and how values are encoded (mapping). In practice, most teams start with weight quantizati...

Mar 14, 2026•11 min read

GPTQ vs AWQ vs NF4: Choosing the Right LLM Quantization Pipeline

TLDR: GPTQ, AWQ, and NF4 all shrink LLMs, but they optimize different constraints. GPTQ focuses on post-training reconstruction error, AWQ protects salient weights for better quality at low bits, and NF4 offers practical 4-bit compression through bit...

Mar 12, 2026•11 min read
LLM Model Quantization: Why, When, and How to Deploy Smaller, Faster Models

LLM Model Quantization: Why, When, and How to Deploy Smaller, Faster Models

TLDR: Quantization converts high-precision model weights and activations (FP16/FP32) into lower-precision formats (INT8 or INT4) so LLMs run with less memory, lower latency, and lower cost. The key is choosing the right quantization method for your a...

Mar 8, 2026•11 min read

Abstract Algorithms

Exploring the fascinating world of algorithms, data structures, and software engineering through clear explanations and practical examples.

Navigation

  • Home
  • All Posts
  • All Series
  • About

Popular Topics

  • System Design
  • architecture
  • event-driven-architecture
  • Microservices
  • distributed systems
  • data-engineering

Author

Abstract Algorithms

Abstract Algorithms

@abstractalgorithms

© 2026 Abstract Algorithms. All rights reserved.

Powered by Hashnode