Topic
Inference
Learn Inference as a connected topic across chapters, concepts, simulations, and interview reasoning.
10 Concepts8 Articles2h 9m
Overview
Learn Inference as a connected topic across chapters, concepts, simulations, and interview reasoning.
How this topic helps
Ai
Llm
Deep Learning
Quantization
Learning Path in this Topic
Series that contain articles from Inference. Select a path to filter the article list.
Articles
8 matched articles
Article 1Managed API LLMs vs Self-Hosted Models: When to Switch and When Not ToTLDR: Most teams should start with managed LLM APIs because they buy speed, reliability, model quality, and low operational burden. Move to self-hosted or open-weight models only when you have stable 17 min
Article 2Types of LLM Quantization: By Timing, Scope, and MappingTLDR: There is no single "best" LLM quantization. You classify and choose quantization along three axes: when you quantize (timing), what you quantize (scope), and how values are encoded (mapping). In17 min
Article 4LLM Model Quantization: Why, When, and How to Deploy Smaller, Faster ModelsTLDR: Quantization converts high-precision model weights and activations (FP16/FP32) into lower-precision formats (INT8 or INT4) so LLMs run with less memory, lower latency, and lower cost. The key is13 min
Article 5Sparse Mixture of Experts: How MoE LLMs Do More With Less ComputeTLDR: Mixture of Experts (MoE) replaces the single dense Feed-Forward Network (FFN) layer in each Transformer block with N independent expert FFNs plus a learned router. Only the top-K experts activat27 minPage 1 of 2