Abstract Algorithms
Learning Graphs

Start here

AI Systems

Understand model behavior, inference, retrieval, vector spaces, evaluation, and production guardrails together.

Model BehaviorVector SpaceInferenceRetrievalEvaluationPractical LLM Quantization in Colab

Begin with

Practical LLM Quantization in Colab gives you the cleanest entry point before branching into constraints, failures, and related systems.

12

Articles

10

Concepts

Start With Practical LLM Quantization in Colab

Grounding

Build the mental model.

Start Topic

Shape

See how the pieces depend on each other.

Explore Graph

Consequence

Compare what improves and what breaks.

Practice Tradeoffs

Stress

Change constraints and watch behavior.

Start Challenge

Next

Move to the next useful edge.

Continue Learning

Relationships

Follow the shape of the system

Move through prerequisites, dependencies, tradeoffs, and adjacent concepts without losing the thread.

Guidance

Model Behavior

Continues from what you have already explored.

System behavior

CAP Under Network Partition

Systems choose between consistency and availability during partitions.

Open
Step 1 / 2Normal flow
requestcommandpersistpublishconsumereqUClientActorGAPI GatewayBoundaryCCore ServiceCoordinatorDState StoreDurabilityQEvent StreamStreamWConsumerWorker

Read in sequence

1Practical LLM Quantization in Colab: A Hugging Face WalkthroughTLDR: This is a practical, notebook-style quantization guide for Google Colab and Hugging Face. You will quantize real models, run inference, compare memory/latency, and learn when to use 4-bit NF4 vs15 min2A Beginner's Guide to Vector Database PrinciplesTLDR: A vector database stores meaning as numbers so you can search by intent, not exact keywords. That is why "reset my password" can find "account recovery steps" even if the words are different. 14 min3LLM Model Quantization: Why, When, and How to Deploy Smaller, Faster ModelsTLDR: Quantization converts high-precision model weights and activations (FP16/FP32) into lower-precision formats (INT8 or INT4) so LLMs run with less memory, lower latency, and lower cost. The key is13 min4Dot Product in Machine Learning: The Engine Behind Similarity, Attention, and Neural NetworksTLDR: The dot product multiplies corresponding elements of two vectors and sums the results. In machine learning it does three critical jobs: it scores semantic similarity between embeddings, computes22 min5Sparse Mixture of Experts: How MoE LLMs Do More With Less ComputeTLDR: Mixture of Experts (MoE) replaces the single dense Feed-Forward Network (FFN) layer in each Transformer block with N independent expert FFNs plus a learned router. Only the top-K experts activat27 min6Softmax Function Explained: From Raw Scores to ProbabilitiesTLDR: Softmax converts a vector of raw scores (logits) into a valid probability distribution by exponentiating each value and dividing by the total. Subtracting the max before exponentiating prevents 23 min7Dense LLM Architecture: How Every Parameter Works on Every TokenTLDR: In a dense LLM every single parameter is active for every token in every forward pass — no routing, no selection. A transformer block runs multi-head self-attention (Q, K, V) followed by a feed-24 min8Managed API LLMs vs Self-Hosted Models: When to Switch and When Not ToTLDR: Most teams should start with managed LLM APIs because they buy speed, reliability, model quality, and low operational burden. Move to self-hosted or open-weight models only when you have stable 17 min9Types of LLM Quantization: By Timing, Scope, and MappingTLDR: There is no single "best" LLM quantization. You classify and choose quantization along three axes: when you quantize (timing), what you quantize (scope), and how values are encoded (mapping). In17 min10GPTQ vs AWQ vs NF4: Choosing the Right LLM Quantization PipelineTLDR: GPTQ, AWQ, and NF4 all shrink LLMs, but they optimize different constraints. GPTQ focuses on post-training reconstruction error, AWQ protects salient weights for better quality at low bits, and 15 min11Why Embeddings Matter: Solving Key Issues in Data RepresentationTLDR: Embeddings convert words (and images, users, products) into dense numerical vectors in a geometric space where semantic similarity = geometric proximity. "King - Man + Woman ≈ Queen" is not magi14 min12How Transformer Architecture Works: A Deep DiveTLDR: The Transformer is the architecture behind every major LLM (GPT, BERT, Claude, Gemini). Its core innovation is Self-Attention — a mechanism that lets the model weigh relationships between all to18 min

Related threads

Abstract Algorithms · © 2026 · Engineering learning lab