Topic
deep learning
20 articles across 6 sub-topics
Sub-topic
11 articles

Types of LLM Quantization: By Timing, Scope, and Mapping
TLDR: There is no single "best" LLM quantization. You classify and choose quantization along three axes: when you quantize (timing), what you quantize (scope), and how values are encoded (mapping). In practice, most teams start with weight quantizati...
Why Embeddings Matter: Solving Key Issues in Data Representation
TLDR: Embeddings convert words (and images, users, products) into dense numerical vectors in a geometric space where semantic similarity = geometric proximity. "King - Man + Woman ≈ Queen" is not magic — it is the arithmetic property of well-trained ...
What are Logits in Machine Learning and Why They Matter
TLDR: Logits are the raw, unnormalized scores produced by the final layer of a neural network — before any probability transformation. Softmax converts them to probabilities. Temperature scales them before Softmax to control output randomness. 📖 T...
Unlocking the Power of ML, DL, and LLM Through Real-World Use Cases
TLDR: ML, Deep Learning, and LLMs are not competing technologies — they are a nested hierarchy. LLMs are a type of Deep Learning. Deep Learning is a subset of ML. Choosing the right layer depends on your data type, problem complexity, and available t...

LoRA Explained: How to Fine-Tune LLMs on a Budget
TLDR: Fine-tuning a 7B-parameter LLM updates billions of weights and requires expensive GPUs. LoRA (Low-Rank Adaptation) freezes the original weights and trains only tiny adapter matrices that are added on top. 90%+ memory reduction; zero inference l...

Diffusion Models: How AI Creates Art from Noise
TLDR: Diffusion models work by first learning to add noise to an image, then learning to undo that noise. At inference time you start from pure static and iteratively denoise into a meaningful image. They power DALL-E, Midjourney, and Stable Diffusio...
Sub-topic
3 articles

Sparse Mixture of Experts: How MoE LLMs Do More With Less Compute
TLDR: Mixture of Experts (MoE) replaces the single dense Feed-Forward Network (FFN) layer in each Transformer block with N independent expert FFNs plus a learned router. Only the top-K experts activate per token — so total parameters far exceed activ...

Dense LLM Architecture: How Every Parameter Works on Every Token
TLDR: In a dense LLM every single parameter is active for every token in every forward pass — no routing, no selection. A transformer block runs multi-head self-attention (Q, K, V) followed by a feed-forward network (FFN) with roughly 4× the hidden d...
How Transformer Architecture Works: A Deep Dive
TLDR: The Transformer is the architecture behind every major LLM (GPT, BERT, Claude, Gemini). Its core innovation is Self-Attention — a mechanism that lets the model weigh relationships between all tokens in a sequence simultaneously, regardless of d...
Sub-topic
2 articles
Dot Product in Machine Learning: The Engine Behind Similarity, Attention, and Neural Networks
TLDR: The dot product multiplies corresponding elements of two vectors and sums the results. In machine learning it does three critical jobs: it scores semantic similarity between embeddings, computes every activation in a fully connected layer, and ...

Attention Mechanism Explained: How Transformers Learn to Focus
TLDR: Attention lets every token in a sequence ask "what else is relevant to me?" — dynamically weighting relationships across all positions simultaneously. It replaced the fixed-size hidden-state bottleneck of RNNs and is the engine behind every GPT...
Sub-topic
2 articles
Fine-Tuning LLMs with LoRA and QLoRA: A Practical Deep-Dive
TLDR: LoRA freezes the base model and trains two tiny matrices per layer — 0.1 % of parameters, 70 % less GPU memory, near-identical quality. QLoRA adds 4-bit NF4 quantization of the frozen base, enabling 70B fine-tuning on 2× A100 80 GB instead of 8...

Transfer Learning Explained: Standing on the Shoulders of Pretrained Models
TLDR: You don't need millions of labeled images or months of GPU time to build a great model. Transfer learning lets you borrow a pretrained network's hard-won feature detectors, plug in a new output head, and fine-tune on your small dataset — often ...
Sub-topic
1 article
Softmax Function Explained: From Raw Scores to Probabilities
TLDR: Softmax converts a vector of raw scores (logits) into a valid probability distribution by exponentiating each value and dividing by the total. Subtracting the max before exponentiating prevents floating-point overflow. Temperature scaling contr...
Sub-topic
1 article
How GPT (LLM) Works: The Next Word Predictor
TLDR: At its core, GPT asks one question, repeated: "Given everything so far, what is the most likely next token?" Tokens are not words — they're subword units. The Transformer architecture uses self-attention to weigh how much each token should infl...
