Home/Learn/Transformers
Topic

Transformers

Learn Transformers as a connected topic across chapters, concepts, simulations, and interview reasoning.

10 Concepts9 Articles2h 58m

Overview

Learn Transformers as a connected topic across chapters, concepts, simulations, and interview reasoning.

How this topic helps

Deep Learning
Machine Learning
Llm
Ai

Learning Path in this Topic

Series that contain articles from Transformers. Select a path to filter the article list.

Articles

9 matched articles

Article 1Attention Mechanism Explained: How Transformers Learn to FocusTLDR: Attention lets every token in a sequence ask "what else is relevant to me?" — dynamically weighting relationships across all positions simultaneously. It replaced the fixed-size hidden-state bot25 minArticle 2Softmax Function Explained: From Raw Scores to ProbabilitiesTLDR: Softmax converts a vector of raw scores (logits) into a valid probability distribution by exponentiating each value and dividing by the total. Subtracting the max before exponentiating prevents 23 minArticle 3Dot Product in Machine Learning: The Engine Behind Similarity, Attention, and Neural NetworksTLDR: The dot product multiplies corresponding elements of two vectors and sums the results. In machine learning it does three critical jobs: it scores semantic similarity between embeddings, computes22 minArticle 4Sparse Mixture of Experts: How MoE LLMs Do More With Less ComputeTLDR: Mixture of Experts (MoE) replaces the single dense Feed-Forward Network (FFN) layer in each Transformer block with N independent expert FFNs plus a learned router. Only the top-K experts activat27 minArticle 5Dense LLM Architecture: How Every Parameter Works on Every TokenTLDR: In a dense LLM every single parameter is active for every token in every forward pass — no routing, no selection. A transformer block runs multi-head self-attention (Q, K, V) followed by a feed-24 minArticle 6Practical LLM Quantization in Colab: A Hugging Face WalkthroughTLDR: This is a practical, notebook-style quantization guide for Google Colab and Hugging Face. You will quantize real models, run inference, compare memory/latency, and learn when to use 4-bit NF4 vs15 min

Page 1 of 2