Home/Learn/Transformer
Topic

Transformer

Learn Transformer as a connected topic across chapters, concepts, simulations, and interview reasoning.

10 Concepts11 Articles3h 30m

Overview

Learn Transformer as a connected topic across chapters, concepts, simulations, and interview reasoning.

How this topic helps

Deep Learning
Machine Learning
Transformers
Llm

Learning Path in this Topic

Series that contain articles from Transformer. Select a path to filter the article list.

Articles

11 matched articles

Article 1How Transformer Architecture Works: A Deep DiveTLDR: The Transformer is the architecture behind every major LLM (GPT, BERT, Claude, Gemini). Its core innovation is Self-Attention — a mechanism that lets the model weigh relationships between all to18 minArticle 2Attention Mechanism Explained: How Transformers Learn to FocusTLDR: Attention lets every token in a sequence ask "what else is relevant to me?" — dynamically weighting relationships across all positions simultaneously. It replaced the fixed-size hidden-state bot25 minArticle 3Deep Learning Architectures: CNNs, RNNs, and TransformersTLDR: CNNs, RNNs, and Transformers solve different kinds of pattern problems. CNNs are great for spatial data like images, RNNs handle ordered sequences, and Transformers shine when long-range context13 minArticle 4Softmax Function Explained: From Raw Scores to ProbabilitiesTLDR: Softmax converts a vector of raw scores (logits) into a valid probability distribution by exponentiating each value and dividing by the total. Subtracting the max before exponentiating prevents 23 minArticle 5Dot Product in Machine Learning: The Engine Behind Similarity, Attention, and Neural NetworksTLDR: The dot product multiplies corresponding elements of two vectors and sums the results. In machine learning it does three critical jobs: it scores semantic similarity between embeddings, computes22 minArticle 6Sparse Mixture of Experts: How MoE LLMs Do More With Less ComputeTLDR: Mixture of Experts (MoE) replaces the single dense Feed-Forward Network (FFN) layer in each Transformer block with N independent expert FFNs plus a learned router. Only the top-K experts activat27 min

Page 1 of 2