Category
attention mechanism
2 articles across 2 sub-topics
Architecture(1)
How Transformer Architecture Works: A Deep Dive
TLDR: The Transformer is the architecture behind every major LLM (GPT, BERT, Claude, Gemini). Its core innovation is Self-Attention — a mechanism that lets the model weigh relationships between all tokens in a sequence simultaneously, regardless of d...
•17 min read

