Abstract AlgorithmsAn AI Powered Learning Platform

Topic

deep learning

20 articles across 6 sub-topics

Sub-topic

11 articles

Types of LLM Quantization: By Timing, Scope, and Mapping

TLDR: There is no single "best" LLM quantization. You classify and choose quantization along three axes: when you quantize (timing), what you quantize (scope), and how values are encoded (mapping). In practice, most teams start with weight quantizati...

Mar 14, 2026•16 min read

Why Embeddings Matter: Solving Key Issues in Data Representation

TLDR: Embeddings convert words (and images, users, products) into dense numerical vectors in a geometric space where semantic similarity = geometric proximity. "King - Man + Woman ≈ Queen" is not magic — it is the arithmetic property of well-trained ...

Mar 9, 2026•13 min read

What are Logits in Machine Learning and Why They Matter

TLDR: Logits are the raw, unnormalized scores produced by the final layer of a neural network — before any probability transformation. Softmax converts them to probabilities. Temperature scales them before Softmax to control output randomness. 📖 T...

Mar 9, 2026•11 min read

Unlocking the Power of ML, DL, and LLM Through Real-World Use Cases

TLDR: ML, Deep Learning, and LLMs are not competing technologies — they are a nested hierarchy. LLMs are a type of Deep Learning. Deep Learning is a subset of ML. Choosing the right layer depends on your data type, problem complexity, and available t...

Mar 9, 2026•14 min read

LoRA Explained: How to Fine-Tune LLMs on a Budget

TLDR: Fine-tuning a 7B-parameter LLM updates billions of weights and requires expensive GPUs. LoRA (Low-Rank Adaptation) freezes the original weights and trains only tiny adapter matrices that are added on top. 90%+ memory reduction; zero inference l...

Mar 9, 2026•13 min read

Diffusion Models: How AI Creates Art from Noise

TLDR: Diffusion models work by first learning to add noise to an image, then learning to undo that noise. At inference time you start from pure static and iteratively denoise into a meaningful image. They power DALL-E, Midjourney, and Stable Diffusio...

Mar 9, 2026•11 min read

Sub-topic

Architecture

3 articles

Sparse Mixture of Experts: How MoE LLMs Do More With Less Compute

TLDR: Mixture of Experts (MoE) replaces the single dense Feed-Forward Network (FFN) layer in each Transformer block with N independent expert FFNs plus a learned router. Only the top-K experts activate per token — so total parameters far exceed activ...

Apr 17, 2026•26 min read

Dense LLM Architecture: How Every Parameter Works on Every Token

TLDR: In a dense LLM every single parameter is active for every token in every forward pass — no routing, no selection. A transformer block runs multi-head self-attention (Q, K, V) followed by a feed-forward network (FFN) with roughly 4× the hidden d...

Apr 17, 2026•22 min read

How Transformer Architecture Works: A Deep Dive

TLDR: The Transformer is the architecture behind every major LLM (GPT, BERT, Claude, Gemini). Its core innovation is Self-Attention — a mechanism that lets the model weigh relationships between all tokens in a sequence simultaneously, regardless of d...

Mar 9, 2026•17 min read

Sub-topic

Attention-mechanism

2 articles

Dot Product in Machine Learning: The Engine Behind Similarity, Attention, and Neural Networks

TLDR: The dot product multiplies corresponding elements of two vectors and sums the results. In machine learning it does three critical jobs: it scores semantic similarity between embeddings, computes every activation in a fully connected layer, and ...

May 3, 2026•21 min read

Attention Mechanism Explained: How Transformers Learn to Focus

TLDR: Attention lets every token in a sequence ask "what else is relevant to me?" — dynamically weighting relationships across all positions simultaneously. It replaced the fixed-size hidden-state bottleneck of RNNs and is the engine behind every GPT...

Apr 18, 2026•24 min read

Sub-topic

Fine Tuning

2 articles

Fine-Tuning LLMs with LoRA and QLoRA: A Practical Deep-Dive

TLDR: LoRA freezes the base model and trains two tiny matrices per layer — 0.1 % of parameters, 70 % less GPU memory, near-identical quality. QLoRA adds 4-bit NF4 quantization of the frozen base, enabling 70B fine-tuning on 2× A100 80 GB instead of 8...

Apr 19, 2026•29 min read

Transfer Learning Explained: Standing on the Shoulders of Pretrained Models

TLDR: You don't need millions of labeled images or months of GPU time to build a great model. Transfer learning lets you borrow a pretrained network's hard-won feature detectors, plug in a new output head, and fine-tune on your small dataset — often ...

Apr 18, 2026•26 min read

Sub-topic

Machine Learning

1 article

Softmax Function Explained: From Raw Scores to Probabilities

TLDR: Softmax converts a vector of raw scores (logits) into a valid probability distribution by exponentiating each value and dividing by the total. Subtracting the max before exponentiating prevents floating-point overflow. Temperature scaling contr...

May 3, 2026•21 min read

Sub-topic

Generative Ai

1 article

How GPT (LLM) Works: The Next Word Predictor

TLDR: At its core, GPT asks one question, repeated: "Given everything so far, what is the most likely next token?" Tokens are not words — they're subword units. The Transformer architecture uses self-attention to weigh how much each token should infl...

Mar 9, 2026•14 min read