Supervised Learning Algorithms: A Deep Dive into Regression and Classification
Abstract AlgorithmsTL;DR
Supervised Learning is the workhorse of modern AI. In this guide, we peel back the layers of its most popular algorithms.

Introduction: The "Teacher" Paradigm
Supervised learning = teaching the computer with a teacher. You give it labeled data (inputs + correct answers) and say: "Learn to predict the correct answer for new similar inputs."
It's like showing a child 100 photos: "This is a cat (label), this is a dog (label)." After seeing many, the child can look at a new photo and say "dog" correctly.
The computer is learning during training by adjusting its internal numbers (parameters) to minimize mistakes on the labeled examples.
Key difference from unsupervised:
- Unsupervised: No labels, no right/wrong — computer finds patterns alone (e.g., groups photos without knowing "cat" or "dog").
- Supervised: Always labels — computer gets feedback like "wrong, that's a cat not a dog" and improves.
The Two Main Families in Supervised Learning
Supervised algorithms split into:
- Regression: Predict a continuous number (e.g., house price, temperature).
- Classification: Predict a category/class (e.g., spam/not spam, cat/dog/bird).
Both use the same idea: Train on labeled data → minimize prediction error → use on new data.
| Aspect | Regression | Classification |
| Output type | Number (continuous, like 85.5) | Category (discrete, like "positive" or "negative") |
| Common goal | Predict how much / how many | Predict which one / yes-no |
| Error measure | Difference in numbers (e.g., RMSE: sqrt(average (predicted - actual)^2)) | Wrong category count (e.g., accuracy: correct predictions / total) |
| Analogy | Guess the height of a person from their weight | Guess if a fruit is apple/orange/banana from color/shape |
How Supervised Learning Works (Training Phase)
- Give labeled data: Inputs (features) + correct outputs (labels/targets).
- The computer makes a guess using current parameters.
- Calculate error: How wrong was the guess? (using a loss function).
- Adjust parameters: Use math (gradient descent) to tweak numbers so error drops next time.
- Repeat: Over many examples/iterations until error is low.
- After training: Model is "learned" — parameters are fixed. Use for new data (inference).
Learning = minimizing the loss (error score) on labeled data.
Deep Dive into Regression Algorithms
Regression predicts numbers. Example scenario: Predict house prices in Bengaluru based on size (sq ft), bedrooms, location score (0-10).
Toy data (5 houses):
| House | Size (sq ft) | Bedrooms | Location Score | Price (₹ lakhs) — Label |
| 1 | 1000 | 2 | 7 | 50 |
| 2 | 1500 | 3 | 8 | 75 |
| 3 | 1200 | 2 | 6 | 60 |
| 4 | 2000 | 4 | 9 | 100 |
| 5 | 1800 | 3 | 7 | 85 |
Popular Algorithm: Linear Regression
- Idea: Fit a straight line (or plane in higher dimensions) that best predicts price from features.
- What it learns: Coefficients (weights) for each feature + intercept (bias).
- Math: Prediction =
$w_1 \times \text{size} + w_2 \times \text{bedrooms} + w_3 \times \text{location} + b$
- Loss: Mean Squared Error (MSE): average (predicted - actual)²
- Learning: Adjust $w_1$, w_2, w_3, b$ to minimize MSE.
After training, typical learned values might be:
- $w_1 \approx 0.04$ (₹ per sq ft)
- $w_2 \approx 10$ (₹ per bedroom)
- $w_3 \approx 5$ (₹ per location point)
- $b \approx 10$
For a new house (1400 sq ft, 3 bedrooms, location score 8), it predicts $\approx$ 85–90 lakhs.
Other Regression Algorithms:
- Decision Tree Regression: Learns if-then rules (e.g., "If size > 1500 and bedrooms ≥ 3, price > 70"). What it learns: Tree splits. Good for non-linear relationships.
- Random Forest Regression: Many decision trees averaged together. Learns: Ensemble of trees. More accurate, less overfitting.
- Gradient Boosting (XGBoost, LightGBM, CatBoost): Builds trees sequentially, each correcting previous errors. Very powerful in practice.
- Neural Networks (Deep Learning Regression): Layers of weights learn complex patterns (e.g., "big size + good location = extra premium").
Deep Dive into Classification Algorithms
Classification predicts categories. Example: Email spam detection. Features: word count, has links (yes/no), sender score (0-10). Label: "spam" (1) or "not spam" (0).
Toy data (5 emails):
| Word Count | Has Links (0/1) | Sender Score (0-10) | Spam (Label) | |
| 1 | 50 | 0 | 8 | 0 |
| 2 | 200 | 1 | 2 | 1 |
| 3 | 100 | 0 | 7 | 0 |
| 4 | 150 | 1 | 3 | 1 |
| 5 | 80 | 0 | 9 | 0 |
Popular Algorithm: Logistic Regression
- Idea: Predict probability (0–1) of "spam" using a sigmoid (S-shaped) curve.
- What it learns: Weights that turn features into a probability.
- Math: Logit = $w_1 \times \text{words} + w_2 \times \text{links} + w_3 \times \text{sender} + b$
- Probability: $1 / (1 + \exp(-\text{logit}))$
- Loss: Binary Cross-Entropy: measures how far predicted probabilities are from actual 0/1 labels
- Learning: Adjust weights to minimize loss (make probs close to actual labels).
After training, typical learned values might be:
- $w_1 \approx 0.01$ (slight positive for long emails)
- $w_2 \approx 2$ (strong penalty for having links)
- $w_3 \approx -0.5$ (negative for good sender score)
- $b \approx -1$
For a new email (120 words, has link, sender score 4), it might predict probability $\approx$ 0.75 → classify as spam (if threshold 0.5).
Other Classification Algorithms:
- Decision Tree Classifier: Learns rules (e.g., "If has_links = 1 and sender_score < 5 → spam"). What it learns: Tree structure.
- Random Forest Classifier: Many trees voting together. Learns: Ensemble of trees. Robust and accurate.
- Support Vector Machine (SVM): Finds the best boundary (hyperplane) that separates classes with maximum margin. Learns: Support vectors (key points near the boundary).
- Neural Networks (Deep Learning Classification): Layers learn complex decision boundaries (e.g., CNN for images, transformers for text).
Summary — What It Learns, How to Define Learning, How to Use It
- What it learns: Parameters (weights, biases, tree splits, support vectors, etc.) that map inputs to outputs with low error.
- How to define learning: Successfully minimized the loss function on labeled data (e.g., low MSE for regression, high accuracy / low cross-entropy for classification).
- How to use it: Feed new inputs → get prediction (price / class / probability) instantly.
- Example applications: house price estimator apps, spam filters, medical diagnosis (tumor yes/no), credit approval (approve/decline), customer churn prediction (will leave / will stay).
Supervised learning powers most production ML systems today because clear labels give strong, reliable feedback — though collecting good labels can be expensive and time-consuming.
Practice Quiz: Test Your Knowledge!
Scenario: You want to predict the exact temperature (in degrees) for tomorrow. Is this Regression or Classification?
- A) Regression
- B) Classification
Scenario: You want to predict if it will rain tomorrow (Yes/No). Is this Regression or Classification?
- A) Regression
- B) Classification
Scenario: In Linear Regression, what does the "Slope" ($m$) represent?
- A) The starting value when input is 0.
- B) How much the target changes when the input increases by 1.
- C) The error rate of the model.
(Answers: 1-A, 2-B, 3-B)
What's Next?
We've mastered the teacher-led world of Supervised Learning. In the next post, we'll venture into the wild west of Unsupervised Learning, where the computer has to figure things out all on its own!
Did you find this deep dive helpful? Subscribe to the series to keep learning!

Written by
Abstract Algorithms
@abstractalgorithms
