All Posts

Supervised Learning Algorithms: A Deep Dive into Regression and Classification

Abstract AlgorithmsAbstract Algorithms
··6 min read

TL;DR

Supervised Learning is the workhorse of modern AI. In this guide, we peel back the layers of its most popular algorithms.

Cover Image for Supervised Learning Algorithms: A Deep Dive into Regression and Classification

Introduction: The "Teacher" Paradigm

Supervised learning = teaching the computer with a teacher. You give it labeled data (inputs + correct answers) and say: "Learn to predict the correct answer for new similar inputs."

It's like showing a child 100 photos: "This is a cat (label), this is a dog (label)." After seeing many, the child can look at a new photo and say "dog" correctly.

The computer is learning during training by adjusting its internal numbers (parameters) to minimize mistakes on the labeled examples.

Key difference from unsupervised:

  • Unsupervised: No labels, no right/wrong — computer finds patterns alone (e.g., groups photos without knowing "cat" or "dog").
  • Supervised: Always labels — computer gets feedback like "wrong, that's a cat not a dog" and improves.

The Two Main Families in Supervised Learning

Supervised algorithms split into:

  1. Regression: Predict a continuous number (e.g., house price, temperature).
  2. Classification: Predict a category/class (e.g., spam/not spam, cat/dog/bird).

Both use the same idea: Train on labeled data → minimize prediction error → use on new data.

AspectRegressionClassification
Output typeNumber (continuous, like 85.5)Category (discrete, like "positive" or "negative")
Common goalPredict how much / how manyPredict which one / yes-no
Error measureDifference in numbers (e.g., RMSE: sqrt(average (predicted - actual)^2))Wrong category count (e.g., accuracy: correct predictions / total)
AnalogyGuess the height of a person from their weightGuess if a fruit is apple/orange/banana from color/shape

How Supervised Learning Works (Training Phase)

  1. Give labeled data: Inputs (features) + correct outputs (labels/targets).
  2. The computer makes a guess using current parameters.
  3. Calculate error: How wrong was the guess? (using a loss function).
  4. Adjust parameters: Use math (gradient descent) to tweak numbers so error drops next time.
  5. Repeat: Over many examples/iterations until error is low.
  6. After training: Model is "learned" — parameters are fixed. Use for new data (inference).

Learning = minimizing the loss (error score) on labeled data.


Deep Dive into Regression Algorithms

Regression predicts numbers. Example scenario: Predict house prices in Bengaluru based on size (sq ft), bedrooms, location score (0-10).

Toy data (5 houses):

HouseSize (sq ft)BedroomsLocation ScorePrice (₹ lakhs) — Label
110002750
215003875
312002660
4200049100
518003785
  • Idea: Fit a straight line (or plane in higher dimensions) that best predicts price from features.
  • What it learns: Coefficients (weights) for each feature + intercept (bias).
  • Math: Prediction = $w_1 \times \text{size} + w_2 \times \text{bedrooms} + w_3 \times \text{location} + b$
    • Loss: Mean Squared Error (MSE): average (predicted - actual)²
    • Learning: Adjust $w_1$, w_2, w_3, b$ to minimize MSE.

After training, typical learned values might be:

  • $w_1 \approx 0.04$ (₹ per sq ft)
  • $w_2 \approx 10$ (₹ per bedroom)
  • $w_3 \approx 5$ (₹ per location point)
  • $b \approx 10$

For a new house (1400 sq ft, 3 bedrooms, location score 8), it predicts $\approx$ 85–90 lakhs.

Other Regression Algorithms:

  • Decision Tree Regression: Learns if-then rules (e.g., "If size > 1500 and bedrooms ≥ 3, price > 70"). What it learns: Tree splits. Good for non-linear relationships.
  • Random Forest Regression: Many decision trees averaged together. Learns: Ensemble of trees. More accurate, less overfitting.
  • Gradient Boosting (XGBoost, LightGBM, CatBoost): Builds trees sequentially, each correcting previous errors. Very powerful in practice.
  • Neural Networks (Deep Learning Regression): Layers of weights learn complex patterns (e.g., "big size + good location = extra premium").

Deep Dive into Classification Algorithms

Classification predicts categories. Example: Email spam detection. Features: word count, has links (yes/no), sender score (0-10). Label: "spam" (1) or "not spam" (0).

Toy data (5 emails):

EmailWord CountHas Links (0/1)Sender Score (0-10)Spam (Label)
150080
2200121
3100070
4150131
580090
  • Idea: Predict probability (0–1) of "spam" using a sigmoid (S-shaped) curve.
  • What it learns: Weights that turn features into a probability.
  • Math: Logit = $w_1 \times \text{words} + w_2 \times \text{links} + w_3 \times \text{sender} + b$
    • Probability: $1 / (1 + \exp(-\text{logit}))$
    • Loss: Binary Cross-Entropy: measures how far predicted probabilities are from actual 0/1 labels
    • Learning: Adjust weights to minimize loss (make probs close to actual labels).

After training, typical learned values might be:

  • $w_1 \approx 0.01$ (slight positive for long emails)
  • $w_2 \approx 2$ (strong penalty for having links)
  • $w_3 \approx -0.5$ (negative for good sender score)
  • $b \approx -1$

For a new email (120 words, has link, sender score 4), it might predict probability $\approx$ 0.75 → classify as spam (if threshold 0.5).

Other Classification Algorithms:

  • Decision Tree Classifier: Learns rules (e.g., "If has_links = 1 and sender_score < 5 → spam"). What it learns: Tree structure.
  • Random Forest Classifier: Many trees voting together. Learns: Ensemble of trees. Robust and accurate.
  • Support Vector Machine (SVM): Finds the best boundary (hyperplane) that separates classes with maximum margin. Learns: Support vectors (key points near the boundary).
  • Neural Networks (Deep Learning Classification): Layers learn complex decision boundaries (e.g., CNN for images, transformers for text).

Summary — What It Learns, How to Define Learning, How to Use It

  • What it learns: Parameters (weights, biases, tree splits, support vectors, etc.) that map inputs to outputs with low error.
  • How to define learning: Successfully minimized the loss function on labeled data (e.g., low MSE for regression, high accuracy / low cross-entropy for classification).
  • How to use it: Feed new inputs → get prediction (price / class / probability) instantly.
    • Example applications: house price estimator apps, spam filters, medical diagnosis (tumor yes/no), credit approval (approve/decline), customer churn prediction (will leave / will stay).

Supervised learning powers most production ML systems today because clear labels give strong, reliable feedback — though collecting good labels can be expensive and time-consuming.


Practice Quiz: Test Your Knowledge!

  1. Scenario: You want to predict the exact temperature (in degrees) for tomorrow. Is this Regression or Classification?

    • A) Regression
    • B) Classification
  2. Scenario: You want to predict if it will rain tomorrow (Yes/No). Is this Regression or Classification?

    • A) Regression
    • B) Classification
  3. Scenario: In Linear Regression, what does the "Slope" ($m$) represent?

    • A) The starting value when input is 0.
    • B) How much the target changes when the input increases by 1.
    • C) The error rate of the model.

(Answers: 1-A, 2-B, 3-B)


What's Next?

We've mastered the teacher-led world of Supervised Learning. In the next post, we'll venture into the wild west of Unsupervised Learning, where the computer has to figure things out all on its own!

Did you find this deep dive helpful? Subscribe to the series to keep learning!

Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms