Multi-class vs Multi-label vs Multi-output Classification
✅ Key Distinction:
- Multi-class → 1 categorical output.
- Multi-label → multiple binary outputs.
- Multi-output → multiple outputs, each possibly multi-class.
Classification
│
├── Single-output
│ ├── Binary classification
│ └── Multi-class classification
│
└── Multi-output
├── Multi-label (all outputs are binary)
└── General multi-output (outputs can be multi-class)
Classification Types (with MNIST examples)
Type | Definition | Output Format | MNIST Example | Other Examples |
---|---|---|---|---|
Binary Classification | Each instance belongs to exactly one of two classes. | Single binary label (0/1). | Predict “Even vs Odd” digit. | Spam detection (spam / not spam). |
Multi-class Classification | Each instance belongs to exactly one class among 3 or more. | Single categorical label (e.g., one-hot encoded). | Predict digit ∈ {0–9}. | Fruit classification (apple, banana, orange). |
Multi-label Classification | Each instance can belong to multiple binary classes at once. | Vector of binary labels. | Predict [Odd?, ≥7?] → e.g., “9” → [True, True]. | Face recognition with multiple people per photo. |
Multi-output (Binary variant = Multi-label) | Predicts multiple targets, each of which is binary. (This is exactly multi-label.) | Multiple binary outputs. | Same as multi-label: [Odd?, ≥7?]. | Medical tests (Positive/Negative for multiple conditions). |
Multi-output (General) | Predicts multiple targets, each target may be binary or multi-class. | Multiple categorical outputs. | Predict: Digit ∈ {0–9}, Stroke ∈ {Thin/Medium/Thick}, Angle ∈ {Left/Upright/Right}. | Medical diagnosis: Disease type ∈ {flu, cold, pneumonia}, Severity ∈ {mild, moderate, severe}. |
1. Multi-class classification
- Each instance (MNIST image) belongs to exactly one among k classes (digits 0–9 → 10 classes). Output is one label from a finite set.
Output is a probability distribution across all classes (sums to 1)
2. Multi-label Classification
- Each instance can belong to multiple classes at the same time. Output is typically a vector of independent binary indicators.
- MNIST example (“≥7?” and “Odd?”) fits here because each picture maps to two independent yes/no outputs.
Output is a set of indepdenent probabilities for each classes (each between 0 and 1)
3. Multi-output Classification
This is the subtle one. Multi-output (a.k.a. multi-target) classification generalizes multi-label:
- Multi-label assumes each target is binary (yes/no).
- Multi-output allows each target to itself be multi-class. Think of it like predicting several categorical variables simultaneously.2
Examples:
- Digit task (MNIST): Instead of only predicting the digit, suppose you also want to predict its stroke thickness (thin/medium/thick) and rotation angle category (left, upright, right). That’s three outputs: digit ∈ {0–9}, thickness ∈ {3 classes}, rotation ∈ {3 classes}. Each output is a multi-class prediction.
- Medical diagnosis: Predict both the disease category (e.g., flu, cold, pneumonia) and severity level (mild, moderate, severe).
- Image denoising: Each pixel’s value is a multi-class output (0–255). The system produces thousands of such outputs at once. That’s multi-output classification, because every pixel prediction is a categorical label with more than two possible values.
Appendix
Python Code
from graphviz import Digraph
# Create a hierarchy diagram for classification types
dot = Digraph(comment="Classification Hierarchy")
# Root node
dot.node("C", "Classification")
# Single-output branch
dot.node("SO", "Single-output")
dot.edge("C", "SO")
dot.node("Bin", "Binary Classification\n(Yes/No)")
dot.node("MC", "Multi-class Classification\n(One out of many classes)")
dot.edges([("SO", "Bin"), ("SO", "MC")])
# Multi-output branch
dot.node("MO", "Multi-output")
dot.edge("C", "MO")
dot.node("ML", "Multi-label\n(Special case: all outputs binary)")
dot.node("GenMO", "General Multi-output\n(Outputs can be multi-class)")
dot.edges([("MO", "ML"), ("MO", "GenMO")])
# Render the diagram
dot.render("/content/classification_hierarchy", format="png", cleanup=True)
# "/mnt/data/classification_hierarchy.png"
Multi-class vs. Multi-label Classification
Aspect | Multi-Class | Multi-Label |
---|---|---|
Output activation | Softmax | Sigmoid |
Loss function | CrossEntropyLoss | BCEWithLogitsLoss |
Constraint | Exactly one class per sample | Any number of classes per sample |
Prediction interpretation | ”Which animal exists in the image (dog or cat or horse or mouse)?" | "Which of the animals exists in the image (dog, cat, horse, mouse)?” |
Output sum | Probabilities sum up to 1 | Each probability is independent |
Footnote
Footnotes
-
https://stats.stackexchange.com/questions/11859/what-is-the-difference-between-a-multiclass-and-a-multilabel-problem ↩
-
https://www.linkedin.com/pulse/multi-class-classification-vs-multi-label-aya-hesham/ ↩
-
https://www.geeksforgeeks.org/machine-learning/multiclass-classification-vs-multi-label-classification/ ↩