Multi-class vs Multi-label vs Multi-output Classification

✅ Key Distinction:

Multi-class → 1 categorical output.

Multi-label → multiple binary outputs.

Multi-output → multiple outputs, each possibly multi-class.

Classification
│
├── Single-output
│   ├── Binary classification
│   └── Multi-class classification
│
└── Multi-output
    ├── Multi-label (all outputs are binary)
    └── General multi-output (outputs can be multi-class)

Classification Types (with MNIST examples)

Type	Definition	Output Format	MNIST Example	Other Examples
Binary Classification	Each instance belongs to exactly one of two classes.	Single binary label (0/1).	Predict “Even vs Odd” digit.	Spam detection (spam / not spam).
Multi-class Classification	Each instance belongs to exactly one class among 3 or more.	Single categorical label (e.g., one-hot encoded).	Predict digit ∈ {0–9}.	Fruit classification (apple, banana, orange).
Multi-label Classification	Each instance can belong to multiple binary classes at once.	Vector of binary labels.	Predict [Odd?, ≥7?] → e.g., “9” → [True, True].	Face recognition with multiple people per photo.
Multi-output (Binary variant = Multi-label)	Predicts multiple targets, each of which is binary. (This is exactly multi-label.)	Multiple binary outputs.	Same as multi-label: [Odd?, ≥7?].	Medical tests (Positive/Negative for multiple conditions).
Multi-output (General)	Predicts multiple targets, each target may be binary or multi-class.	Multiple categorical outputs.	Predict: Digit ∈ {0–9}, Stroke ∈ {Thin/Medium/Thick}, Angle ∈ {Left/Upright/Right}.	Medical diagnosis: Disease type ∈ {flu, cold, pneumonia}, Severity ∈ {mild, moderate, severe}.

1. Multi-class classification

Each instance (MNIST image) belongs to exactly one among k classes (digits 0–9 → 10 classes). Output is one label from a finite set.

Output is a probability distribution across all classes (sums to 1)

2. Multi-label Classification

Each instance can belong to multiple classes at the same time. Output is typically a vector of independent binary indicators.
MNIST example (“≥7?” and “Odd?”) fits here because each picture maps to two independent yes/no outputs.

Output is a set of indepdenent probabilities for each classes (each between 0 and 1)

3. Multi-output Classification

This is the subtle one. Multi-output (a.k.a. multi-target) classification generalizes multi-label:

Multi-label assumes each target is binary (yes/no).
Multi-output allows each target to itself be multi-class. Think of it like predicting several categorical variables simultaneously.²

Examples:

Digit task (MNIST): Instead of only predicting the digit, suppose you also want to predict its stroke thickness (thin/medium/thick) and rotation angle category (left, upright, right). That’s three outputs: digit ∈ {0–9}, thickness ∈ {3 classes}, rotation ∈ {3 classes}. Each output is a multi-class prediction.
Medical diagnosis: Predict both the disease category (e.g., flu, cold, pneumonia) and severity level (mild, moderate, severe).
Image denoising: Each pixel’s value is a multi-class output (0–255). The system produces thousands of such outputs at once. That’s multi-output classification, because every pixel prediction is a categorical label with more than two possible values.

Appendix

Python Code

from graphviz import Digraph
 
# Create a hierarchy diagram for classification types
dot = Digraph(comment="Classification Hierarchy")
 
# Root node
dot.node("C", "Classification")
 
# Single-output branch
dot.node("SO", "Single-output")
dot.edge("C", "SO")
dot.node("Bin", "Binary Classification\n(Yes/No)")
dot.node("MC", "Multi-class Classification\n(One out of many classes)")
dot.edges([("SO", "Bin"), ("SO", "MC")])
 
# Multi-output branch
dot.node("MO", "Multi-output")
dot.edge("C", "MO")
dot.node("ML", "Multi-label\n(Special case: all outputs binary)")
dot.node("GenMO", "General Multi-output\n(Outputs can be multi-class)")
dot.edges([("MO", "ML"), ("MO", "GenMO")])
 
# Render the diagram
dot.render("/content/classification_hierarchy", format="png", cleanup=True)
# "/mnt/data/classification_hierarchy.png"

Multi-class vs. Multi-label Classification

Aspect	Multi-Class	Multi-Label
Output activation	Softmax	Sigmoid
Loss function	CrossEntropyLoss	BCEWithLogitsLoss
Constraint	Exactly one class per sample	Any number of classes per sample
Prediction interpretation	”Which animal exists in the image (dog or cat or horse or mouse)?"	"Which of the animals exists in the image (dog, cat, horse, mouse)?”
Output sum	Probabilities sum up to 1	Each probability is independent

Thangavel PrasanthTP

Explorer

Classification

Multi-class vs Multi-label vs Multi-output Classification

Classification Types (with MNIST examples)

1. Multi-class classification

2. Multi-label Classification

3. Multi-output Classification

Appendix

Python Code

Multi-class vs. Multi-label Classification

Footnote

Graph View

Table of Contents

Thangavel PrasanthTP

Explorer

Classification

Multi-class vs Multi-label vs Multi-output Classification

Classification Types (with MNIST examples)

1. Multi-class classification

2. Multi-label Classification

3. Multi-output Classification

Appendix

Python Code

Multi-class vs. Multi-label Classification

Footnote

Footnotes

Graph View

Table of Contents