Interpretations of ROC-AUC & PR-AUC metrics

Posted Nov 11, 2025 Updated Mar 12, 2026

By Sayan Biswas

4 min read

1. Help me understand how to interpret ROC-AUC score. E.g. what does a 90% ROC-AUC score mean? How is it different from PR-AUC score?

These metrics often get thrown around but not well-understood. Let’s break it down cleanly, with intuition, toy examples, and guidance on when to use each.

Part 1: ROC-AUC: What it really measures

ROC curve = Receiver Operating Characteristic curve.

It plots True Positive Rate (Recall) vs. False Positive Rate (FPR = FP / (FP+TN)) at all classification thresholds.
The AUC (Area Under Curve) is the probability that a randomly chosen positive example is ranked above a randomly chosen negative example.

Interpretation of ROC-AUC = 0.90

If you pick one positive sample and one negative sample, your model will score the positive higher 90% of the time.
Equivalently: the model is very good at separating the classes overall.

Extreme cases:

0.5 → random guessing (diagonal line).
1.0 → perfect separation.

Part 2: PR-AUC: What it really measures

Precision-Recall curve plots Precision vs. Recall at all thresholds.

The PR-AUC is the average precision across recall levels.
It tells you: When I want high recall, how much precision do I have to give up?

Interpretation of PR-AUC = 0.60

Across all recall levels, the model has about 60% average precision.
If the dataset is very imbalanced (say only 1% positives), 60% PR-AUC is actually quite good, because a random classifier’s PR-AUC would be ~0.01 (the base positive rate).

Part 3: ROC-AUC vs PR-AUC - when to use

ROC-AUC is good when:
- Classes are relatively balanced.
- You care about overall ranking ability.
- Example: disease diagnosis where both false positives and false negatives matter.
PR-AUC is better when:
- Classes are highly imbalanced (rare positives).
- You really care about “when the model predicts positive, how often is it correct?”
- Example: fraud detection, cancer detection, anomaly detection.

ROC-AUC can be misleading in imbalanced cases.

With 1% positives, a model can get ROC-AUC = 0.90 but PR-AUC = 0.10. That means: yes, it ranks positives higher, but most of the actual predicted positives are still false alarms.

Part 4: Comparison to Accuracy / Precision / Recall

Accuracy: fraction of correct predictions.
- Misleading on imbalanced data (99% accuracy just by always predicting “negative”).
Precision: of the predicted positives, how many were correct.
Recall: of all true positives, how many you caught.
These depend on one threshold (say 0.5 cutoff).

ROC-AUC and PR-AUC don’t depend on a fixed threshold.

They summarize performance across all possible thresholds.
That makes them more robust for model evaluation and threshold tuning.

Part 5: Rule of Thumb

Use ROC-AUC: when classes are balanced and you care about ranking quality.
Use PR-AUC: when positives are rare and you care about actual “positive predictions” being useful.
Use Accuracy/Precision/Recall: when the threshold is fixed and well-defined (e.g. regulatory or business rule).

So, 90% ROC-AUC means: your model ranks positives above negatives 9 times out of 10 - strong discrimination. But whether that translates to useful predictions depends on class balance. That’s why PR-AUC is often the more telling metric in imbalanced settings.

2. Explain what’s the interpretation of PR-AUC = 0.60 and why so? Also explain why a random classifier’s PR-AUC would be ~0.01 if we’ve 1% positive and 99% negative data?

Let’s drill into PR-AUC = 0.60 and why the random baseline is the positive prevalence (e.g. ~0.01 if only 1% positives).

Part 1: What does PR-AUC = 0.60 mean?

PR curve = plots Precision vs Recall at every threshold.
PR-AUC = area under that curve = “average precision across recall levels.”

If PR-AUC = 0.60, that means:

On average, across all thresholds, when the model tries to recall positives, 60% of its predicted positives are actually correct.
Another way: If you rank the data by model score and “walk down the list,” the top portions of the ranking are fairly enriched for true positives: 60% on average compared to baseline prevalence.

Part 2: Why is the random classifier’s PR-AUC equal to the prevalence?

Suppose you have a dataset of 10,000 samples:

100 are positive (1%).
9,900 are negative.

Now imagine a random classifier that assigns random scores. If you sweep thresholds:

At any threshold, the set of predicted positives is essentially a random subset of the population.
So the fraction of true positives inside that set = prevalence = 100 / 10,000 = 0.01.

So:

Precision is flat = 0.01, regardless of recall.
PR curve is a horizontal line at 0.01.
Area under that curve = 0.01.

That’s why:

PR-AUC baseline = positive rate.
With PR-AUC = 0.60, your model is massively better than random if prevalence is low (say 1%).

Part 3: Why does this matter?

If your dataset has 1% positives:
- ROC-AUC = 0.90 might sound “great” … but you could still end up with very few correct positives in the top predictions.
- PR-AUC = 0.60 tells you that your model’s predicted positive set is, on average, 60x more enriched in true positives than random guessing. That’s far more useful.

Summary

PR-AUC = 0.60 → on average, 60% of predicted positives are true positives across thresholds.
Random classifier’s PR-AUC = prevalence because predictions are random subsets, so expected precision equals the base rate (e.g. 0.01 if 1% positives).
That’s why PR-AUC is the metric of choice for imbalanced classification: it reflects how much better you are than just knowing the base rate.

Machine Learning

ML-Theory Classification

This post is licensed under CC BY 4.0 by the author.