Math foundations
Reading Distributions: Percentiles and Quartiles from Scratch
The idea
A distribution is the full set of values, not one summary number. Average alone hides the shape: a few very large values can pull the mean up while most observations sit lower. Percentiles and quartiles read the shape directly from sorted data.
P90 is the cutoff where about 90% of observations fall at or below it. Quartiles are just special percentiles: Q1 is P25, Q2 is P50 (the median), Q3 is P75. They split sorted data into four equal-count regions.
Sort first, then read the cutoff. The histogram and box plot show where values pile up.
Example: read percentiles, quartiles, and the full shape
Sort the data first. Quartiles split it into four equal-count chunks. Any percentile is a cutoff on that sorted list. The histogram and cumulative curve show the shape, not just one number.
Response times in milliseconds for one endpoint.
Min
12
Q1 (P25)
29
Q2 (P50)
44
Q3 (P75)
62
Max
110
Box plot (quartiles + whiskers)
Histogram: how often values fall in each range
Cutoff: 86 ms
Sorted values (blue = at or below P90) and cumulative curve
Mean vs median
Mean 48.6 ms, median 43.5 ms. They are close here, so the distribution is fairly symmetric.
Sorted data (18 points)
12, 18, 21, 24, 28, 31, 35, 38, 42, 45, 49, 53, 58, 63, 71, 82, 95, 110
P90 is about 86 ms. Roughly 90% of observations fall at or below that cutoff, and 10% sit above it.
How to read the charts
Histogram. Counts how many values fall in each range. Tall bars mean common values. A long tail on one side means skew.
Box plot. The box spans Q1 to Q3. The line inside is the median. Whiskers reach toward min and max. It is a compact view of spread and symmetry.
Cumulative curve. As you move right through sorted values, the curve climbs toward 100%. A percentile cutoff is the x-value where the curve crosses your chosen percentage.
The math
Percentile
Sort the data ascending. P50 is the middle value. P90 is high enough that only about 10% of points sit above it.
Quartiles
Quartiles split sorted data into four equal-count regions.
Interquartile range
IQR captures the middle 50% spread. It is less sensitive to extreme tail values than the full range from min to max.
Mean vs median in one glance
When mean and median diverge, the distribution is skewed. Revenue per customer, order sizes, and incident severity often skew right: a few large values pull the mean up while the median stays closer to typical experience. Percentiles keep the tail visible.
A simple application
This is the core read for latency SLAs, support wait times, and any metric where the tail matters more than the average. P99 latency answers worst-case user experience for a slice of traffic. P90 support wait tells you what most customers see, not what the fastest cases pull the average toward.
The applied posts on percentiles, variance, and mean vs median take these same reads into real dashboards and decision thresholds.