Math foundations

Reading Distributions: Percentiles and Quartiles from Scratch

Percentile cutoff on a sorted distribution

The idea

A distribution is the full set of values, not one summary number. Average alone hides the shape: a few very large values can pull the mean up while most observations sit lower. Percentiles and quartiles read the shape directly from sorted data.

P90 is the cutoff where about 90% of observations fall at or below it. Quartiles are just special percentiles: Q1 is P25, Q2 is P50 (the median), Q3 is P75. They split sorted data into four equal-count regions.

Sort first, then read the cutoff. The histogram and box plot show where values pile up.

Example: read percentiles, quartiles, and the full shape

Sort the data first. Quartiles split it into four equal-count chunks. Any percentile is a cutoff on that sorted list. The histogram and cumulative curve show the shape, not just one number.

Response times in milliseconds for one endpoint.

Min

Q1 (P25)

Q2 (P50)

Q3 (P75)

Max

110

Box plot (quartiles + whiskers)

Histogram: how often values fall in each range

Percentile cutoff: P90

Cutoff: 86 ms

Sorted values (blue = at or below P90) and cumulative curve

Mean vs median

Mean 48.6 ms, median 43.5 ms. They are close here, so the distribution is fairly symmetric.

Sorted data (18 points)

12, 18, 21, 24, 28, 31, 35, 38, 42, 45, 49, 53, 58, 63, 71, 82, 95, 110

P90 is about 86 ms. Roughly 90% of observations fall at or below that cutoff, and 10% sit above it.

How to read the charts

Histogram. Counts how many values fall in each range. Tall bars mean common values. A long tail on one side means skew.

Box plot. The box spans Q1 to Q3. The line inside is the median. Whiskers reach toward min and max. It is a compact view of spread and symmetry.

Cumulative curve. As you move right through sorted values, the curve climbs toward 100%. A percentile cutoff is the x-value where the curve crosses your chosen percentage.

The math

Percentile

P_k = value where k% of data is ≤ that value

Sort the data ascending. P50 is the middle value. P90 is high enough that only about 10% of points sit above it.

Quartiles

Q1 = P25, Q2 = P50, Q3 = P75

Quartiles split sorted data into four equal-count regions.

Interquartile range

IQR = Q3 − Q1

IQR captures the middle 50% spread. It is less sensitive to extreme tail values than the full range from min to max.

Mean vs median in one glance

When mean and median diverge, the distribution is skewed. Revenue per customer, order sizes, and incident severity often skew right: a few large values pull the mean up while the median stays closer to typical experience. Percentiles keep the tail visible.

A simple application

This is the core read for latency SLAs, support wait times, and any metric where the tail matters more than the average. P99 latency answers worst-case user experience for a slice of traffic. P90 support wait tells you what most customers see, not what the fastest cases pull the average toward.

The applied posts on percentiles, variance, and mean vs median take these same reads into real dashboards and decision thresholds.