What you'll learn
Cumulative frequency is a statistical method for organising and analysing grouped data, tested extensively in CIE IGCSE Mathematics Paper 2 and Paper 4. This topic requires you to construct tables, draw smooth curves, and extract key statistical measures including the median, quartiles, and interquartile range. Understanding these techniques enables you to interpret large data sets and answer multi-step questions worth 6-10 marks in examinations.
Key terms and definitions
Cumulative frequency — the running total of frequencies up to and including each class interval, showing how many data values fall below the upper class boundary.
Upper class boundary — the highest value in each class interval, used as the x-coordinate when plotting cumulative frequency curves.
Median — the middle value of a data set, found at the 50th percentile or n/2 position on the cumulative frequency curve.
Lower quartile (Q₁) — the value at the 25th percentile, found at the n/4 position, below which 25% of data values lie.
Upper quartile (Q₃) — the value at the 75th percentile, found at the 3n/4 position, below which 75% of data values lie.
Interquartile range (IQR) — the measure of spread calculated as Q₃ - Q₁, representing the range of the middle 50% of data values.
Cumulative frequency curve (ogive) — a smooth S-shaped curve plotted with upper class boundaries on the x-axis and cumulative frequencies on the y-axis.
Class interval — a range of values used to group continuous data, written in the form a < x ≤ b or a ≤ x < b.
Core concepts
Understanding cumulative frequency tables
A cumulative frequency table transforms grouped frequency data into running totals. Each cumulative frequency value represents the total number of observations up to and including that class interval.
To construct a cumulative frequency table:
- Identify the class intervals and their frequencies from the raw data
- Add a new column headed "Cumulative Frequency"
- Write the first frequency as the first cumulative frequency
- Add each subsequent frequency to the previous cumulative frequency total
- The final cumulative frequency must equal the total number of data values (n)
For example, if class intervals show frequencies of 3, 7, 12, 15, 8, 5, the cumulative frequencies become 3, 10, 22, 37, 45, 50. Always verify that your final value equals the sum of all frequencies.
Plotting cumulative frequency curves
The cumulative frequency curve provides a visual representation allowing you to read statistical measures directly. CIE IGCSE Mathematics examiners expect precise plotting technique.
Essential plotting rules:
- Plot cumulative frequency against the upper class boundary of each interval (not the midpoint)
- Use the x-axis for the variable being measured (e.g., height, time, mass)
- Use the y-axis for cumulative frequency
- Mark each point with a small, neat cross (×)
- Draw a smooth S-shaped curve through all points — never use straight lines between points
- If the data starts at zero, plot the point (lowest class boundary, 0) before the first class interval
- Label both axes clearly with titles and appropriate scales
The curve should be smooth because cumulative frequency represents continuous data. Sharp corners or ruled lines between points lose marks in examinations.
Finding the median from cumulative frequency
The median divides the data set into two equal halves. On a cumulative frequency curve, it represents the 50th percentile position.
Method for finding the median:
- Calculate n/2, where n is the total frequency (the highest cumulative frequency value)
- Draw a horizontal line from n/2 on the y-axis until it meets the curve
- Draw a vertical line down from this intersection point to the x-axis
- Read the median value from the x-axis
For example, if the total frequency is 80, locate 40 on the y-axis (since 80 ÷ 2 = 40). The x-coordinate where the horizontal line meets the curve gives the median. Always show construction lines clearly in examinations — they demonstrate your method and can earn method marks even if your final answer contains minor reading errors.
Calculating quartiles
Quartiles divide the data into four equal parts. The lower quartile (Q₁) marks the 25th percentile, while the upper quartile (Q₃) marks the 75th percentile.
Method for finding Q₁:
- Calculate n/4
- Draw a horizontal line from n/4 on the y-axis to the curve
- Draw a vertical line down to read Q₁ from the x-axis
Method for finding Q₃:
- Calculate 3n/4
- Draw a horizontal line from 3n/4 on the y-axis to the curve
- Draw a vertical line down to read Q₃ from the x-axis
Using the previous example where n = 80:
- Q₁ position = 80 ÷ 4 = 20
- Q₃ position = 3 × 80 ÷ 4 = 60
These quartile values allow you to understand the spread and distribution of data. In symmetrical distributions, the median lies exactly halfway between Q₁ and Q₃; in skewed distributions, the median is closer to one quartile than the other.
Computing the interquartile range
The interquartile range (IQR) measures statistical spread while excluding extreme values (outliers). It represents the range containing the middle 50% of data.
Formula: IQR = Q₃ - Q₁
The IQR is more reliable than the range (highest - lowest) because it is not affected by extreme outliers. In CIE IGCSE Mathematics examinations, questions often ask you to:
- State which measure of spread is more appropriate for skewed data (answer: IQR)
- Compare IQRs from two different data sets
- Use IQR to identify outliers using the criterion: outliers lie more than 1.5 × IQR beyond the quartiles
A smaller IQR indicates data clustered closely around the median; a larger IQR shows greater variability in the middle 50% of values.
Reading and interpreting cumulative frequency curves
Beyond calculating specific statistics, you must interpret what cumulative frequency curves reveal about data distribution:
Steep sections indicate many data values concentrated in that interval (high frequency density).
Gentle slopes show fewer data values spread across that range (low frequency density).
Inflection points (where the curve changes from concave to convex) often occur near the median in symmetrical distributions.
When comparing two cumulative frequency curves on the same axes, the curve further to the right represents data with generally higher values. If two curves have similar medians but different IQRs, the data set with the larger IQR shows greater variability.
Worked examples
Example 1: Constructing a cumulative frequency table and finding the median
The table shows the time, in minutes, taken by 60 students to complete a test.
| Time (t minutes) | Frequency |
|---|---|
| 10 < t ≤ 20 | 4 |
| 20 < t ≤ 30 | 9 |
| 30 < t ≤ 40 | 18 |
| 40 < t ≤ 50 | 16 |
| 50 < t ≤ 60 | 10 |
| 60 < t ≤ 70 | 3 |
(a) Complete a cumulative frequency table. [2 marks]
(b) Draw a cumulative frequency curve. [2 marks]
(c) Use your curve to find the median. [2 marks]
Solution:
(a) Cumulative frequency table:
| Time (t minutes) | Frequency | Cumulative Frequency |
|---|---|---|
| 10 < t ≤ 20 | 4 | 4 |
| 20 < t ≤ 30 | 9 | 13 |
| 30 < t ≤ 40 | 18 | 31 |
| 40 < t ≤ 50 | 16 | 47 |
| 50 < t ≤ 60 | 10 | 57 |
| 60 < t ≤ 70 | 3 | 60 |
✓ All cumulative frequencies correct [1 mark] ✓ Final value = 60 [1 mark]
(b) Plot points at (20, 4), (30, 13), (40, 31), (50, 47), (60, 57), (70, 60). Draw a smooth curve through all points. [2 marks for correctly plotted and smooth curve]
(c) n/2 = 60/2 = 30
Draw a horizontal line from 30 on the cumulative frequency axis to meet the curve. Draw a vertical line down to the time axis.
Reading from the curve: Median = 39 minutes (accept 38-40) [2 marks]
Example 2: Finding quartiles and interquartile range
Using the cumulative frequency curve from Example 1:
(a) Find the lower quartile. [1 mark]
(b) Find the upper quartile. [1 mark]
(c) Calculate the interquartile range. [2 marks]
Solution:
(a) Q₁ position = n/4 = 60/4 = 15
Draw a horizontal line from 15 to meet the curve, then vertically down.
Q₁ = 31 minutes (accept 30-32) [1 mark]
(b) Q₃ position = 3n/4 = 3 × 60/4 = 45
Draw a horizontal line from 45 to meet the curve, then vertically down.
Q₃ = 49 minutes (accept 48-50) [1 mark]
(c) IQR = Q₃ - Q₁ = 49 - 31 = 18 minutes [1 mark for method, 1 mark for answer]
Example 3: Comparing data sets
Two groups of students sit the same mathematics test. Their results are shown using cumulative frequency curves. Group A has a median of 58 marks and IQR of 22 marks. Group B has a median of 62 marks and IQR of 14 marks.
(a) Which group performed better overall? Give a reason. [2 marks]
(b) Which group's results were more consistent? Give a reason. [2 marks]
Solution:
(a) Group B performed better overall because the median is higher (62 compared to 58). This means the middle student in Group B scored more marks than the middle student in Group A. [1 mark for answer, 1 mark for reason]
(b) Group B's results were more consistent because the IQR is smaller (14 compared to 22). A smaller interquartile range indicates less spread in the middle 50% of data values, meaning scores were clustered more closely together. [1 mark for answer, 1 mark for reason]
Common mistakes and how to avoid them
Mistake: Plotting cumulative frequency against the midpoint of class intervals instead of the upper class boundary. Correction: Always use the upper boundary value. For the interval 30 < t ≤ 40, plot at 40 (not 35). The cumulative frequency represents all values up to and including this boundary.
Mistake: Joining points with straight ruled lines instead of drawing a smooth curve. Correction: Cumulative frequency curves must be smooth and S-shaped. Use a single flowing line through all points without sharp corners. Examiners specifically deduct marks for ruled straight-line segments.
Mistake: Forgetting to plot the starting point (lowest boundary, 0) when appropriate. Correction: If your first interval is 10 < t ≤ 20, plot the point (10, 0) before plotting (20, 4). This ensures your curve starts correctly from zero cumulative frequency.
Mistake: Calculating n/4 for the upper quartile instead of 3n/4. Correction: Q₁ is at position n/4, Q₃ is at position 3n/4. Write these formulas clearly in your working to avoid confusion. Many students incorrectly use n/4 for both quartiles.
Mistake: Reading values directly from the table instead of using the curve for median and quartiles. Correction: Even when a cumulative frequency table is provided, you must read statistical measures from the curve you have drawn. Direct table readings do not account for the continuous nature of grouped data and will lose marks.
Mistake: Subtracting quartiles in the wrong order when calculating IQR. Correction: IQR = Q₃ - Q₁ (upper minus lower), not Q₁ - Q₃. The interquartile range must always be positive. If your answer is negative, you have subtracted in the wrong order.
Exam technique for cumulative frequency questions
Command word recognition: "Complete the table" requires accurate cumulative totals with the final value equalling n. "Draw a cumulative frequency curve" requires smooth plotting with all construction shown. "Use your curve to find" means you must show horizontal and vertical lines on your diagram — reading mentally loses method marks.
Show all construction lines: Draw horizontal lines from calculated positions on the y-axis to the curve, then vertical lines down to the x-axis. Use a ruler and make lines visible but not so heavy they obscure the curve. These construction lines earn method marks even if your final reading is slightly inaccurate.
State units in final answers: If the question asks for median height in centimetres, write "165 cm" not just "165". One mark questions often test whether you include appropriate units.
Comparison questions require both values and explanations: When asked which data set has greater spread or higher average, state which set (with supporting numerical values) and explain why using statistical terminology. A two-mark question needs both components: "Group A because IQR = 18 which is larger than Group B's IQR = 12, showing greater variability."
Quick revision summary
Cumulative frequency tables show running totals of frequencies. Plot cumulative frequency against upper class boundaries, drawing a smooth curve through all points. Find the median at n/2, lower quartile Q₁ at n/4, and upper quartile Q₃ at 3n/4 by reading horizontally from the y-axis to the curve, then vertically down to the x-axis. Calculate interquartile range as IQR = Q₃ - Q₁ to measure spread. Always show construction lines clearly and include units in final answers.