What you'll learn
Statistics forms a substantial component of CIE IGCSE Mathematics, typically accounting for 15-20% of exam marks across both paper types. This topic encompasses collecting, organising, representing and interpreting data through measures of central tendency, measures of spread, and cumulative frequency techniques. Examiners consistently test your ability to calculate from raw data, extract information from statistical diagrams, and make reasoned comparisons between datasets.
Key terms and definitions
Mean — the sum of all data values divided by the number of values; also called the arithmetic average.
Median — the middle value when data is arranged in ascending order; for an even number of values, it is the mean of the two middle values.
Mode — the value that occurs most frequently in a dataset; a dataset can be bimodal or have no mode.
Range — the difference between the largest and smallest values in a dataset; a simple measure of spread.
Interquartile range (IQR) — the difference between the upper quartile (Q₃) and lower quartile (Q₁); represents the spread of the middle 50% of the data.
Cumulative frequency — a running total of frequencies, used to find medians and quartiles from grouped data.
Outlier — a data value that lies outside the normal pattern; formally defined as a value more than 1.5 × IQR above Q₃ or below Q₁.
Standard deviation — a measure of spread that quantifies how dispersed data values are from the mean (Extended syllabus only).
Core concepts
Measures of central tendency
The three averages tested in CIE IGCSE Mathematics each serve different purposes and appear in distinct question contexts.
Calculating the mean:
- For discrete data: add all values and divide by the number of values
- For frequency tables: multiply each value by its frequency, sum these products, then divide by the total frequency
- Formula: mean = Σfx ÷ Σf where f represents frequency and x represents data values
- The mean uses all data values but is affected by extreme values (outliers)
Finding the median:
- Arrange data in ascending order
- For n values, the median position is (n+1)÷2
- For odd n: take the middle value directly
- For even n: calculate the mean of the two central values
- The median represents the 50th percentile and is not influenced by outliers
- From cumulative frequency graphs, read across from n÷2 on the vertical axis
Identifying the mode:
- Count frequency of each value and identify the highest
- Modal class: the class interval with the highest frequency in grouped data
- The mode is the only average that can be used for non-numerical data
- Bimodal datasets have two modes with equal highest frequency
Measures of spread
Examiners expect you to calculate and interpret spread alongside averages, particularly when comparing datasets.
Range calculation:
- Range = highest value - lowest value
- Simple to calculate but heavily influenced by outliers
- Provides a quick indication of variability but lacks sophistication
- Always state the range as a single value, not as "from x to y"
Quartiles and interquartile range:
- Lower quartile (Q₁): the value at the 25th percentile position = (n+1)÷4
- Upper quartile (Q₃): the value at the 75th percentile position = 3(n+1)÷4
- Interquartile range: IQR = Q₃ - Q₁
- IQR represents the spread of the middle 50% of data, unaffected by extreme values
- Box-and-whisker plots display Q₁, median, Q₃, and range graphically
- From cumulative frequency curves: read Q₁ at n÷4 and Q₃ at 3n÷4 on the vertical axis
Interpreting spread:
- Smaller spread indicates more consistent or clustered data
- Larger spread indicates more variable or dispersed data
- When comparing datasets, comment on both average AND spread
- Example comparison: "Dataset A has a higher mean (showing higher typical values) but also a larger IQR (showing less consistency)"
Frequency tables and grouped data
CIE IGCSE Mathematics papers frequently present data in tabular form requiring calculations from frequency distributions.
Working with frequency tables:
- Always create additional columns for fx (value × frequency) when calculating means
- Sum the frequency column to find n (total number of data items)
- The modal value is the x-value with the highest frequency
- For median from frequency tables, use cumulative frequency to locate the middle position
Grouped data calculations:
- Use midpoint of each class interval for calculations: midpoint = (lower bound + upper bound) ÷ 2
- Create columns for midpoint, frequency, and midpoint × frequency
- Estimated mean = Σ(midpoint × frequency) ÷ Σfrequency
- Modal class is the interval with highest frequency (cannot find exact mode)
- Calculations from grouped data produce estimates, not exact values
Class intervals and boundaries:
- Discrete data: 1-10, 11-20 (gaps between classes)
- Continuous data: 0 < x ≤ 10, 10 < x ≤ 20 (no gaps; values at boundaries assigned to upper class)
- Class width = upper boundary - lower boundary
- Frequency density = frequency ÷ class width (used for histograms with unequal class widths)
Cumulative frequency
Cumulative frequency techniques allow calculation of medians and quartiles from large grouped datasets.
Constructing cumulative frequency tables:
- Add a cumulative frequency column to the grouped frequency table
- Each cumulative frequency is the sum of all frequencies up to and including that class
- The final cumulative frequency equals the total number of data items (n)
- Use upper class boundaries as the x-coordinates for plotting
Drawing cumulative frequency curves:
- Plot points at (upper boundary, cumulative frequency) for each class
- The first point is often at (lower boundary of first class, 0)
- Join points with a smooth curve (not straight lines)
- Label axes clearly: x-axis shows the variable, y-axis shows cumulative frequency
- The curve should increase from left to right and never decrease
Reading values from cumulative frequency graphs:
- Median: read across from n÷2 on the y-axis, down to the x-axis
- Lower quartile (Q₁): read across from n÷4
- Upper quartile (Q₃): read across from 3n÷4
- To find how many values exceed a certain amount: read up to the curve, find cumulative frequency, subtract from n
- Interquartile range: calculate Q₃ - Q₁ using read values
Box-and-whisker plots
These diagrams provide visual comparison of datasets using five-number summaries.
Components:
- Minimum value (left whisker end)
- Lower quartile Q₁ (left box edge)
- Median (line inside box)
- Upper quartile Q₃ (right box edge)
- Maximum value (right whisker end)
Construction requirements:
- Draw to scale on a number line
- Box spans from Q₁ to Q₃
- Whiskers extend to minimum and maximum unless outliers present
- Mark outliers separately with crosses beyond 1.5 × IQR from the box
Interpretation:
- Box length represents IQR (spread of middle 50%)
- Whisker lengths show spread of outer quarters
- Position shows central tendency (location of median)
- Skewness visible if median is not central in the box
Standard deviation (Extended)
Standard deviation quantifies average distance of data points from the mean.
Formula:
- σ = √[Σ(x - x̄)² ÷ n] for a population
- s = √[Σ(x - x̄)² ÷ (n-1)] for a sample
- CIE papers typically use the population formula
- Alternative formula: σ = √[(Σx² ÷ n) - x̄²]
Interpretation:
- Larger standard deviation indicates greater spread from the mean
- Approximately 68% of values lie within one standard deviation of the mean in normal distributions
- More reliable than range as it uses all data values
- Always given in the same units as the original data
Worked examples
Example 1: Calculating mean and range from a frequency table
The table shows the number of goals scored by a hockey team in 20 matches.
| Goals (x) | 0 | 1 | 2 | 3 | 4 |
|---|---|---|---|---|---|
| Frequency (f) | 3 | 5 | 7 | 4 | 1 |
(a) Calculate the mean number of goals. (b) Find the range. (c) State the mode.
Solution:
(a) Create fx column:
- 0×3 = 0
- 1×5 = 5
- 2×7 = 14
- 3×4 = 12
- 4×1 = 4
- Σfx = 35
- Σf = 20
- Mean = 35 ÷ 20 = 1.75 goals ✓
(b) Range = maximum - minimum = 4 - 0 = 4 goals ✓
(c) Highest frequency is 7, which corresponds to 2 goals
- Mode = 2 goals ✓
Example 2: Median and IQR from grouped data
The cumulative frequency table shows the masses of 80 apples.
| Mass (m grams) | m ≤ 100 | m ≤ 120 | m ≤ 140 | m ≤ 160 | m ≤ 180 |
|---|---|---|---|---|---|
| Cumulative frequency | 8 | 24 | 52 | 72 | 80 |
(a) Find the median mass. (b) Calculate the interquartile range. (c) Estimate how many apples have mass greater than 155g.
Solution:
(a) n = 80, so median is at position 80 ÷ 2 = 40
- From the table, 24 apples weigh ≤120g and 52 apples weigh ≤140g
- The 40th apple falls in the 120 < m ≤ 140 class
- Using linear interpolation: median ≈ 120 + [(40-24)/(52-24)] × 20 = 120 + 11.4 = 131.4g ✓
(b) Q₁ position = 80 ÷ 4 = 20
- The 20th value falls in the 100 < m ≤ 120 class
- Q₁ ≈ 100 + [(20-8)/(24-8)] × 20 = 100 + 15 = 115g
- Q₃ position = 3 × 80 ÷ 4 = 60
- The 60th value falls in the 140 < m ≤ 160 class
- Q₃ ≈ 140 + [(60-52)/(72-52)] × 20 = 140 + 8 = 148g
- IQR = 148 - 115 = 33g ✓
(c) At m = 155g, the cumulative frequency ≈ 52 + [(155-140)/(160-140)] × (72-52) = 52 + 15 = 67
- Number with mass > 155g = 80 - 67 = 13 apples ✓
Example 3: Comparing datasets
Two classes took the same test. The results are summarised:
- Class A: mean = 64 marks, standard deviation = 12 marks
- Class B: mean = 68 marks, standard deviation = 8 marks
Compare the performance of the two classes.
Solution:
Class B has a higher mean (68 compared to 64), indicating that on average, Class B performed better on the test. ✓
Class B has a smaller standard deviation (8 compared to 12), indicating that the marks in Class B were more consistent and clustered closer to the mean, whereas Class A had more variation in performance. ✓
Common mistakes and how to avoid them
Confusing mean, median and mode — Students often mix up definitions or apply the wrong average. Remember: mean uses all values (sum ÷ count), median is the middle value when ordered, mode is the most frequent. Check which the question asks for.
Forgetting to order data for median — Finding the median from unordered data gives wrong answers. Always arrange values in ascending order first, then locate the middle position using (n+1)÷2.
Stating range as two numbers — Writing "range is 5 to 12" loses marks. Range is a single value: maximum - minimum = 7. It measures the spread, not the boundary values.
Using class boundaries instead of midpoints — When calculating mean from grouped data, use the midpoint of each interval, not the lower or upper boundary. Midpoint = (lower + upper) ÷ 2.
Plotting cumulative frequency at midpoints — Cumulative frequency curves require points at upper class boundaries, not midpoints. The cumulative frequency represents all values up to and including that boundary.
Misinterpreting IQR — The IQR is Q₃ - Q₁, not Q₃ ÷ Q₁ or the difference between Q₁ and the median. It represents the spread of the middle 50% of the data, always given as a single value.
Exam technique for Statistics
Command word "calculate" — Show clear working with numbers substituted into formulas. For mean from frequency tables, examiners expect to see an fx column or clear evidence of Σfx and Σf. Marks are awarded for method even if the final answer is incorrect.
Command word "estimate" — Used with grouped data to signal that exact values cannot be found. Use midpoints for calculations and state your answer as an estimate. Interpolation methods for median/quartiles from cumulative frequency earn method marks even if reading accuracy varies.
Comparison questions — Award marks are given separately for commenting on average (central tendency) AND spread. Structure answers as two statements: one about means/medians, one about ranges/IQRs. Use comparative language ("higher", "more consistent", "greater spread").
Drawing statistical diagrams — Use a ruler and draw to scale accurately. Label axes with variable names and units. For cumulative frequency curves, use a smooth curve through plotted points. Box plots require a number line with marked scale. Diagrams typically carry 2-3 marks with accuracy marks for correct scale and plotting.
Quick revision summary
Statistics in CIE IGCSE Mathematics centres on three averages (mean, median, mode) and measures of spread (range, IQR, standard deviation). Calculate mean from frequency tables using Σfx÷Σf; find median by ordering data and locating the middle position. For grouped data, use midpoints for mean calculations and cumulative frequency curves for median and quartiles. Interquartile range (Q₃-Q₁) measures spread of middle 50%. When comparing datasets, always comment on both central tendency and spread. Box plots display five-number summaries visually. Show full working for method marks and use correct terminology in written responses.