What you'll learn
Statistics and Probability forms a critical component of the CXC CSEC Additional Mathematics syllabus, accounting for approximately 15% of examination marks. This section covers descriptive statistics including measures of central tendency and dispersion, probability theory including conditional probability and tree diagrams, and the application of statistical methods to real-world Caribbean contexts. Mastery of these concepts is essential for success in Paper 2 structured questions.
Key terms and definitions
Mean (μ or x̄) — the arithmetic average of a data set, calculated by summing all values and dividing by the number of observations.
Standard deviation (σ) — a measure of spread that quantifies how dispersed data values are from the mean, calculated as the square root of the variance.
Probability — a numerical measure between 0 and 1 representing the likelihood of an event occurring, where 0 indicates impossibility and 1 indicates certainty.
Conditional probability P(A|B) — the probability of event A occurring given that event B has already occurred, calculated as P(A∩B)/P(B).
Mutually exclusive events — events that cannot occur simultaneously; if A and B are mutually exclusive, then P(A∩B) = 0.
Independent events — events where the occurrence of one does not affect the probability of the other; if A and B are independent, then P(A∩B) = P(A) × P(B).
Quartiles — values that divide an ordered data set into four equal parts: Q₁ (lower quartile), Q₂ (median), and Q₃ (upper quartile).
Interquartile range (IQR) — a measure of spread calculated as Q₃ - Q₁, representing the range of the middle 50% of the data.
Core concepts
Measures of central tendency
The mean is the most commonly used measure of central tendency. For ungrouped data with n values:
x̄ = (Σx)/n
For grouped data with class frequencies:
x̄ = (Σfx)/Σf
where x represents the class midpoint and f represents the frequency.
The median is the middle value when data is arranged in order. For n values:
- If n is odd: median is the ((n+1)/2)th value
- If n is even: median is the average of the (n/2)th and (n/2 + 1)th values
For grouped data, use the formula:
Median = L + ((n/2 - F)/f) × c
where L is the lower boundary of the median class, F is the cumulative frequency before the median class, f is the frequency of the median class, and c is the class width.
The mode is the most frequently occurring value. In grouped data, the modal class has the highest frequency.
Measures of dispersion
Range is the simplest measure: Range = highest value - lowest value
Variance (σ²) measures average squared deviation from the mean:
σ² = (Σ(x - x̄)²)/n or the computational formula: σ² = (Σx²)/n - x̄²
For grouped data: σ² = (Σfx²)/(Σf) - x̄²
Standard deviation is the square root of variance:
σ = √variance
A larger standard deviation indicates greater spread in the data.
The interquartile range (IQR) is resistant to outliers:
IQR = Q₃ - Q₁
To find quartiles in ungrouped data:
- Q₁ position: (n+1)/4
- Q₃ position: 3(n+1)/4
For grouped data, use similar interpolation as for the median.
Probability fundamentals
Basic probability rules:
Addition rule for mutually exclusive events: P(A or B) = P(A) + P(B)
General addition rule: P(A∪B) = P(A) + P(B) - P(A∩B)
Multiplication rule for independent events: P(A and B) = P(A) × P(B)
Complement rule: P(A') = 1 - P(A)
where A' represents "not A".
For Caribbean contexts, probability questions often involve:
- Tourist arrival statistics for Caribbean islands
- Agricultural crop yields in different weather conditions
- Manufacturing defect rates in regional industries
- Sports team performance in CPL cricket or Caribbean football
Conditional probability
Conditional probability represents the probability of an event given that another event has occurred:
P(A|B) = P(A∩B)/P(B)
Rearranging gives: P(A∩B) = P(A|B) × P(B)
Tree diagrams effectively represent conditional probability problems with sequential events. Each branch represents a possible outcome, with probabilities marked on branches. Multiply along branches for combined probabilities; add across branches for alternative outcomes.
For independent events: P(A|B) = P(A), meaning B's occurrence doesn't affect A's probability.
Probability distributions
The expected value (E(X) or μ) represents the mean outcome for a probability distribution:
E(X) = Σ[x × P(X = x)]
For a discrete random variable with outcomes x₁, x₂, ..., xₙ and corresponding probabilities p₁, p₂, ..., pₙ.
Variance of a probability distribution:
Var(X) = E(X²) - [E(X)]²
where E(X²) = Σ[x² × P(X = x)]
These concepts apply to scenarios such as:
- Number of hurricanes affecting a Caribbean island per season
- Daily sales volume at a Kingston market stall
- Number of successful fishing trips per week
Permutations and combinations
Permutations (arrangements where order matters):
ⁿPᵣ = n!/(n-r)!
where n! = n × (n-1) × (n-2) × ... × 2 × 1
Combinations (selections where order doesn't matter):
ⁿCᵣ = n!/(r!(n-r)!)
Common applications include:
- Selecting cricket teams from available players
- Arranging students for a school photograph
- Creating committees from staff members
Worked examples
Example 1: Calculating mean and standard deviation
The daily rainfall (in mm) recorded at a Barbados weather station over 10 days was: 12, 8, 15, 0, 23, 10, 5, 18, 12, 7
(a) Calculate the mean rainfall. (b) Calculate the standard deviation, correct to 2 decimal places.
Solution:
(a) Mean = Σx/n = (12+8+15+0+23+10+5+18+12+7)/10 = 110/10 = 11 mm
(b) First calculate Σx²: 12² + 8² + 15² + 0² + 23² + 10² + 5² + 18² + 12² + 7² = 144 + 64 + 225 + 0 + 529 + 100 + 25 + 324 + 144 + 49 = 1604
Variance = Σx²/n - (x̄)² = 1604/10 - (11)² = 160.4 - 121 = 39.4
Standard deviation = √39.4 = 6.28 mm (to 2 d.p.)
Example 2: Conditional probability with tree diagram
At a Port of Spain manufacturing plant, Machine A produces 60% of items and Machine B produces 40%. Machine A produces 5% defective items, while Machine B produces 8% defective items.
(a) Draw a tree diagram to represent this situation. (b) Find the probability that a randomly selected item is defective. (c) Given that an item is defective, find the probability it came from Machine A.
Solution:
(a) Tree diagram structure:
0.95 — Non-defective
0.6 —
Machine A 0.05 — Defective
0.92 — Non-defective
0.4 —
Machine B 0.08 — Defective
(b) P(Defective) = P(A and D) + P(B and D) = (0.6 × 0.05) + (0.4 × 0.08) = 0.03 + 0.032 = 0.062 or 6.2%
(c) P(A|D) = P(A∩D)/P(D) = 0.03/0.062 = 0.484 or 48.4% (to 3 s.f.)
Example 3: Grouped data statistics
The table shows the ages of 50 passengers on a Caribbean Airlines flight:
| Age (years) | 0-9 | 10-19 | 20-29 | 30-39 | 40-49 |
|---|---|---|---|---|---|
| Frequency | 6 | 12 | 18 | 10 | 4 |
(a) Calculate an estimate of the mean age. (b) Identify the modal class. (c) Calculate an estimate of the median age.
Solution:
(a) Create calculation table:
| Age | Midpoint (x) | Frequency (f) | fx | Cumulative f |
|---|---|---|---|---|
| 0-9 | 4.5 | 6 | 27 | 6 |
| 10-19 | 14.5 | 12 | 174 | 18 |
| 20-29 | 24.5 | 18 | 441 | 36 |
| 30-39 | 34.5 | 10 | 345 | 46 |
| 40-49 | 44.5 | 4 | 178 | 50 |
Mean = Σfx/Σf = 1165/50 = 23.3 years
(b) Modal class is 20-29 years (highest frequency = 18)
(c) n/2 = 50/2 = 25th value This falls in the 20-29 class (cumulative frequency 18 to 36)
Median = L + ((n/2 - F)/f) × c = 20 + ((25 - 18)/18) × 10 = 20 + (7/18) × 10 = 20 + 3.89 = 23.9 years
Common mistakes and how to avoid them
Confusing mean and median formulas for grouped data — always identify whether you're working with raw data or grouped frequency distributions. For grouped data, you must use class midpoints and the appropriate interpolation formulas.
Forgetting to square root when calculating standard deviation — variance and standard deviation are related but different. Remember: σ = √(variance). Write both steps clearly in examinations.
Misapplying probability rules to dependent events — do not use P(A∩B) = P(A) × P(B) unless events are explicitly independent. Use conditional probability P(A|B) when one event affects another.
Arithmetic errors with tree diagrams — always multiply along branches for "and" situations, and add across different paths for "or" situations. Label all branches clearly and verify probabilities sum to 1 at each branch point.
Incorrect cumulative frequency for median calculations — ensure you correctly identify which class contains the n/2 position and use the cumulative frequency before that class (F) in the formula, not the cumulative frequency of the class itself.
Mixing up permutations and combinations — if order matters (arrangements, races, passwords), use ⁿPᵣ. If order doesn't matter (selections, committees, groups), use ⁿCᵣ. Read questions carefully for keywords like "arrange" versus "select."
Exam technique for Statistics and Probability
Command word precision — "Calculate" requires showing working and exact numerical answers. "Estimate" (for grouped data) acknowledges you're using class midpoints. "Hence" means use your previous answer directly; marks are lost if you don't reference it.
Show all intermediate steps — CXC awards method marks even if final answers are incorrect. For standard deviation, write the variance calculation before taking the square root. For probability, show the multiplication/addition of individual probabilities before the final answer.
Unit consistency — always include appropriate units (mm, kg, years) with measures of central tendency and dispersion. Probability has no units but should be expressed as decimals or simplified fractions unless percentages are specifically requested.
Table organization for grouped data — examiners expect systematic working. Create clear columns for x (midpoint), f (frequency), fx, x², fx², and cumulative frequency. This structured approach minimizes errors and maximizes method marks.
Quick revision summary
Statistics involves calculating measures of central tendency (mean, median, mode) and dispersion (range, standard deviation, IQR) for both ungrouped and grouped data. For grouped data, use class midpoints and interpolation formulas. Probability quantifies likelihood from 0 to 1, with fundamental rules for mutually exclusive and independent events. Conditional probability P(A|B) represents situations where one event affects another; tree diagrams organize sequential probability problems effectively. Permutations count arrangements where order matters; combinations count selections where order doesn't matter. Always show complete working, include units, and apply formulas systematically for maximum marks.