What you'll learn
This topic covers scatter graphs — plotting two variables together, drawing a line of best fit, and describing correlation. In this guide you will learn how to plot a scatter graph, how to recognise the type and strength of correlation, how to draw and use a line of best fit, and the meaning of interpolation, extrapolation and outliers. These are core data-handling skills.
Key terms and definitions
Scatter graph — a graph plotting paired values of two variables.
Correlation — the relationship between the two variables.
Line of best fit — a straight line through the trend of the points.
Interpolation — predicting within the range of the data.
Outlier — a point that does not fit the general pattern.
Core concepts
Plotting and reading correlation
A scatter graph plots pairs of values as points. The pattern shows the correlation: positive (both increase together, points rising), negative (one increases as the other decreases, points falling) or none (no clear pattern). Describe both the type and strength (strong or weak).
Strength of correlation
The closer the points lie to a straight line, the stronger the correlation. Widely scattered points show weak correlation. Always describe strength in words alongside the direction.
Drawing a line of best fit
A line of best fit is a single straight line following the trend, with roughly equal numbers of points each side. It should pass through the bulk of the data (often near the mean point) and need not pass through the origin. Ignore outliers when drawing it.
Using the line to predict
Use the line to predict a value: read across from a known value to the line, then to the other axis. Interpolation (predicting within the data range) is reliable; extrapolation (predicting beyond the range) is unreliable because the trend may not continue.
Correlation and causation, and outliers
Correlation does not mean causation — a relationship does not prove one variable causes the other. Outliers are points away from the trend; identify them and exclude them from the line of best fit, but mention them.
Worked examples
Example 1: Type of correlation
As temperature rises, ice-cream sales rise. What correlation is this?
Positive correlation.
Example 2: Prediction
A line of best fit gives sales of 50 at temperature 20°C. How is this read?
Go up from 20°C to the line, then across to 50 sales (interpolation).
Example 3: Reliability
Why might predicting sales at 45°C be unreliable?
It is extrapolation beyond the data, where the trend may not hold.
Common mistakes and how to avoid them
Forcing the line through the origin. It should follow the data, not the origin.
Including outliers in the line. Ignore them when drawing the best fit.
Extrapolating confidently. Predictions beyond the data are unreliable.
Assuming causation. Correlation alone does not prove cause.
Vague descriptions. State both direction and strength.
Exam technique for Scatter Graphs
Plot points accurately and describe the correlation's type and strength.
Draw a line of best fit with points balanced each side, ignoring outliers.
Read predictions off the line, distinguishing interpolation from extrapolation.
Note that correlation is not causation.
Identify and comment on outliers.
Quick revision summary
A scatter graph plots pairs of values to show correlation: positive (rising together), negative (one up, one down) or none, and you should state both type and strength — the closer to a straight line, the stronger. A line of best fit is a straight line following the trend with roughly equal points each side, passing through the bulk of the data (not forced through the origin) and ignoring outliers. Use it to predict, distinguishing interpolation (within the data, reliable) from extrapolation (beyond the data, unreliable). Remember that correlation does not prove causation, and that outliers should be excluded from the line but noted. The common errors are forcing the line through the origin, including outliers, extrapolating with confidence, and assuming causation. Plot accurately, describe direction and strength, draw a balanced line of best fit, predict by interpolation, and stay cautious about causation and extrapolation.