Data representation: data storage units and file sizes — CIE IGCSE Computer Science Revision Notes

What you'll learn

This revision guide covers how computers store data and the units used to measure storage capacity and file sizes. You'll learn to convert between different storage units, calculate file sizes for various media types, and understand why file compression matters. These topics appear regularly in Paper 1 and Paper 2 of the CIE IGCSE Computer Science examination.

Key terms and definitions

Bit — the smallest unit of data in a computer, representing a single binary digit (0 or 1)

Byte — a group of 8 bits, the standard unit for measuring file sizes and storage capacity

Nibble — a group of 4 bits, equivalent to one hexadecimal digit

Kilobyte (KB) — 1,000 bytes in decimal notation; used to measure small file sizes

Megabyte (MB) — 1,000 kilobytes or 1,000,000 bytes; typical size for images and documents

Gigabyte (GB) — 1,000 megabytes or 1,000,000,000 bytes; typical size for videos and storage devices

Terabyte (TB) — 1,000 gigabytes or 1,000,000,000,000 bytes; used for large storage systems and data centres

Binary notation — a numbering system using powers of 2 (1024) for storage calculations, where 1 KiB = 1024 bytes

Core concepts

Bits and bytes: the foundation of data storage

All data stored in computers exists as binary digits. A bit can hold one of two values: 0 or 1. This binary system forms the foundation of all digital computing.

A byte consists of 8 bits and serves as the standard unit for measuring data:

1 byte can represent 256 different values (2⁸)
1 byte can store one character of text in ASCII encoding
File sizes are always measured in bytes or multiples of bytes

A nibble contains 4 bits and has specific uses:

Represents one hexadecimal digit (0-9, A-F)
Used in low-level programming and data encoding
2 nibbles = 1 byte

Storage unit conversions: decimal vs binary

The CIE IGCSE specification requires understanding of two systems for measuring storage:

Decimal notation (base-10):

Uses multiples of 1,000
1 kilobyte (KB) = 1,000 bytes
1 megabyte (MB) = 1,000 KB = 1,000,000 bytes
1 gigabyte (GB) = 1,000 MB = 1,000,000,000 bytes
1 terabyte (TB) = 1,000 GB = 1,000,000,000,000 bytes

This system is commonly used by manufacturers for hard drives and storage devices. A "500 GB" external drive contains approximately 500 billion bytes.

Binary notation (base-2):

Uses powers of 2 (multiples of 1,024)
1 kibibyte (KiB) = 1,024 bytes
1 mebibyte (MiB) = 1,024 KiB = 1,048,576 bytes
1 gibibyte (GiB) = 1,024 MiB = 1,073,741,824 bytes
1 tebibyte (TiB) = 1,024 GiB = 1,099,511,627,776 bytes

Operating systems typically use binary notation when displaying file sizes. This creates a discrepancy: a "500 GB" hard drive appears as approximately 465 GiB in Windows.

For IGCSE examinations:

Questions will specify which system to use
Decimal (1,000) is more common in recent papers
Always check the question carefully
Show your working to demonstrate understanding

Common conversion calculations

Converting larger units to smaller units:

Multiply by the conversion factor for each step:

To convert GB to MB: multiply by 1,000
To convert MB to KB: multiply by 1,000
To convert KB to bytes: multiply by 1,000

Example: 2.5 GB to bytes 2.5 × 1,000 = 2,500 MB 2,500 × 1,000 = 2,500,000 KB 2,500,000 × 1,000 = 2,500,000,000 bytes

Converting smaller units to larger units:

Divide by the conversion factor for each step:

To convert bytes to KB: divide by 1,000
To convert KB to MB: divide by 1,000
To convert MB to GB: divide by 1,000

Example: 4,500,000 bytes to MB 4,500,000 ÷ 1,000 = 4,500 KB 4,500 ÷ 1,000 = 4.5 MB

Calculating file sizes for different media types

Understanding how to calculate file sizes is essential for Paper 1 theory questions and Paper 2 problem-solving tasks.

Text files:

File size (bytes) = Number of characters × Bytes per character

For standard ASCII text:

1 character = 1 byte
A document with 5,000 characters = 5,000 bytes = 5 KB

For Unicode text (UTF-8):

1 character = typically 1-4 bytes
Extended characters require more bytes
Most common characters = 1 byte

Image files (bitmap/uncompressed):

File size (bytes) = Width (pixels) × Height (pixels) × Colour depth (bits) ÷ 8

The colour depth determines how many bits represent each pixel:

1-bit colour = 2 colours (black and white)
8-bit colour = 256 colours
24-bit colour = 16,777,216 colours (true colour)
32-bit colour = 24-bit plus 8-bit transparency (alpha channel)

Example: A 1920 × 1080 pixel image with 24-bit colour depth 1920 × 1080 × 24 ÷ 8 = 6,220,800 bytes = 6.22 MB (approximately)

Sound files (uncompressed):

File size (bytes) = Sample rate (Hz) × Duration (seconds) × Bit depth (bits) ÷ 8 × Number of channels

Key factors:

Sample rate: how many samples per second (typically 44,100 Hz for CD quality)
Bit depth: bits per sample (typically 16-bit)
Channels: 1 for mono, 2 for stereo

Example: A 3-minute stereo recording at CD quality 44,100 × 180 × 16 ÷ 8 × 2 = 31,752,000 bytes ≈ 31.75 MB

Video files (uncompressed):

File size (bytes) = Image file size × Frame rate (fps) × Duration (seconds)

This calculation combines image and time dimensions:

Frame rate: typically 24, 30, or 60 frames per second
Each frame is effectively a still image

Example: 10 seconds of 1920 × 1080, 24-bit colour video at 30 fps

Per frame: 1920 × 1080 × 24 ÷ 8 = 6,220,800 bytes
Total: 6,220,800 × 30 × 10 = 1,866,240,000 bytes ≈ 1.87 GB

File compression and storage efficiency

Compression reduces file size to save storage space and reduce transmission time. The CIE specification requires understanding of two types:

Lossy compression:

Permanently removes data that humans are less likely to notice
Cannot restore the original file exactly
Achieves high compression ratios
Used for JPEG images, MP3 audio, MP4 video
Acceptable for multimedia where perfect accuracy isn't essential

Lossless compression:

Reduces file size without losing any data
Original file can be perfectly restored
Lower compression ratios than lossy methods
Used for ZIP archives, PNG images, FLAC audio
Essential for text documents, programs, and data files

Factors affecting file size:

For images:

Resolution (pixel dimensions)
Colour depth
Complexity of the image content
Compression algorithm and quality settings

For audio:

Sample rate
Bit depth
Number of channels
Duration
Compression codec

For video:

All image factors above
Frame rate
Duration
Compression codec and bitrate

Storage capacity and practical applications

Understanding storage units helps in real-world decision-making:

Typical file sizes:

Plain text email: 2-10 KB
Word document with images: 100 KB - 5 MB
Digital photo from smartphone: 2-5 MB
MP3 song: 3-10 MB
HD movie (compressed): 4-8 GB
4K movie (compressed): 15-25 GB

Storage device capacities:

USB flash drive: 8-256 GB
Smartphone: 64-512 GB
Laptop hard drive: 256 GB - 2 TB
External hard drive: 1-5 TB
Cloud storage services: 5 GB - unlimited

Calculating storage requirements:

Estimate how many files fit on a storage device: Number of files = Storage capacity ÷ File size

Example: How many 4 MB photos fit on a 64 GB memory card? 64 GB = 64,000 MB 64,000 ÷ 4 = 16,000 photos

Worked examples

Example 1: Unit conversion (2 marks)

Question: Convert 3.2 GB to KB, showing your working.

Solution: 3.2 GB = 3.2 × 1,000 = 3,200 MB [1 mark] 3,200 MB = 3,200 × 1,000 = 3,200,000 KB [1 mark]

Examiner notes: Award 1 mark for correct intermediate conversion to MB, 1 mark for final answer with correct unit. Accept alternative single-step calculation: 3.2 × 1,000,000 = 3,200,000 KB.

Example 2: Image file size calculation (4 marks)

Question: A digital camera takes photographs with dimensions 4000 × 3000 pixels. Each pixel is stored using 24 bits. Calculate the file size of one uncompressed photograph in megabytes.

Solution: Number of pixels = 4000 × 3000 = 12,000,000 pixels [1 mark] Total bits = 12,000,000 × 24 = 288,000,000 bits [1 mark] Total bytes = 288,000,000 ÷ 8 = 36,000,000 bytes [1 mark] File size = 36,000,000 ÷ 1,000,000 = 36 MB [1 mark]

Examiner notes: Award marks for each correct calculation step. Common error: forgetting to divide by 8 to convert bits to bytes. Final answer must include correct unit (MB).

Example 3: Sound file calculation and storage capacity (5 marks)

Question: A music streaming service stores songs as uncompressed audio files. Each song is recorded in stereo, with a sample rate of 48,000 Hz and a bit depth of 16 bits.

(a) Calculate the file size in megabytes for a 4-minute song. [3] (b) How many complete songs can be stored on a 500 GB server? [2]

Solution: (a) Duration = 4 × 60 = 240 seconds [1 mark] File size = 48,000 × 240 × 16 ÷ 8 × 2 = 92,160,000 bytes [1 mark] = 92.16 MB [1 mark]

(b) 500 GB = 500,000 MB [1 mark] Number of songs = 500,000 ÷ 92.16 = 5,425 songs (complete songs only) [1 mark]

Examiner notes: For part (a), accept answers between 92.15 and 92.17 MB allowing for rounding. For part (b), answer must be a whole number (round down) as question asks for "complete songs."

Common mistakes and how to avoid them

Forgetting to convert bits to bytes: When calculating image or sound file sizes, the colour depth and bit depth are given in bits. Always divide by 8 to convert to bytes before converting to larger units.

Using the wrong conversion factor: Check whether the question specifies decimal (1,000) or binary (1,024) notation. Most recent CIE papers use decimal notation, but always verify from the question context.

Incorrect unit in final answer: Questions often ask for a specific unit (e.g., "give your answer in MB"). Ensure you convert to the requested unit and include it in your answer. Writing "36" instead of "36 MB" loses marks.

Multiplying when you should divide: When converting from smaller to larger units (e.g., bytes to KB), divide. When converting from larger to smaller units (e.g., GB to MB), multiply. A common error is doing the opposite.

Forgetting stereo channels: Sound file calculations require multiplying by 2 for stereo recordings. Questions may state "stereo" or "two channels" – both mean the same thing.

Rounding too early: Perform all calculations first, then round the final answer to an appropriate number of decimal places (usually 2 decimal places unless specified otherwise). Rounding intermediate values introduces errors.

Exam technique for "Data representation: data storage units and file sizes"

Show detailed working for calculation questions: Even if you make an arithmetic error, you can earn method marks by demonstrating the correct approach. Write out each step clearly on separate lines.

Identify command words carefully: "Calculate" requires numerical working and a final answer with units. "State" needs a brief answer without explanation. "Describe" requires characteristics or features with some detail.

For multi-mark calculation questions: Typically 1 mark per calculation step plus 1 mark for the final answer with correct units. A 4-mark question usually requires 3-4 distinct calculation steps.

Check your calculator: Ensure you're comfortable with your calculator's operation, particularly the order of operations. Brackets are essential in complex calculations – use them to ensure correct sequencing.

Quick revision summary

Data storage begins with bits (0 or 1) and bytes (8 bits). Storage units increase by factors of 1,000 (decimal) or 1,024 (binary): KB, MB, GB, TB. Calculate file sizes by multiplying relevant factors – pixels × colour depth for images, sample rate × duration × bit depth × channels for sound, adding frame rate × duration for video. Always divide by 8 to convert bits to bytes. Lossy compression removes data permanently; lossless compression preserves all original data. Show all working in calculations and include correct units in answers.