Programming concepts: string handling operations (length, substring, concatenation, case conversion, character conversion) — CIE IGCSE Computer Science Revision Notes

What you'll learn

String handling operations are fundamental programming techniques that allow you to manipulate text data in your programs. This revision guide covers the five essential string operations tested in CIE IGCSE Computer Science: finding string length, extracting substrings, concatenating strings, converting case, and converting between characters and their numeric codes. You'll learn how to apply these operations in pseudocode and Python, with authentic exam-style examples.

Key terms and definitions

String — a data type that stores a sequence of characters, enclosed in quotation marks (e.g., "Hello" or "Kingston")

Concatenation — the operation of joining two or more strings together to form a single string

Substring — a portion or segment of a string, extracted from a specific position for a specified length

Length — the number of characters contained within a string, including spaces and punctuation

Case conversion — the process of changing all characters in a string to either uppercase or lowercase

Character code — the numeric value that represents a character in a character encoding system (e.g., ASCII or Unicode)

Index — the position of a character within a string, typically starting from 0 in most programming languages

Immutable — a property of strings in some languages where the original string cannot be changed; operations create new strings instead

Core concepts

String length operations

The length operation determines how many characters exist in a string. This is essential for validation, loop control, and data processing tasks.

In pseudocode (CIE format):

MyString <- "Nassau"
StringLength <- LENGTH(MyString)

The variable StringLength would store the value 6.

In Python:

my_string = "Nassau"
string_length = len(my_string)

Key points about length operations:

Spaces count as characters (e.g., "New York" has length 8)
Empty strings have length 0
Special characters and punctuation count towards length
Length is always a non-negative integer
Useful for input validation (e.g., checking password requirements)

Practical applications:

Validating that a username meets minimum length requirements
Checking if user input is within acceptable limits
Creating loops that process each character in a string
Determining if a text field has been left empty

Substring extraction

Substring operations extract a portion of a string based on starting position and length. The CIE specification requires understanding of substring extraction using position and length parameters.

In pseudocode (CIE format):

MyString <- "Barbados"
Extract <- SUBSTRING(MyString, 4, 3)

The variable Extract would contain "bad" (starting at position 4, taking 3 characters).

Important indexing rules:

CIE pseudocode uses 1-based indexing (first character is at position 1)
Python uses 0-based indexing (first character is at position 0)
The length parameter specifies how many characters to extract
Attempting to extract beyond the string's end may cause errors or return available characters

In Python (0-based indexing):

my_string = "Barbados"
extract = my_string[3:6]  # Characters from index 3 up to (but not including) 6

Common uses:

Extracting area codes from phone numbers
Separating day, month, and year from date strings
Processing fixed-width data files
Removing file extensions from filenames

String concatenation

Concatenation joins multiple strings into a single string. This operation is fundamental for building output messages, combining user input, and constructing complex strings.

In pseudocode (CIE format):

FirstName <- "Marcus"
LastName <- "Garvey"
FullName <- FirstName & " " & LastName

The & operator performs concatenation. FullName becomes "Marcus Garvey".

In Python:

first_name = "Marcus"
last_name = "Garvey"
full_name = first_name + " " + last_name

Key principles:

Concatenation does not automatically add spaces; you must include them explicitly
You can concatenate any number of strings in a single operation
Strings remain unchanged (immutable); concatenation creates a new string
Cannot directly concatenate strings with numbers without conversion

Concatenating with non-string data:

When combining strings with numbers, conversion is necessary:

Pseudocode:

Age <- 16
Message <- "You are " & STRING(Age) & " years old"

Python:

age = 16
message = "You are " + str(age) + " years old"

Practical examples:

Creating personalised welcome messages
Building file paths by combining directory and filename
Constructing SQL queries or URLs
Formatting output displays

Case conversion operations

Case conversion changes all alphabetic characters in a string to either uppercase or lowercase. Non-alphabetic characters remain unchanged.

In pseudocode (CIE format):

CityName <- "Port of Spain"
UpperCase <- UCASE(CityName)
LowerCase <- LCASE(CityName)

UpperCase becomes "PORT OF SPAIN" and LowerCase becomes "port of spain".

In Python:

city_name = "Port of Spain"
upper_case = city_name.upper()
lower_case = city_name.lower()

Applications in programming:

Case-insensitive comparisons: Converting both strings to the same case before comparing prevents "London" and "london" being treated as different
Data standardisation: Ensuring consistent storage format in databases
Input validation: Accepting commands regardless of how users type them
Password systems: Some systems convert usernames to lowercase to prevent duplicate accounts

Example of case-insensitive comparison:

Pseudocode:

UserInput <- INPUT("Enter your city: ")
IF UCASE(UserInput) = "LONDON" THEN
    OUTPUT "Welcome to London!"
ENDIF

This accepts "London", "london", "LONDON", or any other case variation.

Character-to-ASCII conversion

Character code conversion translates between individual characters and their numeric representations in the ASCII or Unicode system. This allows programmers to perform numeric operations on text data.

The two key functions:

ASC() or ORD() — converts a character to its numeric code
CHR() — converts a numeric code to its character

In pseudocode (CIE format):

Letter <- "A"
Code <- ASC(Letter)
// Code = 65

NewLetter <- CHR(66)
// NewLetter = "B"

In Python:

letter = "A"
code = ord(letter)
# code = 65

new_letter = chr(66)
# new_letter = "B"

Important ASCII values to remember:

Uppercase letters: A=65, B=66, C=67, ..., Z=90
Lowercase letters: a=97, b=98, c=99, ..., z=122
Digits: 0=48, 1=49, 2=50, ..., 9=57
Space character = 32

The relationship between cases:

The difference between any uppercase letter and its lowercase equivalent is always 32:

'A' (65) and 'a' (97): difference of 32
'M' (77) and 'm' (109): difference of 32

Practical applications:

Creating simple encryption algorithms (Caesar cipher)
Validating that input contains only letters or only digits
Sorting characters in a specific order
Converting between uppercase and lowercase manually
Checking if a character is alphabetic or numeric

Example: Caesar cipher encryption:

Pseudocode:

PlainChar <- "D"
Shift <- 3
Code <- ASC(PlainChar)
EncryptedCode <- Code + Shift
EncryptedChar <- CHR(EncryptedCode)
// EncryptedChar = "G"

Worked examples

Example 1: Email validation (6 marks)

Question: A program needs to validate email addresses. Write an algorithm in pseudocode that:

Takes an email address as input
Checks that the email contains an "@" symbol
Extracts and displays the domain name (everything after the "@")
Converts the domain to uppercase before displaying

Answer:

Email <- INPUT("Enter email address: ")
AtPosition <- 0
Found <- FALSE

// Find position of @ symbol
FOR Counter <- 1 TO LENGTH(Email)
    IF SUBSTRING(Email, Counter, 1) = "@" THEN
        AtPosition <- Counter
        Found <- TRUE
    ENDIF
NEXT Counter

IF Found = TRUE THEN
    DomainLength <- LENGTH(Email) - AtPosition
    Domain <- SUBSTRING(Email, AtPosition + 1, DomainLength)
    Domain <- UCASE(Domain)
    OUTPUT "Domain: ", Domain
ELSE
    OUTPUT "Invalid email - no @ symbol found"
ENDIF

Mark scheme notes:

1 mark: Correct use of LENGTH() function
1 mark: Loop to find "@" position
1 mark: Correct IF condition to identify "@"
1 mark: Correct calculation of domain length
1 mark: Correct SUBSTRING extraction (position and length)
1 mark: UCASE() function applied to domain

Example 2: Password strength checker (5 marks)

Question: Write a Python program that checks if a password meets these criteria:

At least 8 characters long
Contains at least one digit
Display appropriate messages for pass/fail

Answer:

password = input("Enter password: ")
valid = True

# Check length
if len(password) < 8:
    print("Password too short - minimum 8 characters")
    valid = False

# Check for digit
has_digit = False
for char in password:
    if ord(char) >= 48 and ord(char) <= 57:
        has_digit = True

if not has_digit:
    print("Password must contain at least one digit")
    valid = False

if valid:
    print("Password accepted")

Mark scheme notes:

1 mark: Correct use of len() function with comparison
1 mark: Loop through each character
1 mark: Correct use of ord() with appropriate numeric range (48-57)
1 mark: Boolean flag to track digit presence
1 mark: Appropriate output messages based on validation

Example 3: Name formatting (4 marks)

Question: A database stores names in the format "SURNAME, FirstName". Write pseudocode to:

Take this format as input
Output the name as "FirstName Surname" (first name in lowercase, surname in uppercase)

Answer:

FullName <- "JOHNSON, Robert"
CommaPosition <- 0

// Find comma position
FOR Position <- 1 TO LENGTH(FullName)
    IF SUBSTRING(FullName, Position, 1) = "," THEN
        CommaPosition <- Position
    ENDIF
NEXT Position

Surname <- SUBSTRING(FullName, 1, CommaPosition - 1)
FirstNameLength <- LENGTH(FullName) - CommaPosition - 1
FirstName <- SUBSTRING(FullName, CommaPosition + 2, FirstNameLength)

FirstName <- LCASE(FirstName)
OUTPUT FirstName & " " & Surname

Mark scheme notes:

1 mark: Finding comma position correctly
1 mark: Extracting surname with correct parameters
1 mark: Extracting first name (accounting for space after comma)
1 mark: Case conversion and concatenation with space

Common mistakes and how to avoid them

Confusing 0-based and 1-based indexing: CIE pseudocode uses 1-based indexing (first character is position 1), while Python uses 0-based (first character is position 0). Always check which system you're using and adjust accordingly.
Forgetting to include spaces in concatenation: When joining strings like first name and surname, students often write FirstName & LastName which produces "JohnSmith" instead of "John Smith". Always include & " " & to add the space explicitly.
Attempting to concatenate strings and numbers directly: You cannot use "Age: " & 16 in most systems. Convert the number first: "Age: " & STRING(16) in pseudocode or "Age: " + str(16) in Python.
Extracting substrings beyond string boundaries: If a string has 8 characters, attempting SUBSTRING(MyString, 6, 5) will cause errors. Always ensure starting position plus length doesn't exceed the string length.
Assuming case conversion affects numbers or symbols: UCASE("abc123!@") produces "ABC123!@" — only letters change case. Numbers and special characters remain unchanged.
Using the wrong function names: CIE pseudocode uses specific function names: LENGTH(), SUBSTRING(), UCASE(), LCASE(), ASC(), CHR(). Python equivalents are different: len(), slice notation, .upper(), .lower(), ord(), chr(). Learn both systems separately.

Exam technique for string handling operations

For "Write" or "Complete" questions: Examiners expect precise function names from the specification. In pseudocode, use LENGTH(), SUBSTRING(), UCASE(), LCASE(), ASC(), CHR() — not variations or Python syntax. Marks are often awarded for correct function selection (1 mark) and correct parameter usage (1 mark).
Show your working with substring operations: When extracting substrings, examiners want to see you've correctly identified both the starting position AND the length parameter. If a question asks you to explain, write "Starting at position X, extracting Y characters" to demonstrate understanding.
For trace table questions: When string operations appear in trace tables, carefully track the contents of string variables after each operation. Write out the full string value, not just "changed" or "updated". Each correct value typically earns 1 mark.
Algorithm questions requiring validation: When string operations are part of validation algorithms, structure your answer with clear conditional statements showing what happens for both valid and invalid inputs. Use appropriate error messages that specify the problem (e.g., "Password must be at least 8 characters" rather than just "Invalid").

Quick revision summary

String handling operations manipulate text data in five key ways: LENGTH() counts characters; SUBSTRING() extracts portions using position and length; concatenation (&) joins strings; UCASE()/LCASE() convert case; ASC()/CHR() convert between characters and numeric codes. Remember CIE pseudocode uses 1-based indexing and specific function names. Always include spaces explicitly in concatenation, convert numbers before joining them to strings, and ensure substring extractions stay within string boundaries. These operations are essential for validation, data processing, and formatting output in programming tasks.

What you'll learn

Key terms and definitions

Core concepts

String length operations

Substring extraction

String concatenation

Case conversion operations

Character-to-ASCII conversion

Worked examples

Example 1: Email validation (6 marks)

Example 2: Password strength checker (5 marks)

Example 3: Name formatting (4 marks)

Common mistakes and how to avoid them

Exam technique for string handling operations

Quick revision summary

Lock in Programming concepts: string handling operations (length, substring, concatenation, case conversion, character conversion) with real exam questions.