What you'll learn
String handling operations are fundamental programming techniques that allow you to manipulate text data in your programs. This revision guide covers the five essential string operations tested in CIE IGCSE Computer Science: finding string length, extracting substrings, concatenating strings, converting case, and converting between characters and their numeric codes. You'll learn how to apply these operations in pseudocode and Python, with authentic exam-style examples.
Key terms and definitions
String — a data type that stores a sequence of characters, enclosed in quotation marks (e.g., "Hello" or "Kingston")
Concatenation — the operation of joining two or more strings together to form a single string
Substring — a portion or segment of a string, extracted from a specific position for a specified length
Length — the number of characters contained within a string, including spaces and punctuation
Case conversion — the process of changing all characters in a string to either uppercase or lowercase
Character code — the numeric value that represents a character in a character encoding system (e.g., ASCII or Unicode)
Index — the position of a character within a string, typically starting from 0 in most programming languages
Immutable — a property of strings in some languages where the original string cannot be changed; operations create new strings instead
Core concepts
String length operations
The length operation determines how many characters exist in a string. This is essential for validation, loop control, and data processing tasks.
In pseudocode (CIE format):
MyString <- "Nassau"
StringLength <- LENGTH(MyString)
The variable StringLength would store the value 6.
In Python:
my_string = "Nassau"
string_length = len(my_string)
Key points about length operations:
- Spaces count as characters (e.g., "New York" has length 8)
- Empty strings have length 0
- Special characters and punctuation count towards length
- Length is always a non-negative integer
- Useful for input validation (e.g., checking password requirements)
Practical applications:
- Validating that a username meets minimum length requirements
- Checking if user input is within acceptable limits
- Creating loops that process each character in a string
- Determining if a text field has been left empty
Substring extraction
Substring operations extract a portion of a string based on starting position and length. The CIE specification requires understanding of substring extraction using position and length parameters.
In pseudocode (CIE format):
MyString <- "Barbados"
Extract <- SUBSTRING(MyString, 4, 3)
The variable Extract would contain "bad" (starting at position 4, taking 3 characters).
Important indexing rules:
- CIE pseudocode uses 1-based indexing (first character is at position 1)
- Python uses 0-based indexing (first character is at position 0)
- The length parameter specifies how many characters to extract
- Attempting to extract beyond the string's end may cause errors or return available characters
In Python (0-based indexing):
my_string = "Barbados"
extract = my_string[3:6] # Characters from index 3 up to (but not including) 6
Common uses:
- Extracting area codes from phone numbers
- Separating day, month, and year from date strings
- Processing fixed-width data files
- Removing file extensions from filenames
String concatenation
Concatenation joins multiple strings into a single string. This operation is fundamental for building output messages, combining user input, and constructing complex strings.
In pseudocode (CIE format):
FirstName <- "Marcus"
LastName <- "Garvey"
FullName <- FirstName & " " & LastName
The & operator performs concatenation. FullName becomes "Marcus Garvey".
In Python:
first_name = "Marcus"
last_name = "Garvey"
full_name = first_name + " " + last_name
Key principles:
- Concatenation does not automatically add spaces; you must include them explicitly
- You can concatenate any number of strings in a single operation
- Strings remain unchanged (immutable); concatenation creates a new string
- Cannot directly concatenate strings with numbers without conversion
Concatenating with non-string data:
When combining strings with numbers, conversion is necessary:
Pseudocode:
Age <- 16
Message <- "You are " & STRING(Age) & " years old"
Python:
age = 16
message = "You are " + str(age) + " years old"
Practical examples:
- Creating personalised welcome messages
- Building file paths by combining directory and filename
- Constructing SQL queries or URLs
- Formatting output displays
Case conversion operations
Case conversion changes all alphabetic characters in a string to either uppercase or lowercase. Non-alphabetic characters remain unchanged.
In pseudocode (CIE format):
CityName <- "Port of Spain"
UpperCase <- UCASE(CityName)
LowerCase <- LCASE(CityName)
UpperCase becomes "PORT OF SPAIN" and LowerCase becomes "port of spain".
In Python:
city_name = "Port of Spain"
upper_case = city_name.upper()
lower_case = city_name.lower()
Applications in programming:
- Case-insensitive comparisons: Converting both strings to the same case before comparing prevents "London" and "london" being treated as different
- Data standardisation: Ensuring consistent storage format in databases
- Input validation: Accepting commands regardless of how users type them
- Password systems: Some systems convert usernames to lowercase to prevent duplicate accounts
Example of case-insensitive comparison:
Pseudocode:
UserInput <- INPUT("Enter your city: ")
IF UCASE(UserInput) = "LONDON" THEN
OUTPUT "Welcome to London!"
ENDIF
This accepts "London", "london", "LONDON", or any other case variation.
Character-to-ASCII conversion
Character code conversion translates between individual characters and their numeric representations in the ASCII or Unicode system. This allows programmers to perform numeric operations on text data.
The two key functions:
- ASC() or ORD() — converts a character to its numeric code
- CHR() — converts a numeric code to its character
In pseudocode (CIE format):
Letter <- "A"
Code <- ASC(Letter)
// Code = 65
NewLetter <- CHR(66)
// NewLetter = "B"
In Python:
letter = "A"
code = ord(letter)
# code = 65
new_letter = chr(66)
# new_letter = "B"
Important ASCII values to remember:
- Uppercase letters: A=65, B=66, C=67, ..., Z=90
- Lowercase letters: a=97, b=98, c=99, ..., z=122
- Digits: 0=48, 1=49, 2=50, ..., 9=57
- Space character = 32
The relationship between cases:
The difference between any uppercase letter and its lowercase equivalent is always 32:
- 'A' (65) and 'a' (97): difference of 32
- 'M' (77) and 'm' (109): difference of 32
Practical applications:
- Creating simple encryption algorithms (Caesar cipher)
- Validating that input contains only letters or only digits
- Sorting characters in a specific order
- Converting between uppercase and lowercase manually
- Checking if a character is alphabetic or numeric
Example: Caesar cipher encryption:
Pseudocode:
PlainChar <- "D"
Shift <- 3
Code <- ASC(PlainChar)
EncryptedCode <- Code + Shift
EncryptedChar <- CHR(EncryptedCode)
// EncryptedChar = "G"
Worked examples
Example 1: Email validation (6 marks)
Question: A program needs to validate email addresses. Write an algorithm in pseudocode that:
- Takes an email address as input
- Checks that the email contains an "@" symbol
- Extracts and displays the domain name (everything after the "@")
- Converts the domain to uppercase before displaying
Answer:
Email <- INPUT("Enter email address: ")
AtPosition <- 0
Found <- FALSE
// Find position of @ symbol
FOR Counter <- 1 TO LENGTH(Email)
IF SUBSTRING(Email, Counter, 1) = "@" THEN
AtPosition <- Counter
Found <- TRUE
ENDIF
NEXT Counter
IF Found = TRUE THEN
DomainLength <- LENGTH(Email) - AtPosition
Domain <- SUBSTRING(Email, AtPosition + 1, DomainLength)
Domain <- UCASE(Domain)
OUTPUT "Domain: ", Domain
ELSE
OUTPUT "Invalid email - no @ symbol found"
ENDIF
Mark scheme notes:
- 1 mark: Correct use of LENGTH() function
- 1 mark: Loop to find "@" position
- 1 mark: Correct IF condition to identify "@"
- 1 mark: Correct calculation of domain length
- 1 mark: Correct SUBSTRING extraction (position and length)
- 1 mark: UCASE() function applied to domain
Example 2: Password strength checker (5 marks)
Question: Write a Python program that checks if a password meets these criteria:
- At least 8 characters long
- Contains at least one digit
- Display appropriate messages for pass/fail
Answer:
password = input("Enter password: ")
valid = True
# Check length
if len(password) < 8:
print("Password too short - minimum 8 characters")
valid = False
# Check for digit
has_digit = False
for char in password:
if ord(char) >= 48 and ord(char) <= 57:
has_digit = True
if not has_digit:
print("Password must contain at least one digit")
valid = False
if valid:
print("Password accepted")
Mark scheme notes:
- 1 mark: Correct use of len() function with comparison
- 1 mark: Loop through each character
- 1 mark: Correct use of ord() with appropriate numeric range (48-57)
- 1 mark: Boolean flag to track digit presence
- 1 mark: Appropriate output messages based on validation
Example 3: Name formatting (4 marks)
Question: A database stores names in the format "SURNAME, FirstName". Write pseudocode to:
- Take this format as input
- Output the name as "FirstName Surname" (first name in lowercase, surname in uppercase)
Answer:
FullName <- "JOHNSON, Robert"
CommaPosition <- 0
// Find comma position
FOR Position <- 1 TO LENGTH(FullName)
IF SUBSTRING(FullName, Position, 1) = "," THEN
CommaPosition <- Position
ENDIF
NEXT Position
Surname <- SUBSTRING(FullName, 1, CommaPosition - 1)
FirstNameLength <- LENGTH(FullName) - CommaPosition - 1
FirstName <- SUBSTRING(FullName, CommaPosition + 2, FirstNameLength)
FirstName <- LCASE(FirstName)
OUTPUT FirstName & " " & Surname
Mark scheme notes:
- 1 mark: Finding comma position correctly
- 1 mark: Extracting surname with correct parameters
- 1 mark: Extracting first name (accounting for space after comma)
- 1 mark: Case conversion and concatenation with space
Common mistakes and how to avoid them
Confusing 0-based and 1-based indexing: CIE pseudocode uses 1-based indexing (first character is position 1), while Python uses 0-based (first character is position 0). Always check which system you're using and adjust accordingly.
Forgetting to include spaces in concatenation: When joining strings like first name and surname, students often write
FirstName & LastNamewhich produces "JohnSmith" instead of "John Smith". Always include& " " &to add the space explicitly.Attempting to concatenate strings and numbers directly: You cannot use
"Age: " & 16in most systems. Convert the number first:"Age: " & STRING(16)in pseudocode or"Age: " + str(16)in Python.Extracting substrings beyond string boundaries: If a string has 8 characters, attempting
SUBSTRING(MyString, 6, 5)will cause errors. Always ensure starting position plus length doesn't exceed the string length.Assuming case conversion affects numbers or symbols:
UCASE("abc123!@")produces "ABC123!@" — only letters change case. Numbers and special characters remain unchanged.Using the wrong function names: CIE pseudocode uses specific function names:
LENGTH(),SUBSTRING(),UCASE(),LCASE(),ASC(),CHR(). Python equivalents are different:len(), slice notation,.upper(),.lower(),ord(),chr(). Learn both systems separately.
Exam technique for string handling operations
For "Write" or "Complete" questions: Examiners expect precise function names from the specification. In pseudocode, use
LENGTH(),SUBSTRING(),UCASE(),LCASE(),ASC(),CHR()— not variations or Python syntax. Marks are often awarded for correct function selection (1 mark) and correct parameter usage (1 mark).Show your working with substring operations: When extracting substrings, examiners want to see you've correctly identified both the starting position AND the length parameter. If a question asks you to explain, write "Starting at position X, extracting Y characters" to demonstrate understanding.
For trace table questions: When string operations appear in trace tables, carefully track the contents of string variables after each operation. Write out the full string value, not just "changed" or "updated". Each correct value typically earns 1 mark.
Algorithm questions requiring validation: When string operations are part of validation algorithms, structure your answer with clear conditional statements showing what happens for both valid and invalid inputs. Use appropriate error messages that specify the problem (e.g., "Password must be at least 8 characters" rather than just "Invalid").
Quick revision summary
String handling operations manipulate text data in five key ways: LENGTH() counts characters; SUBSTRING() extracts portions using position and length; concatenation (&) joins strings; UCASE()/LCASE() convert case; ASC()/CHR() convert between characters and numeric codes. Remember CIE pseudocode uses 1-based indexing and specific function names. Always include spaces explicitly in concatenation, convert numbers before joining them to strings, and ensure substring extractions stay within string boundaries. These operations are essential for validation, data processing, and formatting output in programming tasks.