Regular Expressions Demystified with the re
Module
Regular expressions (regex) are powerful for searching, matching, and manipulating text. Python’s re
module makes it easy to use regex in your scripts. This tutorial will walk you through the basics and practical applications.
What Are Regular Expressions?
A regular expression is a sequence of characters that defines a search pattern. It can be used for string matching, finding patterns, and even text replacements.
Basics of the re
Module
Common Functions in the re
Module
- Searching for Patterns:
import re pattern = r"\d+" # Matches one or more digits match = re.search(pattern, "The order number is 12345") if match: print("Found match:", match.group())
- Finding All Matches:
result = re.findall(r"\w+", "Hello, world! Python is great.") print("All words:", result)
- Replacing Substrings:
modified_text = re.sub(r"Python", "JavaScript", "I love Python programming.") print(modified_text) # Output: I love JavaScript programming.
Using Special Characters and Patterns
- Anchors:
^
(start of string),$
(end of string) - Quantifiers:
*
(0 or more),+
(1 or more),?
(0 or 1),{n}
(exactly n times) - Character Classes:
[a-z]
,\d
(digit),\w
(word character)
Example: Validating Email Addresses
email_pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
email = "[email protected]"
if re.match(email_pattern, email):
print("Valid email")
else:
print("Invalid email")
Advanced Techniques
Compiling Patterns
For better performance when using the same pattern multiple times:
compiled_pattern = re.compile(r"\d+")
matches = compiled_pattern.findall("Call me at 123-456-7890 or 987-654-3210")
print(matches)
Common Pitfalls
- Greedy vs. Non-Greedy Matching:
text = "<tag>content</tag>" # Greedy match print(re.findall(r"<.*>", text)) # Output: ['<tag>content</tag>'] # Non-greedy match print(re.findall(r"<.*?>", text)) # Output: ['<tag>']
Best Practices
- Use raw strings (
r"pattern"
) to avoid issues with escape characters. - Break down complex patterns and test them incrementally.