Regular Expressions Demystified with the `re` Module

Regular expressions (regex) are powerful for searching, matching, and manipulating text. Python’s re module makes it easy to use regex in your scripts. This tutorial will walk you through the basics and practical applications.

What Are Regular Expressions?

A regular expression is a sequence of characters that defines a search pattern. It can be used for string matching, finding patterns, and even text replacements.

Basics of the `re` Module

Common Functions in the `re` Module

Searching for Patterns:

import re

pattern = r"\d+"  # Matches one or more digits
match = re.search(pattern, "The order number is 12345")
if match:
    print("Found match:", match.group())

Finding All Matches:

result = re.findall(r"\w+", "Hello, world! Python is great.")
print("All words:", result)

Replacing Substrings:

modified_text = re.sub(r"Python", "JavaScript", "I love Python programming.")
print(modified_text)  # Output: I love JavaScript programming.

Using Special Characters and Patterns

Anchors: ^ (start of string), $ (end of string)
Quantifiers: * (0 or more), + (1 or more), ? (0 or 1), {n} (exactly n times)
Character Classes: [a-z], \d (digit), \w (word character)

Example: Validating Email Addresses

email_pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
email = "[email protected]"
if re.match(email_pattern, email):
    print("Valid email")
else:
    print("Invalid email")

Advanced Techniques

Compiling Patterns

For better performance when using the same pattern multiple times:

compiled_pattern = re.compile(r"\d+")
matches = compiled_pattern.findall("Call me at 123-456-7890 or 987-654-3210")
print(matches)

Common Pitfalls

Greedy vs. Non-Greedy Matching:

 text = "<tag>content</tag>"

 # Greedy match
 print(re.findall(r"<.*>", text))  # Output: ['<tag>content</tag>']

 # Non-greedy match
 print(re.findall(r"<.*?>", text))  # Output: ['<tag>']

Best Practices

Use raw strings (r"pattern") to avoid issues with escape characters.
Break down complex patterns and test them incrementally.

Regular Expressions Demystified with the re Module