Regular Expressions Demystified with the re Module

Regular expressions (regex) are powerful for searching, matching, and manipulating text. Python’s re module makes it easy to use regex in your scripts. This tutorial will walk you through the basics and practical applications.

What Are Regular Expressions?

A regular expression is a sequence of characters that defines a search pattern. It can be used for string matching, finding patterns, and even text replacements.

Basics of the re Module

Common Functions in the re Module

  1. Searching for Patterns:
    import re
    
    pattern = r"\d+"  # Matches one or more digits
    match = re.search(pattern, "The order number is 12345")
    if match:
        print("Found match:", match.group())
    
  2. Finding All Matches:
    result = re.findall(r"\w+", "Hello, world! Python is great.")
    print("All words:", result)
    
  3. Replacing Substrings:
    modified_text = re.sub(r"Python", "JavaScript", "I love Python programming.")
    print(modified_text)  # Output: I love JavaScript programming.
    

Using Special Characters and Patterns

Example: Validating Email Addresses

email_pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
email = "[email protected]"
if re.match(email_pattern, email):
    print("Valid email")
else:
    print("Invalid email")

Advanced Techniques

Compiling Patterns

For better performance when using the same pattern multiple times:

compiled_pattern = re.compile(r"\d+")
matches = compiled_pattern.findall("Call me at 123-456-7890 or 987-654-3210")
print(matches)

Common Pitfalls

Best Practices