Pattern Matching with Regular Expressions
179
(\s*(ext|x|ext.)\s*\d{2,5})? # extension
)''', re.VERBOSE)
Note how the previous example uses the triple-quote syntax (
'''
) to
create a multiline string so that you can spread the regular expression defi-
nition over many lines, making it much more legible.
The comment rules inside the regular expression string are the same
as regular Python code: the
#
symbol and everything after it to the end
of the line are ignored. Also, the extra spaces inside the multiline string
for the regular expression are not considered part of the text pattern to be
matched. This lets you organize the regular expression so it’s easier to read.
Combining re.IGNORECASE, re.DOTALL, and re.VERBOSE
What if you want to use
re.VERBOSE
to write comments in your regular
expression but also want to use
re.IGNORECASE
to ignore capitalization?
Unfortunately, the
re.compile()
function takes only a single value as its
second argument. You can get around this limitation by combining the
re.IGNORECASE
,
re.DOTALL
, and
re.VERBOSE
variables using the pipe character
(
|
), which in this context is known as the
bitwise or operator.
So if you want a regular expression that’s case-insensitive
and includes
newlines to match the dot character, you would form your
re.compile()
call
like this:
>>>
someRegexValue = re.compile('foo', re.IGNORECASE | re.DOTALL)
Including all three options in the second argument will look like this:
>>>
someRegexValue = re.compile('foo', re.IGNORECASE | re.DOTALL | re.VERBOSE)
This syntax is a little old-fashioned and originates from early versions
of Python. The details of the bitwise operators are beyond the scope of this
book, but check out the resources at
https://nostarch.com/automatestuff2/ for
more information. You can also pass other options for the second argument;
they’re uncommon, but you can read more about them in the resources, too.
Dostları ilə paylaş: