Pattern Matching with Regular Expressions
169
>>>
mo1.group()
'Batman'
>>>
mo2 = batRegex.search('The Adventures of Batwoman')
>>>
mo2.group()
'Batwoman'
The
(wo)?
part of the regular expression
means that the pattern
wo
is
an optional group. The regex will match text that has zero instances or one
instance of
wo in it. This is
why the regex matches both
'Batwoman'
and
'Batman'
.
Using the earlier phone number example,
you can make the regex look
for phone numbers that do or do not have an area code. Enter the following
into the interactive shell:
>>>
phoneRegex = re.compile(r'(\d\d\d-)?\d\d\d-\d\d\d\d')
>>>
mo1 = phoneRegex.search('My number is 415-555-4242')
>>>
mo1.group()
'415-555-4242'
>>>
mo2 = phoneRegex.search('My number is 555-4242')
>>>
mo2.group()
'555-4242'
You can think of the
?
as saying, “Match zero
or one of the group pre-
ceding this question mark.”
If you need to match an actual question mark character,
escape it with
\?
.
Matching Zero or More with the Star
The
*
(called the
star or
asterisk) means “match zero or more”—the
group
that precedes the star can occur any number of times in the text. It can be
completely absent or repeated over and over again. Let’s look at the Batman
example again.
>>>
batRegex = re.compile(r'Bat(wo)*man')
>>>
mo1 = batRegex.search('The Adventures of Batman')
>>>
mo1.group()
'Batman'
>>>
mo2 = batRegex.search('The Adventures of Batwoman')
>>>
mo2.group()
'Batwoman'
>>>
mo3 = batRegex.search('The Adventures of Batwowowowoman')
>>>
mo3.group()
'Batwowowowoman'
For
'Batman'
, the
(wo)*
Dostları ilə paylaş: