171 Enter the following into the interactive shell:
>>> haRegex = re.compile(r'(Ha){3}') >>> mo1 = haRegex.search('HaHaHa') >>> mo1.group() 'HaHaHa'
>>> mo2 = haRegex.search('Ha') >>> mo2 == None True
Here,
(Ha){3}
matches
'HaHaHa'
but not
'Ha'
. Since it doesn’t match
'Ha'
,
search()
returns
None
.
Greedy and Non-greedy Matching Since
(Ha){3,5}
can match three, four, or five instances of
Ha
in the string
'HaHaHaHaHa'
, you may wonder why the
Match
object’s call to
group()
in the
previous brace example returns
'HaHaHaHaHa'
instead of the shorter possibili-
ties. After all,
'HaHaHa'
and
'HaHaHaHa'
are also valid matches of the regular
expression
(Ha){3,5}
.
Python’s regular expressions are greedy by default, which means that in
ambiguous situations they will match the longest string possible. The non- greedy (also called lazy) version of the braces, which matches the shortest
string possible, has the closing brace followed by a question mark.
Enter the following into the interactive shell, and notice the differ-
ence between the greedy and non-greedy forms of the braces searching the
same string:
>>> greedyHaRegex = re.compile(r'(Ha){3,5}') >>> mo1 = greedyHaRegex.search('HaHaHaHaHa') >>> mo1.group() 'HaHaHaHaHa'
>>> nongreedyHaRegex = re.compile(r'(Ha){3,5}?') >>> mo2 = nongreedyHaRegex.search('HaHaHaHaHa') >>> mo2.group() 'HaHaHa'
Note that the question mark can have two meanings in regular expres-
sions: declaring a non-greedy match or flagging an optional group. These
meanings are entirely unrelated.