7 pat t e r n m at c h I n g w I t h r e g u L a r e X p r e s s I o n s


Matching Newlines with the Dot Character



Yüklə 397,03 Kb.
Pdf görüntüsü
səhifə16/25
tarix29.11.2022
ölçüsü397,03 Kb.
#71308
1   ...   12   13   14   15   16   17   18   19   ...   25
P A T T E R N M A T C H I N G W I T H

Matching Newlines with the Dot Character
The dot-star will match everything except a newline. By passing 
re.DOTALL
as 
the second argument to 
re.compile()
, you can make the dot character match 
all characters, including the newline character.
Enter the following into the interactive shell:
>>> noNewlineRegex = re.compile('.*')
>>> noNewlineRegex.search('Serve the public trust.\nProtect the innocent. 
\nUphold the law.').group()
'Serve the public trust.'
>>> newlineRegex = re.compile('.*', re.DOTALL)
>>> newlineRegex.search('Serve the public trust.\nProtect the innocent. 
\nUphold the law.').group()
'Serve the public trust.\nProtect the innocent.\nUphold the law.'
The regex 
noNewlineRegex
, which did not have 
re.DOTALL
passed to the 
re.compile()
call that created it, will match everything only up to the first 
newline character, whereas 
newlineRegex
, which did have 
re.DOTALL
passed to 
re.compile()
, matches everything. This is why the 
newlineRegex.search()
call 
matches the full string, including its newline characters.


Pattern Matching with Regular Expressions
177
Review of Regex Symbols
This chapter covered a lot of notation, so here’s a quick review of what you 
learned about basic regular expression syntax:
• The 
?
matches zero or one of the preceding group.
• The 
*
matches zero or more of the preceding group.
• The 
+
matches one or more of the preceding group.
• The 
{n}
matches exactly n of the preceding group.
• The 
{n,}
matches n or more of the preceding group.
• The 
{,m}
matches 0 to m of the preceding group.
• The 
{n,m}
matches at least n and at most m of the preceding group.

{n,m}?
or 
*?
or 
+?
performs a non-greedy match of the preceding group.

^spam
means the string must begin with spam.

spam$
means the string must end with spam.
• The 
.
matches any character, except newline characters.

\d

\w
, and 
\s
match a digit, word, or space character, respectively.

\D

\W
, and 
\S
match anything except a digit, word, or space character, 
respectively.

[abc]
matches any character between the brackets (such as ab, or c).

[^abc]
matches any character that isn’t between the brackets.

Yüklə 397,03 Kb.

Dostları ilə paylaş:
1   ...   12   13   14   15   16   17   18   19   ...   25




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azkurs.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin