As you know,regular expression is very import in text processing.I had never noticed the “link break” in the regular expression.when i matched my “multiple line” regular expression against the string,i found that the line break in both the regular expression and the string to match will affect the match result.below is the detail.
PCRE(Perl-Compatible Regular Expression) for example,let’s take a close look at two special pattern modifier,below is the explanation from php manual document for both PCRE_MULTILINE and PCRE_DOTALL modifier:
- m (PCRE_MULTILINE)
- By default, PCRE treats the subject string as consisting of a single “line” of characters (even if it actually contains several newlines). The “start of line” metacharacter (^) matches only at the start of the string, while the “end of line” metacharacter ($) matches only at the end of the string, or before a terminating newline (unless D modifier is set). This is the same as Perl. When this modifier is set, the “start of line” and “end of line” constructs match immediately following or immediately before any newline in the subject string, respectively, as well as at the very start and end. This is equivalent to Perl’s /m modifier. If there are no “\n” characters in a subject string, or no occurrences of ^ or $ in a pattern, setting this modifier has no effect.
- s (PCRE_DOTALL)
- If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl’s /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.
so regular expression can match multiple line string and it will be affected by m and s pattern modifier.we do not need to define a regular expression in multiple line like this,thought it’s ok in syntax:
$pattern = “/\| Domain: .*$domainstr.*
\| Expired domain: .*
\| Ip: \d+\.\d+\.\d+\.\d+
\| UserName: .*
\| PassWord: .*
/”;
however,we can rewrite it in this way:
$pattern = “/\| Domain: .*$domainstr.*(\r)?\n(\| Expired domain: .*?(\r)?\n)?\| Ip: \d+\.\d+\.\d+\.\d+(\s\(n\))?(\r)?\n(\| HasCgi: y(\r)?\n)?\| UserName: .*(\r)?\n\| PassWord: .*/i”;
The line break depends on system.so it will affect the match result.so we need to pay more attention when define the regular expression.