71F00
## REGULAR EXPRESSIONS
In the 1940s, Warren McCulloch and Walter Pitts modelled neurons as finite automata to describe the nervous system. In 1956, Steve Kleene invented a mathematical abstraction called regular sets to describe these models.
A regular expression is notation for specifying a set of strings. Since the set might contain infinitely many members, we can't simply enumerate them.
There are four operations for creating regular expressions
1. Kleene closure (replication) *
2. Alternation |
3. Concatenation none
4. Grouping ()
The table below illustrates them by example.
Operation Regular Expression Yes No
..................................................................
Replication ab*a aa ab
(Kleene closure) aba
abba
Logical OR aa | baab aa every other string
(Alternation) baab
Concatenation aabaab aabaab every other string
Grouping a(a|b)aab aaaab every other string
abaab
The * operator has the highest precedence, then | , then concatenation. If we want to specify the set of strings a, aba, ababa, abababa, and so forth, we must write (ab)*a to indicate that the ab pattern must be replicated together.
Robert Sedgewick, Kevin Wayne, Princeton University, 2004
-----------------------------------------------------------------------
Useful extensions:
. matches any single character
$ matches the end of the line
^ matches the beginning of the line
[Ci-Cj] matches character between Ci and Cj
? matches the preceding character 0 or 1 times only
* matches the preceding character 0 or more times
+ matches the previous character 1 or more times
{n} matches the preceding character n times exactly
{n,m} matches the preceding character at least n times
but not more than m times
\ quotation switches between ordinary and extended alphabet
(metasymbols). for example \$ means dollar but not the end
of the line.