07-21-2016, 07:21 PM

Dean Roddey Wrote:In a nutshell, it comes down to a sequence of characters that are used to match a pattern. It's composed of one of a set of components:http://www.charmedquark.com/vb_forum/sho...ostcount=4

1. Characters. Just regular characters to be matched

ABC means match A, then B, then C

2. Any single character is represented by a period, so:

A.C will match A1C, A2, A4C, AxC and so forth

3. One of a set of things, divided by a | character, so:

A|B will match a single A or single B.

(A|B)(C|D) will match AC, AD, BC, or BC

4. Repetition indicators. ?=One or zero, *=Zero or more, +=One ormore

AB?C will match ABC or AC, since one or zero Bs can be between them

AB+C will match ABC, ABBC, ABBBBBBC, and so on, because one or more Bs are legal

AB*C will match AC, ABC, ABBBBC, and so on, because zero or more Bs are legal.

5. Range indicators, using []

[A-Z] will match any letter from capital A to capital Z

[0-9] will match any digit

[123] will match a one, two or three (so it's kind of like (1|2|3) really, but just demonstrating that it's not necessarily a range like the first two.)

You can mix these things together in many ways, particularly by using the parenthesis, because the repetitions apply to the whole parethesized previous section if it is parenthesized, or to the range.

(A|B)+CD will match ACD, BCD, ABCD, AAAABBBBCD and so forth, because it's one or more As or Bs followed by CD.

One|Two will match either One or Two.

(Two|Four) (Rabbits|Foxes) will match "Two Rabbits", "Four Rabbits", "Two Foxes" or "Four Foxes".

[0-9]* will match any number of digits (ncluding zero digits)

[0-9]+Db will match 0Db, 0132Db and so forth, i.e. one or more digits followed by Db.

(1|2)[5-9][A-F] would match 15A, 29C, and so forth. It's 1 or 2, followed by one digit from 5 to 9, followed by one character from A to F.

[^ACD]+ means one or more characters, none of which can be A, C, or D.

A.*B would mean A followed by zero or more of any character, followed by a B.

Of course you have to escape the magic characters to get them to be treated as regular characters. So:

\[\(\* would match [(* because they are all escaped. \\ represents a single \ character, since you have to escape the escape character as well.

That's it in a nutshell. There are a few more aspects to it, but that's the most important stuff. You can write quit elaborate ones, but they can be hard to prove correct just by looking at them.

And you cannot use regular expressions to count things. So there's no regular expression that says "four A, B, or C characters.". You would hav eto just do:

(A|B)(A|B)(A|B)(A|B)

And explicitly indicate all four options.

Mykel Koblenz

Illawarra Smart Home

Illawarra Smart Home