Regular expressions – practical examples to get you started
There are many posts already out there about Regular Expressions, but I’ve done a few blog posts which use regular expressions and thought it wouldn’t hurt to do yet another post on regular expressions for CF developers who haven’t yet tried them.
Let’s start off with which functions we have available to use in CFML:
– ReFind
– ReFindNoCase
– ReReplace
– ReReplaceNoCase
– ReMatch
– ReMatchNoCase
Essentially, there is find, replace and match functions with and without case sensitivity. For this post I’m going to be using ReReplace
.
Let’s start off with a simple replace. Given the input string of Example 1
I want to replace the 1
with an !
. This can be done using a straight forward replace
function.
function figure1(s) { // replace '1' with '!' return s.replace("1", "!"); } result = figure1("Example 1"); writeoutput(result); // Example !
This works great, but it’ll only work with Example 1
. If we had Example 2
we’d have to write another replace statement. This is where regular expressions come in handy.
With a regular expression you can choose to match based on a pattern. For example I could match the 1
or 2
by using a bracket expression as shown below.
function figure2(s) { // replace '1' or '2' with '!' return s.rereplace("[12]", "!"); } result = figure2("Example 1"); writeoutput(result); // Example ! result = figure2("Example 2"); writeoutput(result); // Example !
In the above figure2 I’ve switched to using a reReplace
so that we can use a regular expression. The first parameter of [12]
is a bracket expression. This particular pattern is defined by using the special characters [
and ]
to specify a set. The expression will match any of the characters contained within the square brackets. In this example we have 12
inside the square brackets so it’ll match either 1
or 2
.
This is great, but what if we have strings ending in 1,2,3,4,5,6,7,8 and 9? We could simply add all those characters to the bracket expression.
function figure3(s) { // replace any character in the bracket expression with '!' return s.rereplace("[123456789]", "!"); } result = figure3("Example 1"); writeoutput(result); // Example ! result = figure3("Example 2"); writeoutput(result); // Example ! result = figure3("Example 5"); writeoutput(result); // Example ! result = figure3("Example 9"); writeoutput(result); // Example !
That does the job, but we can actually define a range of characters instead of having to explicitly type each one.
function figure4(s) { // replace any character from 1-9 with '!' return s.rereplace("[1-9]", "!"); } result = figure4("Example 1"); writeoutput(result); // Example ! result = figure4("Example 2"); writeoutput(result); // Example ! result = figure4("Example 5"); writeoutput(result); // Example ! result = figure4("Example 9"); writeoutput(result); // Example !
As shown in figure4, instead of writing out each character we want to replace as [123456789]
I’ve switched to using a range [1-9]
which does exactly the same thing. It’s less typing and more readable.
So far so good, but we’re soon going to get to Example 10
so we need to include 0
in the bracket expression range.
function figure5(s) { // replace any character from 0-9 with '!' return s.rereplace("[0-9]", "!"); } result = figure5("Example 1"); writeoutput(result); // Example ! result = figure5("Example 10"); writeoutput(result); // Example !0
We have a problem. The input string of Example 10
results in Example !0
. What we want is for it to replace the 10
with a !
without breaking our 1 to 9 versions. In other words, we want to replace one or more numeric values with a !
. To solve this we can leverage another special character in regular expressions: +
.
function figure6(s) { // replace any character from 1-9 with '!' return s.rereplace("[0-9]+", "!"); } result = figure6("Example 1"); writeoutput(result); // Example ! result = figure6("Example 10"); writeoutput(result); // Example !
In regular expressions, a +
character means match one or more of the preceding element. As such we are saying match one or more of any character in the range 0 to 9 ([0-9]
). This also means that our regular expression of [0-9]+
will work with 100
or 12345
.
function figure7(s) { // replace one or more characters from 0-9 with '!' return s.rereplace("[0-9]+", "!"); } result = figure7("Example 1"); writeoutput(result); // Example ! result = figure7("Example 10"); writeoutput(result); // Example ! result = figure7("Example 9876543210"); writeoutput(result); // Example !
Bracket ranges don’t just work with integers, you can use them with letters as well.
function figure8(s) { // replace one or more characters in the bracket expression with '!' return s.rereplace("[a-z]+", "!"); } result = figure8("Example 1"); writeoutput(result); // E! 1 result = figure8("Example 10"); writeoutput(result); // E! 10 result = figure8("Example 9876543210"); writeoutput(result); // E! 9876543210
This time we are replacing one or more characters in the range [a-z]
. Note that it is case-sensitive, so doesn’t match on the E
.
We can match uppercase and lowercase characters by using reReplaceNoCase
or we can actually add another range to the bracket expression to match those uppercase characters.
function figure9(s) { // replace one or more characters in the bracket expression with '!' return s.rereplace("[a-zA-Z]+", "!"); } result = figure9("Example 1"); writeoutput(result); // ! 1 result = figure9("Example 10"); writeoutput(result); // ! 10 result = figure9("Example 9876543210"); writeoutput(result); // ! 9876543210
Now we are matching one of more of either lowercase characters a-z
or uppercase characters A-Z
without having to use ReReplaceNoCase
.
I think that’s enough for one blog post – but before I end I want to mention that when I first started using regular expressions I could never remember things like what the special character for specific matches were, so instead I’d often write them using an alternative syntax. That syntax uses curly braces to define a quantifier of how many matches we want like so: {mincount,maxcount}
.
As we want to match one or more, we can write that as {1,}
. We don’t specify the max count (the 2nd parameter) as we don’t have a maximum limit, all we are worried about is matching at least 1.
In the following figure10 I’ve substituted the special character +
with {1,}
. They both mean the same thing, namely – match one or more.
function figure10(s) { // replace one or more characters in the bracket expression with '!' return s.rereplace("[a-zA-Z]{1,}", "!"); } result = figure10("Example 1"); writeoutput(result); // ! 1 result = figure10("Example 10"); writeoutput(result); // ! 10 result = figure10("Example 9876543210"); writeoutput(result); // ! 9876543210
As I mentioned above, when I was getting started with regular expressions I found this syntax much easier to get my head around to read and write rather than trying to remember that +
meant match one or more as there are several other special characters for quantifiers in regular expressions. These days, I opt for the +
syntax, but thought I share the alternative syntax in case it helps you get started.
Hopefully this post has been useful!
Runnable examples here:
Hi James,
I’d not seen those tickets before. I tried them on ACF2016 / 2018 and seems to be OK (although you mentioned that it’s intermittent so maybe I got lucky).
I have dropped down to using Java regular expressions before as you can cache the patterns which can improve performance. There is also some extra functionality you get by using the Java, so would agree that it’s worth a look (thanks for adding the links).
This post was aimed at people who had never used regular expressions before, so tried to keep it clear and concise.
We’ve encountered some unpreventable ColdFusion hard errors (that can’t be prevented using try/catch) when using ReReplaceNoCase() in our application framework. Here are some reported bugs that are similar:
https://tracker.adobe.com/#/view/CF-3928688
https://tracker.adobe.com/#/view/CF-4165797
To work around this intermittently occuring bug (which Adobe classifies as “UserError” or “AsDesigned”), we started casting the initial value as a string (required for Java; not all CF-typeless variables contain pure “string” values) and using Java’s replaceAll(). Our functions now seem faster, but more importantly there’s no occassional CFError occurring in any of our apps.
When using regular expressions with ColdFusion, the ReEscape() function (available since CF10) simplifies escaping characters that match regular expression control characters.
https://cfdocs.org/reescape
If interested in using more regex matching functions in ColdFusion, check out Ben Nadel’s JRegEx:
https://www.bennadel.com/blog/3322-jregex—a-coldfusion-wrapper-around-java-s-regular-expression-patterns.htm
We’ve encountered some unpreventable ColdFusion hard errors (that can’t be prevented using try/catch) when using ReReplaceNoCase() in our application framework. Here are some reported bugs that are similar:
https://tracker.adobe.com/#/view/CF-3928688
https://tracker.adobe.com/#/view/CF-4165797
To work around this intermittently occuring bug (which Adobe classifies as “UserError” or “AsDesigned”), we started casting the initial value as a string (required for Java; not all CF-typeless variables contain pure “string” values) and using Java’s replaceAll(). Our functions now seem faster, but more importantly there’s no occassional CFError occurring in any of our apps.
When using regular expressions with ColdFusion, the ReEscape() function (available since CF10) simplifies escaping characters that match regular expression control characters.
https://cfdocs.org/reescape
If interested in using more regex matching functions in ColdFusion, check out Ben Nadel’s JRegEx:
https://www.bennadel.com/blog/3322-jregex—a-coldfusion-wrapper-around-java-s-regular-expression-patterns.htm
You must be logged in to post a comment.