Regular Expressions in JavaScript don't support the single-line mode
Today, while working on a gadget involving some client-side development I have discovered that JavaScript doesn’t support the single-line mode for Regular Expressions (Regex). Now why is that bad?
What is the Regular Expression Single-line mode?
As you know Regular Expressions work with patters. Using various metacharacters you can match pieces of a text string. One of such characters is a .
(dot) which by default matches any character except newline characters (\n
on *nix and \r\n
on Windows).
Now imagine that you have a multiline string like:
<div id="wrapper">
<h2>Title</h2>
<div id="content">
Lorem ipsum
</div>
</div>
and you want to grab the contents of the content div. Normally you would use the following Regex:
<div\s*id="content">.*</div>
In this case however the match would be empty because of the newline characters. In many programming languages the Regular Expression engine allows you to match the expression in the Single-line mode where the .
(dot) matches every character including newline character.
XRegExp
Looking on the Internet on how to activate the Single-line mode in JavaScript will lead you to the Overview Of JavaScript Regex page at Fire Studios. According to the information on that page you can use the /s
switch to activate the single-line mode in JavaScript Regex. The problem is, that the /s
doesn’t work and as soon as you use it, you will get a JavaScript parsing error. I have tested this in both Internet Explorer 7 and Firefox 3 and both browsers act the same. So is it even possible to run JavaScript Regular Expressions in the Single-line mode?
Looking on the Internet for a solution I have stumbled upon the XRegExp: An Extended JavaScript Regex Constructor by Steven Levithan. Steven has noticed the same problem as I and decided to provide some functionality to fill the gap among which supporting the /s
switch.
Alternatives
Depending on what you want to do, you can download the XRegExp and use it in your project. Personally, all I wanted to have is the ability of running an expression in the Single-line mode, and therefore have decided to modify the regex I was using.
Looking at the Regex metacharacters you will find among others \s
(white space) and \S
which is equivalent of [^\s]
(non-whitespace). Combining these two together into [\s\S]
allows you to match any character including the newline character. The Regex should then look like:
<div\s*id="content">[\s\S]*</div>
Summary
JavaScript doesn’t support Regular Expressions in Single-line mode as well as other very useful Regex concepts (like named matches for example). While there are many libraries available on the Internet you can achieve similar results by being creative and knowing how the engine works and which functionality is there to help you.