July 27 2008

Matching html tags with Regular Expression / Regex

I <3 regular expression.  To me it is pure programming joy. Creating an equation to find matches within highly varying strings.

I USED to know how to do this, but I had to look it up again, which is why it’s here.

I needed to parse some html comments out of code, but because Regular Expression is greedy by default if you have 2 html comments in code, a normal expression will start at the beginning of the first and complete at the end of the second.

Too Simple

<!--.*-->

So, to make the expression less greedy we need to use the 0-1 (?) quantifier after another 0-infinity (*) or 1-infinity (+) quantifier.

Not Greedy

<!--.*?-->

The one last flaw is that this will not handle tags that flow over a CR/LF, so we need an “all characters” pattern. Whitespace/Not Whitespace is a pretty big catch-all.

Just Right

<!--[\s\S]*?-->
Comments (View)
blog comments powered by Disqus

Please...

Leave a comment if this has helped or offended you.

StackOverflow Id