Matching html tags with Regular Expression / Regex
I <3 regular expression. To me it is pure programming joy. Creating an equation to find matches within highly varying strings.
I USED to know how to do this, but I had to look it up again, which is why it’s here.
I needed to parse some html comments out of code, but because Regular Expression is greedy by default if you have 2 html comments in code, a normal expression will start at the beginning of the first and complete at the end of the second.
Too Simple
<!--.*-->
So, to make the expression less greedy we need to use the 0-1 (?) quantifier after another 0-infinity (*) or 1-infinity (+) quantifier.
Not Greedy
<!--.*?-->
The one last flaw is that this will not handle tags that flow over a CR/LF, so we need an “all characters” pattern. Whitespace/Not Whitespace is a pretty big catch-all.
Just Right
<!--[\s\S]*?-->