First: I've read the general; don't use RegEx on XHTML arguments like this one: RegEx match open tags except XHTML self-contained tags and I do understand how RegEx will fail on nested XHTML or XML nodes.
I don't see why manipulating attributes of an XML alone should break using RegEx. So there seems to be exceptions to the general rule. Attributes are always contained in a single node starting with a < and ending with a > any other < or > in between would break the XML so such can't occur.
Now I'd like to clean an XHTML string of any microdata it might contain. That is any attributes itemscope, itemtype, itemprop, itemid and itemref. Something like this:
...
<body itemscope="itemscope" itemtype="http://schema.org/WebPage">
<div itemprop="maincontent">content</div>
...
What's the best way to do this in PHP?