Parsing Response HTML
If you are parsing XML using the DOMParser is very easy and straight forward.
GM_xmlhttpRequest({
method: 'GET',
url: url,
onload: function(responseDetails) {
var dp = new XPCNativeWrapper(window, "DOMParser()");
var parser = new dp.DOMParser();
responseXML = parser.parseFromString(self.responseText, 'text/xml');
}
});
However, this code only works because XML is strictly formatted. But if you want to parse an HTML page so that you can use XPath on it using this code will likely give you a "XML not well-formatted" error. But, luckily I have found a way around this error. Create a new document and fill it with the response HTML:
function getDOC(url, callback) {
GM_xmlhttpRequest({
method: 'GET',
url: url,
onload: function (responseDetails) {
var doc = document.implementation.createDocument('', '', null),
html = document.createElement('html'),
head = document.createElement('head'),
body = document.createElement('body');
head.innerHTML = /<\s*head[^>]*>((?:.|\s)+?)<\s*\/head\s*>/mi.exec(responseDetails.responseText)[1];
body.innerHTML = /<\s*body[^>]*>((?:.|\s)+?)<\s*\/body\s*>/mi.exec(responseDetails.responseText)[1];
doc.appendChild(html);
html.appendChild(head);
html.appendChild(body);
callback(doc);
}
});
}
getDOC('http://example.com/', function(doc) { alert(doc.documentElement.innerHTML) });
Once you do this you can use evaluate and getElementsByTagName on doc:
getDOC('http://example.com/', function(doc) {
alert(doc.evaluate('count(.//a)', doc, null, 1, null).numberValue);
alert(doc.getElementsByTagName('a').length);
});

login to vote
Excuse me, but what's the point of using DOMParser at all? You have your HTML parsed as soon as you set it as element's innerHTML.
GM_xmlhttpRequest({ method: 'GET', url: 'http://userscripts.org', onload: function(responseDetails) { var holder = document.createElement('div'); holder.innerHTML = responseDetails.responseText.split(/<body[^>]*>((?:.|\n)*)<\/body>/i)[1]; alert(document.evaluate('count(.//a)', holder, null, 1, null).numberValue); alert(holder.getElementsByTagName('a').length); } });You seem to do three times more work than necessary.
login to vote
You're absolutely right about the DOMParser being useless for parsing HTML. My old method was only temporary until I found something better, which I did just recently.