Only useful for script developers -- doesn't do anything by itself. Many XPath examples for deleting, moving, and changing objects.
This script has no discussions.
This script has no reviews.
How to use these recipes
There are 4 tools I use in combination:
- Adblock Plus: First I try just blocking style sheets and javascript to clean up the page -- it's a good first step. I simply use CTRL-SHIFT-V, find stuff that looks like junk, and block it.
- Next I use AutoPager to identify XPath's, right-click on the AutoPager icon and select 'Test XPath'. Select the "Content XPath" tab, then click the "C" with a crosshair button, and hover the mouse over something you want to block, and press "w" until it highlights the broadest area you can block without blocking real content. It often gives you multiple XPath's, and I find the shortest one that doesn't use hard-coded ordinal position indicators (i.e. "div[4]"), double-click on it and verify in the bottom left window that it only selects the thing I want to block, then copy that XPath into notepad.
- Next I right-click on Greasemonkey, choose "Manage User Scripts", select one of my existing scripts and copy the contents as an example (all my scripts have the same functional contents, just different XPaths). Then I close the example, right-click on Greasemonkey again and choose "New User Script", answer the questions, and paste the contents of my other script, delete the specific XPath's, and replace them with the XPath created in step 2. I keep the Greasemonkey script editing window open the entire time -- you can just save the file and reload the webpage to observe the effects of your changes.
- Next I press CTRL-SHIFT-I to start the DOM inspector. Using the top-left mouse-pointer button I select the element of the page that contains the real content. Because it displays things in heirarchical form, it's easy to see how the particular object you select relates to others. Commonly I find that I need to look for a DIV that's higher-up than the one I actually selected to encompass the entire "real" content. It's very common to find websites that contain an annoying nesting of DIV's, with the DIV you want being nested several levels down, and with up-stream DIV's containing some junk you want to get rid of. In addition to finding the DIV with the stuff I want to save, I find the top-most DIV in the tree.
At this point I have:
- Greasemonkey script editing window still open
- AutoPager Test XPath sidebar still open (note: you have to occassionally close this and re-open it after you reload the webpage to test Greasemonkey changes)
- DOM Inspector still open (note: you have to close this window and restart if you want to use the selector tool after you've reloaded the page)
- I have a note telling myself what DIV contains the real-content I want to save, and what the top-most DIV is
Now take a look at the example Greasemonkey script below. The only things that need to be changed are bolded italic. In short, the XPath that identifies the top-level DIV, the XPath that identifies the DIV containing the content you want to display, and a list of XPath's that you want to remove.
Check out this CNN article which is the basis for this particular script. The DIV "cnnLeftCol" is the text of the article, and the DIV "cnnContainer" is a high-level container object that includes most of the page. cnnLeftCol is inside of cnnContainer, so I can't simply delete cnnContainer. However, since cnnContainer contains most of the page (including a bunch of junk I want to get rid of), I simply replace cnnContainer with cnnLeftCol, which cleans up most of the page all at once. The additional XPath's are little things that were not inside of cnnContainer that I wanted to get rid of.
Once I'm satisfied with my selection of the real content and top-level XPath (based on putting them into the Greasemonkey script and trying them out), I close the DOM Inspector and focus on step #2 -- I find an XPath I want to remove, add it the Greasemonkey script, save, reload the page to make sure it looks right, and repeat until the page looks nice.
You can see before and after screenshots below, and you can find more with some of my other scripts. For this particular page, "before" takes more than 3.2x as much space to convey the same information -- the clean version provides nearly a 70% increase in information density. Similar gains can easily be had on most websites.
var item_to_replace = $x("//div[@id='cnnContainer']")[0];
var replace_with = $x("//div[@id='cnnLeftCol']")[0];
var stuff_to_remove = [
"//div[@id='cnnHeader']",
"//div[@id='cnnSCFontButtons']",
"//div[@id='cnnFooter']",
"//div[@class='cnnTopNewsModule']",
"//div[@class='cnnStoryTools']",
"//p[@class='cnnTopics']",
"//p[@class='cnnAttribution']",
"//div[@class='cnnStoryElementBox']",
"//span[@class='cnnEmbeddedMosLnk']",
"//div[@class='cnnWsnr']",
"//div[@class='cnnStoryToolsFooter']",
"//div[@class='cnnMosaicContentCol']/div[@class='cnnUGCBox']",
];
function $x(p, context) {
if (!context) context = document;
var i, arr = [], xpr = document.evaluate(p, context, null, XPathResult.UNORDERED_NODE_SNAPSHOT_TYPE, null);
for (i = 0; item = xpr.snapshotItem(i); i++) arr.push(item);
return arr;
}
item_to_replace.parentNode.replaceChild(replace_with, item_to_replace);
stuff_to_remove.forEach(
function(xpath) {
$x(xpath).forEach(
function(item) {
item.parentNode.removeChild(item);
}
);
}
);







