CleanItUp
Last update on Jun 24, 2007
Lets you clean up a page by putting instructions in url (css, xpath selecting, regexp and more)
Multitool for cleaning any webpage by instructions passed in the url. The tool is embedded in LookItUp2, so you don't have to install it in order to use LookItUp2. There are two reasons I created this script separately: 1.To document the features on a separate page, 2.The tool could be useful in to embed other script, so submitting it as a separate script makes it easier for others to extract.
xremove: Selects nodes to destroy, using xpath
css: Applies css to the document
eval: Run any script.
replace: Replace content using regexp
crop: Select content between two strings in the sourcecode (in the innerHTML)
The actions can be combined. They will be performed in the order that they appear in the url. Its possible to have the same type of action multiple times
Notes:
Note: I can highly recommend the XPath Checker plugin for quickly testing xpath expressions on a page
Examples
Examples
Examples
I recommend Instant CSS for quickly testing new css
Examples
Examples
Examples
02 JUN 2007 - v1.0 Created
Cleanup features
xcrop: Selects nodes to keep, using xpath (this is what xStripper does)xremove: Selects nodes to destroy, using xpath
css: Applies css to the document
eval: Run any script.
replace: Replace content using regexp
crop: Select content between two strings in the sourcecode (in the innerHTML)
The actions can be combined. They will be performed in the order that they appear in the url. Its possible to have the same type of action multiple times
Notes:
- The actions are only triggered if the url contains "cleanitup". Reason: the script will likely run on *, so it should be able to quickly decide if it should spent time examining the parameters - and some sites might use the same parameter names, which would have unintentional effects.
- Some sites respond to extra params in a bad way (ie freedictionary: It tries to look up all words passed as params). To handle this, I expanded the syntax: the url is scanned for "cleanitup". The character that follows this string is choosen as the command separator. Usually the separator is "&", but I have given you the option to redefine this. This syntax could also come handi if need to use "&"'s in the command values. Example:
http://www.thefreedictionary.com/dict.asp?Word=test#cleanitup!xcrop=#MainTxt!eval=alert('hi')
xcrop
Selects the specified nodes, and deletes the rest. The code is taken from xStripper. The only difference is that I don't wrap each node in a div.Note: I can highly recommend the XPath Checker plugin for quickly testing xpath expressions on a page
Examples
- Selecting images and inputs:
http://www.google.com/webhp?cleanitup&xcrop=//img|//input
xremove
Removes all nodes that are specified by the xpath.Examples
- Removing the 4'th button on google (I never feel lucky):
http://www.google.com/webhp?cleanitup&xremove=//input[4]
css
Applies css to the documentExamples
- Setting the background color of google:
http://www.google.com/webhp?cleanitup&css=body{background-color:yellow!important}
I recommend Instant CSS for quickly testing new css
eval
Allows you to run any script you want. Note: You have access to greasemonkey functions here!Examples
replace
Replaces text, using regexp. It takes 3 arguments: Search-string, modifiers and replace-string. They are separated with this character: "|". If you skip the modifiers, they default to "gi".Examples
- Giving a new meaning to an article:
http://en.wikipedia.org/wiki/Superman?cleanitup&replace=superman|Bill Gates - Removing the "From Wikipedia, the free encyclopedia" tagline from wikipedia articles:
http://en.wikipedia.org/wiki/Tagline?cleanitup&replace=From.*pedia|
crop
Crops the document. The action takes two parameters: Text that identifies where the crop should begin, and text that identifies where it should end. The first is included, the second is not. It also takes two additional optional arguments, that allow you to adjust the positions. This "crop" feature is maybe a little dogdy. There is for example a problem: The text is taken from documen.body.innerHTML - but this is actually not the same as document source, and I think its browser dependent. Note: in most cases you can use xcrop instead, by using the "following-sibling" and/or "preceding-sibling" axis (example: here)Examples
- Getting the content between the to horizontal rulers in Chambers:
http://www.chambersharrap.co.uk/chambers/features/chref/chref.py/main?cleanitup
&query=hello&crop=%3Cdiv%20class=%22hr%22%3E|%3Cdiv%20class=%22hr%22%3E
Credits
Based on the xStripper script by alien_scumversion history
24 JUN 2007 - v1.1 Fixed autoupdate02 JUN 2007 - v1.0 Created
You could comment on this script if you were logged in.

login to vote
It's Really cool!!
login to vote
I'm having some difficulty with the syntax. Per recommendation I installed XPath Checker which tended to suggest using IDs and classes (e.g. "id('content')" ). However as far as I can tell these do not work in cleanitup?