Scroogle - linkable, pretty, cache++

By khopesh Last update Oct 6, 2010 — Installed 6,272 times.

Post doesn't exist

Archived Comments (locked)

in
Subscribe to Archived Comments 6 posts, 3 voices



Jesse Andrews Admin

The following is an archive of comments made before threaded discussions was implemented (November 16th, 2008)

 
khopesh Script's Author

Apparently, the referer is only stripped when linking from an SSL site to a non-SSL site. You're referencing a SSL site, so the referer is passed along. I had not tested this issue on an SSL server. Very interesting. I updated the script to take care of this issue while still allowing bookmarkable URLs.

The best workaround I could find (using href="javascript:location.href='https://...'") required its own workaround so that the link is colored when you've visited it, and that was very hard to come by. This might be the first time it has been used like this (yay me!), so I feel obliged to mention that it exposes an interesting way to invade users' privacy. With this method, the javascript essentially gains access to whether or not you have visited a link. A malicious coder could then submit this data to a third party. (The more obvious methods of gaining this data are protected, but this one is not.)

Another method of gaining this knowledge comes in the form of CSS, using a unique background image for each a:visited link (which is tedious) and logging the http requests for those background images on a server. This is a known "exploit" and I believe it is being worked on specifically, whereas I believe mine is both more elegant and (more importantly) not yet known.

Update: Of course it's known. Far too simple. This is Mozilla Bug 14777. The patch blocking it landed yesterday(!), so I'll have to get another workaround[1] when that gets propagated into a release (which won't be for a while). I'll put my code in a try/catch test so that this part can die silently if denied access to that attribute (though I expect it will merely grab the un-visited color instead, which is effectively the same result).

[1]: Probably these four steps: duplicate the link, alter the duplication's href, place it exactly on top of the original, and set its opacity to 1.00. Very messy. Maybe there will be an "official" way to do it, or maybe I'll figure out how to do it with onclick instead (my initial attempts there failed).

 
icepick Scriptwright

RE: Referer & SSL:

For example, visit this URL: https://ssl.scroogle.org/cgi-bin/nbbw.cgi?Gw=ph...
Look for a phpinfo() page, #4 is one at time of writing. Grep for HTTP_REFERER

I get HTTP_REFERER https://ssl.scroogle.org/cgi-bin/nbbw.cgi?Gw=ph...
which is the source of my claim.

 
icepick Scriptwright

Items cached by PdfMeNot are indeed stored indefinitely, however, used in conjunction with Scroogle, I find limited reason for concern. One assumes that all results displayed are public and cached by multiple entities - one more would seem insignificant. The local upload feature, on the other hand, I believe is of far more concern.

As always, cheers for the update for this useful script. Using your latest version, I will now apply my (increasingly limited) tweaks to update my "fork".

 
khopesh Script's Author

PdfMeNot is an interesting flash-based PDF viewer ... they have no privacy policy, and it doesn't say what they do with their cached copies, so I'm a bit wary (after all, this is why we use Scroogle over Google!) but I've added it regardless. I also updated the link text for Google's cache on PDF items. See the Update section above for more details.

To-do: PDF is just one format Google parses and caches as HTML. I'll have to come back and do PPT, RTF, and DOC (any others?) later. I can't keep adding icons as they will take up too much disk space, so I may revert back to using [PDF] the way Google does.

Update: PPT, RTF, DOC, and PDF all now have that prefix. PDF also gets an icon because it's so common. That might change in the future.

 
icepick Scriptwright

Cheers for the feedback!

Your update is going to drive me out of the forking business. :-P
Before I synchronise with your code, I figured I'd offer my next idea as suggestions to you first. Quite simply, when a returned result is a PDF, a link to view using PdfMeNot, as well as Google's 'view as HTML' is added - with both links having some form of increased visibility.
It should be noted that the Google Cache and 'View as HTML' URLs are identical if you're into space-saving.

Thanks, and I look forward to whatever else you might come up with in the future.

Cross
Presentational HTML allowed.
Use <code> for inline code and <pre> for code blocks. Use &lt; and &gt; for literal < and >.
We help break paragraphs and link your links.
or cancel