Regex kicks my ass
|
|
This is what I want to do 1
2
3
I must have tried about 15+ html-to-rss sites (dapper.com etc etc) and most of them are little more than toys. dapper is best of the lot - but it either works or it doesn't. And if it doesn't work - that's that. I found this RSS GENERATOR [rssgenerator.plech.net]
You give it a site, you set up a regex and it gives you a... um... an rss proxy. But the example at the site doesn't work. And I've had no luck with it. Maybe it is working - or maybe it's broken - I can't tell. Another problem is that yahoo isn't going to shut down tomorrow - but plech.net might. The only solution I -almost- have success with is yahoo pipes. Pipes has a "fetch page" module that helps to scrape pages. Sheer stubbornness will probably help me in figuring out what I need to do at yahoo pipes - except for the regex. For what I want to do it's regex or nothing. I'm very stubborn - but nobody pushes regex around. Regex keeps kicking my ass. Reality check - it's impossible to make a gm script that makes google reader
I looked at about 60+ pipes that involve rss. Hardly any of them involve scraping. They manipulate the rss feed. Finally I found this excellent pipe that's an great example of what I want to be able to do... Faba2Feed – EN
... but - try as I might - I just can't "get" the "simple" regex code. I can't copy and paste into this thread since pipes doesn't let you copy and paste - everything is all tied up with the pictures. I copied out the source (code) and had some success with my own pipe... reuters 01b
... but the key problem is that since I don't understand how the regex is working - copy and paste only works for very similar situations. I don't know how to get my rss links to work. The yahoo pipe forum has hardly any activity - and I already post there often. And the official yahoo regex pages have no info and just point to pages with material like this: Regular-Expressions.info
I've spent hours reading and trying out the examples but I'm getting nowhere. |
|
|
I couldn't figure out yahoo pipes ether. Two other automatic scraping services that I've tried are http://openkapow.com/ and http://www.dapper.net/ . Dapper is the easiest of the two. But it would probably be easier to just use a GM_xmlhttpRequest to scrape the page. http://diveintogreasemonkey.org/api/gm_xmlhttpr...
GM_xmlhttpRequest({
method: 'GET',
url: 'http://foo.com',
headers: {
'User-agent': 'Mozilla/4.0 (compatible) Greasemonkey',
},
onload: function(rDetails) {
var rTex = rDetails.responseText;
//do stuff with the responseText
},
onerror: function(rDetails){
alert('error '+rDetails.responseText);
},
});
|
|
|
I downloaded openkapow installer and just after I clicked 'install' I got a message that vista isn't (fully) supported. [image]
I started to wonder how much longer openkapow will be in existence. I didn't install it. === If (granted a big "if") the ideal situation worked I'd go to a site like browse movies at thepiratebay.org the first page
I'd grab the first five pages. I'd take the list of 150 torrents (30 to page) and "roll my own" feed. The end result would be a list of probably about 90-120 torrents (I'd filter a lot out) that are sorted alphabetically. I'd fix the title names too and remove underscores and dots (title_with_underscores) (title.with.dots) A mess of data would become useful for me. Very useful. === xmlhttpRequest and GM_xmlhttpRequest are definitely a challenge for me. I'll see how far I can get with them. That would be another way to get the info - like at The Pirate Bay - take the tables from first five pages - mash them together on the first page - filter, alphabetize, etc. === Thanks for the regex link. I need to keep trying. The key to yahoo pipes is regex - without regex - you can't do much. |
|
|
I have a new idea. 1= put 100 items from a feed onto a "page"
>>> Why do I want to that?
>>> Can feedreaders or firefox feedreaders be used?
>>> Can the google mobile link be used?
=== I'm stuck at number two. I can't figure out how to have firefox use a built-in css layout. If there's no 'easy' solution - say to make firefox format it automatically - I'll probably have to scrap my idea. Example of the "ordinary" feed of (a photo strem)
It's formatted with css. Firefox does so automatically for feeds. Well not entirely automatically. The photo stream feed "piped" through the google api to grab 100 items
It's not formatted. And I don't think I can figure out how to style it with html and css. The link uses the google api link plus ?n=100 at the end
https://www.google.com/reader/atom/feed/...?n=100 === >>> Why feedreaders can't work. My favorite way to use feeds to quickly browse loads of them with as many items as possible. Google reader slows to a crawl if I start sticking lots feeds with 100 items at a time into it. It becomes unusable. If I had hundreds of feeds it might even cause firefox to crash. I've tried out standalone newsreaders - but this example is typical - FeedDemon crashed when I had one one single feed delivering 100 items and displaying 100 items. I'm pretty sure that any newsreader or firefox newsreader will choke and/or crash due to the number of feeds I want to "have" and the number of items I want to grab. >>> Why the google mobile link can't work. Here's the google mobile link
The photo stream feed "piped" through the google mobile link
In settings you can have up to 20 items. But the summary is about 10 words and there are no photos. The programming challenge of sticking the photos into the summaries, grabbing longer description summaries and showing 100 items is way way beyond me. === My solution is far from perfect but I'm comfortable with the limitations and if I can jump the hurdle of number two I'm ready to try to see if I can succeed with greasemonkey. === The technique uses the google reader api that is offically undocumented and is about 3 years late. I found unoffical documention here GoogleReaderAPI - pyrfeed - Google Code
That's where I leaned how to grab 100 items. === And btw - if anybody has actually read this far - google api links and google mobile links can't be sent to yahoo pipes. The input isn't accepted. |
