Comments by Ted Kandell on scripts

6 comments

NCBI changed the URLs for papers to
http://www.ncbi.nlm.nih.gov/sites/entrez? ...
from
http://www.ncbi.nlm.nih.gov/entrez? ...

I've updated the script to handle the new URLs as well as the old ones.

If anyone has any problems with this script, or any questions, please email me at:
ted underscore kandell at yahoo dot com
(replace "underscore" "at" and "dot" with the appropriate punctuation).

Please report any bugs or feature requests to ted underscore kandell at yahoo dot com.

Please report any bugs or feature requests to ted underscore kandell at yahoo dot com.

Thanks! Basically the idea that I have is exactly what you're referring to: List those sites that have access to the free full text first, in preference to the others, and also try to get those sites which CiteULike supports too. Now, I think that there are some sites which generally fit both criteria, like PubMed Central, which has full text articles. The question is, how to know which sites have the free full text article available?

So, any ideas on an ordering scheme to try to find the site(s) with the free full text? Or a way of figuring this out, perhaps with an onClick event, that would only do the search once the button was clicked and just for that article?

See, at least for life sciences papers, if the full text is available, PubMed shows the link for it, or multiple links. There should be something analogous for all the other academic fields too, and then it would just be a matter of accessing those few sites to check if the full text is available, in a certain order, and stopping the search on the first hit.

If you think that that would actually work, then I can write that onClick callback to access first the Google Scholar "all X citations" page, read in all the sites, and then access them in a certain order to see if the full text is out there. If anyone can come up with a list of sites guaranteed to show if the full text is available or not, then I'll implement that.

It works, more or less, for all results on a Google Scholar results page. There are some issues with a DOI redirect, because currently CiteULike doesn't follow these. Also, CiteULike doesn't yet support OCLC WorldCat, which gives bibliographic citations for books, but as soon as that plugin is implemented, I and then parse the Google Scholar Book results too, and create links for those.

Sometimes, it seems, that some papers don't have links in Google Scholar for the larger citation sites, like Entrez PubMed, and IngentaConnect - even though these papers appear on those sites. Finding those links would make even more papers automatically post to CiteULike.

Google Scholar only lists at most three links for each paper, and has link for "all X versions" to a page that lists the rest. I've tried to sort the links according to those sites that usually have the full text available for free. JSTOR, even though the full text may be freely available, has access restrictions for the general public unless the IP address is owned by one of the subscribing institutions.

There is a way around these problems: I could write something for the onClick event to fetch the "all X versions" page, and then find the appropriate links from there if possible, if one wasn't found already (or just do that by default.) If no CiteULike supported site link was found, I could do a search on the larger citation sites to see if the paper is really there too.

I think that while it may be feasable onClick to retrieve all the links from the "all X versions" page automatically, I don't think I should implement the following of DOI redirection links - that should be in CiteULike itself, and Google Scholar should fix their spidering of links and find all the citations that are actually out there. If that doesn't happen, there is always the option that CiteULike could do the secondary searching.

Now, since Google Scholar already has the full citation for a search result, perhaps that could be made available via (a possibly hidden) link for each one? That way, there would be no real need to parse *any* journal or citation page ... just pass the link onto CiteULike.

I'd like to hear what people think of these options.

(People might know this already, but NPG's Connotea doesn't retrieve the full citation - users are pasting the Abstract into the description field for example to get around this. CiteULike has fields for every possible piece of bibliographic information already in it's database structure, and so is much more useful than Connotea for academic papers, IMHO.)

This script is tested and should now be working properly. Please report any bugs or feature requests to ted underscore kandell at yahoo dot com.