[hcs-d] Wget from PIN-authenticated page
gdb at hcs.harvard.edu
Tue Dec 7 15:11:22 EST 2010
The necessity of solutions like this always remind me how much I hate
2010/12/7 Peter Bailis <pbailis at fas.harvard.edu>:
> Great--this is pretty slick! Thanks, Ivan.
> On Tue, Dec 7, 2010 at 12:09 PM, Ivan Krstić
> <krstic at solarsail.hcs.harvard.edu> wrote:
>> Easiest thing you can do here is to script the browser
>> itself, http://groups.csail.mit.edu/uid/chickenfoot/. It'll take you 10
>> minutes to get this working.
>> Ivan (via mobile)
>> On Dec 7, 2010, at 8:07 AM, Peter Bailis <pbailis at fas.harvard.edu> wrote:
>> Hey HCS Hackers,
>> I'm helping a thesis-writing friend with some automated database lookups
>> on a pharmacology database. We've gotten their permission to automate the
>> scraping, though they don't have an API, so I'm going to do some scraping
>> using wget (I know there are libraries, though I just want a quick and dirty
>> script). This is paywalled through the Harvard PIN API, though, and, as my
>> initial attempts haven't been too successful, I thought I'd see if anyone
>> else has experience getting through authentication using wget/knows what I'm
>> doing wrong/has any other ideas.
>> My problem is that I'm not sure how to perform the initial authentication
>> on the PIN login page using wget so I can store the cookie for later
>> accesses. I've tried opening the page in Firefox, getting the cookie, then
>> converting the sqlite (sqlite3 -separator ' ' cookies.sqlite 'select * from
>> moz_cookies' > cookies.txt), but I still get the PIN page. Any thoughts?
>> The site is http://nrs.harvard.edu/urn-3:hul.eresource:clinphar
>> hcs-discuss mailing list
>> hcs-discuss at lists.hcs.harvard.edu
> hcs-discuss mailing list
> hcs-discuss at lists.hcs.harvard.edu
More information about the hcs-discuss