[hcs-d] Wget from PIN-authenticated page

Greg Brockman gdb at hcs.harvard.edu
Tue Dec 7 15:11:22 EST 2010


The necessity of solutions like this always remind me how much I hate
computers...

Greg



2010/12/7 Peter Bailis <pbailis at fas.harvard.edu>:
> Great--this is pretty slick!  Thanks, Ivan.
>
> On Tue, Dec 7, 2010 at 12:09 PM, Ivan Krstić
> <krstic at solarsail.hcs.harvard.edu> wrote:
>>
>> Easiest thing you can do here is to script the browser
>> itself, http://groups.csail.mit.edu/uid/chickenfoot/. It'll take you 10
>> minutes to get this working.
>>
>> Cheers,
>> Ivan (via mobile)
>> On Dec 7, 2010, at 8:07 AM, Peter Bailis <pbailis at fas.harvard.edu> wrote:
>>
>> Hey HCS Hackers,
>> I'm helping a thesis-writing friend with some automated database lookups
>> on a pharmacology database.  We've gotten their permission to automate the
>> scraping, though they don't have an API, so I'm going to do some scraping
>> using wget (I know there are libraries, though I just want a quick and dirty
>> script).  This is paywalled through the Harvard PIN API, though, and, as my
>> initial attempts haven't been too successful, I thought I'd see if anyone
>> else has experience getting through authentication using wget/knows what I'm
>> doing wrong/has any other ideas.
>> My problem is that I'm not sure how to perform the initial authentication
>> on the PIN login page using wget so I can store the cookie for later
>> accesses.  I've tried opening the page in Firefox, getting the cookie, then
>> converting the sqlite (sqlite3 -separator ' ' cookies.sqlite 'select * from
>> moz_cookies' > cookies.txt), but I still get the PIN page.  Any thoughts?
>> The site is http://nrs.harvard.edu/urn-3:hul.eresource:clinphar
>> Thanks,
>> Peter
>>
>> _______________________________________________
>> hcs-discuss mailing list
>> hcs-discuss at lists.hcs.harvard.edu
>> https://lists.hcs.harvard.edu/mailman/listinfo/hcs-discuss
>
>
> _______________________________________________
> hcs-discuss mailing list
> hcs-discuss at lists.hcs.harvard.edu
> https://lists.hcs.harvard.edu/mailman/listinfo/hcs-discuss
>
>


More information about the hcs-discuss mailing list