July 12, 2005

Help with the Connection HTTP header in python?

Okay, so I've been banging away on this little one for three or four hours now, to no avail. I'm hoping maybe a python expert sees this and can help. Here's the story:

I'm working on a little python script that will grab search results from one of our vendors to include in a federated search system. I've had pretty good results with most of the vendors we work with, and all in all, my little program is pretty slick.

But, I'm stuck with one of them. You see, in order for their authentication system to work when I'm doing the search, I need to first get a cookie with a session variable from their server. So, it needs to look like this: hit homepage, get cookie, send second request, get search results. Pretty easy, huh?

Turns out, no, it isn't that easy. The root of the issue is that my script is including the "Connection: close" HTTP header. And when the server (IIS) sees this, it flushes the session variable it set. So, on the inital request, it gives me a session variable then immediately flushes it. Not so useful.

So, I need to not send the "Connection: close" header, so that IIS will keep the session alive. Fair enough. Let's look at what I'm using to grab the scripts. Plain old urllib and urllib2 won't cut it, as they don't have cookie support (well, urllib2 got it as of python 2.4, but our box is running 2.2, and I'm pretty sure that wouldn't matter anyway). So, I'm using ClientCookie (which eventually merged into the standard library as cookielib). ClientCookie is pretty darn slick, very easy to use. But, you guessed it, no persistence. It sets the same old "Connection: close" header as urllib2 does. So, I then turned to urlgrabber's keepalive module. This works like a charm...easy persistent connections. But, uh, no cookie support. Both modules let you set headers (to change the User-Agent and such), but my attempts to change the "Connection" header go nowhere.

I have two different problems, with two solutions. Just they won't work together. I've spent a few hours trying to combine them without luck. The next thing I can think of is to basically try to re-write the functionality of these two modules together as one, but that sounds really, really ugly for this occassional programmer.

Anyone out there a python guru with a good answer to this?? Write me!!