Saturday, December 16. 2006
Well, after a week of coding, I decided to "release" ajp-wsgi (i.e. make its existence known beyond my web sites and blog). It's a rather niche project, but if one other person finds it useful, then hey, cool.  Releasing it costs me nothing.
I found it highly educational to create, illuminating the mysteries of the Python C API. Plus I'm using it everywhere now. And after having closed ticket #4, the solution of which seems to have been a panacea to all current issues, it's pretty much complete as far as I envisioned it.
Now maybe I can get back to Flannel...
And I have thought about writing an SCGI version as well... but it's not something I would use. So what's the point? I don't really like supporting something I don't use regularly, though by virtue of being free software, I'm not really obligated to provide any support. (But I still do because I'm a nice guy.  ) Maybe if Apache HTTPD eventually adds a mod_proxy_scgi though...
Ah, no Google hits for mod_proxy_scgi. Oh well. At least this entry may eventually show up.
Thursday, December 14. 2006
Apparently, I did not read the AJP13 spec nor my original code very closely:
Note: The content-length header is extremely important. If it is present and non-zero, the container assumes that the request has a body (a POST request, for example), and immediately reads a separate packet off the input stream to get that body. Here I was requesting the first block. Anyway, a quick and easy fix.
ajp-wsgi is moving along... and actually, development has slowed down immensely (a marked sign of stability?). I've converted all Python WSGI applications on my server to use it now. It may just be psychological, but I do notice application responses being a bit snappier. But who knows.
Trac seemed to be the most non-trivial to convert. It doesn't provide a ready-made application factory to create the WSGI app object. I basically had to mimic (using my config options) the operations that its main() method performed. Other applications (my own blog & shorten projects, moinmoin) had readily-available app objects though. And I'm also glad to say that Paste Deploy-based apps are easily deployed with ajp-wsgi as well.
Anyhow, I went ahead and decoupled the C WSGI code from the AJP code today. Now the next time I'm bored, I think I'll write drivers for both ends of an SCGI connection. It would be interesting if I could write that FastCGI->SCGI adapter wholly in C (using the standard C FastCGI dev kit). Actually, I guess I should check if there's already a C SCGI implementation...
Tuesday, December 12. 2006
I moved all the WSGI stuff out of my AJP C library project into its own project: ajp-wsgi. I polished it up a bit, gave it a command-line interface, wrote a better build system, and even wrote a simple README for it. You can find it here.
Note that this is not a Python extension. Rather, it is a 100% C WSGI implementation... that executes the application in an embedded Python interpreter.
It's moved beyond a proof-of-concept and is quickly becoming more and more practical. (At this moment, my personal wiki is running atop it. Maybe I'll switch my Trac sites and shorten over to it as well.) But make no mistake, it is very much alpha-quality and untested.
Future directions:
- Decouple the WSGI code from AJP, opening it up to an SCGI implementation. I have no interest in making a FastCGI version, so don't ask.
- If not thread pooling, then maybe a thread limit at least?
- Multi-app support. Maybe read configuration from an INI-style file (each section header would define the module:object for the app).
- Figure out/understand the oddity with mod_proxy_ajp. It seems to stream the request body whether solicited or not. It also sends an unsolicited EOF packet. If this is the case, then mod_jk seems like a saner implementation... but maybe this is the future direction of the protocol?
Monday, December 11. 2006
A continuation from the weekend's entry... I actually finished the WSGI implementation in C and glued it to the AJP library I wrote. It was an interesting endeavor... programming for Python in C.
I was going to cop out and just implement wsgi.input in Python, but I went all the way and wrote that in C as well. And I'm glad too, because it's far more efficient. Data copies are greatly minimized. And data is streamed from the server. Assuming the application reads wsgi.input in decent-sized chunks, the memory usage will always remain manageable. (For example, I uploaded a 600+ MB file and hashed it. The server never used more than 2-3MB of memory.)
And from my braindead (i.e. "Hello World!") benchmarks, the server is capable of about 900 requests per second. This is a 10X improvement compared to the pure-Python server serving the same application. Not bad at all.
I'm glad to say that as far as I can tell, the server is pretty close to 100% WSGI compliance. At least, wsgiref.validate doesn't complain.  One thing I haven't tested is its ability to handle erroneous WSGI applications. But I'm not sure if I'll want to do much validation (e.g. are the header names/values strings? Is the status code/reason code a string?). My goal is to make this server a production-grade server... so it will assume that your application is WSGI compliant itself.
Anyhow, still a bit of work to do. It would be nice if it was configurable somehow. Also, I should probably investigate if I can just turn it into a simple Python extension module (vs. being a C server that embeds a Python interpreter). I haven't looked how the hybrid FastCGI servers are packaged, but I'm sure it's something much more sane than the route I went.
Sunday, December 10. 2006
There don't seem to be many AJP C libraries. In fact, there don't seem to be any (according to Google, at least). There's at least one FastCGI C library, which is unsurprising given the ubiquity of FastCGI. So yesterday afternoon, I decided to "read spec, write code" yet again and began a C implementation of the "container" (app server) side of AJP. After not having touched C for over 2-3 years, it was a good feeling to muck around with C and BSD sockets again. (Procedural programming, how I missed you!)
I finished it up in a few hours and it is now fairly complete. It's actually a pretty simple protocol, I've realized. All the complexity comes from the way requests/responses are encoded and decoded. (Otherwise, it's a fairly straightforward 1:1 mapping.)
Of course there were the 3 undocumented spec additions, the first two I had to figure out through experimentation so long ago and the last was conveyed to me by someone who actually looked at the source mod_jk/mod_proxy_ajp source. (As much as I believe in the whole "the best documentation is the source" thing, I don't really like looking at similar/related source code when implementing something.)
Anyhow, I don't think those 3 undocumented additions are documented anywhere (hah!) besides my source (ajp_base.py and ajp.c). So:
- When decoding strings, a string with the length of 0xffff is the same as the empty string. However, its trailing NUL is not in the stream.
- When sending SEND_BODY_CHUNK packets, the packets must be NUL terminated. However, this NUL must not be included in the SEND_BODY_CHUNK's length (but must be included in the packet's length).
- The value following an SSL_KEY_SIZE attribute is not an encoded string, but rather an AJP integer.
Anyway, the ultimate goal of this project is to create a Python WSGI AJP server that is implemented in C as much as possible. At this point, I have a simple proof-of-concept working that makes calls into an embedded Python interpreter. It doesn't implement WSGI at all though.
As far as request/response throughput is concerned, it looks promising. While the threaded pure-Python AJP server could only handle ~86 requests per second (with 100 parallel clients), the threaded C/Python hybrid version was nearly pushing 1000 requests per second. Of course it doesn't include WSGI overhead yet. But we shall see.
All this effort is, of course, inspired by the WSGI servers built upon the FastCGI C library: fcgiapp and python-fastcgi. If I actually used FastCGI, I'd probably be using one of those servers rather than flup's.
Wednesday, December 6. 2006
While thinking about how I'm going to implement security in flannel, I thought: it would be nice to be able to reuse all the WSGI auth middleware out there. (Really, coming up with my own auth scheme to fit within flannel's framework did not seem very appealing.)
But it seems, you either protect the entire application, or you don't. Nothing so far that I've seen lets you protect only certain URL patterns. So generalizing upon this, it would be nice to have:
A conditional filter-type middleware that will accept: a bunch of URL patterns and a middleware instance. If the PATH_INFO matches any of the URL patterns, the middleware is invoked... otherwise the request is passed directly to the application. One can build more complex conditional behavior by composing different instances of this filter middleware. And it would be nice if the pattern language were something simple, like Ant-style path pattern matching. Other nice-to-have features would be pluggable pattern matchers (maybe people would rather use regexes... who knows?), a caching decorator for pattern matchers.
Another wish-list item I had, which is somewhat also security-related:
"Remember Me" middleware: This would be similar to paste.auth.cookie but the cookies would be explicitly persistent. Perhaps signing the username + random junk and stuffing this (along with the signature) into a persistent cookie. A nice-to-have feature would be including the date the cookie was created into the signed data. This would allow some sort of "expire all persistent login data" feature.
Anyway, stuff to work on if I have time and it interests me enough. I haven't been doing much lately because I haven't been feeling well. A shame though, that a little illness would utterly stop all my Python/flannel momentum.
Tuesday, December 5. 2006
In an effort to quell a warning from wsgiref's validator, QUERY_STRING will now default to an empty string if it doesn't exist in the environ. Despite not being required to always be present by the WSGI spec, it looks like the cgi module will assume sys.argv[1] is the query string if QUERY_STRING isn't present in the environ.
Also, I changed the keyword parameters in GzipMiddleware a bit:
- mimeTypes -> mime_types
- compresslevel -> compress_level
Lastly, unless I hear from people otherwise, I will be removing flup.publisher and flup.resolver from flup.
And as far as flup.middleware goes, ErrorMiddleware will be disappearing as well. There are better options from Paste, especially paste.evalexception for development. GzipMiddleware and SessionMiddleware are now currently looking for a new home, as I would like to remove them from flup eventually too.
|