I just recently discovered that Apache HTTPD now ships with mod_proxy_scgi. (Probably old news to some, that was 3 patch versions ago.) Couple that with the fact that I stumbled upon PyCaduceus again recently (which proves to me that the C WSGI code written for ajp-wsgi was relatively transport-independent)... So I decided to spend a few hours today to writing an SCGI driver in C.
Well, it was actually pretty easy considering the SCGI spec is only a page long. DJB's netstrings description was just as short.
Replacing AJP in ajp-wsgi was straightforward as well. In fact, building the WSGI environment is a lot simpler with SCGI since the "headers" from the web server don't need to be re-interpreted/converted. (And I've yet to test this, but I don't think specifying the scriptName is necessary anymore.)
Once scgi-wsgi was in a working state, I was curious how it compared to ajp-wsgi. The results were surprising at first, but later made sense once I figured out what was happening. These were on my server using an extremely compute-bound WSGI application:
So... yeah. scgi-wsgi is slightly slower when threading and significantly slower when forking. Despite SCGI's simplicity, the explanation for this is that AJP uses persistent "backend" connections while SCGI uses one connection per request. In other words, the SCGI version was heavily penalized because there was no thread pooling or process pooling.
So where do I go from here? I'll probably just continue on with ajp-wsgi 1.1 as planned and leave scgi-wsgi in limbo. (I was entertaining the idea of merging the two.) SCGI needs thread/process pools to be competitive. Unless I find C implementations with a friendly-enough license (i.e. not copyleft), it's probably not worth writing my own pool implementations just for scgi-wsgi. The forking reference implementation (now found in Paste, I believe) is still probably the best.
(As an aside, since the SCGI protocol is so simple, moving the implementation to C to avoid the GIL probably doesn't improve things much over a pure-Python implementation.)
Addendum: Apparently, the C implementations shine over their pure-Python counterparts when uploading files (i.e. large request body). When uploading a ~100MB file, ajp-wsgi and scgi-wsgi took ~6 seconds and ~4 seconds respectively (simplicity wins out!) When using flup ajp/scgi and Paste's scgiserver, I couldn't be bothered to wait for the upload to complete. (It was well over a few minutes for each before I canceled.)
Regarding AJP vs SCGI performance.
One of the big wins with SCGI over AJP, despite AJP having persistent connections, seems to be that with large bodies SCGI is obscenely fast.
For example, using flup (a very recent version as of this writing) and flup's AJP support, a 10MB upload took about a minute. With flup's SCGI support, the same upload took fractions of a second. I did not test ajp-wsgi (the C code) as it doesn't hook into paste that I'm aware of.
Might I suggest modifying ajp-wsgi in such a way that it could be included with flup as an optional performance enhancement with paste factory support?
I noticed that last night as well. My test program was a bit off-spec -- turns out, you're not supposed to read beyond CONTENT_LENGTH. After I fixed that, I saw that flup AJP (and I suspect FastCGI as well, since it's the same code) is horribly slow due to all the buffering/copying, while flup SCGI simply wraps the socket with a file object.
Though ajp-wsgi is actually on-par, as far as request body uploads are concerned. So it's a problem with the Python implementation.
I did originally think about including the ajp-wsgi stuff into flup, but it would be the opposite of what it actually is: a C program that makes calls into a Python WSGI app. Though I am curious if flup would benefit from having the transport/threadpool/prefork stuff in C.
As for getting ajp-wsgi (and scgi-wsgi now) working with Paste or any Paste-based framework, it's a simple matter of writing a two line Python script:
from paste.deploy import loadapp
app = loadapp('config:production.ini', relative_to='.')
Save it into foo.py, then invoke ajp-wsgi like so: ajp-wsgi foo app
Regarding ajp-wsgi and paste frameworks -- what I want is the opposite, I think. I want to edit a (an?) .ini file and have paster make use of ajp-wsgi as a 'server'.
I don't think the threadpool/prefork stuff needs to be in C, but perhaps having part of the protocol in C wouldn't hurt. Sort of like a protocol accelerator the way SQLAlchemy can make (optional) use of C extensions for performance.
I suspect the SCGI protocol is also much more efficient for large payloads than AJP, but I can't be sure since I'm not that well-versed in AJP.
What could be done to the Python ajp protocol layer to speed it up?
I couldn't find any obvious ways to run an external server from "paster serve." The main blocker is that Paste's server_runner/server_factory expect an app object to be passed in. (No way to really pass that on to an external program.)
SCGI definitely is more efficient than AJP, mainly because AJP is packet-based. So that means for AJP to do 'wsgi.input' stuff, there needs to be a lot of copying/slicing going on, which is where I believe the slowness is coming from.
Even in ajp-wsgi it's unavoidable -- though I did manage to bring it down from copying 'wsgi.input' data twice to just copying it once.
However I'm not sure if it would really account for the massive slowness in flup's AJP. I'm mildly curious and will probably look into it sometime, though the pure-C WSGI servers have my attention right now.