Wednesday, June 29. 2011
It seems like I've been working on so many random things the past few weeks. And all on different platforms: C (UNIX/POSIX), Python, Cocoa, iOS.
Anyhow, the only things that are almost ready for public consumption are the security-related libraries/modules. (Each succeeding project builds on the previous.) All are BSD licensed.
Everything SHA, from SHA-1 to SHA-2 (SHA-224, SHA-256, SHA-384, SHA-512) to the newest truncated editions (SHA-512/224, SHA-512/256). Also includes HMAC wrappers for each. This is a cleaned up and refactored version of my previous command-line sha project. Still WIP though.
Implementations of PBKDF2 in C and Python. The C version depends on sha-asaddi, though I suppose it can easily be swapped to use any other SHA/HMAC implementations with similar signatures. The Python version has no dependencies outside of the standard library.
Implementations of Colin Percival's scrypt in C and pure Python. This was more of an academic exercise since Mr. Percival's version is far more optimized and is already available under a BSD license. (At least, I'm assuming it's more optimized — I haven't actually looked at his code yet. But he did seem to have assembly/SSE optimized versions.) The Python version is probably too slow to be practical. (A Python wrapper for the original scrypt implementation is already available.) Will gladly take optimization suggestions for the Python version!
Oh, and these three projects represent a general shift in my packaging/building ideals. For now, I've decided to ditch GNU's autotools in favor of CMake for my projects. CMake just seems like a more coherent tool for configuring/building in a platform-independent manner. But, that just, like, my opinion, man...
Saturday, April 9. 2011
For anyone not familiar with it, CAS is an open source single-sign on service. It's implemented in Java, but I have stumbled across a CAS server implemented in Python. (Which I haven't tried yet.)
Anyway, I've been running my own CAS server for a while now (which provides SSO service for all the webapps on my server). I actually have my own Python CAS client and WSGI middleware that I've been using for years. In addition to a number of my Python webapps, I've successfully used it to CASsify MoinMoin. I'm sure I could get it to work with Trac too, but that would probably annoy my Trac users. (It works with any app that simply relies on REMOTE_USER.)
casclient and casmiddleware were part of of my 'py-lib' project (if anyone remembers that). Though 'py-lib' was never really a project. It was more like a directory of Python modules I had written for myself — a place to incubate early Python projects (flup's fcgi.py started there, along with the other pre-1.0 flup modules).
But when I moved beyond Subversion, py-lib was abandoned.
I was reminded of the fact that my Python CAS stuff was never really released due to all the work-related CAS work I've been doing lately. JA-SIG has their own Python CAS code, but it seems to be more geared toward CGI. I'm sure a few of the various Python authentication frameworks out there have a CAS module. But anyway, I just wanted to get the ball rolling and get my own CASMiddleware 'out there' — in case anyone else finds it useful.
It's pretty small and only depends on Beaker.
For now, it's only available at my Hg repo. I'll send it over to PyPI when I get the chance. (Hah! I should have searched PyPI first, so many hits for 'CAS'. Oh well...)
Wednesday, March 16. 2011
I merged all the Python 3.x-specific commits in ajp-wsgi-py3.0 into scgi-wsgi and surprisingly, it was painless and (seemingly) functional on the first try.
scgi-wsgi for Python 3.2 (Hg repo)
(Don't forget to clone scgi, threadpool and preforkserver as well.)
Really cool stuff. Though admittedly, most of my amazement comes from how easy it was to cherry-pick and merge specific revisions thanks to Mercurial.
Addendum: Oh yeah, I fixed ajp-wsgi-py3.0 so it compiles under Python 3.2. Even though the PyCapsule stuff was backported to Python 2.7, I won't be merging that change over. I still want the Python 2.x version to work with Python as old as 2.4.
It was brought to my attention that my trac sites were no longer authenticating the generic user I set up. Apparently, something I missed in my switch to SCGI...
But it's not scgi-wsgi's fault, really. I narrowed it down to mod_proxy_scgi (obviously, by the title of this entry!). mod_scgi works fine. mod_proxy_scgi does not seem to pass the HTTP Authorization header along. I'm currently looking around for flags/options to remedy this with no luck. A quick Google search revealed something similar, but not quite. Though that may address the SCRIPT_NAME/PATH_INFO issues I've seen... in a future version of Apache HTTPD.
I wish I had more time to figure out what was really going on. In the mean time, I will probably switch away from mod_proxy_scgi (to mod_scgi) where possible...
Monday, February 7. 2011
I haven't run into any problems with scgi-wsgi in the few weeks that I've been using it. My issues with SCGI stem from mod_scgi and mod_proxy_scgi.
Namely, I think mod_scgi's handling of environment variables is perfect (see my previous rants about mod_proxy_scgi). It's even slightly faster than mod_proxy_scgi. But configuration leaves a bit to be desired. Say I want to mount an application at the root but I want the path "/static" to be handled by the web server. How does one do this? I'm guessing it has something to do with the SCGIHandler on/off directive (in fact, it probably is that simple) — but that method is a bit way too verbose, especially if you have multiple exclusions.
I like the way the mod_proxy_* stuff is configured (namely with ProxyPass). It's easy to exclude directories. And the directives are parsed in order, meaning configuration is sane and intuitive (at least to me). You specify the most specific entries first.
But again, see my rant about mod_proxy_scgi's environment handling.
Anyway, scgi-wsgi incorporates both a thread pool and a process pool. Since handling is done on a per-request basis, I thought it would be poor form if scgi-wsgi simply closed the connection from the web server if the pool was full — so instead, requests received are queued when the pool is over capacity. (This is in contrast to ajp-wsgi which does drop connections, but again, there's no 1:1 mapping between AJP connections and requests.) So you can have your app running on a single thread/process and still dozens or hundreds of concurrent requests. Those requests won't be served in a timely manner, but at least they will be served.
On my server, scgi-wsgi is one of the fastest WSGI servers I've tried (that's out of flup, ajp-wsgi, Paste's scgiserver, and CherryPy's web server). I have no idea what the state of the art in the WSGI world is, but it was certainly impressive enough to get me to write the thread/process pool code and start switching over my applications. And unlike CherryPy, it had the disadvantage (like the others) of sitting behind Apache httpd. (Though the fact that CherryPy is quite fast and speaks HTTP natively will probably stave off any urge to glue together an embeddable C web server and my C WSGI code... for a while. )
So anyway: scgi-wsgi. I've put it in its own space for now. Additional, more interesting links below.
Wednesday, February 2. 2011
Well, it's been a few weeks since I've touched (or had the need to touch) ajp-wsgi. I've been running it a few places — sometimes forked if the app allowed it.
Anyway, the biggest new feature is, of course, forking support. I think now is a good time to "release" it, as any.
The 1.1 release will probably be the last for ajp-wsgi (barring any bugfix releases). I imagine I'll be folding AJP support into scgi-wsgi in the future (and renaming it all too). At this point, scgi-wsgi is a bit more advanced than ajp-wsgi with its thread pooling and pre-fork multiprocessing. Though neither of those features are really required (or make sense) with AJP's persistent connections.
scgi-wsgi 1.1 will follow in a few days.
Sunday, January 23. 2011
I went ahead and installed the original mod_scgi and it opened my eyes to how utterly different (or maybe broken) mod_proxy_scgi's handling of SCRIPT_NAME/PATH_INFO was.
I've been poring over the source for both modules, and though it's been years since I've coded Apache modules, I couldn't find any glaring differences. mod_scgi set the variables itself using some particular logic, while mod_proxy_scgi called ap_add_cgi_vars() to set the variables. That function appears to use the same logic.
So in spite of all that, I just cannot explain why Apache's mod_proxy_scgi is so broken. To give you an idea of how broken, given the request URI of "/foo/bar%20garply", it would set SCRIPT_NAME/PATH_INFO as follows:
The sane thing to do (which mod_scgi does) is:
So not only is SCRIPT_NAME utterly wrong for something mounted at the root, but PATH_INFO remains quoted.
My solution involves re-deriving SCRIPT_NAME/PATH_INFO from REQUEST_URI. Of course, mod_scgi doesn't need this, and in fact, shouldn't use it all since it sets SCRIPT_NAME correctly for non-root-mounted applications (removing the need to manually specify it).
Since I wouldn't trust auto-detection of the SCGI module, I decided to leave it up to the invoker of scgi-wsgi to decide what's right. Which lead to this lovely feature being created (from scgi-wsgi's README):
Due to the inconsistency between the various SCGI connectors, you may need to specify an environment profile using the -E option. The default profile is pass-through. The profiles are described below:So... yeah. I'm not too happy about it, and it diminishes my motivation to switch over to SCGI. I like the mod_proxy* stuff because it lets me use load balancing and/or set backend connection limits. But between the two, only mod_scgi seems to be the saner implementation. (As mentioned, I've yet to try nginx and lighttpd).
If I can gather code/test cases into a coherent bug report, I'll be posting one with Apache. Because really, the presence of quoted characters in the URL should not throw everything off.
Friday, January 21. 2011
Shortly after my last entry, I switched from a CPU-bound WSGI app to something comparatively more I/O heavy: serving a static 10K page. I then tried it out with the non-threadpool version of scgi-wsgi. The results surprised me. scgi-wsgi was doing well over 700 (nearly 800 at times) requests per second while ajp-wsgi only mustered 300 requests/sec.
Though of course, like before, when I switched to the non-preforking scgi-wsgi, its throughput dropped to a little over 50 requests/sec. ajp-wsgi maintained 300 requests/sec even while forking.
Given the non-threadpool performance of scgi-wsgi, I was spurred to write my own preforking server code (again), this time in C. The result can be found here. Unlike flup's preforking server, this one is based on descriptor passing. (And since I couldn't find my copy of UNIX Network Programming, I have to thank Google for having it browsable online. )
Hooking up the prefork server into scgi-wsgi, I now get similar performance to the threaded version: 700+ requests/sec. With that, scgi-wsgi graduated from 'limbo' to 'alpha.' And since mod_proxy_scgi is now included with vanilla Apache HTTPD, I will probably be transitioning my stuff from ajp-wsgi to scgi-wsgi. (Eat your own dog food and all that.)
As for the future, I would like to merge the two. I'm still entertaining the idea of hooking up my C WSGI code to an embedded HTTP server, similar to what PyCaduceus did (but remaining a top-level program rather than a Python module). I haven't really found an embeddable HTTP server with an interface that I like, though mongoose looks promising.
In the meantime, I'll go ahead and release ajp-wsgi 1.1 and scgi-wsgi 1.1 (eventually).
scgi-wsgi Hg repository (don't forget to clone scgi and preforkserver as well)
Monday, January 17. 2011
I just recently discovered that Apache HTTPD now ships with mod_proxy_scgi. (Probably old news to some, that was 3 patch versions ago.) Couple that with the fact that I stumbled upon PyCaduceus again recently (which proves to me that the C WSGI code written for ajp-wsgi was relatively transport-independent)... So I decided to spend a few hours today to writing an SCGI driver in C.
Well, it was actually pretty easy considering the SCGI spec is only a page long. DJB's netstrings description was just as short.
Replacing AJP in ajp-wsgi was straightforward as well. In fact, building the WSGI environment is a lot simpler with SCGI since the "headers" from the web server don't need to be re-interpreted/converted. (And I've yet to test this, but I don't think specifying the scriptName is necessary anymore.)
Once scgi-wsgi was in a working state, I was curious how it compared to ajp-wsgi. The results were surprising at first, but later made sense once I figured out what was happening. These were on my server using an extremely compute-bound WSGI application:
So... yeah. scgi-wsgi is slightly slower when threading and significantly slower when forking. Despite SCGI's simplicity, the explanation for this is that AJP uses persistent "backend" connections while SCGI uses one connection per request. In other words, the SCGI version was heavily penalized because there was no thread pooling or process pooling.
So where do I go from here? I'll probably just continue on with ajp-wsgi 1.1 as planned and leave scgi-wsgi in limbo. (I was entertaining the idea of merging the two.) SCGI needs thread/process pools to be competitive. Unless I find C implementations with a friendly-enough license (i.e. not copyleft), it's probably not worth writing my own pool implementations just for scgi-wsgi. The forking reference implementation (now found in Paste, I believe) is still probably the best.
(As an aside, since the SCGI protocol is so simple, moving the implementation to C to avoid the GIL probably doesn't improve things much over a pure-Python implementation.)
Addendum: Apparently, the C implementations shine over their pure-Python counterparts when uploading files (i.e. large request body). When uploading a ~100MB file, ajp-wsgi and scgi-wsgi took ~6 seconds and ~4 seconds respectively (simplicity wins out!) When using flup ajp/scgi and Paste's scgiserver, I couldn't be bothered to wait for the upload to complete. (It was well over a few minutes for each before I canceled.)
Thursday, January 13. 2011
I was curious about what it would take to add forking/multi-process support to ajp-wsgi and apparently the answer is: not much!
Well, it becomes trivial when you don't bother with preforking or process pools. I don't think an AJP backend really needs to worry much about saving fork() time since connections (and therefore processes) are more or less persistant.
I've done some basic stress testing using ApacheBench (ab): 100 concurrent requests, 500 concurrent requests. It's looking promising (no errors!). I haven't done any extensive testing between threaded vs. forking, but the results seem to be the same (using a freshly created Pylons app). Obviously the GIL isn't really being exercised when serving a static ~5K page.
And I did learn one oddity about mod_proxy/mod_proxy_ajp — apparently, when you specify max connections, you're actually specifying max connections per httpd process. On my server, where httpd uses the worker MPM (hybrid thread/process), the hard process limit is apparently 16, so even specifying max=1 means ajp-wsgi can expect up to 16 connections. (Conveniently for me, ajp-wsgi's default process limit is 16.) Something to keep in mind when tweaking ajp-wsgi's maxConnections parameter.
(Also note that if mod_proxy_ajp ever goes over ajp-wsgi's maxConnections limit, users will see HTTP 503 errors.)
Anyway, I think I'll stick to threading for now. But 1.1 alpha is out there for anyone interested...
Sunday, December 12. 2010
I just discovered a rather fatal bug introduced in 1.0.3 which leads to high CPU utilization when the web server closes transport connections. I only just discovered the issue tonight, when all of my ajp-wsgi instances went full bore and brought my server's load average to over 15. (Apparently, Apache decided it was a good time to reap stale AJP connections.)
Anyway, I'm kicking myself, since this problem was also seen in flup but was patched some time ago. Shows me not to make post-midnight releases, especially after such short testing cycles. Oh wait...
Wednesday, December 8. 2010
Thursday, June 17. 2010
Released version 1.0.2 of ajp-wsgi (there was no 1.0.1 release, apparently).
It's made up mainly of Mac OS X build fixes, though there were 2 general bugs:
Wednesday, October 21. 2009
I just released a new snapshot, it includes the following changes:
Here are links to the Hg repository and the PyPI page.
If there are any issues, please open a ticket at the flup Trac and/or submit a patch.
Monday, August 3. 2009
I didn't realize this, but all my SHA implementations were still living in my Subversion repository. Well, after some meticulous converting from both my archived CVS and SVN repositories, I've managed to finally move them over to Mercurial.
Anyway, I'm not sure what else in Subversion I'd like to move over. Certainly, there's nothing left in there that's all that interesting to the public. So I will take down the public-facing SVN repo... someday.
Searching Google, I see nothing links directly svn.saddi.com. However, there's quite a lot of documentation that references flup and fcgi.py within it. Well, if anyone's been checking out that flup, it's 2 years out of date. And the standalone fcgi.py module... I'd rather just forget about that since it's hopelessly out of sync from the flup version. (It's basically the flup threaded fcgi server without thread pools.)
(Page 1 of 4, totaling 60 entries) » next page
Syndicate This Blog