Wednesday, June 29. 2011Latest Projects
It seems like I've been working on so many random things the past few weeks. And all on different platforms: C (UNIX/POSIX), Python, Cocoa, iOS.
Anyhow, the only things that are almost ready for public consumption are the security-related libraries/modules. (Each succeeding project builds on the previous.) All are BSD licensed. sha-asaddi Everything SHA, from SHA-1 to SHA-2 (SHA-224, SHA-256, SHA-384, SHA-512) to the newest truncated editions (SHA-512/224, SHA-512/256). Also includes HMAC wrappers for each. This is a cleaned up and refactored version of my previous command-line sha project. Still WIP though. pbkdf2 Implementations of PBKDF2 in C and Python. The C version depends on sha-asaddi, though I suppose it can easily be swapped to use any other SHA/HMAC implementations with similar signatures. The Python version has no dependencies outside of the standard library. scrypt Implementations of Colin Percival's scrypt in C and pure Python. This was more of an academic exercise since Mr. Percival's version is far more optimized and is already available under a BSD license. (At least, I'm assuming it's more optimized — I haven't actually looked at his code yet. But he did seem to have assembly/SSE optimized versions.) The Python version is probably too slow to be practical. (A Python wrapper for the original scrypt implementation is already available.) Will gladly take optimization suggestions for the Python version! Oh, and these three projects represent a general shift in my packaging/building ideals. For now, I've decided to ditch GNU's autotools in favor of CMake for my projects. CMake just seems like a more coherent tool for configuring/building in a platform-independent manner. But, that just, like, my opinion, man... Sunday, April 10. 2011sha updated
It's strange, but my most ubiquitous work is probably sha (or rather, its individual hash implementations). I lost track of the number of random products I've run across that have a copyright/license attribution to me. SHA implementations are a dime a dozen though, and I'm sure the versions in OpenSSL (libcrypto) draw the most users.
But I have to wonder, what draws developers to mine? The random portability/optimization knobs? The BSD license? Anyway, I reworked my SHA implementations today. Brought the API to my naming/style standards for this decade. Added SHA-512/224 and SHA-512/256. And, from the work on pam_totp, added HMAC wrappers. Of course, with the addition of HMAC, I had to work it into sha (the utility) as well, which I did. So after a little over 7 years, here's a new sha release... sha 1.1 tarball Saturday, April 9. 2011CASMiddleware
For anyone not familiar with it, CAS is an open source single-sign on service. It's implemented in Java, but I have stumbled across a CAS server implemented in Python. (Which I haven't tried yet.)
Anyway, I've been running my own CAS server for a while now (which provides SSO service for all the webapps on my server). I actually have my own Python CAS client and WSGI middleware that I've been using for years. In addition to a number of my Python webapps, I've successfully used it to CASsify MoinMoin. I'm sure I could get it to work with Trac too, but that would probably annoy my Trac users. casclient and casmiddleware were part of of my 'py-lib' project (if anyone remembers that). Though 'py-lib' was never really a project. It was more like a directory of Python modules I had written for myself — a place to incubate early Python projects (flup's fcgi.py started there, along with the other pre-1.0 flup modules). But when I moved beyond Subversion, py-lib was abandoned. I was reminded of the fact that my Python CAS stuff was never really released due to all the work-related CAS work I've been doing lately. JA-SIG has their own Python CAS code, but it seems to be more geared toward CGI. I'm sure a few of the various Python authentication frameworks out there have a CAS module. But anyway, I just wanted to get the ball rolling and get my own CASMiddleware 'out there' — in case anyone else finds it useful. It's pretty small and only depends on Beaker. For now, it's only available at my Hg repo. I'll send it over to PyPI when I get the chance. (Hah! I should have searched PyPI first, so many hits for 'CAS'. Oh well...) casmiddleware Wednesday, March 16. 2011scgi-wsgi for Python 3.2
I merged all the Python 3.x-specific commits in ajp-wsgi-py3.0 into scgi-wsgi and surprisingly, it was painless and (seemingly) functional on the first try.
scgi-wsgi for Python 3.2 (Hg repo) (Don't forget to clone scgi, threadpool and preforkserver as well.) Really cool stuff. Though admittedly, most of my amazement comes from how easy it was to cherry-pick and merge specific revisions thanks to Mercurial. Addendum: Oh yeah, I fixed ajp-wsgi-py3.0 so it compiles under Python 3.2. Even though the PyCapsule stuff was backported to Python 2.7, I won't be merging that change over. I still want the Python 2.x version to work with Python as old as 2.4. While shaking fist at mod_proxy_scgi...
It was brought to my attention that my trac sites were no longer authenticating the generic user I set up. Apparently, something I missed in my switch to SCGI...
But it's not scgi-wsgi's fault, really. I wish I had more time to figure out what was really going on. In the mean time, I will probably switch away from mod_proxy_scgi (to mod_scgi) where possible... Monday, February 7. 2011scgi-wsgi 1.1 released
I haven't run into any problems with scgi-wsgi in the few weeks that I've been using it. My issues with SCGI stem from mod_scgi and mod_proxy_scgi.
Namely, I think mod_scgi's handling of environment variables is perfect (see my previous rants about mod_proxy_scgi). It's even slightly faster than mod_proxy_scgi. But configuration leaves a bit to be desired. Say I want to mount an application at the root but I want the path "/static" to be handled by the web server. How does one do this? I'm guessing it has something to do with the SCGIHandler on/off directive (in fact, it probably is that simple) — but that method is a bit way too verbose, especially if you have multiple exclusions. I like the way the mod_proxy_* stuff is configured (namely with ProxyPass). It's easy to exclude directories. And the directives are parsed in order, meaning configuration is sane and intuitive (at least to me). You specify the most specific entries first. But again, see my rant about mod_proxy_scgi's environment handling. Anyway, scgi-wsgi incorporates both a thread pool and a process pool. Since handling is done on a per-request basis, I thought it would be poor form if scgi-wsgi simply closed the connection from the web server if the pool was full — so instead, requests received are queued when the pool is over capacity. (This is in contrast to ajp-wsgi which does drop connections, but again, there's no 1:1 mapping between AJP connections and requests.) So you can have your app running on a single thread/process and still dozens or hundreds of concurrent requests. Those requests won't be served in a timely manner, but at least they will be served. On my server, scgi-wsgi is one of the fastest WSGI servers I've tried (that's out of flup, ajp-wsgi, Paste's scgiserver, and CherryPy's web server). I have no idea what the state of the art in the WSGI world is, but it was certainly impressive enough to get me to write the thread/process pool code and start switching over my applications. And unlike CherryPy, it had the disadvantage (like the others) of sitting behind Apache httpd. (Though the fact that CherryPy is quite fast and speaks HTTP natively will probably stave off any urge to glue together an embeddable C web server and my C WSGI code... for a while. So anyway: scgi-wsgi. I've put it in its own space for now. Additional, more interesting links below. ChangeLog Tarball Hg Repository Wednesday, February 2. 2011ajp-wsgi 1.1 released
Well, it's been a few weeks since I've touched (or had the need to touch) ajp-wsgi. I've been running it a few places — sometimes forked if the app allowed it.
Anyway, the biggest new feature is, of course, forking support. I think now is a good time to "release" it, as any. The 1.1 release will probably be the last for ajp-wsgi (barring any bugfix releases). I imagine I'll be folding AJP support into scgi-wsgi in the future (and renaming it all too). At this point, scgi-wsgi is a bit more advanced than ajp-wsgi with its thread pooling and pre-fork multiprocessing. Though neither of those features are really required (or make sense) with AJP's persistent connections. scgi-wsgi 1.1 will follow in a few days. ChangeLog Tarball Hg Repository Sunday, January 23. 2011mod_proxy_scgi, why?!?!
I went ahead and installed the original mod_scgi and it opened my eyes to how utterly different (or maybe broken) mod_proxy_scgi's handling of SCRIPT_NAME/PATH_INFO was.
I've been poring over the source for both modules, and though it's been years since I've coded Apache modules, I couldn't find any glaring differences. mod_scgi set the variables itself using some particular logic, while mod_proxy_scgi called ap_add_cgi_vars() to set the variables. That function appears to use the same logic. So in spite of all that, I just cannot explain why Apache's mod_proxy_scgi is so broken. To give you an idea of how broken, given the request URI of "/foo/bar%20garply", it would set SCRIPT_NAME/PATH_INFO as follows: SCRIPT_NAME: /foo/bar garplyPATH_INFO: /foo/bar%20garplyThe sane thing to do (which mod_scgi does) is: SCRIPT_NAME: (empty) PATH_INFO: /foo/bar garplySo not only is SCRIPT_NAME utterly wrong for something mounted at the root, but PATH_INFO remains quoted. My solution involves re-deriving SCRIPT_NAME/PATH_INFO from REQUEST_URI. Of course, mod_scgi doesn't need this, and in fact, shouldn't use it all since it sets SCRIPT_NAME correctly for non-root-mounted applications (removing the need to manually specify it). Since I wouldn't trust auto-detection of the SCGI module, I decided to leave it up to the invoker of scgi-wsgi to decide what's right. Which lead to this lovely feature being created (from scgi-wsgi's README): Due to the inconsistency between the various SCGI connectors, you may need to specify an environment profile using the -E option. The default profile is pass-through. The profiles are described below:So... yeah. I'm not too happy about it, and it diminishes my motivation to switch over to SCGI. I like the mod_proxy* stuff because it lets me use load balancing and/or set backend connection limits. But between the two, only mod_scgi seems to be the saner implementation. (As mentioned, I've yet to try nginx and lighttpd). If I can gather code/test cases into a coherent bug report, I'll be posting one with Apache. Because really, the presence of quoted characters in the URL should not throw everything off. Friday, January 21. 2011scgi-wsgi!
Shortly after my last entry, I switched from a CPU-bound WSGI app to something comparatively more I/O heavy: serving a static 10K page. I then tried it out with the non-threadpool version of scgi-wsgi. The results surprised me. scgi-wsgi was doing well over 700 (nearly 800 at times) requests per second while ajp-wsgi only mustered 300 requests/sec.
Though of course, like before, when I switched to the non-preforking scgi-wsgi, its throughput dropped to a little over 50 requests/sec. ajp-wsgi maintained 300 requests/sec even while forking. Given the non-threadpool performance of scgi-wsgi, I was spurred to write my own preforking server code (again), this time in C. The result can be found here. Unlike flup's preforking server, this one is based on descriptor passing. (And since I couldn't find my copy of UNIX Network Programming, I have to thank Google for having it browsable online. Hooking up the prefork server into scgi-wsgi, I now get similar performance to the threaded version: 700+ requests/sec. With that, scgi-wsgi graduated from 'limbo' to 'alpha.' As for the future, I would like to merge the two. I'm still entertaining the idea of hooking up my C WSGI code to an embedded HTTP server, similar to what PyCaduceus did (but remaining a top-level program rather than a Python module). I haven't really found an embeddable HTTP server with an interface that I like, though mongoose looks promising. In the meantime, I'll go ahead and release ajp-wsgi 1.1 and scgi-wsgi 1.1 (eventually). scgi-wsgi Hg repository (don't forget to clone scgi and preforkserver as well) scgi-wsgi tarball Monday, January 17. 2011scgi-wsgi?
I just recently discovered that Apache HTTPD now ships with mod_proxy_scgi. (Probably old news to some, that was 3 patch versions ago.) Couple that with the fact that I stumbled upon PyCaduceus again recently (which proves to me that the C WSGI code written for ajp-wsgi was relatively transport-independent)... So I decided to spend a few hours today to writing an SCGI driver in C.
Well, it was actually pretty easy considering the SCGI spec is only a page long. DJB's netstrings description was just as short. Replacing AJP in ajp-wsgi was straightforward as well. In fact, building the WSGI environment is a lot simpler with SCGI since the "headers" from the web server don't need to be re-interpreted/converted. (And I've yet to test this, but I don't think specifying the scriptName is necessary anymore.) Once scgi-wsgi was in a working state, I was curious how it compared to ajp-wsgi. The results were surprising at first, but later made sense once I figured out what was happening. These were on my server using an extremely compute-bound WSGI application:
So... yeah. scgi-wsgi is slightly slower when threading and significantly slower when forking. Despite SCGI's simplicity, the explanation for this is that AJP uses persistent "backend" connections while SCGI uses one connection per request. In other words, the SCGI version was heavily penalized because there was no thread pooling or process pooling. So where do I go from here? I'll probably just continue on with ajp-wsgi 1.1 as planned and leave scgi-wsgi in limbo. (I was entertaining the idea of merging the two.) SCGI needs thread/process pools to be competitive. Unless I find C implementations with a friendly-enough license (i.e. not copyleft), it's probably not worth writing my own pool implementations just for scgi-wsgi. The forking reference implementation (now found in Paste, I believe) is still probably the best. (As an aside, since the SCGI protocol is so simple, moving the implementation to C to avoid the GIL probably doesn't improve things much over a pure-Python implementation.) Hg Repository Addendum: Apparently, the C implementations shine over their pure-Python counterparts when uploading files (i.e. large request body). When uploading a ~100MB file, ajp-wsgi and scgi-wsgi took ~6 seconds and ~4 seconds respectively (simplicity wins out!) When using flup ajp/scgi and Paste's scgiserver, I couldn't be bothered to wait for the upload to complete. (It was well over a few minutes for each before I canceled.) Thursday, January 13. 2011ajp-wsgi 1.1 alpha
I was curious about what it would take to add forking/multi-process support to ajp-wsgi and apparently the answer is: not much!
Well, it becomes trivial when you don't bother with preforking or process pools. I don't think an AJP backend really needs to worry much about saving fork() time since connections (and therefore processes) are more or less persistant. I've done some basic stress testing using ApacheBench (ab): 100 concurrent requests, 500 concurrent requests. It's looking promising (no errors!). I haven't done any extensive testing between threaded vs. forking, but the results seem to be the same (using a freshly created Pylons app). Obviously the GIL isn't really being exercised when serving a static ~5K page. And I did learn one oddity about mod_proxy/mod_proxy_ajp — apparently, when you specify max connections, you're actually specifying max connections per httpd process. On my server, where httpd uses the worker MPM (hybrid thread/process), the hard process limit is apparently 16, so even specifying max=1 means ajp-wsgi can expect up to 16 connections. (Conveniently for me, ajp-wsgi's default process limit is 16.) Something to keep in mind when tweaking ajp-wsgi's maxConnections parameter. (Also note that if mod_proxy_ajp ever goes over ajp-wsgi's maxConnections limit, users will see HTTP 503 errors.) Anyway, I think I'll stick to threading for now. But 1.1 alpha is out there for anyone interested... ChangeLog Tarball Hg repository Sunday, December 12. 2010ajp-wsgi 1.0.3 pulled, 1.0.4 released
I just discovered a rather fatal bug introduced in 1.0.3 which leads to high CPU utilization when the web server closes transport connections. I only just discovered the issue tonight, when all of my ajp-wsgi instances went full bore and brought my server's load average to over 15. (Apparently, Apache decided it was a good time to reap stale AJP connections.)
Anyway, I'm kicking myself, since this problem was also seen in flup but was patched some time ago. Shows me not to make post-midnight releases, especially after such short testing cycles. Oh wait... Tarball Saturday, December 11. 2010pam_oath, where art thou?
I've been wanting to install some sort of two-factor authentication scheme on my server for a while now. There's Google Authenticator, but unfortunately, it appears to be written for Linux-PAM and is rife with Linuxisms. But all was not lost, however, as it lead me to OATH and its related specs, HOTP and TOTP authentication.
It turns out that HOTP/TOTP is relatively simple — solely based on HMAC-SHA1. Great, I thought. I just needed an HMAC implementation... and I also needed to learn how to write a PAM module (specifically, an OpenPAM module, which is what FreeBSD uses). And yes, I know that "PAM module" is technically redundant, no one needs to point that out. So I studied RFC 4226, RFC 2104 and this useful article about OpenPAM. I've been doing that in my spare time for a few weeks now. It wasn't until this morning that I decided to start writing some code. And in a few hours, I had HMAC-SHA1 (built on top of my SHA1 implementation... I wanted to avoid libcrypto to keep things lightweight), HOTP, and finally a working pam_totp. I went with TOTP-only for now as that's what I wanted and I didn't really fancy keeping state for each user (aside from their keys). (But as an aside, it looks like I'll need to keep state anyway if I want to avoid replay attacks and have some clock drift tracking.) Anyway, what I have is in an extreme alpha state, but needless to say, I've already installed it into my sshd PAM auth chain. I won't bother releasing anything, as the intended audience is rather small (FreeBSD admins who want TOTP auth). Maybe I'll work on it more someday, add event-based HOTP support, develop it into a true pam_oath (which I couldn't find anywhere, strangely). But at least that itch has been scratched... Wednesday, December 8. 2010ajp-wsgi 1.0.3 released
New version of ajp-wsgi released, with the following change:
Tarball HG repository (also here for ajp) Thursday, June 17. 2010ajp-wsgi 1.0.2 released
Released version 1.0.2 of ajp-wsgi (there was no 1.0.1 release, apparently).
It's made up mainly of Mac OS X build fixes, though there were 2 general bugs:
Tarball HG Repository
(Page 1 of 6, totaling 83 entries)
» next page
|
Calendar
QuicksearchCategoriesSyndicate This BlogBlog Administration |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
