Saturday, May 26. 2012
TomEE 1.0 released recently and due to that, I decided to revisit the land of EJBs once again. (Especially what is known as "EJB 3.1 Lite.") I've been a heavy Spring user both with work projects and personal projects, so EJBs have never really been on my radar. And for whatever reason, I always gravitate toward the transaction manager/database side whenever I try out a new container.
Anyway, setting up TomEE/OpenEJB with local transactions is pretty straightforward. What's problematic (and doesn't seem to be well documented at all) is setting up XA datasources.
There are two major problems:
Fairly easy. And to set more properties in Definition, simply use semi-colons to delimit multiple properties.
Now, to use PostgreSQL with TomEE/OpenEJB you have to first create a dummy class that extends PGXADataSource and implements javax.sql.DataSource. If you're using JDK 6/JDBC4, you will most likely have to implement a few more methods as well. For now, it seems ok if the methods simply throw UnsupportedOperationExceptions or similar, since the PostgreSQL JDBC4 classes do the same.
To solve problem #2, you can either use the PostgreSQL 8.4 driver (which doesn't have the problematic interaction with DBCP), or patch up DBCP 1.4 with the patch I attached to DBCP-356. Once that's done, the PostgreSQL XA datasource definition is pretty simple:
The only caveat is that PGXADataSource does not take a URL property, so you will have to pass the database name, server name, port, etc. as separate properties in Definition.
Wednesday, June 29. 2011
It seems like I've been working on so many random things the past few weeks. And all on different platforms: C (UNIX/POSIX), Python, Cocoa, iOS.
Anyhow, the only things that are almost ready for public consumption are the security-related libraries/modules. (Each succeeding project builds on the previous.) All are BSD licensed.
Everything SHA, from SHA-1 to SHA-2 (SHA-224, SHA-256, SHA-384, SHA-512) to the newest truncated editions (SHA-512/224, SHA-512/256). Also includes HMAC wrappers for each. This is a cleaned up and refactored version of my previous command-line sha project. Still WIP though.
Implementations of PBKDF2 in C and Python. The C version depends on sha-asaddi, though I suppose it can easily be swapped to use any other SHA/HMAC implementations with similar signatures. The Python version has no dependencies outside of the standard library.
Implementations of Colin Percival's scrypt in C and pure Python. This was more of an academic exercise since Mr. Percival's version is far more optimized and is already available under a BSD license. (At least, I'm assuming it's more optimized — I haven't actually looked at his code yet. But he did seem to have assembly/SSE optimized versions.) The Python version is probably too slow to be practical. (A Python wrapper for the original scrypt implementation is already available.) Will gladly take optimization suggestions for the Python version!
Oh, and these three projects represent a general shift in my packaging/building ideals. For now, I've decided to ditch GNU's autotools in favor of CMake for my projects. CMake just seems like a more coherent tool for configuring/building in a platform-independent manner. But, that just, like, my opinion, man...
Sunday, April 10. 2011
It's strange, but my most ubiquitous work is probably sha (or rather, its individual hash implementations). I lost track of the number of random products I've run across that have a copyright/license attribution to me. SHA implementations are a dime a dozen though, and I'm sure the versions in OpenSSL (libcrypto) draw the most users.
But I have to wonder, what draws developers to mine? The random portability/optimization knobs? The BSD license?
Anyway, I reworked my SHA implementations today. Brought the API to my naming/style standards for this decade. Added SHA-512/224 and SHA-512/256. And, from the work on pam_totp, added HMAC wrappers.
Of course, with the addition of HMAC, I had to work it into sha (the utility) as well, which I did.
So after a little over 7 years, here's a new sha release...
sha 1.1 tarball
Saturday, April 9. 2011
For anyone not familiar with it, CAS is an open source single-sign on service. It's implemented in Java, but I have stumbled across a CAS server implemented in Python. (Which I haven't tried yet.)
Anyway, I've been running my own CAS server for a while now (which provides SSO service for all the webapps on my server). I actually have my own Python CAS client and WSGI middleware that I've been using for years. In addition to a number of my Python webapps, I've successfully used it to CASsify MoinMoin. I'm sure I could get it to work with Trac too, but that would probably annoy my Trac users. (It works with any app that simply relies on REMOTE_USER.)
casclient and casmiddleware were part of of my 'py-lib' project (if anyone remembers that). Though 'py-lib' was never really a project. It was more like a directory of Python modules I had written for myself — a place to incubate early Python projects (flup's fcgi.py started there, along with the other pre-1.0 flup modules).
But when I moved beyond Subversion, py-lib was abandoned.
I was reminded of the fact that my Python CAS stuff was never really released due to all the work-related CAS work I've been doing lately. JA-SIG has their own Python CAS code, but it seems to be more geared toward CGI. I'm sure a few of the various Python authentication frameworks out there have a CAS module. But anyway, I just wanted to get the ball rolling and get my own CASMiddleware 'out there' — in case anyone else finds it useful.
It's pretty small and only depends on Beaker.
For now, it's only available at my Hg repo. I'll send it over to PyPI when I get the chance. (Hah! I should have searched PyPI first, so many hits for 'CAS'. Oh well...)
Wednesday, March 16. 2011
I merged all the Python 3.x-specific commits in ajp-wsgi-py3.0 into scgi-wsgi and surprisingly, it was painless and (seemingly) functional on the first try.
scgi-wsgi for Python 3.2 (Hg repo)
(Don't forget to clone scgi, threadpool and preforkserver as well.)
Really cool stuff. Though admittedly, most of my amazement comes from how easy it was to cherry-pick and merge specific revisions thanks to Mercurial.
Addendum: Oh yeah, I fixed ajp-wsgi-py3.0 so it compiles under Python 3.2. Even though the PyCapsule stuff was backported to Python 2.7, I won't be merging that change over. I still want the Python 2.x version to work with Python as old as 2.4.
It was brought to my attention that my trac sites were no longer authenticating the generic user I set up. Apparently, something I missed in my switch to SCGI...
But it's not scgi-wsgi's fault, really. I narrowed it down to mod_proxy_scgi (obviously, by the title of this entry!). mod_scgi works fine. mod_proxy_scgi does not seem to pass the HTTP Authorization header along. I'm currently looking around for flags/options to remedy this with no luck. A quick Google search revealed something similar, but not quite. Though that may address the SCRIPT_NAME/PATH_INFO issues I've seen... in a future version of Apache HTTPD.
I wish I had more time to figure out what was really going on. In the mean time, I will probably switch away from mod_proxy_scgi (to mod_scgi) where possible...
Monday, February 7. 2011
I haven't run into any problems with scgi-wsgi in the few weeks that I've been using it. My issues with SCGI stem from mod_scgi and mod_proxy_scgi.
Namely, I think mod_scgi's handling of environment variables is perfect (see my previous rants about mod_proxy_scgi). It's even slightly faster than mod_proxy_scgi. But configuration leaves a bit to be desired. Say I want to mount an application at the root but I want the path "/static" to be handled by the web server. How does one do this? I'm guessing it has something to do with the SCGIHandler on/off directive (in fact, it probably is that simple) — but that method is a bit way too verbose, especially if you have multiple exclusions.
I like the way the mod_proxy_* stuff is configured (namely with ProxyPass). It's easy to exclude directories. And the directives are parsed in order, meaning configuration is sane and intuitive (at least to me). You specify the most specific entries first.
But again, see my rant about mod_proxy_scgi's environment handling.
Anyway, scgi-wsgi incorporates both a thread pool and a process pool. Since handling is done on a per-request basis, I thought it would be poor form if scgi-wsgi simply closed the connection from the web server if the pool was full — so instead, requests received are queued when the pool is over capacity. (This is in contrast to ajp-wsgi which does drop connections, but again, there's no 1:1 mapping between AJP connections and requests.) So you can have your app running on a single thread/process and still dozens or hundreds of concurrent requests. Those requests won't be served in a timely manner, but at least they will be served.
On my server, scgi-wsgi is one of the fastest WSGI servers I've tried (that's out of flup, ajp-wsgi, Paste's scgiserver, and CherryPy's web server). I have no idea what the state of the art in the WSGI world is, but it was certainly impressive enough to get me to write the thread/process pool code and start switching over my applications. And unlike CherryPy, it had the disadvantage (like the others) of sitting behind Apache httpd. (Though the fact that CherryPy is quite fast and speaks HTTP natively will probably stave off any urge to glue together an embeddable C web server and my C WSGI code... for a while. )
So anyway: scgi-wsgi. I've put it in its own space for now. Additional, more interesting links below.
Wednesday, February 2. 2011
Well, it's been a few weeks since I've touched (or had the need to touch) ajp-wsgi. I've been running it a few places — sometimes forked if the app allowed it.
Anyway, the biggest new feature is, of course, forking support. I think now is a good time to "release" it, as any.
The 1.1 release will probably be the last for ajp-wsgi (barring any bugfix releases). I imagine I'll be folding AJP support into scgi-wsgi in the future (and renaming it all too). At this point, scgi-wsgi is a bit more advanced than ajp-wsgi with its thread pooling and pre-fork multiprocessing. Though neither of those features are really required (or make sense) with AJP's persistent connections.
scgi-wsgi 1.1 will follow in a few days.
Sunday, January 23. 2011
I went ahead and installed the original mod_scgi and it opened my eyes to how utterly different (or maybe broken) mod_proxy_scgi's handling of SCRIPT_NAME/PATH_INFO was.
I've been poring over the source for both modules, and though it's been years since I've coded Apache modules, I couldn't find any glaring differences. mod_scgi set the variables itself using some particular logic, while mod_proxy_scgi called ap_add_cgi_vars() to set the variables. That function appears to use the same logic.
So in spite of all that, I just cannot explain why Apache's mod_proxy_scgi is so broken. To give you an idea of how broken, given the request URI of "/foo/bar%20garply", it would set SCRIPT_NAME/PATH_INFO as follows:
The sane thing to do (which mod_scgi does) is:
So not only is SCRIPT_NAME utterly wrong for something mounted at the root, but PATH_INFO remains quoted.
My solution involves re-deriving SCRIPT_NAME/PATH_INFO from REQUEST_URI. Of course, mod_scgi doesn't need this, and in fact, shouldn't use it all since it sets SCRIPT_NAME correctly for non-root-mounted applications (removing the need to manually specify it).
Since I wouldn't trust auto-detection of the SCGI module, I decided to leave it up to the invoker of scgi-wsgi to decide what's right. Which lead to this lovely feature being created (from scgi-wsgi's README):
Due to the inconsistency between the various SCGI connectors, you may need to specify an environment profile using the -E option. The default profile is pass-through. The profiles are described below:So... yeah. I'm not too happy about it, and it diminishes my motivation to switch over to SCGI. I like the mod_proxy* stuff because it lets me use load balancing and/or set backend connection limits. But between the two, only mod_scgi seems to be the saner implementation. (As mentioned, I've yet to try nginx and lighttpd).
If I can gather code/test cases into a coherent bug report, I'll be posting one with Apache. Because really, the presence of quoted characters in the URL should not throw everything off.
Friday, January 21. 2011
Shortly after my last entry, I switched from a CPU-bound WSGI app to something comparatively more I/O heavy: serving a static 10K page. I then tried it out with the non-threadpool version of scgi-wsgi. The results surprised me. scgi-wsgi was doing well over 700 (nearly 800 at times) requests per second while ajp-wsgi only mustered 300 requests/sec.
Though of course, like before, when I switched to the non-preforking scgi-wsgi, its throughput dropped to a little over 50 requests/sec. ajp-wsgi maintained 300 requests/sec even while forking.
Given the non-threadpool performance of scgi-wsgi, I was spurred to write my own preforking server code (again), this time in C. The result can be found here. Unlike flup's preforking server, this one is based on descriptor passing. (And since I couldn't find my copy of UNIX Network Programming, I have to thank Google for having it browsable online. )
Hooking up the prefork server into scgi-wsgi, I now get similar performance to the threaded version: 700+ requests/sec. With that, scgi-wsgi graduated from 'limbo' to 'alpha.' And since mod_proxy_scgi is now included with vanilla Apache HTTPD, I will probably be transitioning my stuff from ajp-wsgi to scgi-wsgi. (Eat your own dog food and all that.)
As for the future, I would like to merge the two. I'm still entertaining the idea of hooking up my C WSGI code to an embedded HTTP server, similar to what PyCaduceus did (but remaining a top-level program rather than a Python module). I haven't really found an embeddable HTTP server with an interface that I like, though mongoose looks promising.
In the meantime, I'll go ahead and release ajp-wsgi 1.1 and scgi-wsgi 1.1 (eventually).
scgi-wsgi Hg repository (don't forget to clone scgi and preforkserver as well)
Monday, January 17. 2011
I just recently discovered that Apache HTTPD now ships with mod_proxy_scgi. (Probably old news to some, that was 3 patch versions ago.) Couple that with the fact that I stumbled upon PyCaduceus again recently (which proves to me that the C WSGI code written for ajp-wsgi was relatively transport-independent)... So I decided to spend a few hours today to writing an SCGI driver in C.
Well, it was actually pretty easy considering the SCGI spec is only a page long. DJB's netstrings description was just as short.
Replacing AJP in ajp-wsgi was straightforward as well. In fact, building the WSGI environment is a lot simpler with SCGI since the "headers" from the web server don't need to be re-interpreted/converted. (And I've yet to test this, but I don't think specifying the scriptName is necessary anymore.)
Once scgi-wsgi was in a working state, I was curious how it compared to ajp-wsgi. The results were surprising at first, but later made sense once I figured out what was happening. These were on my server using an extremely compute-bound WSGI application:
So... yeah. scgi-wsgi is slightly slower when threading and significantly slower when forking. Despite SCGI's simplicity, the explanation for this is that AJP uses persistent "backend" connections while SCGI uses one connection per request. In other words, the SCGI version was heavily penalized because there was no thread pooling or process pooling.
So where do I go from here? I'll probably just continue on with ajp-wsgi 1.1 as planned and leave scgi-wsgi in limbo. (I was entertaining the idea of merging the two.) SCGI needs thread/process pools to be competitive. Unless I find C implementations with a friendly-enough license (i.e. not copyleft), it's probably not worth writing my own pool implementations just for scgi-wsgi. The forking reference implementation (now found in Paste, I believe) is still probably the best.
(As an aside, since the SCGI protocol is so simple, moving the implementation to C to avoid the GIL probably doesn't improve things much over a pure-Python implementation.)
Addendum: Apparently, the C implementations shine over their pure-Python counterparts when uploading files (i.e. large request body). When uploading a ~100MB file, ajp-wsgi and scgi-wsgi took ~6 seconds and ~4 seconds respectively (simplicity wins out!) When using flup ajp/scgi and Paste's scgiserver, I couldn't be bothered to wait for the upload to complete. (It was well over a few minutes for each before I canceled.)
Thursday, January 13. 2011
I was curious about what it would take to add forking/multi-process support to ajp-wsgi and apparently the answer is: not much!
Well, it becomes trivial when you don't bother with preforking or process pools. I don't think an AJP backend really needs to worry much about saving fork() time since connections (and therefore processes) are more or less persistant.
I've done some basic stress testing using ApacheBench (ab): 100 concurrent requests, 500 concurrent requests. It's looking promising (no errors!). I haven't done any extensive testing between threaded vs. forking, but the results seem to be the same (using a freshly created Pylons app). Obviously the GIL isn't really being exercised when serving a static ~5K page.
And I did learn one oddity about mod_proxy/mod_proxy_ajp — apparently, when you specify max connections, you're actually specifying max connections per httpd process. On my server, where httpd uses the worker MPM (hybrid thread/process), the hard process limit is apparently 16, so even specifying max=1 means ajp-wsgi can expect up to 16 connections. (Conveniently for me, ajp-wsgi's default process limit is 16.) Something to keep in mind when tweaking ajp-wsgi's maxConnections parameter.
(Also note that if mod_proxy_ajp ever goes over ajp-wsgi's maxConnections limit, users will see HTTP 503 errors.)
Anyway, I think I'll stick to threading for now. But 1.1 alpha is out there for anyone interested...
Sunday, December 12. 2010
I just discovered a rather fatal bug introduced in 1.0.3 which leads to high CPU utilization when the web server closes transport connections. I only just discovered the issue tonight, when all of my ajp-wsgi instances went full bore and brought my server's load average to over 15. (Apparently, Apache decided it was a good time to reap stale AJP connections.)
Anyway, I'm kicking myself, since this problem was also seen in flup but was patched some time ago. Shows me not to make post-midnight releases, especially after such short testing cycles. Oh wait...
Saturday, December 11. 2010
I've been wanting to install some sort of two-factor authentication scheme on my server for a while now. There's Google Authenticator, but unfortunately, it appears to be written for Linux-PAM and is rife with Linuxisms. But all was not lost, however, as it lead me to OATH and its related specs, HOTP and TOTP authentication.
It turns out that HOTP/TOTP is relatively simple — solely based on HMAC-SHA1. Great, I thought. I just needed an HMAC implementation... and I also needed to learn how to write a PAM module (specifically, an OpenPAM module, which is what FreeBSD uses). And yes, I know that "PAM module" is technically redundant, no one needs to point that out.
So I studied RFC 4226, RFC 2104 and this useful article about OpenPAM. I've been doing that in my spare time for a few weeks now. It wasn't until this morning that I decided to start writing some code.
And in a few hours, I had HMAC-SHA1 (built on top of my SHA1 implementation... I wanted to avoid libcrypto to keep things lightweight), HOTP, and finally a working pam_totp. I went with TOTP-only for now as that's what I wanted and I didn't really fancy keeping state for each user (aside from their keys). (But as an aside, it looks like I'll need to keep state anyway if I want to avoid replay attacks and have some clock drift tracking.)
Anyway, what I have is in an extreme alpha state, but needless to say, I've already installed it into my sshd PAM auth chain. As for a token generator, I use a nice, free iOS app called OATH Token.
I won't bother releasing anything, as the intended audience is rather small (FreeBSD admins who want TOTP auth). Maybe I'll work on it more someday, add event-based HOTP support, develop it into a true pam_oath (which I couldn't find anywhere, strangely). But at least that itch has been scratched...
Wednesday, December 8. 2010
(Page 1 of 6, totaling 82 entries) » next page
Syndicate This Blog