As I've been searching for a solution to my big-data analysis problems, I've been very impressed by MongoDB's features, but even more by their astonishing level of support. After mentioning I was having trouble running Mongo in my benchmark, Kristina from 10Gen not only fixed the bug in my code, she then emailed me an optimization (using the built-in _id in my array), and after that 10gen's Mathias Stearn let me know the latest build contained some more optimizations for that path. After burning days dealing with obscure Tokyo problems I know how much time responsive support can save, so it makes me really want to use Mongo for my work.
The only fly in the ointment was the lack of Unix domain socket support. I'm running my analysis jobs on the same machine as the database, and as you can see from my benchmark results, using a file socket rather than TCP on local host speeds up my runs significantly on the other stores that support it. I already added support to Redis, so I decided to dive into the Mongo codebase and see what I could manage.
Here's a patch implementing domain sockets, including a diff and complete copies of the files I changed. Running the same benchmarks gives me a time of 35.8s, vs 43.9s over TCP, and 28.9s with the RAM cache vs 31.1s on TCP. These figures are only representative for my case, large values on a single machine, but generally they demonstrate the overhead of TCP sockets even if you're using localhost. To use it yourself, specify a port number of zero, and put the socket location (eg /tmp/mongo.sock) instead of the host name. I've patched the server, the command-line shell, and the PHP driver to all support file sockets this way.
I don't know what Mongo's policy is on community contributions, I primarily wrote this patch to scratch my own itch, but I hope something like this will make it into the main branch. Writing the code is the easy bit of course, the real challenge is testing it across all the platforms and configurations!
Comments