Thought I’d share a recent setup I’ve implemented in a Python webapp: offloading heavier tasks and alerting the user on completion in real-time. To do this you’ll need a job queue and a way to send messages back to the client. The reason for the job queue should be obvious; if you’re doing something that requires background processing in the first place, chances are you don’t want to open the possibility of hundreds or thousands of instances of this happening at the same time. Being able to send messages back to the client in real-time will allow you to let them know when a job is complete without requiring them to refresh the page.
Setting up beanstalkd
beanstalkd is a job queue created by the people behind the Causes app on Facebook. You’ll need to download and install, then run (on port 11300 by default). There are plenty of client libraries, but since I’m using Python I went with beanstalkc. Jobs in beanstalkd are stored in ‘tubes’. For this example we'll just use the tube ‘jabroni’, because what an awesome word.
Setting up Nginx HTTP Push Module
Using the Nginx HTTP Push Module (NHPM) for notifications is nice if you’re already using Nginx anyways. The only downside so far is that you have to recompile Nginx to use it. To configure with a fresh Nginx install:
mkdir ~/sources && cd ~/sources
wget http://pushmodule.slact.net/downloads/nginx_http_push_module-0.692.tar.gz
wget http://nginx.org/download/nginx-1.0.11.tar.gz
tar -xvzf nginx_http_push_module-0.692.tar.gz
tar -xvzf nginx-1.0.11.tar.gz
cd nginx-1.0.11
./configure --add-module=~/sources/nginx_http_push_module-0.692
make
sudo make install
NHPM uses ‘channels’, which are published and subscribed to. We're going to set the URLs to /pub and /sub. In your site’s Nginx server directive:
location /pub {
set $push_channel_id $arg_id;
push_publisher;
push_store_messages off;
push_max_message_buffer_length 10;
}
location /sub {
push_subscriber;
push_subscriber_concurrency broadcast;
set $push_channel_id $arg_id;
default_type text/plain;
}
The Worker
With jobs you need workers. Your worker will watch a tube, wait for a job until it gets one, take it out of the queue, deal with it, notify the user, then delete it from the queue (unless of course something goes wrong). Now is a good time to design your workload format/schema (I prefer JSON). Here is an example worker in Python with beanstalkc and requests to simplify the post to the pub address:
import beanstalkc, json, requests, time
nhpm_pub_addr = 'http://your-url.net/pub'
beanstalk = beanstalkc.Connection(host='localhost', port=11300)
beanstalk.watch('jabroni')
def wait_for_work():
while True:
# Block and wait for a job
job = beanstalk.reserve()
# Load JSON from job body
workload = json.loads(job.body)
# Do work! Or sleep, to pretend you're doing work.
time.sleep(10)
# Let the user know the job is complete, sending back whatever
message = { 'event': 'job-complete' }
requests.post(nhpm_pub_addr, data=message)
# Delete the job and wait for another
job.delete()
if __name__ == '__main__':
wait_for_work()
You should use something like supervisord to keep this running, and implement some sort of logging as well.
Adding a Job
Now instead of making your users feel like you’re running your site off of a cellphone in Russia, you just add a job to the queue. You need to make sure to include a unique channel identifier for the user.
# Channel ID - passed to view and put in job.
channel_id = '_'.join(('user', 'alerts', User.get('uid')))
# Your workload, include whatever you need to process the job
workload = { 'action': 'some_action', 'channel_id': channel_id }
beanstalk = beanstalkc.Connection(host='localhost', port=11300)
beanstalk.use('jabroni')
beanstalk.put(json.dumps(workload))
# Finish your request business, but don't forget to pass channel_id to your view
Client-Side
Finally, your users need a way to receive the notifications you’re sending them. For this I’ve been using jQuery with jquery.comet.js. Basic JavaScript showing how to subscribe to NHPM with jquery.comet and handle messages:
<script type="text/javascript">
// Channel id you passed, would normally come from a template variable
var channel_id = 'user_alerts_123';
var nhpm_sub_addr = '/sub';
// Listen after the document is loaded to prevent a forever 'page loading' indicator
$(window).load(function() {
return setTimeout((function() {
$.comet.connect(nhpm_sub_addr + '?id=' + channel_id);
$.comet.bind(read_nhpm_events);
}));
});
function read_nhpm_events(data) {
// Display however you want! I prefer the console. alert() is for chumps.
console.log(data);
}
</script>
Conclusion
There are a few different approaches to do stuff like this, some offering more flexibility than others. I’m looking forward to checking other available options for real-time events when I dive a little deeper into JavaScript, like now.js, Juggernaut, and Socket.IO. Maybe even some kind of terrible multiplayer game with sprite.js, we’ll see. If you want an example app that uses NHPM with Flask, I made a basic anonymous twitter chat app with the source on github.
This summer has been good. Damn good. I went to New York City for a weekend, Washington State for a week, ran the Tough Mudder in Wisconsin, and hung out with friends at Lollapalooza.
Tomorrow I'm flying to Iceland for a week. No days are fully packed, and a couple are reserved for driving wherever and camping. I'm going to check out the Golden Circle (Þinghvellir national park, Gullfoss waterfall, geysers at Geysir geothermal area), spend a lot of time in Reykjavik, drink some viking beer, check out icebergs at a glacial lagoon (Jökulsarlón, I can't pronounce half of this stuff), hike Mt. Esja, spend a few hours at Blue Lagoon, and anything else I end up doing along the way. Who knows, I might even go whale watching, just because I hear there's a slim chance you can see a narwhal, and that's kind of a big deal to me. I'm pretty excited for this trip. Some people call me crazy for going alone, and I call these people unadventurous.
Now that I have a passport I feel the need to go places. I don't know what's next, but I'm open to suggestions.
I finally entered the world of DSLR. I debated between a smaller camera or a higher quality DSLR for a while, but could afford neither. Then I sold stuff I don't need, and suddenly, Oprah rich. Ok not really, but I had enough to realistically look at cameras, and after a trip to Best Buy I came home with a Canon Rebel T3i.
Chewie says hi.
There's definitely a learning curve, but I've got enough lined up in the next few months that I should get the hang of it before too long. I went to Starved Rock the other weekend and took a few pictures there of typical nature junk. I'm going to New York City this weekend. I'll be in Washington State at the end of the month. I'm in Iceland for a week in August. I'm hoping to go on a hike and possibly camp (possibly Michigan) the weekend following that. Then I'm open to whatever I can afford.
I decided to clean up my big, sloppy, one-file-driven blog by splitting models & views into packages. A problem that created was that my static files were no longer available when I ran the standalone server for testing changes.
The best solution I found for this involves using werkzeug.SharedDataMiddleware to make Flask serve your static files when you're in debug mode (in this example, checked when the request_started signal is sent). Assuming we're working with someapp and you keep your static files in a directory called static, someapp/__init__.py should look similar to this:
from flask import Flask, request_started
app = Flask(__name__)
def serve_static(sender):
if app.config['DEBUG']:
from werkzeug import SharedDataMiddleware
import os
app.wsgi_app = SharedDataMiddleware(app.wsgi_app, {
'/': os.path.join(os.path.dirname(__file__), 'static')
})
request_started.connect(serve_static, app)
There are a few ways to trigger debug mode, but the easiest is probably through your script that runs the local built-in server. Say we have /runserver.py:
from someapp import app
app.run(debug=True)
And now your static files are available in both environments. Ok fine, I mostly just posted this to play with syntax highlighting on this site, but if you stumble upon it and it helps, cool. Helpful input is nice too.
I downloaded MarsEdit and noticed it didn't have "site you threw together aimlessly" in the pulldown of blog types, so I made mine compatible with MetaWeblog by using Flask-XML-RPC. Took a bit of trial and error but was overall pretty simple, and now I can add/edit posts through a desktop app and whatever else supports that API. So far the things I'm missing that I'll add in later are adding new tags, uploading media and managing draft status. It doesn't manage the post date either, but that's ok with me. Next up: switching from a single module to a package, this things getting messy.
Edit: I stand corrected, post date is managed in the 'Post' pulldown, which wasn't as obvious to me as it should have been. Was expecting it as a date/time field in the main edit window.
Looks like this site has taken a back seat to real life once again. Not the end of the world. My head-on approach to getting my miles up in April has turned out to be a great success, even though it seemed like I'd never get used to running more than 3 miles. I'm comfortable going over 7 now, if only because I get myself lost, or compete in intense running challenges with my brother. I ended up signing up for the Tough Mudder, because hey, life's not worth living if you don't have a few goals, right? Right. So I'll be getting dirty, bruised, shocked, fatigued, and if that's not enough they give out crappy beer when you're done. I might do the Warrior Dash in both Illinois and Wisconsin as well, but that's getting expensive so we'll see. I'll be happy if I just finish the Tough Mudder at the very least.
My precious blog project isn't dead quite. I'm still writing things down. See? This is another post. And now posts can have comments. Comments that I can read, analyze, think about, delete, laugh at, etc. Look at your web server. Now back at me. Now back at your face. I'm on a gunicorn. Very easy addition, though easier if I just used Disqus, but I wanted to do it myself as an exercise since I don't do very much Python web development lately. Closer and closer to shifting gears on the work server though.
Update: What a wildly popular post! With spammers. I like the consistency of them though, should make testing protection easy.
Super Updated Mega Update: Problem solved.
I made it this far, but I'm calling it quits on running every day. I'm still doing something, maybe I'll replace it with riding my bike, but from everything I've read I'd be stupid to keep going with my lower back hurting as much as it does. Apparently it was too much impact. I got 17 days in a row, that's not terrible. I'm going to take a few days before trying to run again. I did get used to getting out the door every day though. So new goal isn't running every day, but somehow breaking a sweat and getting my heart rate up without messing up my back. I'm surprised it's my back and not me knees.

My body needs a break. I've got 7 days down and 23 to go. If all I were doing were running I have a feeling I'd be just fine (for now, at least). Feels good to push it though, as long as I don't hurt myself. My weight is already back to fluctuating instead of slowly rising (somewhere around 167 now). Ok, I wouldn't mind being over 170 again, I like to crush things. Getting up early is working most of the time, except a few days that I've crawled out of bed at 11am, probably from caffeine too late in the day leading to being unable to fall asleep. Eating well has been easy mostly, with a few exceptions. The least healthy items being the all you can eat sushi last weekend and the porterhouse in the middle of the week. Resisting GrubHub has been the hardest thing ever, but being broke from buying tickets for Eddie Vedder and Lollapalooza definitely helps. Priorities. Now time to go do my taxes - should make for a good anger-run.
I learned a little something about working with dates and times in python today; it sucks. I figured I could easily duplicate most of the twitter archiving functionality to archive my pinboard.in account as well. Simple enough: get last link from database, retrieve/parse JSON from pinboard, loop links and save anything newer than the last link in the database. Turns out saving a datetime.datetime object to mongo doesn't retain the timezone information, and it also turns out you can't compare one datetime that has timezone information to another that doesn't. I went to the googles and discovered helpful options python-dateutil and pytz, which beats creating your own timezone object. I'm sure there's a reason the datetime objects don't just assume local system timezone when the tzinfo isn't present, probably something to do with explicit is better than implicit. As always though, learning by mistakes. Good stuff. Unrelated to that, I'm attempting to score Lollapalooza 'Secret Sale' tickets by scraping the site every 30 seconds with BeautifulSoup and checking if their coming soon message has changed, in which case Growl will yell at me, and hopefully wake me up if I'm sleeping, so I can score tickets for $60 instead of $200+. Assuming it works. Carpe noctem.
I've been wanting to save my tweets for a while now, so now I am. After taking a spin through the API I downloaded python-twitter and went to town. They're stored in mongo from now on (didn't bother getting past tweets, it's not that important), and at least temporarily displayed here on the side, though I'm only displaying my aimless rambles and skipping @-replies, which took me a second to figure out because of this little bit about PyMongo and using regular expressions for '$not'. I know you can use one of the 'in_reply_to' fields to detect as well, but that doesn't catch manually entered usernames. Pretty good weekend of messing with this. Finally upgraded Debian on my slicehost as well, which means python 2.6.6 without building from source. Too bad it's already Monday. Back to work.
Stayed up way too late tonight playing video games and messing with this again. All I really wanted to do was be able to mark posts as drafts / unpublished in my admin, which is as simple as it sounds, but before I bothered with that I wanted to tackle an annoyance I created by having a dev version and a live version of this trainwreck. I'm a fan of nginx, so I decided to try out the FastCGI deployment method for Flask. Works like a charm, but it's kind of a pain to have to restart it every time I make a code change. I was running python in a screen session for each version of the site, so after each change I'd need to kill and restart the session, classic rookie mistake. Anyways, now I'm using generic bash script with start-stop-daemon to manage that, which may or may not be the 'correct' way to go about it but it's definitely better. Then I set it up to automatically check out files and restart the FastCGI server in the post-receive git hook of the version of the site you're looking at, which I think is the most I can do to minimize how much I'm manually restarting the server while I'm messing around breaking stuff. Then I went to town and got my drafts working and decided to get tags working too. At first my tags were all wonky with spaces not being stripped correctly. But then I was all like, I'll use a list comprehension. Bitches love list comprehensions. Not sure what I'll throw up next, probably comments though. And then I'll comment on my own stuff. It'll be amazing. Though I might try making pretty syntax highlighted code snippets first, just so I can never post any code, but you know, have the ability to. Really though, it would be very cool to post bits of code and configuration and get some quality reviews and corrections where needed. Some day.
And that doesn't stand for public display of floatation like you were thinking either. My weekend project finally took off tonight, and after taking care of that and messing with this junk I am yet again up entirely too late. The goal sounded simple enough. For one of my work projects I had dynamically generated PDFs using the FPDF (PDF library for PHP). It basically slaps information from a database with related photos onto a PDF and returns the file. Well the next step was to include an external website, as close as possible to what it looks like on your screen. FPDF doesn't support that, but I had read earlier that TCPDF was basically the exact same as FPDF, same function names, and could handle HTML+CSS. Well first off, it's not 100% the same functionally as FPDF. The line-height and margins work a little differently, which required a bit of adjustments. Then, the site I included looked like absolute garbage. I ran some regex to fix broken references in the HTML that was downloaded, but nothing really helped. Then I found the solution to that problem with wkhtmltopdf, which uses Webkit to render a site into a PDF. Except now I have 2 PDFs, and TCPDF doesn't let you put them together, or include on in another even. So now I've got to use FPDI, which is meant to be a wrapper for FPDF that lets you use templates, but in the most recent version has added support for TCPDF. Everything works now, even though it takes forever because creating a PDF from a remote webpage isn't a speedy process. But function first, performance last.