blogmachine

In my todo list I’ve got “Automate updates” as an item. I have an ulterior motive: experiment with netty, clojure, PEG, and markdown.

  1. Netty - an evented, NIO server from jboss, and one of the Apache MINA people

I’ve used Apache MINA in a project with 250 c1.xlarge EC2 nodes, and it was excellent.

The PEG paper

And all that is fantastically interesting, but completely beside the point. So lets ignore all that and actually solve my immediate problem, which is what, exactly?

My pain points are:

  1. Preview time.
  2. Publishing.
    1. rel=“canonical”
    2. sitemap
    3. upload to S3
    4. hit CloudFront with list of invalidations

Preview Time

So I downloaded SimpleHTTPServer.py and hacked in a call to multimarkdown in its lookup method. Took about 10 minutes. So now I have an CtrlsS CmdTab CmdR edit/preview cycle.

Canonical

Next step was to automatically stuff link rel=“canonical” into the headers. This is needed to instruct Google that it should point to the www.jamiebriant.com domain rather than ldskjflsdjf.cloudfront.net domain. And of course it has to be the correct page. I ended up using BeautifulSoup and it was pretty straightforward to build the markdown to a temp file, read the resulting html with BS and then add a link tag to the head tag. While I was at it I discovered that all the power of PyCharm is available to IntelliJ as a plugin.

Unlike all that Netty/PEG wonderful pie-in-the-sky dreams, I got this handled in about two hours. Most of that was mucking about with MacPorts and Python. While I downloaded and hacked SimpleHTTPServer.py, I wanted to use BeautifulSoup and got the impression that I should update to python3. So of course python3 turns out to be completely incompatible with python2 and after wasting time with 2to3 I gave up and redid my hack and paste job.

It must be said that an IDE that knows the language is a fantastic help when trying to learn it. Having it underline bad code with red “you’re doing it wrong” squiggles is much faster than finding out when I run it, and being able to click on things to navigate to the source is immensely helpful in understanding it.

Sitemap and uploading

So, awesome. BeautifulSoup 3 “is no longer being developed”. So I installed BS 4 and python 3 and converted my script to python3. Well, lo and behold, the S3 library - the official one mind you - is python 2.7 only. No python 3. WTF is python 3 then?

Can you imagine if Java 6 was incompatible with Java 5? Well I want S3, so I guess I’m rolling back to python 2.7.

Ok, so that happened…

Now I’m back to python 2.7 and for my efforts I have the AWS python libraries called “boto”, and as of now, I can sync this website with S3. S3 nicely stores md5 hashes for uploaded files[1] so I can just request the bucket contents and compare with the local version. This is what my python script can do right now:

  1. Find all md files and convert to html.
  2. Get the bucket contents via S3 API
  3. Compare local vs bucket contents using md5
  4. Upload different or new files.
  5. Tell CloudFront which paths have changed and need invalidating.

  1. The ETags are md5 hashes if the file is less than 2GB and uploaded in one go. Otherwise it gets more complicated.  ↩