Saturday, 14 December 2013

Python PDB Sublime keyboard shortcut

Very quick tip for anyone who uses Sublime Text Editor. Below is a key-binding entry for setting ctrl+p to insert "import pdb; pdb.set_trace()" and new line where the cursor currently is. For those unfamiliar with pdb, it is the Python Debugger and this line will first import it then set the trace point, meaning when the script is next ran, pdb will be invoked at that point.


You will need to paste this into "Preferences -> Key Bindings - User"

For further information to pdb, would strongly suggest to watch this video: http://pyvideo.org/video/644/introduction-to-pdb and to start incorporating it in your development toolkit.

Wednesday, 13 November 2013

Retry python decorator

A common case is to defence code against connection timeouts, which is commonly achieved by applying a retry strategy to the vulnerable code. With python you would typically run the vulnerable code in a try block, catch the exception and if the number of retries hasn't been reached yet you would retry the code, else you would re-raise the exception.

I found myself repeating this pattern a lot and found lots of places where I had missed adding this and in large part was that the code to do this is a few lines. So I wanted to create a simple decorator so going forward I could simply decorate the method and be done with it - perfect.

I searched around and found on the Python Wiki there was an example of a retry decorator that would retry based on the return value being False. I simply edited this code to instead retry based on an exception being raised.

Simply stick @retry(10) above a method to make it retry itself 10 times if an exception is raised.

I have included the @functools.wrap decorator so that my docstrings are retained which I use to auto generate sphinx documentation.

Here is an example usage when dealing with boto and SQS...beautiful if I do say so myself.

Saturday, 19 October 2013

Detect whether a video is truly interlaced

Have you ever come across a video file that on the tin states that it is interlaced but when you view it, it isn't? If you then take that file and apply a de-interlacing filter to it, the filter may have a negative impact on quality for absolutely no reason.

Now, with a small pipeline where you are able to view each video yourself this can be caught, but a pipeline with hundreds, thousands, millions, no.

FFMPEG includes a nifty video filter called 'idet' that scans the video image for combing effects and produces a report on the amount of interlaced and non-interlaced frames it has determined. It also, if it finds a interlaced frame, tell you the scan order. Here's an example command line:


This filter does require FFMPEG to decode the video so it may take a while. When I implement this in a pipeline I would only process enough of the video that I believe will give a good reading. This example I have said that to be just the first minute of the video.

The output looks like this:


These numbers could be gathered by regex and calculated with a simple max() function like so:

Message management

Recently I have been investigating possible services to use for message management in a content pipeline system. There will be messages needed to be queued, processed and reported on throughout the pipeline including delivery of assets, encoding, publishing and distributing files between storage services worldwide. These messages will be sent to workers around the world, and to workers that may die and need its messages re-added to the queue. Also good monitoring throughout the pipeline and reporting on progress and completion.

From my initial investigation the simple and hard truth about the matter is that there is no perfect fit when it comes to messaging brokers, but here are three that I found that I reckon one will be the final choice I will make...

RabbitMQ
Very durable, FIFO (First in First out), Publish/Subscribe, enables a lot of cool shit to be done when using in conjunction with Celery.
No visibility timeout, no good hosted solutions - will have to manage ourselves, No built-in priority.

SQS: AWS Simple Queue Service
Simple, lightweight, adore python's boto library, visibility timeout, hosted with extremely high reliability.
Messages may be delivered more than once, Celery limitations, not FIFO.

Redis
FIFO, visibility timeout, fully compatible with Celery, priority functionality.
Never worked with, hosted solutions seem fairly expensive, not very durable out of the box.