New Toy: Meet the Wobbleometer(TM)

I’ve had my eye on the Raspberry Shake seismometer for a couple of years now. Last week, I finally succumbed and bought both a new Raspberry Pi (just a 4B 2GB RAM – cheap yet more than adequate) and a Shake 1-D.

The 1-D is a simple seismometer, responding only in the up/down direction. Other products are available…

Mine arrived as a kit, that even I was able to stick together in under half an hour (thanks to a youtube video showing what most of the screws and things were for).

I installed and levelled it on the downstairs windowsill and plugged it directly into the main ADSL router – when you’re uploading a hundred samples per second, minimizing latency is essential.

Obviously, being based in mainland Scotland, I don’t expect to see that many significant earthquakes. We get a handful of ~magnitude 2.5 around the country every couple of years if we’re lucky. However, recently YouTube has sent me down a rabbit-hole of geological analyses of goings-on in Iceland’s Reykjanes peninsula – the recent sequence of volcanic eruptions alternating with tiny earthquakes as the magma chamber refills.

The very first night, I spotted a magnitude 3.6 quake in Iceland.

Since then, the live data stream has sadly been unavailable for about 5 days, but when it works, the ability to select a quake event and then click on a station and see the station’s raw data really rocks.

There was another magnitude 3.7 earthquake along the Mid-Atlantic Ridge offshore a few miles south-west of Grindavik yesterday. The live-data view is currently working in the mobile app, leading to this analysis – showing the P and S waves propagating from the event.

As monitoring equipment goes, I’m impressed. It’s a wonderful device, very sensitive yet easy to set up and works well. I’ll continue to watch for all quakes nearby and larger ones further afield, if only to be able to say “I saw it” ūüôā

A Geek in Lockdown

I’m comparatively fortunate that the coronavirus and covid-19 have not affected me or anyone I know directly. Having spent the last 15 years working from home, life has not changed as significantly as it has for others.

So, apart from staying home and doing nothing much, what’s a geek to do to contribute back?

First, contribute data. As often as I remember, I update the KCL/Zoe app to say I’m still alive.

Second, grid computing projects.

There’s a significant amount of computer power available and machines would otherwise spend their time idle. A couple of months before the coronavirus became known, I had already installed BOINC, perhaps the oldest and best-known grid-computing system. As the scale of the Covid-19 problem became more apparent, I discovered the Rosetta@Home project which has been working to predict structure of proteins involved in the disease.

For mobile devices, there is Vodafone’s DreamLab project which similarly uses one’s iphone/ipad/tablet’s downtime to perform computations hopefully to identify drugs to fight Covid-19.

Third, art.

This took a bit of thought, but recently RedBubble who I use for selling my photography added the option of selling masks alongside the usual prints, mugs, etc.

I wasn’t at all sure what to make of it. The idea of profiting off others’ health and necessity jarred with the idea of art being a luxury item. However, a friend pointed out that if face-masks are to become normalized in society, having interesting art designs on could make them more approachable. There’s also the bit where Redbubble match each mask bought with a donation to an appropriate charity. So, a net good thing then.

Inspiration struck and I spent much of the weekend designing an image to represent the coronavirus (using a fragment of its gene sequence as a background, naturally) and even rendering a little video from it, both using Povray, my favoured ray-tracing software from the late 1990s(!). Naturally I made the scene description and other sources available as open-source: the-lurgy (on github).

The “lurgy” design and a selection of other landscape images are available on redbubble as face-masks with a profit margin set to 0.

And of course work continues, producing more photos to go on my website, ShinyPhoto

ShinyPhoto: New Website

The old ShinyPhoto website was getting a bit long in the tooth. It saw several versions of Python come and go and increasingly suffered from bitrot. (Notably, a mutual incompatibility in the CGI module between python versions; it ran for so long the backend storage engine I used became deprecated with no easy way out but to revert to one I wrote myself – not a good reason to rely on third-party libraries!)

So, for the past couple of months I’ve been learning my way around Javascript and node.js and have replaced the site with a new gallery to show-off my photos.

Being me, it’s a bit geeky. With web-design there are so many angles to consider, but here are a few aspects that stick in the mind:

Technical: no XSLT; this is the first time in nearly 20 years where I’ve used a different templating language – in this case, Mustache since it does need to be able to produce non-HTML data as well.

Learning: there’s a whole ecosystem of node.js packages (NPMs) that have come in handy, from the Express webserver to image-resizing utilities (some of which are faster than others!).

Data: in my more professional work capacity I deal with data-storage on a daily basis, so it has some passing interest. One of the problems with the old site was its inability to extract metadata from images; because this instance’s primary focus is the organization and display of photos, I decided that the JPEG should contain all the data displayed – title, description, geotagging, keywords all extracted from one upload and the less manual editing effort required, the better. Essentially, digiKam is both organizer and implicit website editor on my desktop.

Database: with the unit of data being the JPEG, presented as a page per photo, that maps well into a document-oriented model such as one of the NoSQL JSON-based databases. Unfortunately MongoDB disgraced themselves by choosing a non-open-source licence recently, so I was pleased to discover CouchDB – a modular system sharing protocols (JSON-over-HTTP(S)) and query language (MangoDB) across different storage backends with the advantage that I can start from the PouchDB pure node.js implementation but switch to an external version of the same with a quick data-replication later if need be. So far, it’s coping fine with 1.1GB of JPEG data (stored internally as attachments) and 70MB of log data.

Configurability: several aspects of the site are configurable, from the internal and external navigation menus to the cache-validity/timeout on images.

Scalable: my initial thought was to keep image-resizing pure-javascript and rely on nginx caching for speed; however, that would lose the ability to count JPEG impressions (especially thumbnails), so I switched to a mixed JS/C NPM and now resizing is sufficiently fast to run live. The actual node.js code itself also runs cleanly – feels very snappy in the browser after the old python implementation.

Metadata/SEO: the change of templating engine has meant specific templates can be applied to specific kinds of pages, rather than imposing one structure across the whole site; different OpenGraph and Twitter-Card metadata applies on the homepage, gallery and individual photo pages.

Statistics: lots of statistics. There are at least three aspects where statistics rule:

  • the usual analytics; it’s always handy to keep an eye on the most-popular images, external referrers, etc. The site uses its own application-level logging to register events on the page-impression scale, so the log data is queryable without having to dig through CLF webserver logs.
  • how should a photo gallery be sorted? By popularity, by date? Do thumbnails count? What about click-through rate? The new site combines all three metrics to devise its own score-function which is recalculated across all images nightly and forms the basis of a display order. (It surprises me that there are photo-galleries that expect people to choose the sort order by hand, or even present no obvious order at all.)
  • how should a photo-gallery be organized? My work is very varied, from bright colour to black and white, from sky to tree to mountain and water, from fast to long exposure, from one corner of the country to another, as the landscape leads; I did not want to impose a structure only to have to maintain it down the line. Accordingly, the new ShinyPhoto is self-organizing: within any slice through the gallery, a set of navigation tags is chosen that splits the images closest to half. Relatedly, the images on the homepage used to be a static selection, manually edited; now they are chosen dynamically by aspect-ratio and score.
  • Marketing: some aspects of the layout now enjoy a/b testing – no cookies required, but another hash function determines the site appearance and I can check which work best over time.

So far, it’s proving pleasantly easy to live with; apart from the continual debugging and adding of new features – fortunately now slowing down – I’m adding photos at a rate of a handful a day both to the site and to a new RedBubble account in case anyone wants to buy them, one way or another.

So apparently I now like the whole node.js ecosystem. It’s blown away the cobwebs of running – or more accurately not-running – a legacy website, whilst retaining full control of the appearance and structure of the site not handing that over to some third-party site designer.

A good way to start a new year, methinks.

Frame interpolation for timelapse, using Julia

A long time ago I wrote a python utility to interpolate frames for use in timelapse. This project was timelapse.py.

Back in 2014 I ported the idea to the very-alpha-level language Julia.

In recent weeks Julia released version v1.0.0, followed shortly by compatibility fixes in the Images.jl library.

And so I’m pleased to announce that the julia implementation of my project, timelapse.jl (working simply off file mtimes without reference to exif) has also been updated to work with julia v1.0.0 and the new Images.jl API.

Usage:

zsh/scr, photos 11:32AM sunset/ % ls *
images-in/:
med-00001.png med-00022.png med-00065.png med-00085.png
med-00009.png med-00044.png med-00074.png

images-out/:

zsh/scr, photos 11:32AM sunset/ % ~/j/timelapse/timelapse.jl 50 images-in images-out
[1.536147186333474e9] - Starting
[1.536147186333673e9] - Loading modules
[1.536147201591988e9] - Sorting parameters
[1.536147201648837e9] - Reading images from directory [images-in]
[1.536147202022173e9] - Interpolating 50 frames
[1.53614720592181e9] - frame 1 / 50 left=1, right=2, prop=0.11999988555908203
[1.536147217019145e9] - saving images-out/image-00001.jpg
[1.536147218068828e9] - frame 2 / 50 left=1, right=2, prop=0.24000000953674316
[1.536147222013697e9] - saving images-out/image-00002.jpg
[1.536147222819911e9] - frame 3 / 50 left=1, right=2, prop=0.3599998950958252
[1.536147226688287e9] - saving images-out/image-00003.jpg

...

[1.536147597050891e9] - saving images-out/image-00048.jpg
[1.53614761140285e9] - frame 49 / 50 left=6, right=7, prop=0.880000114440918
[1.536147615090572e9] - saving images-out/image-00049.jpg
[1.536147615649168e9] - frame 50 / 50 left=6, right=7, prop=1.0
[1.536147619363807e9] - saving images-out/image-00050.jpg
[1.536147619960565e9] - All done
zsh/scr, photos 11:40AM sunset/ %

zsh/scr, photos 11:51AM sunset/ % ffmpeg -i images-out/image-%05d.jpg -qscale 0 -r 50 sunset-timelapse.mp4
ffmpeg version 3.4.2-2+b1 Copyright (c) 2000-2018 the FFmpeg developers

...

zsh/scr, photos 11:51AM sunset/ % ll -h sunset-timelapse.mp4
-rw------- 1 tim tim 4.9M Sep 5 11:46 sunset-timelapse.mp4

How Many Megapixels?

There are several cliches in the field of megapixel-count and resolution required for acceptable photographic prints.

In no particular order:

  • 300dpi is “fine art”
  • you don’t need as many dpi for larger prints because typically they’re viewed further away
  • my printer claims to do 360dpi or 1440dpi or …
  • 24 megapixels is more than enough for anything
  • “for a 5″ print you need 300-800pixels, for medium to large calendars 800-1600 pixels, for A4 900-1600px, for an A3 poster 1200 to 2000px, for an A2 poster 1500 to 2400px, …” (taken from a well-known photo-prints website guidelines)
  • it’s not about the megapixels it’s about the dynamic range

There are probably more set arguments in the field, but all are vague, arising from idle pontificating and anecdote over the last couple of centuries.

Here’s a key question: in a list of required image resolutions by print size, why does the number of dpi required drop-off with print size? What is the driving factor and might there be an upper bound on the number of megapixels required to make a print of any size?

We can flip this around and say that if prints are expected to be viewed at a distance related to their size, then it is no longer a matter of absolute measurements in various dimensions but rather about how much field of view they cover. This makes it not about the print at all, but about the human eye, its field of view and angular acuity – and the numbers are remarkably simple.

From wikipedia, the human eye’s field of view is 180-200 degrees horizontally by 135 degrees vertically (ie somewhere between 4:3 and 3:2 aspect-ratios).¬† Its¬†angular acuity is between 0.02 to 0.03 deg.

If we simply divide these numbers we end up with a number of pixels that the eye could resolve.

At one end of the range,

 180*135 /0.03 / 0.03 / 1024 / 1024 = 25.75 (6000 x 4500)

and at the other:

200*135 / 0.02 / 0.02 / 1024 / 1024 = 64.37 (10,000 x 6750)

In the middle,

180*135 / 0.025 / 0.025 / 1024 / 1024 = 37.1 (7200 x  5400)

Significant points:

  • this is no longer about absolute print sizes; it’s simply about being able to view a print without one’s eye perceiving pixellation
  • the numbers correlate reassuringly with numbers of megapixels seen in real-world dSLR sensors today
  • you can reasonably say that if you have 64 Megapixels then you can make a print of¬†any size from it
  • you can make an image with more than 64 megapixels if you want to, but the reasons for doing so are not directly to do with resolution – they might be
    in order that you can¬† crop it – either to physically crop the image in post-processing or to view it from a closer distance than that required merely to fill your eyes’ field of view
    or maybe for pixel-binning to reduce noise, give smoother tonality, etc
  • 24 megapixels is not enough for much; rather it’s is a turning-point: the bare minimum number of pixels for a person of limited acuity to resolve assuming they slightly less than fill their field of view with the print. 36MPel is more usable and 64 will keep you in business selling fine quality wall art.

Now we know how many megapixels are required for various real-world purposes, all that matters is making them good megapixels. Physics to the rescue.

 

Pentax K-1: an open-source photo-processing workflow

There is a trope that photography involves taking a single RAW image, hunching over the desktop poking sliders in Lightroom, and publishing one JPEG; if you want less noise you buy noise-reduction software; if you want larger images, you buy upscaling software. It’s not the way I work.

I prefer to make the most of the scene, capturing lots of real-world photons as a form of future-proofing. Hence I was pleased to be able to fulfil a print order last year that involved making a 34″-wide print from an image captured on an ancient Lumix GH2 many years ago. Accordingly, I’ve been blending multiple source images per output, simply varying one or two dimensions: simple stacking, stacking with sub-pixel super-resolution, HDR, panoramas and occasionally focus-stacking as the situation demands.

I do have a favoured approach, which is to compose the scene as closely as possible to the desired image, then shoot hand-held with HDR bracketing; this combines greater dynamic range, some noise-reduction and scope for super-resolution (upscaling).

I have also almost perfected a purely open-source workflow on Linux with scope for lots of automation – the only areas of manual intervention were setting the initial RAW conversion profile in RawTherapee and the collation of images into groups in order to run blending in batch.

After a while, inevitably, it was simply becoming too computationally intensive to be upscaling and blending images in post, so I bought an Olympus Pen-F with a view to using its high-resolution mode, pushing the sub-pixel realignment into hardware. That worked, and I could enjoy two custom setting presets (one for HDR and allowing walk-around shooting with upscaling, one for hi-res mode on a tripod), albeit with some limitations – no more than 8s base exposure (hence exposure times being quoted as “8x8s”), no smaller than f/8, no greater than ISO 1600. For landscape, this is not always ideal – what if 64s long exposure doesn’t give adequate cloud blur, or falls between one’s ND64 little- and ND1000 big-stopper filters? What if the focal length and subject distance require f/10 for DoF?

All that changed when I swapped all the Olympus gear for a Pentax K-1 a couple of weekends ago. Full-frame with beautiful tonality – smooth gradation and no noise. A quick test in the shop and I could enable both HDR and pixel-shift mode and save RAW files (.PEF or .DNG) and in the case of pixel-shift mode, was limited to 30s rather than 8s – no worse than regular manual mode before switching to bulb timing. And 36 megapixels for both single and multi-shot modes. Done deal.

One problem: I spent the first evening collecting data, er, taking photos at a well-known landscape scene, came home with a mixture of RAW files, some of which were 40-odd MB, some 130-odd MB; so obviously multiple frames’ data was being stored. However, using RawTherapee to open the images – either PEF or DNG – it didn’t seem like the exposures were as long as I expected from the JPEGs.

A lot of reviews of the K-1 concentrate on pixel-shift mode, saying how it has options to correct subject-motion or not, etc, and agonizing over how which commercial RAW-converter handles the motion. What they do not make clear is that the K-1 only performs any blend when outputting JPEGs, which is also used as the preview image embedded in the RAW file; the DNG or PEF files are simply concatenations of sub-frames with no processing applied in-camera.

On a simple test using pixel-shift mode with the camera pointing at the floor for the first two frames and to the ceiling for the latter two, it quickly becomes apparent that RawTherapee is only reading the first frame within a PEF or DNG file and ignoring the rest.

Disaster? End of the world? I think not.

If you use dcraw to probe the source files, you see things like:

zsh, rhyolite 12:43AM 20170204/ % dcraw -i -v IMGP0020.PEF

Filename: IMGP0020.PEF
Timestamp: Sat Feb  4 12:32:52 2017
Camera: Pentax K-1
ISO speed: 100
Shutter: 30.0 sec
Aperture: f/7.1
Focal length: 31.0 mm
Embedded ICC profile: no
Number of raw images: 4
Thumb size:  7360 x 4912
Full size:   7392 x 4950
Image size:  7392 x 4950
Output size: 7392 x 4950
Raw colors: 3
Filter pattern: RG/GB
Daylight multipliers: 1.000000 1.000000 1.000000
Camera multipliers: 18368.000000 8192.000000 12512.000000 8192.000000

On further inspection, both PEF and DNG formats are capable of storing multiple sub-frames.

After a bit of investigation, I came up with an optimal set of parameters to dcraw with which to extract all four images with predictable filenames, making the most of the image quality available:

dcraw -w +M -H 0 -o /usr/share/argyllcms/ref/ProPhotoLin.icm -p "/usr/share/rawtherapee/iccprofiles/input/Pentax K200D.icc" -j -W -s all -6 -T -q 2 -4 "$filename"

Explanation:

  • -w = use camera white-balance
  • +M = use the embedded colour matrix if possible
  • -H 0 = leave the highlights clipped, no rebuilding or blending
    (if I want to handle highlights, I’ll shoot HDR at the scene)
  • -o = use ProPhotoRGB-linear output profile
  • -p = use RawTherapee’s nearest input profile for the sensor (in this case, the K200D)
  • -j = don’t stretch or rotate pixels
  • -W = don’t automatically brighten the image
  • -s all = output all sub-frames
  • -6 = 16-bit output
  • -T = TIFF instead of PPM
  • -q 2 = use the PPG demosaicing algorithm
    (I compared all 4 options and this gave the biggest JPEG = hardest to compress = most image data)
  • -4 = Lienar 16-bit

At this point, I could hook in to the workflow I was using previously, but instead of worrying how to regroup multiple RAWs into one output, the camera has done that already and all we need do is retain the base filename whilst blending.

After a few hours’ hacking, I came up with this little zsh shell function that completely automates the RAW conversion process:

pic.2.raw () {
        for i in *.PEF *.DNG
        do
                echo "Converting $i"
                base="$i:r" 
                dcraw -w +M -H 0 -o /usr/share/argyllcms/ref/ProPhotoLin.icm -p "/usr/share/rawtherapee/iccprofiles/input/Pentax K200D.icc" -j -W -s all -6 -T -q 2 -4 "$i"
                mkdir -p converted
                exiftool -overwrite_original_in_place -tagsfromfile "$i" ${base}.tiff
                exiftool -overwrite_original_in_place -tagsfromfile "$i" ${base}_0.tiff
                mv ${base}.tiff converted 2> /dev/null
                mkdir -p coll-$base coll-$base-large
                echo "Upscaling"
                for f in ${base}_*.tiff
                do
                        convert -scale "133%" -sharpen 1.25x0.75 $f coll-${base}-large/${f:r}-large.tiff
                        exiftool -overwrite_original_in_place -tagsfromfile "$i" coll-${base}-large/${f:r}-large.tiff
                done
                mv ${base}_*tiff coll-$base 2> /dev/null
        done
        echo "Blending each directory"
        for i in coll-*
        do
          (cd $i && align_image_stack -a "temp_$i_" *.tif? && enfuse -o "fused_$i.tiff" temp_$base_*.tif \
           -d 16 \
           --saturation-weight=0.1 --entropy-weight=1 \
           --contrast-weight=0.1 --exposure-weight=1)
        done
        echo "Preparing processed versions"
        mkdir processed
        (
                cd processed && ln -s ../coll*/f*f . && ln -s ../converted/*f .
        )
        echo "All done"
}

Here’s how the results are organized:

  • we start from a directory with source PEF and/or DNG RAW files in it
  • for each RAW file found, we take the filename stem and call it $base
  • each RAW is converted into two directories, coll-$base/ consisting of the TIFF files and fused_$base.tiff, the results of aligning and enfuse-ing
  • for each coll-$base there is a corresponding coll-$base-large/ with all the TIFF images upscaled 1.33 (linear) times before align+enfusing
    This gives the perfect blend of super-resolution and HDR when shooting hand-held
    The sharpening coefficients given to ImageMagick’s convert(1) command have been chosen from a grid comparison; again the JPEG conversion is one of the largest showing greatest image detail.
  • In the case of RAW files only containing one frame, it is moved into converted/ instead for identification
  • All the processed outptus (single and fused) are collated into a ./processed/ subdirectory
  • EXIF data is explicitly maintained at all times.

The result is a directory of processed results with all the RAW conversion performed using consistent parameters (in particular, white-balance and exposure come entirely from the camera only) so, apart from correcting for lens aberrations, anything else is an artistic decision not a technical one. Point darktable at the processed/ directory and off you go.

All worries about how well “the camera” or “the RAW converter” handle motion in the subject in pixel-shift mode are irrelevant when you take explicit control over it yourself using enfuse.

Happy Conclusion: whether I’m shooting single frames in a social setting, or walking around doing hand-held HDR, or taking my time to use a tripod out in the landscape (each situation being a user preset on the camera), the same one command suffices to produce optimal RAW conversions.

 

digiKam showing a directory of K-1 RAW files ready to be converted

One of the intermediate directories, showing 1.33x upscaling and HDR blending

Results of converting all RAW files – now just add art!

The results of running darktable on all the processed TIFFs – custom profiles applied, some images converted to black+white, etc

One image was particularly outstanding so has been processed through LuminanceHDR and the Gimp as well

Meall Odhar from the Brackland Glen, Callander

A bit of a hardware upgrade

Four and a half years ago I bought basalt,. a Lenovo Thinkpad W520 notebook.
Named for being black and geometrically regular, he was not cheap, but then the hardware was chosen for longevity. He did everything – mostly sitting on the desk being a workstation, for both photo-processing and work, with added portability. And he was fast… very fast.

He’s done sterling service, but the time has come for a bit of an upgrade. Since I haven’t done so for a few years now, I built the replacement workstation by hand.

So, after much shopping (mostly on Amazon with a couple of trips to the local Maplin’s), meet rhyolite – named because the red fan LEDs in the case remind me of the pink granite rock in Glen Etive.

basalt rhyolite
RAM  16G 64G
CPU  i7-2720QM @ 2.20GHz i7-6800K  @ 3.40GHz
HD 1TB 2.5″ 2*2T SATA, 64GB SSD, 240GB SSD
Display 15″ 1080p 24″ 4K
Time to process one Olympus Pen-F hi-res ORF file from RAW to TIFF 5min50s 1min6s
Time to rebuild Apache Spark from git source 28min15s 6min28s

It’s funny how oblivious we can be of¬† machines slowing down and software bloating over time, rather like the proverbial boiling lobster and then when I look up, things can be done 4-5x as fast.

It’s really funny when you copy your entire home directory across wholesale and see what used to be a full-screen maximized Firefox window now occupying barely a quarter of the new display.

How many photo thumbnails can you fit in a 4K display?

There’s only one downside: sometimes I miss having a trackpad in the middle of the keyboard area…

Now with added solar panels

The first thing I did on buying a new house in November was to have solar panels installed. Like, within 24hrs of having keys in hand I’d had two companies round to try and sell me their wares and made a decision.

The installation was not the easiest, as December 4 coincided with Storm Desmond and the house gets fairly battered by the wind at the best of times. That, and it’s taken another week to get everything wired-up and configured.

It’s also not been the best of weather – cloud, foggy, rainy and windy – for the past week as well. So today has been the first real day of generating electricity – with actual data-logging of some sort.

I’m well happy to have seen a clear couple of hours around lunchtime in which generation peaked at just over 200W.

Solar Panel output, first day

Solar Panel output, first day

Bring it on! ūüôā

Too-Big Data? That don’t impress me much

On a whim, I spent the evening in Edinburgh at a meetup presentation/meeting concerning Big Data, the talk given by a “Big Data hero” (IBM’s term), employed by a huge multinational corporation with a lot of fingers in a lot of pies (including the UK Welfare system).

I think I was supposed to be awed by the scale of data under discussion, but mostly what I heard was all immodest massive business-speak and buzzwords and acronyms. A few scattered examples to claim “we did that”, “look at the size of our supercomputer”, but the only technical word he uttered all evening was “Hadoop”.

In the absence of a clear directed message, I’ve come away with my own thoughts instead.

So the idea of Big Data is altogether a source of disappointment and concern.

There seems to be a discrepancy: on the one hand, one’s fitbit and phone are rich sources of data; the thought of analyzing it all thoroughly sets my data-geek senses twitching in excitement. However, the Internet of Things experience relies on huge companies doing the analysis – outsourced to the cloud – which forms a disjoint as they proceed to do inter-company business based on one’s personal data (read: sell it, however aggregated it might be – the presenter this evening scoffed at the idea of “anonymized”), above one’s head and outwith one’s control. The power flows upwards.

To people such as this evening’s speaker, privacy and ethics are just more buzzwords to bolt on to a “data value pipeline” to tout the profit optimizations of “data-driven companies”. So are the terms data, information, knowledge and even wisdom.

But I think he’s lost direction in the process. We’ve come a long way from sitting on the sofa making choices how to spend the evening pushing buttons on the mobile.

And that is where I break contact with The Matrix.

I believe in appreciating the value of little things. In people, humanity and compassion more than companies. In substance. In the genuine kind of Quality sought by Pirsig, not as “defined” by ISO code 9000. Value may arise from people taking care in their craft: one might put a price on a carved wooden bowl in order to sell it, but the brain that contains the skill required to make it is precious beyond the scope of the dollar.

Data is data and insights are a way to lead to knowledge, but real wisdom is not just knowing how to guide analysis – it’s understanding that human intervention is sometimes required, and knowing when to deploy it, awareness, critical thinking to see and choose.

The story goes that a salesman once approached a pianist, offering a new keyboard “with eight nuances”. The response came back: “but my playing requires nine”.

Discovering SmartOS

Revolutionize the datacenter: ZFS, DTrace, Zones, KVM

What 22TB looks like.

It has been a long and interesting weekend of fixing computers.
Adopt the pose: sit cross-legged on the floor surrounded by 9 hard-drives – wait, I need another one, make it 10 hard-drives – and the attendant spaghetti of SATA cables and plastic housings and fragments of case.
Funnily enough the need for screwdrivers has reduced over the years, albeit more than compensated by the cost of a case alone. I’m sure it never used to make for such a sore back either…
Anyway. Amidst the turmoil of fixing my main archive/work/backup server, I discovered a new OS.
For a few years now, I’ve been fond of ZFS – reliable as a brick, convenient as anything to use; I choose my OSes based on their ability to support ZFS, amongst other things. Just a quick
zpool create data /dev/ada1 /dev/ada2
zfs create data/Pictures
and that’s it, a new pool and filesystem created, another 1-liner command to add NFS sharing… Not a format or a mount in sight.
Of course, Linux has not been able to include ZFS in the kernel due to licensing considerations, so the various implementations (custom kernel; user-space FUSE module) have been less than desirable. So I’ve been using FreeBSD as server operating-system of choice. The most convenient way to control a plethora of virtual machines on a FreeBSD host seems to be to use VirtualBox – rather large and clunky nowadays.
However, a couple of weeks ago I stumbled across SmartOS, a new-to-me OS combining ZFS, DTrace and a Solaris/Illumos kernel, with both its own native Zones and Linux’s KVM virtualization.
There have been a few steps in this direction previously – most memorably was Nexenta, an opensolaris/illumos kernel with Debian packaging and GNU toolchain. That was a nice idea, but it lacked virtualization.
So, this weekend, with a storage server box rebuilt (staying with FreeBSD) and a whole new machine on which to experiment, I installed SmartOS.
Overall, it’s the perfect feature blend for running one’s own little cloud server. ZFS remains the filesystem of choice, DTrace has yet to be experimented with, and KVM is a breeze, mostly since Joyent have provided their own OS semi-installed images to work from (think: Docker, but without the Linux-specificity). The vmadm command shares a high-level succinctness with the zfs tools. Just import an image, make a JSON config file describing the guest VM and create an instance and it’s away and running with a VNC interface before you know it.
There’s one quirk that deserves special note so far. If you wish to use a guest VM as a gateway, e.g. via VPN to another network, you have to enable spoofing of IPs and IP forwarding on the private netblocks, in the VM config file.
      "allow_dhcp_spoofing": "true",
      "allow_ip_spoofing": "true",
      "allowed_ips": [ "192.168.99.0/24" ]
[root@78-24-af-39-19-7a ~]# imgadm avail | grep centos-7 
5e164fac-286d-11e4-9cf7-b3f73eefcd01 centos-7 20140820 linux 2014-08-20T13:24:52Z 
553da8ba-499e-11e4-8bee-5f8dadc234ce centos-7 20141001 linux 2014-10-01T19:08:31Z 
1f061f26-6aa9-11e4-941b-ff1a9c437feb centos-7 20141112 linux 2014-11-12T20:18:53Z 
b1df4936-7a5c-11e4-98ed-dfe1fa3a813a centos-7 20141202 linux 2014-12-02T19:52:06Z 
02dbab66-a70a-11e4-819b-b3dc41b361d6 centos-7 20150128 linux 2015-01-28T16:23:36Z 
3269b9fa-d22e-11e4-afcc-2b4d49a11805 centos-7 20150324 linux 2015-03-24T14:00:58Z 
c41bf236-dc75-11e4-88e5-038814c07c11 centos-7 20150406 linux 2015-04-06T15:58:28Z 
d8e65ea2-1f3e-11e5-8557-6b43e0a88b38 centos-7 20150630 linux 2015-06-30T15:44:09Z 

[root@78-24-af-39-19-7a ~]# imgadm import d8e65ea2-1f3e-11e5-8557-6b43e0a88b38 Importing d8e65ea2-1f3e-11e5-8557-6b43e0a88b38 (centos-7@20150630) from "https://images.joyent.com" 
Gather image d8e65ea2-1f3e-11e5-8557-6b43e0a88b38 ancestry 
Must download and install 1 image (514.3 MiB) 
Download 1 image [=====================================================>] 100% 514.39MB 564.58KB/s 15m32s 
Downloaded image d8e65ea2-1f3e-11e5-8557-6b43e0a88b38 (514.3 MiB) ...1f3e-11e5-8557-6b43e0a88b38 [=====================================================>] 100% 514.39MB 38.13MB/s 13s 
Imported image d8e65ea2-1f3e-11e5-8557-6b43e0a88b38 (centos-7@20150630) 
[root@78-24-af-39-19-7a ~]# 

[root@78-24-af-39-19-7a ~]# cat newbox.config 
{
  "brand": "kvm",
  "resolvers": [
    "8.8.8.8",
    "8.8.4.4"
  ],
  "ram": "256",
  "vcpus": "2",
  "nics": [
    {
      "nic_tag": "admin",
      "ip": "192.168.5.48",
      "netmask": "255.255.255.0",
      "gateway": "192.168.5.1",
      "model": "virtio",
      "primary": true,
      "allow_dhcp_spoofing": "true",
      "allow_ip_spoofing": "true",
      "allowed_ips": [ "192.168.99.0/24" ]
    }
  ],
  "disks": [
    {
      "image_uuid": "d8e65ea2-1f3e-11e5-8557-6b43e0a88b38",
      "boot": true,
      "model": "virtio"
    }
  ],
"customer_metadata": {
    "root_authorized_keys":
"ssh-rsa AAAAB3NzaC1y[...]"
  }

}
[root@78-24-af-39-19-7a ~]# vmadm create -f newbox.config 
Successfully created VM d7b00fa6-8aa5-466b-aba4-664913e80a2e 
[root@78-24-af-39-19-7a ~]# ping -s 192.168.5.48 
PING 192.168.5.48: 56 data bytes 
64 bytes from 192.168.5.48: icmp_seq=0. time=0.377 ms 
64 bytes from 192.168.5.48: icmp_seq=1. time=0.519 ms 
64 bytes from 192.168.5.48: icmp_seq=2. time=0.525 ms ... 

zsh, basalt% ssh root@192.168.5.48 
Warning: Permanently added '192.168.5.48' (ECDSA) to the list of known hosts.
Last login: Mon Aug  3 16:49:24 2015 from 192.168.5.47
   __        .                   .
 _|  |_      | .-. .  . .-. :--. |-
|_    _|     ;|   ||  |(.-' |  | |
  |__|   `--'  `-' `;-| `-' '  ' `-'
                   /  ;  Instance (CentOS 7.1 (1503) 20150630)
                   `-'   https://docs.joyent.com/images/linux/centos

[root@d7b00fa6-8aa5-466b-aba4-664913e80a2e ~]# 

And there we have a new guest VM up and running in less than a minute’s effort.

Infrastructure and development environments recreated from scratch (partly thanks to storing my ~/etc/ in git) in under an hour.

I’m still looking for the perfect distributed filesystem, however…

Announcing Timelapse.jl – frame interpolation for timelapse photography

A few years ago I wrote a python utility for interpolating a greater number of frames from a smaller set, for use in timelapse video/photography.

Recently, I’ve been trying to learn Julia. As a small challenge – to fulfil a potential forthcoming need – and because it claimed to be quite a fast language – I’ve rewritten the core of the script.

It’s quite refreshing – none of the gratuitous OO classes I had to create with Python, just a couple of functions to do the essential interpolation. And watching the output frames flash by… it’s pretty nippy, too.

It doesn’t yet support EXIF directly, nor can it modify images¬†directly on the fly, but if you want to have a play then go download it from Github:¬†https://github.com/spodzone/Timelapse.jl

Transitions

Panasonic Lumix GH2Nearly 3 years ago I spent the best part of 2 hours one afternoon in PCWorld, looking to kick the Canon habit and vacillating between a Nikon (as I recall, the D3100) and the Panasonic Lumix GH2.

On the one hand, the Nikon had excellent image-quality, but its usability was let-down drastically by the lack of a dedicated ISO button (you could pretend, by reassigning the one “custom” button – so why bother with either?) and it had what I felt was a patronizing user-interface, showing a graphic of a closing lens iris every time one changed the aperture (as though I hadn’t learned anything in the last 10 years spent working on my photography).

On the other hand, the Lumix GH2 had less than stellar image-quality, but the user-interface won me over.

Over the last 2 years, the ergonomics fitted my style like a glove: coming from film, including medium-format with waist-level finders, I find it most natural to operate looking down at the articulated LCD panel in live-view mode. I had sufficient custom presets for two aperture-priority-mode variations of black and white (one square, one 3:2, both with exposure-bracketing set to 3 frames +/-1EV) and a third, a particular colour “film emulation” and manual mode at ISO160 for long exposures, with bracketing set to 5 frames for more extreme HDR. With those 3 modes, I could cover 95% of my subject-matter from woodland closeup to long-exposure seascape and back, at the flip of a dial.

I learned to appreciate its choice of exposure parameters (normally well-considered), and to overcome the sensor’s foibles – it made an excellent test-case for understanding both high-ISO and long-exposure sensor noise and its limited dynamic range increased my familiarity with HDR, panoramas and other multi-exposure-blending techniques (all hail enfuse!). Coupled with the Pentacon 50mm f/1.8 lens, it made for some excellent closeup photos. As a measure of how workable the kit is, I once took every camera I then possessed – including medium- and large-format film – to Arran for a photo-holiday, and never used anything apart from the GH2 for the whole week.

If this all sounds like it’s leading up to something, it is. There is a long-established idea in photographer circles that gear-acquisition-syndrome (GAS), or the buying of new equipment for the sake of it or in order that it might somehow help one take better photos, is delusional. To some extent that’s right, but the flip-side is that any one camera will impose limitations on the shots that can be achieved. So I’ve established the principle that, if one can explain 3 things a camera can allow you to do better, the acquisition is justifiable.

And so I’ve switched. The new camera is a Sony NEX7[Amazon]; even though the model is barely younger than the GH2, it still has a vastly superior sensor that will give me larger images, better dynamic range and narrower depth-of-field. Indeed, at two years old, it’s still punching above its weight despite the pressure from some of the larger dSLRs to have come out since.Wee Waterfall (2)

One of the things I learned from the GH2 is that it always pays to understand one’s equipment. For this reason, the first 100 frames shot on the NEX-7 fell into 4 kinds:

  1. studying noise with varying ISO in a comparatively low-light real-world scene (stuff on bookshelves in the study – good to know how noise and sharpness interplay in both the darkest shadows and midtones)
  2. building a library of dark-frame images at various ISO and shutter-speed combinations (taken with the lens-cap on, for a theoretically black shot – any non-zero pixels are sensor noise)
  3. building a library of lens correction profiles – taking images of a uniform out-of-focus plain wall to compare vignetting at various apertures and focal-lengths on both kit lenses
  4. studying kit-lens sharpness as a function of aperture – discussed previously.

Impressively, I could just load all these images into RawTherapee and easily move them into relevant directories in a couple of right-clicks, and from there I spent the rest of the evening deriving profiles for ISO noise and sharpness with automatic dark-frame-reduction and actually measured vignetting correction – because I know very well how much time it will save me in the future.

Despite having played with film cameras, I’m quite acutely aware of the change in sensor format this time: in moving from prolonged use of micro-4/3rds to APS-C, I can no longer assume that setting the lens to f/8 will give me everything in focus at the lens’s sweetspot, but have to stop-down to f/11 or even further. The tripod has already come into its own…

So there we go.

Oh, and the complete GH2 kit is for sale on ebay, if anyone wants to buy it!
Update 2014-02-02: the complete kit sold on eBay for a very reasonable sum!

 

Determining the best ZFS compression algorithm for email

I’m in the process of setting up a FreeBSD jail in which to run a local mail-server, mostly for work. As the main purpose will be simply archiving mails for posterity (does anyone ever actually delete emails these days?), I thought I’d investigate which of ZFS’s compression algorithms offers the best trade-off between speed and compression-ratio achieved.

The Dataset

The email corpus comprises 273,273 files totalling 2.14GB; individually the mean size is 8KB, the median is 1.7KB and the vast majority are around 2.5KB.

The Test

The test is simple: the algorithms consist of 9 levels of gzip compression plus a new method, lzjb, which is noted for being fast, if not compressing particularly effectively.

A test run consists of two parts: copying the entire email corpus from the regular directory to a new temporary zfs filesystem, first using a single thread and then using two parallel threads Рusing the old but efficient   find . | cpio -pdv   construct allows spawning of two background jobs copying the files sorted into ascending and descending order Рtwo writers, working in opposite directions. Because the server was running with a live load at the time, a test was run 5 times per algorithm Рa total of 13 hours.

The test script is as follows:

#!/bin/zsh

cd /data/mail || exit -1

zfs destroy data/temp

foreach i ( gzip-1 gzip-2 gzip-3 gzip-4 gzip-5 gzip-6 \
	gzip-7 gzip-8 gzip-9 lzjb ) {
  echo "DEBUG: Doing $i"
  zfs create -ocompression=$i data/temp
  echo "DEBUG: Partition created"
  t1=$(date +%s)
  find . | cpio -pdu /data/temp 2>/dev/null
  t2=$(date +%s)
  size=$(zfs list -H data/temp)
  compr=$(zfs get -H compressratio data/temp)
  echo "$i,$size,$compr,$t1,$t2,1"
  zfs destroy data/temp

  sync
  sleep 5
  sync

  echo "DEBUG: Doing $i - parallel"
  zfs create -ocompression=$i data/temp
  echo "DEBUG: Partition created"
  t1=$(date +%s)
  find . | sort | cpio -pdu /data/temp 2>/dev/null &
  find . | sort -r | cpio -pdu /data/temp 2>/dev/null &
  wait
  t2=$(date +%s)
  size=$(zfs list -H data/temp)
  compr=$(zfs get -H compressratio data/temp)
  echo "$i,$size,$compr,$t1,$t2,2"
  zfs destroy data/temp
}

zfs destroy data/temp

echo "DONE"

Results

The script’s output was massaged with a bit of commandline awk and sed and vi to make a CSV file, which was loaded into R.

The runs were aggregated according to algorithm and whether one or two threads were used, by taking the mean removing 10% outliers.

Since it is desirable for an algorithm both to compress well and not take much time to do it, it was decided to define efficiency = compressratio / timetaken.

The aggregated data looks like this:

algorithm nowriters eff timetaken compressratio
1 gzip-1 1 0.011760128 260.0 2.583
2 gzip-2 1 0.011800408 286.2 2.613
3 gzip-3 1 0.013763665 196.4 2.639
4 gzip-4 1 0.013632926 205.0 2.697
5 gzip-5 1 0.015003015 183.4 2.723
6 gzip-6 1 0.013774746 201.4 2.743
7 gzip-7 1 0.012994211 214.6 2.747
8 gzip-8 1 0.013645055 203.6 2.757
9 gzip-9 1 0.012950727 215.2 2.755
10 lzjb 1 0.009921776 181.6 1.669
11 gzip-1 2 0.004261760 677.6 2.577
12 gzip-2 2 0.003167507 1178.4 2.601
13 gzip-3 2 0.004932052 539.4 2.625
14 gzip-4 2 0.005056057 539.6 2.691
15 gzip-5 2 0.005248420 528.6 2.721
16 gzip-6 2 0.004156005 709.8 2.731
17 gzip-7 2 0.004446555 644.8 2.739
18 gzip-8 2 0.004949638 566.0 2.741
19 gzip-9 2 0.004044351 727.6 2.747
20 lzjb 2 0.002705393 900.8 1.657

A plot of efficiency against algorithm shows two clear bands, for the number of jobs writing simultaneously.

Analysis

In both cases, the lzjb algorithm’s apparent speed is more than compensated for by its limited compression ratio achievements.

The consequences of using two writer processes are two-fold: first, the overall efficiency is not only halved, but it’s nearer to only a third that of the single writer – there could be environmental factors at play such as caching and disk i/o bandwidth. Second, the variance overall has increased by 8%:

> aggregate(eff ~ nowriters, data, FUN=function(x) { sd(x)/mean(x, trim=0.1)*100.} )
 nowriters eff
1 1 21.56343
2 2 29.74183

so choosing the right algorithm has become more significant – and it remains gzip-5 with levels 4, 3 and 8 becoming closer contenders but gzip-2 and -9 are much worse choices.

Of course, your mileage may vary; feel free to perform similar tests on your own setup, but I know which method I’ll be using on my new mail server.

Definitions of Geek Antiquity

I can see already that this will probably be a series of more than one post…

You know you’ve been a geek a while when you find files like these lying around:

 -rw-r--r-- 1 tim users 16650 Apr 24 2001 .pinerc
...
D drwxr-xr-x 3 tim users   76 Aug 18 2003 .sawfish
  -rw------- 1 tim users  324 Jan 14 2003 .sawfishrc
D drwxr-xr-x 3 tim users   95 Apr 24 2001 .sawmill
  -rw-r--r-- 1 tim users 4419 Apr 24 2001 .sawmillrc

Just in case there’s any doubt: I used Pine from about 1995 to 2000, before I bypassed mutt in favour of Emacs and Gnus. Sawmill was the original name before Sawfish renamed itself; either way, I no longer use just a simple window-manager but the whole KDE environment.

A slight tidy-up is called for…

Comparing MP3, Ogg and AAC audio encoding formats

Over the past couple of months, I’ve been both learning the R language and boosting my understanding of statistics, the one feeding off the other. To aid the learning process, I set myself a project, to answer the age-old question: which lossy audio format is best for encoding my music collection? Does it matter if I use Ogg (oggenc), MP3 (lame) or AAC (faac), and if so, what bitrate should I use? Is there any significance in the common formats such as mp3 at 128 or 192kbit/s?

First, a disclaimer: when it comes to assessing the accuracy of encoding formats, there are three ways to do it: you can listen to the differences, you can see them, or you can count them. This experiment aims to be objective through a brute-force test, given the constraints of my CD library. Therefore¬†headphones, ears , speakers and fallible human senses do not feature in this – we’re into counting statistical differences, regardless of the nature of the effect they have on the listening experience. It is assumed that the uncompressed audio waveform as ripped from CD expresses the authoritative intent of its artists and sound engineers – and their equipment is more expensive than mine. For the purposes of this test,¬†what matters is to analyze how closely the results of encoding and decoding compare with the original uncompressed wave.

I chose a simple metric to measure this: the root-mean-square (RMS) difference between corresponding samples as a measure of the distance between the original and processed waves. Small values are better – the processed wave is a more faithful reproduction the closer the RMSD is to zero. The primary aim is to get a feel for how the RMSD varies with bitrate and encoding format.

For a representative sample, I started with the Soundcheck CD – 92 tracks, consisting of a mixture of precisely controlled audio tests and some sound-effects and samples, augmented further with a mixture of a small number of CDs of my own – some rock (Queen, Runrig) and classical (Dvorak, Bach and Beethoven).

On the way, I encountered a few surprises, starting with this:

Errors using avconv to decode mp3

Errors using avconv to decode mp3

Initially I was using Arch Linux, where sox is compiled with the ability to decode mp3 straight to wav. The problem being: it prepends approximately 1300 zero samples at the start of the output (and slightly distorts the start). Using avconv instead had no such problem. I then switched to Ubuntu Linux, where the opposite problem occurs: sox is compiled without support for mp3, and instead it’s avconv that inserts the zero samples as the above screenshot of Audacity shows. Therefore, I settled on using lame both to encode and decode.

One pleasant surprise, however: loading wav files into R just requires two commands:

install.packages("audio")
library("audio")

and then the function load.wave(filename) will load a file straight into a (long) vector, normalized into the range [-1,1]. What could be easier?

At this point, interested readers might wish to see the sources used: R audio encoding comparison. I used a zsh script to iterate over bitrates, varying the commands used for each format; a further script iterates over files in other subdirectories as well. R scripts were used for calculating the RMSD between two wavs and for performing statistical analysis.

To complicate matters further,¬†lame, the mp3 encoder, offers several algorithms to choose between: straightforward variable-bitrate encoding, optionally with a “-h” switch to enable higher quality, optionally with a lower bound “-b”, or alternatively, constant bitrate. Further,¬†faac, the AAC encoder, offers a quality control (“-q”).

The formats and commands used to generate them are as follows:

AAC-100 faac -q 100 -b $i -o foo.aac “$src” 2
AAC-80 faac -q 80 -b $i -o foo.aac “$src” 2
mp3hu lame –quiet –abr $i -b 100 -h “$src” foo.mp3
mp3h lame –quiet –abr $i -h “$src” foo.mp3
mp3 lame –quiet –abr $i “$src” foo.mp3
mp3-cbr lame –quiet –cbr -b $i “$src” foo.mp3
ogg oggenc –quiet -b $i “$src” -o foo.ogg

As a measure of the size of the experiment: 150 tracks were encoded to Ogg, all four variants of MP3 and AAC at bitrates varying from 32 to 320kbit/s in steps of 5kbps.

The distribution of genres reflected in the sample size is as follows:

Genre number of tracks
Noise (pink noise with variations) 5
Technical 63
Instrumental 20
Vocal 2
Classical 39
Rock 21

Final totals: 30450 encode+decode operations taking approximately 40 hours to run.

The Analysis

Reassuringly, a quick regression test shows that the RMSD (“deltas”) column is significantly dependent on both bitrate, mp3 and ogg:

> fit<-lm(deltas ~ bitrate+format, d) > summary(fit)

Call:
lm(formula = deltas ~ bitrate + format, data = d)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.06063 -0.02531 -0.00685  0.01225  0.49041 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    6.348e-02  8.663e-04  73.274  < 2e-16 ***
bitrate       -3.048e-04  3.091e-06 -98.601  < 2e-16 ***
formatAAC-80  -9.696e-16  9.674e-04   0.000 1.000000    
formatmp3      2.331e-02  9.674e-04  24.091  < 2e-16 ***
formatmp3-cbr  2.325e-02  9.674e-04  24.034  < 2e-16 ***
formatmp3h     2.353e-02  9.674e-04  24.324  < 2e-16 ***
formatmp3hu    2.353e-02  9.674e-04  24.325  < 2e-16 ***
formatogg      3.690e-03  9.676e-04   3.814 0.000137 ***
---
Signif. codes:  0 ‚Äė***‚Äô 0.001 ‚Äė**‚Äô 0.01 ‚Äė*‚Äô 0.05 ‚Äė.‚Äô 0.1 ‚Äė ‚Äô 1 

Residual standard error: 0.04512 on 30438 degrees of freedom
  (4 observations deleted due to missingness)
Multiple R-squared: 0.275,      Adjusted R-squared: 0.2748 
F-statistic:  1649 on 7 and 30438 DF,  p-value: < 2.2e-16

Note that, for the most part, the low¬†p-values indicate a strong dependence of RMSD (“deltas”) on bitrate and format; we’ll address the AAC rows in a minute. Comparing correlations (with the¬†cor function) shows:

Contributing factors - deltas:
    trackno     bitrate      format       genre   complexity
0.123461675 0.481264066 0.078584139 0.056769403  0.007699271

The genre and complexity columns are added by hand by lookup against a hand-coded CSV spreadsheet; the genre is useful but complexity was mostly a guess and will not be pursued further.

In the subsequent graphs, the lines are coloured by format as follows:

  • green – ogg
  • black – AAC
  • red – mp3 (ordinary VBR)
  • purple – mp3 (VBR with -h)
  • orange – mp3 (VBR with -b 100 lower bound)
  • blue – mp3 (CBR)

A complete overview:

Encoding errors, overview

Encoding errors, overview

Already we can see two surprises:

  • some drastic behaviour at low bitrates: up to 50kbit/s, ogg is hopelessly inaccurate; mp3 persists, actually getting worse, until 100kbit/s.
  • There is no purple line; the mp3 encoded with lame‘s -h switch is indistinguishable from vanilla mp3.
  • There is only one black line, showing the¬†faac AAC encoder ignores the quality setting (80 or 100% – both are superimposed) and appears to constrains the bitrate to a maximum of 150kbps, outputting identical files regardless of the rate requested above that limit.

On the strength of these observations, further analysis concentrates on higher bitrates above 100kbit/s only, the AAC format is dropped entirely and mp3h is ignored.

This changes the statistics significantly. Aggregating the average RMSD (deltas) by genre for all bitrates and formats, with the entire data-set we see:

Aggregates by genre:
           genre       deltas
1        0-Noise  0.016521056
2    1-Technical  0.012121707
3 2-Instrumental  0.009528249
4        3-Voice  0.009226851
5    4-Classical  0.011838209
6         5-Rock  0.017322585

While when only considering higher bitrates, the same average deltas are now drastically reduced:

Aggregates by genre - high-bitrate:
           genre       deltas
1        0-Noise  0.006603641
2    1-Technical  0.003645403
3 2-Instrumental  0.003438490
4        3-Voice  0.005419467
5    4-Classical  0.004444285
6         5-Rock  0.007251133

The high-bitrate end of the graph, coupled with the blend of genres, represents the likeliest real-world uses. Zooming in:

Encoding errors, high bitrate only

Encoding errors, high bitrate only

Reassuringly, for each format the general trend is definitely downwards. However, a few further observations await:

  • Ogg is the smoothest curve, but above about 180kbit/s, is left behind by all the variants of mp3.
  • The orange line (mp3, VBR with a lower bound of 100kbit/s) was introduced because of the performance at the low-bitrate end of the graph. Unexpectedly, this consistently performs¬†worse than straightforward mp3 with VBR including lower bitrates.
  • All the mp3 lines exhibit steps where a few points have a shallow gradient followed by a larger jump to the next. We would need to examine the mp3 algorithm itself to try to understand why.

Deductions

Two frequently encountered bitrates in the wild are 128 and 192kbit/s. Comparing the mean RMSDs at these rates and for all bitrates above 100kbit/s, we see:

   format   deltas(128)  deltas(192)  deltas(hbr)
1 AAC-100  0.009242665   0.008223191  0.006214152
2  AAC-80  0.009242665   0.008223191  0.006214152
3     mp3  0.007813217   0.004489050  0.003357357
4 mp3-cbr  0.007791626   0.004417602  0.003281961
5    mp3h  0.008454780   0.004778378  0.003643497
6   mp3hu  0.008454780   0.004778408  0.003643697
7     ogg  0.007822895   0.004919718  0.003648990

And the conclusion is: at 128 and 192kbps and on average across all higher bitrates, MP3 with constant bitrate encoding is the most accurate.

Should I use it?

The above test of accuracy is only one consideration in one’s choice of data format. Other considerations include the openness of the Ogg format versus patent-encumbrance of mp3 and the degree to which your devices support each format. Note also that the accuracy of mp3-cbr at about 200kbps can be equalled with ogg at about 250kbps if you wish.

Correlations between all combinations of factors

Correlations between all combinations of factors

Returning to the disclaimer above: we have reduced the concept of a difference between two waveforms to a single number, the RMSD. To put this number in context, this plot of the waveforms illustrates the scope of errors encountered:

visualizing encoding differences

visualizing encoding differences

(Reference wave on top; AAC in the middle showing huge distortion and MP3 on the bottom, showing slight differences.)

This demonstrates why tests based on listening are fallible: the errors will definitely manifest themselves as a different profile in a frequency histogram. When listening to the AAC version, the sound heard is modulated by the product of both residual errors from encoding and the frequency response curve of the headphones or speakers used. Coupled with this, the human ear is not listening for accuracy to a waveform so much as for an enjoyable experience overall – so some kinds of errors might be regarded as pleasurable, mistakenly.

There are be other metrics than RMSD by which the string-distance between source and processed waveforms can be measured. Changing the measuring algorithm is left as an exercise for the reader.

Feel free to download the full results for yourself Рbeware, the CSV spreadsheet is over 30k rows long.