Let’s start with something trivial: sorting numbers. Regardless of the algorithm you’ll use, real numbers are naturally ordered. Mathematically speaking, they have a total order, in the sense that you can always decide if a number is greater than another one. There is no ambiguity in this, meaning you can actually sort them, and (excluding duplicates) this sort is unique. There are other fields which are not that lucky: colour, for instance, are very unlucky. Supposing you’re representing colours with their RGB values, there is no standard way to order triples in a line, since they are naturally not organised in a line fashion. The problem is even more complicated since colours have a meaning in the real world. How can we sort colours so that they look as continuous as possible? Which parameters affects the sorting order? Is azure closer to blue (similar hue) or to cyan (similar luminosity)? I can stop you all here and say that there is no solution to this problem. You can sort colours, but the overall result depends on what you are trying to achieve. This post will explore how colours can be sorted, and how this can lead to very different results.
Part 1: Colour sorting
Let’s start by creating a bunch of random colours, sampling them from the RGB space.
import random colours_length = 1000 colours = [] for i in range(1, colours_length): colours.append ( [ random.random(), random.random(), random.random() ] )
All the examples in this post will refer to this very list of random colours.
RGB sorting
The most trivial way in which we can sort colours is by directing sort its RGB values. Python has a very naive way of doing this: it sorts the first component, then the second and finally the third one. If two colours have the same quantity of red, the green channel will be used to determine which one is “bigger”.
colours.sort()
The result looks indeed very poor.
HSV sorting
The RGB colour space works very well for monitors, but it does not represent how similar colours are. The HSV space attempts to overcome to this problem by introducing a parameter called hue. The hue is the base colour, made more intense or washed away by its saturation. The third component, simply called value, determined how “dark” the colour is. Since the hue has been arranged, by definition, in a rainbow fashion, sorting directly on the HSV values is likely to produce somehow visually pleasant results.
colours.sort(key=lambda rgb: colorsys.rgb_to_hsv(*rgb) )
Sorting using the HLS space, which organises colours in a similar fashion, produces seemingly indistinguishable results.
colours.sort(key=lambda rgb: colorsys.rgb_to_hls(*rgb) )
Both these solutions, however, looks very noisy.
Luminosity sorting
The reason why sorting in HSV and HLS colour spaces produces noisy result is caused by a single factor. HSV believes that hue is more important than luminosity. Two visually different shades of blue are closer, compared two two different colours with the similar intensity. An attempt to compensate for this is by sorting directly for the perceived luminosity of a colour.
def lum (r,g,b): return math.sqrt( .241 * r + .691 * g + .068 * b ) colours.sort(key=lambda rgb: lum(*rgb) )
But this, unfortunately, still yields a very poor result.
Step sorting
If we want to sort colours in a visually pleasant way, we need to do something more complicated. We can, for instance, merge hue and luminosity information to obtain a smoother result. Another problem we encounter, however, is determined by how tuples are sorted in Python. To dampen the impact that sorting on the first component has, we can reduce the colour space from a float value between 0 to 1, to an integer from 0 to 7. By doing this, much of the noise is removed.
def step (r,g,b, repetitions=1): lum = math.sqrt( .241 * r + .691 * g + .068 * b ) h, s, v = colorsys.rgb_to_hsv(r,g,b) h2 = int(h * repetitions) lum2 = int(lum * repetitions) v2 = int(v * repetitions) return (h2, lum, v2) colours.sort(key=lambda (r,g,b): step(r,g,b,8) )
Most of the noise is not removed, but the segments don’t look continuous any more. To fix this, we can invert the luminosity of every other segment.
def step (r,g,b, repetitions=1): lum = math.sqrt( .241 * r + .691 * g + .068 * b ) h, s, v = colorsys.rgb_to_hsv(r,g,b) h2 = int(h * repetitions) lum2 = int(lum * repetitions) v2 = int(v * repetitions) if h2 % 2 == 1: v2 = repetitions - v2 lum = repetitions - lum return (h2, lum, v2) colours.sort(key=lambda (r,g,b): step(r,g,b,8) )
Colours looks now organised in a neater way. They are mostly continuous, with very little noise compared to the native HSV sorting. This, however, came at the expense of a monotonic luminosity. When it comes to colour sorting, we can’t have it all.
Hilbert sorting
There is another way to sort colours which looks rather interesting. It is based on the concept of Hilbert curve. You can image a Hilbert curve as a way of mapping every point in a 2D space by using a 1D curve.
This is possible because the Hilbert curve is a fractal space-filling object. We can extend the same concept of space-filling to our three dimensional colour space. What if we use a Hilbert curve to connect all the colours in their RGB space? For this example I am using Steve Witham‘s implementation of a Hilbert walk, as suggested by Jan Pöschko.
import hilbert colours.sort(key=lambda (r,g,b):hilbert.Hilbert_to_int([int(r*255),int(g*255),int(b*255)]) )
The result is indeed intriguing and, despite not following any intuitive colour distribution, it looks very homogeneous. It’s interesting to notice that while all the above mentioned technique would sort greyscale colours correctly, Hilbert sorting rearranges them in a very different way.
Travelling Salesman sorting
The travelling salesman problem refers to a very practical issues: visiting a certain number of cities minimising the overall distance and visiting each city only once. This sounds exactly what we want: visiting each colours only once, minimising the overall distance. Despite being such a critical issue, the travelling salesman problem is NP-complete, which is a fancy way of saying that is too computationally expensive to run over thousands of colours. The algorithm I’ll be using instead is a suboptimal version, called nearest neighbour. Without going too much into its details, it often finds solutions to the travelling salesman problem which are, on average, 25% worse. Which is not that bad, after all. I have used this version of the algorithm.
from scipy.spatial import distance # Distance matrix A = np.zeros([colours_length,colours_length]) for x in range(0, colours_length-1): for y in range(0, colours_length-1): A[x,y] = distance.euclidean(colours[x],colours[y]) # Nearest neighbour algorithm path, _ = NN(A, 0) # Final array colours_nn = [] for i in path: colours_nn.append( colours[i] )
Its result is very smooth, even though the colours are all over the place. This solution, however, should be the one that really minimises the distances between them.
We can also use other colour spaces to calculate distance, such as the HSV (top) and the Lab (bottom) ones, although they all yields similar results:
Part 2: Colour distance
Colour sorting is deeply connected to another problem. Given two different colours, how distant are they? The concept of distance strongly depends on the space we are analysing them. Just to give you an indication, here there are some charts which will help you understand how distance is perceived in several colour space.
In the following diagrams, every (x,y) pixel indicates how distant the respective colours in the X and Y axes are. The X and Y axes arrange colours according to their hue value. White pixels indicate a perfect match: the colours have zero distance, meaning they are are identical. Dark pixels, instead, indicate a high distance. You can click on the charts to expand them.
The next diagrams replace the rainbow colours on the horizontal axis with a greyscale gradient. There won’t be any white pixels since no two colours are the same now.
HSV & HSL | RGB |
YIQ | LAB |
📰 Ad Break
Conclusion
Sorting by colours is a very common practice, especially in advertising and other form of media which requires to be visually pleasant. In a previous post I explained how I collected all the screenshots from #ScreenshotSaturday and sorted them, showing some interesting results over the predominant hues found in indie games. These mosaics are done using over 30.000 images, weighting almost 9Gb. You can download the full-size moisacs (16Mb, 71Mb, 40Mb, 13Mb) here on Patreon. Sceen size mosaics are also available in the tweet below.
Sorting colours is a pain. There isn’t a magic function which will order them nicely, simply because the way we perceived them is based on three different components. Any attempt to flatten them onto one single dimension will inevitably collapse some of the complexity. When it comes to sort colours, you should understand which features you want to highlight. Is it the hue? Is it the luminosity? Start from there, and create your own function.
Other resources
- Sorting colours with Hilbert curves: the post which inspired the Hilbert sorting;
- Sorting colours in Mathematica: an interesting discussion on how colours can be sorted in Mathematica;
- A Visit to Disney’s Magic Kingdom: how colour clustering and sorting can be used to identify emotions in Disney’s movies;
- Using PCA to sort high dimensional objects: how to use Principal Component Analysis to sort colours;
- TSPPixelSort: how genetics algorithms can be used to sort colours;
- Rainbow Smoke: all the RGB colours sorted in a single image. It’s just beautiful.
- Part 1: How to retrieve all the images from a website
- Part 2: How to find the main colours in an image
- Part 3: The incredibly challenging task of sorting colours
Leave a Reply