Leanpub: Publish Early, Publish Often

Resize and crop

Camera phones (or actual digital cameras) usually produce a variety of resolutions that may be too big to use directly in web or mobile applications. They need to be resized to a smaller resolution, and usually cropped to a variety of formats of different ratios.

With video applications, the desired ratios are usually 16:9 to satisfy modern TVs, while for apps like instagram, a ratio of 1:1 is preffered, so there is part of the screen reserved for image metadata, like it’s author, image comment and other widgets.

A 3840 x 2160 pixel resolution is used for 4K displays,
A 1920 x 1080 pixel resolution is used for HD ready (1/4th of 4K display),
A 1080 x 1080 pixel resolution is the default for Instagram posts.

Instagram image width is tailored towards iphone 6-8 screen sizes, which have a width of 1080 pixels. They also have a height of 1920 pixels, which, when in portrait mode, gives a person a HD ready display size.

Interpolation functions for resizing

For an image of an arbitrary width/height, resizing down to a 100x100 image would be done similarly to this pseudo code (assume that division returns a floating point):

 1 w, h := 1920, 1080
 2 dw, dh := 100, 100
 3 for y := 0; y < dh; y++ {
 4   for x := 0; x < dw; x++ {
 5     // calculate source pixels position
 6     //
 7     // x / dw = width in % of the target pixel,
 8     // multiplied by w = source pixel x position
 9 
10     sx = round((x / dw) * w)
11     sy = round((y / dh) * h)
12 
13     dest.Set(x, y, source.NRGBAAt(sx, sy))
14   }
15 }

How the color is calculated is called an interpolation function. Interpolation is a process where an algorythm calculates the color in some particular way, based on a the source pixel position.

For our particular code sample, the algorythm is called the nearest neighbour. It’s called that because all the colors chosen are found in the source image, by calculating the most appropriate x/y values.

The pitfall of this approach is that resizing images will produce jagged destination images, where a significant part of image information is lost. To demonstrate, let’s see how a 5x1 pixels image would be resized:

1 source image: [255, 0, 255, 0, 255]

Resizing the image from 5 to 4 pixels wide:

1 dest[0] = src[round((0 / dw) * w)] /* index = round(0    * 5 = 0)    = 0, color = 25\
2 5 */
3 dest[1] = src[round((1 / dw) * w)] /* index = round(0.25 * 5 = 1.25) = 1, color = 0 \
4 */
5 dest[2] = src[round((2 / dw) * w)] /* index = round(0.5  * 5 = 2.5) = 3, color = 0 */
6 dest[3] = src[round((3 / dw) * w)] /* index = round(0.75 * 5 = 3.75) = 4, color = 25\
7 5 */

So, the destination image would be [255, 0, 0, 255]. When upscaling the image to a larger resolution, we would get what’s generally referred to as a “pixelated” image. In popular media, Minecraft is the most common example where pixelated images are used extensively.

When resizing images to a smaller size using the nearest neighbor method, there is obvious data loss compared to the source image. In terms of “optical” difference, interchanged white/black values would seem as grey at a distance. A better interpolation function is able to calculate a better approximation of the resulting colors, producing a more optically-pleasing image.

A trivial method to implement is called bilinear interpolation. It’s called bilinear because we need to interpolate two values, the color along the X axis, and the color along the Y axis. For any given (X, Y) input, the fractions (subpixels) are used to calculate the final color.

 1 // non-rounded sx/sy
 2 sx = (x / dw) * w
 3 sy = (y / dh) * h
 4 
 5 // subpixel positions:
 6 subx = sx % 1.0
 7 suby = sy % 1.0
 8 
 9 // source pixel positions:
10 top = floor(sy)
11 left = floor(sx)
12 right = min(left+1, w-1) // clamp to image size
13 bottom = min(top+1, h-1) // clamp to image size
14 
15 // 4 colors for each hard x/y pair
16 colors := [4]*color.NRGBA{
17   source.NRGBAAt(left, top),
18   source.NRGBAAt(right, top),
19   source.NRGBAAt(left, bottom),
20   source.NRGBAAt(right, bottom),
21 }

We have calculated two subpixel fractions, subx and suby. A subx fraction of 0.5 would mean that the source pixel color should be calculated as the average between left/right pixel colors.

Now, the most simple way to produce the final color, is to create a 2D multiplication matrix:

 1 // a 0.25 ratio suby is weighted to the top pixel
 2 matrix_top = 1.0 - suby
 3 
 4 // a 0.25 ratio subx is weighted to left pixel
 5 matrix_left = 1.0 - subx
 6 
 7 // the rest
 8 matrix_right, matrix_left = subx, suby
 9 
10 matrix := [4]float{
11   matrix_top + matrix_left,
12   matrix_top + matrix_right,
13   matrix_bottom + matrix_left,
14   matrix_bottom + matrix_right,
15 }

For bilinear interpolation we need to multiply each value in colors with corresponding indices in matrix. Adding all the values together produces the final interpolated color. Since we are interpolating two sets of pixels, we need a final division by two, to calculate the accurate color value.

With the nearest neighbour method in the previous example, matrix is generally assumed to be [1, 0, 0, 0] along each axis, picking only one literal color (top left).

Given a pixel in position 1500.5 and 950.5, the matrix would be [0.5, 0.5, 0.5, 0.5] and the produced color would be an average of all four pixels in the colors[] slice.

1 // pseudo code, but imagine colors[x] to be uint8
2 color := ((colors[0] * matrix[0]) +
3           (colors[1] * matrix[1]) +
4           (colors[2] * matrix[2]) +
5           (colors[3] * matrix[3])) / 2.0

The destination image now becomes [255, 64, 127, 191]. Overall, the image is more realistically resized. When using bilinear interpolation for image upscaling, the produced image will seem blurred but will still retain hard pixel edges.

Additional notable interpolation methods include:

Bicubic interpolation,
Mitchell-Netravali interpolation,
Lanczos interpolation,

In terms of upscaling images, the Lanczos interpolation produces the most naturally correct image, as if a larger image was extensively blurred, but kept highlights more color-accurate. It works well for resizing the images down to smaller sizes too, at the cost of being more CPU expensive.

In comparison, when using bicubic interpolation, the highlights are lost, and often, even in comparison with bilinear interpolation, it seems like the upscaled image is created from a pixelated source which was extensively blurred.

While bilinear interpolation produces a color based only on 2 pixels along each axis, other interpolation algorythms above generally use 4 pixels or more (Lanczos can use 6 pixels or even more) to produce a more accurate color. Generally, the more pixels used, the better looking the image is when it’s being upscaled.

Resizing images

We will resort to using an existing package to provide our resizing functionality. The package nfnt/resize implements the methods described above, and we will default to use the Lanczos3 interpolation method when we’ll resize our images.

When we are dealing with resizing images, we usually resort to one of the following modes when resizing:

Letterboxing the destination image,
A covering image

The aspect ratio is the ratio between the width and the height of the image. So, a 1920x1080 image has an aspect ratio of 16:9, while a 1000x1000 image has a ratio of 1:1. These are common ratios:

21:9 - Ultra wide content (used to save vertical space),
16:9 - HD video, widescreen monitors/TVs,
4:3 - Cameras, classic monitors - landscape,
3:4 - Same but in portrait orientation,
1:1 - Instagram

While these ratios are pretty standard, the original image sources usually come in 16:9 or 4:3/3:4, depending if the source was either video or a camera, or hopefully, something close to those. When producing website content, often content in various ratios is needed in different parts of the website.

When resizing images the ratio of the source material must to be preserved, so the image doesn’t seem like it’s stretched. To achieve that, the following two strategies are usually used:

Letterboxing the image

The term “letterbox” comes from the television. When people started shooting film in widescreen formats, they added black bars above and below the video for playback on 4:3 TVs. This was done to keep the original aspect ratio.

In our case, if we want to produce a 640 x 480 pixels image from a 1920 x 1080 pixels image, we need to calculate the resized dimensions of the original image, that will fit into the smaller image.

1 srcWidth, srcHeight := 1920, 1080
2 dstWidth, dstHeight := 640, 480
3 
4 // ratioWidth := 0.33333...
5 ratioWidth := dstWidth / srcWidth
6 
7 // ratioHeight := 0.44444...
8 ratioHeight := dstHeight / srcHeight

By calculating the ratios for both width and height, we actually calculated two sizing factors. By using the ratioWidth sizing factor, the image would be resized according to fit the destination width:

1 // newWidth := 1920 * 0.33333 = 640
2 newWidth := srcWidth * ratioWidth
3 // newHeight := 1080 * 0.33333 = 360
4 newHeight := srcHeight * ratioWidth

By using the ratioHeight sizing factor, the image would be resized to fit the destination height.

1 // newWidth := 1920 * 0.44444 = 853
2 newWidth := srcWidth * ratioWidth
3 // newHeight := 1080 * 0.44444 = 480
4 newHeight := srcHeight * ratioWidth

It’s now time to move beyond pseudo code, and implement our resizer. Start by creating a cmd/resize folder, and a main.go file:

 1 package main
 2 
 3 import (
 4   "flag"
 5   "log"
 6 
 7   _ "image/gif"
 8   _ "image/jpeg"
 9   _ "image/png"
10 )

We need to create our main() function, this time by using standard library configuration flag package. We will take four possible flags, the input and output filenames (-in and -out respectively), and the width/height of the destination image.

 1 func main() {
 2   var (
 3     input  string
 4     output string
 5     width  uint
 6     height uint
 7   )
 8   flag.StringVar(&input, "in", "", "Input filename")
 9   flag.StringVar(&output, "out", "", "Output filename")
10   flag.UintVar(&width, "width", 640, "Output width")
11   flag.UintVar(&height, "height", 480, "Output height")
12   flag.Parse()
13 
14   if err := loadAndResize(output, input, width, height); err != nil {
15     log.Fatalln(err)
16   }
17 }

Let’s now create the loadAndResize function in resize.go:

 1 package main
 2 
 3 import (
 4   "errors"
 5 )
 6 
 7 func loadAndResize(output string, input string, width uint, height uint) error {
 8   if input == "" {
 9     return errors.New("missing argument: input filename")
10   }
11   if output == "" {
12     return errors.New("missing argument: output filename")
13   }
14   if width <= 0 || height <= 0 {
15     return errors.New("invalid argument: width/height")
16   }
17 
18   img, err := load(input)
19   if err != nil {
20     return err
21   }
22 
23   out := resizer(img, width, height)
24 
25   return save(output, out)
26 }

The function validates our inputs, loads the image, and saves the image. And what remains is implementing our resizer() function (resizer.go):

 1 package main
 2 
 3 import (
 4   "image"
 5   "math"
 6 
 7   "github.com/nfnt/resize"
 8 )
 9 
10 func resizer(img image.Image, width uint, height uint) image.Image {
11   b := img.Bounds()
12   srcWidth, srcHeight := float64(b.Dx()), float64(b.Dy())
13 
14   ratioWidth := float64(width) / srcWidth
15   ratioHeight := float64(height) / srcHeight
16 
17   dim := func(width, height, ratio float64) (uint, uint) {
18     var (
19       newWidth  = uint(math.Round(width * ratio))
20       newHeight = uint(math.Round(height * ratio))
21     )
22     return newWidth, newHeight
23   }
24 
25   var resized image.Image
26   w, h := dim(srcWidth, srcHeight, ratioWidth)
27   if h > height {
28     w, h = dim(srcWidth, srcHeight, ratioHeight)
29   }
30   resized = resize.Resize(w, h, img, resize.Lanczos3)
31 
32   return resized
33 }

Particularly, the resizer function only handles resizing the image to keep aspect ratio and fit inside the target dimensions. We need to add some handling code that will add the letterbox around the image to fill out the requested size.

Let’s create a letterbox.go file:

 1 package main
 2 
 3 import (
 4   "image"
 5   "image/color"
 6   "image/draw"
 7 )
 8 
 9 func letterbox(img image.Image, uwidth uint, uheight uint) image.Image {
10   resized := resizer(img, uwidth, uheight)
11   b := resized.Bounds()
12   width, height := int(uwidth), int(uheight)
13   w, h := b.Dx(), b.Dy()
14 
15   if w < width || h < height {
16     offsetX, offsetY := (width-w)/2, (height-h)/2
17     dest := image.NewRGBA(image.Rect(0, 0, width, height))
18 
19     // fill image with a solid color (blue)
20     fillColor := color.NRGBA{0, 0, 255, 255}
21     draw.Draw(dest, dest.Bounds(), &image.Uniform{fillColor}, image.Point{}, draw.Sr\
22 c)
23 
24     // draw resized image with offset
25     destStart := image.Pt(offsetX, offsetY)
26     destRect := image.Rectangle{destStart, destStart.Add(resized.Bounds().Size())}
27     draw.Draw(dest, destRect, resized, image.Point{}, draw.Src)
28 
29     return dest
30   }
31 
32   return resized
33 }

There are two things to note here:

We are filling the initial image to be letterboxed by using image.Uniform{}. The struct represents an infinite-sized image of an uniform color.

We are drawing the resized image with a offsetX or offsetY offset, producing either a vertical or horizontal letterbox, depending on the source image ratio.

Since we already implemented getting the average color in the previous chapter, let’s use the images average color to fill out the letterbox, instead of an aggresive blue color. Change the fillColor to take the average color value from the image:

1 fillColor := getAverageColor(img)

In case of nearly solid color images, the output is indistinguishable from having a larger image source. In other images, it’s reasonable to asume that the average color will mimic the general tone of the image, so pictures of the sky will end up with an off-white blue tone, while pictures from nature might end up with pastel/earth tones. Depending on the source image color vibrancy, better strategies can be used.