Lucas
Klassmann | Blog

Handling Binary Files in Go

How to Extract Data from Binary Formats

Updated in 2020-05-05 · First published in 2018-07-21 · By Lucas Klassmann

Introduction

Go supports reading many popular binary formats like BMP, PNG, JPEG, GIF, PE, ELF through its standard library.

However, when there is a special binary format you would like to handle, it is important to know which tools Go offers.

It is easy to find third-party modules implementing APIs for reading common binary formats. But when you have a case where you have a binary format and do not find a module for it, here you going to learn how to deal with binary files in Go by yourself.

The knowledge needed

The first step is to grasp the file structure. Independent if you are "hacking" or have the formal specification, what you need is to comprise the file format going through its sections, fields, field sizes, offsets, and the byte order.

This knowledge will help us to transform simple bytes into structured data.

What we need to do after

When we parse a binary file, usually we start opening the file for reading in binary mode and follow fetching bytes while putting them into a structure.

In Go, a naive approach is to open a file and receive a structure that implements io.Reader interface, with that it is possible to use the method Read(n) for starting reading the bytes. But as you will see there is a better approach using encoding/binary package.

Before starting, we need to find a simple format to explore, and here comes the Apple Icon Image file.

Binary Format - Apple Icon Image Overview

Let's try to read the ICNS file, a file format for storing multiple images of different resolutions.

The format is used as scalable icons in MacOS, very similar to ICO on Windows. Here we are going to use the icon from OpenEmu, a popular emulator for MacOS. I got this icon opening the App directory on MacOS. You can use any ICNS you want.

The file stores one or more images with different sizes, resolutions and compression algorithms. Usually, it is used by applications on MacOS for showing the best image for each screen resolution and also in the Finder and Dock where the icon visualization has different sizes.

OpenEmu Icon

You can find more information here.

How it has a very simple structure, this will help us focusing more on decoding binary files instead of trying to understand a more complex file format.

Before starting to code, let's take a look at the file format and understand its internal structure. Here I tried to put all important points like fields, field sizes in bytes and offsets.

The byte order is Big Endian. More information about Byte Order here.

ICNS Internal Structure

ICNS files are made by fixed-size fields of 4 bytes for storing general data and variable length fields for storing the image data. This structure is really simple and fits perfectly for our example.

We start with the Header section, which is composed by two fields of 4 bytes each.

The first field is called the Magic field, it is a character field and has 4 bytes, it has the constant ICNS value, which helps us identifying if the file is actually an ICNS file, this pattern is really common across binary formats.

The second field is the File Size, it's an integer field with 4 bytes. It contains the size of the entire file.

Icon Data

After the Header section, all the following sections are Icon Data sections.

The first field of this section is the Type. It's a character field of 4 bytes and specifies the format for the image inside the Data field. There is a predefined table that you can use to know which size(in pixels), and format(JPEG or PNG) is inside the Data field. Click here for more information.

The Size is an integer field with 4 bytes and it contains the total size of the section, including the space used by Type and Size fields. We use this field to calculate the field Data length which is the last field and it has a variable length size, the size can be calculated based on this formula:

Data Size = Size Value - (size of field Type + size of field Size)

Coding the example

Let's code our example.

Defining the File Structures

Now that we understand better the ICNS internal structure, we can start defining the structures inside the code.

The first structure is the Header; it has only two fields of 4 bytes each.

  • Magic, constant value "ICNS".
  • Length, file length.

In the Magic field, we just have to use an array of 4 bytes; The Length uses an unsigned int of 32 bits(4 bytes) for storing length, we use unsigned because there is no negative lengths in the file size.

type Header struct {
    Magic  [4]byte
    Length uint32
}

The second structure is for an image file stored inside the file.

  • Type, image format.
  • Length, section size.
  • Data, the image data.
type IconData struct {
    Type   [4]byte
    Length uint32
    Data   []byte
}

The last structure is a composition of the other types; this is the complete file structure, it has the Header and a list of IconData.

type AppleIcon struct {
    Header
    Icons []IconData // All Icons
}

The encoding/binary module

To help us reading the bytes from the file, we use the package encoding/binary from the standard library.

The function binary.Read allows us to read bytes easier from an io.Reader with a particular byte order, in this case, Big Endian. The data argument must be a pointer to a fixed-size value or a slice of fixed-size values.

You can read more in the documentation in the link at the end of this post.

// Function signature in encoding/binary
func Read(r io.Reader, order ByteOrder, data interface{}) error

Here are some examples of how to use the function binary.Read.

Note that it is only an example, there is no code about loading the reader and the use of Big Endian.

    // Some usage examples, they are not related with our main example
    var singleByte byte
    err := binary.Read(reader, binary.BigEndian, &singleByte)

    var sixteenBytes [16]byte
    //
    err = binary.Read(reader, binary.BigEndian, &sixteenBytes)

    var unsignedInteger uint32
    err = binary.Read(reader, binary.BigEndian, &unsignedInteger)

Reading the File

Going back to our main example.

Now we have to open the file and read its contents. How it is a basic example, we are going to read the file completely into memory.

// This icon is from OpenEmu app, you can get it inside the example repository
data, _ := ioutil.ReadFile("OpenEmu.icns") // NOTE 1.0

Note that it is not recommended loading the entire file into memory. It's way better streaming the data. But for us, it will not be an issue.

I don't want to be verbose here, so a better error handling is only in the complete example inside the repository.

Now we need to create a new Reader, we call the function bytes.NewReader, it returns a instance that satisfy io.Reader interface.

io.Reader is an interface that abstracts to the caller the way to Read the bytes from a source, it can be the network, a file or another source.

reader := bytes.NewReader(data)

Decoding the File

For decoding our file, we start writing a function that it is going to store all the necessary code. This function receives a Reader and returns two things, the structure AppleIcon filled with the data and an error if there is one, otherwise nil.

func ReadAppleIcon(r *bytes.Reader) (*AppleIcon, error) {
    // Code Here...
}

Inside of the function, we start defining a variable with the type of our structure AppleIcon that it stores all decoded data.

var icns AppleIcon

We must start reading the Header data; we need to read linearly because the Reader keeps the information of the position the last byte read.

Note how offsets are important because when we read bytes using this method, each new read will start from the last stopped position.

binary.Read(r, binary.BigEndian, &icns.Header)

After reading the Header, the Reader moved 8 bytes -- that is the size of our Header -- now we can start read each icon.

After reading the 8 bytes of the Header, we have to start reading the icons.

Reading the icons is easy, while we don't get an EOF (End Of File) error we continue reading in a loop the two fixed fields (Type and Length) and after that we read the variable-length data based on the field Length.

We store each icon inside an IconData variable and append it to AppleIcon.Icons.

for {
    var icon IconData
    err := binary.Read(r, binary.LittleEndian, &icon.Type)

    if err != nil {
        if err == io.EOF {
            break
        }
        return nil, fmt.Errorf("error reading icons: %s", err)
    }

    binary.Read(r, binary.BigEndian, &icon.Length)
    fmt.Printf("%d\n", icon.Length)
    data := make([]byte, icon.Length-8) // NOTE 1.1
    binary.Read(r, binary.BigEndian, data)
    w := bytes.Buffer{}
    w.Write(data)
    icon.Data = w.Bytes()
    icns.Icons = append(icns.Icons, icon)
}

Remember! - Each icon has a length field, which defines the total size of its section. The length is the sum of the size of type, length and data fields. How data is variable in size, in order to get its size, we substract 8 bytes(that is the sum of the size of type and length fields) from length.

The complete code

We have here the code of our explanation. You will find the link to the repository with the complete example.

package main

import (
    "bytes"
    "encoding/binary"
    "fmt"
    "io"
    "io/ioutil"
)

type Header struct {
    Magic  [4]byte
    Length uint32
}

type IconData struct {
    Type   [4]byte
    Length uint32
    Data   []byte
}

type AppleIcon struct {
    Header
    Icons []IconData
}

// Here is a utility function for dump the information about the file
func (i *AppleIcon) Print() {
    fmt.Printf("Header Magic: %s\n", i.Header.Magic)
    fmt.Printf("Header Length: %d\n", i.Header.Length)
    fmt.Println("[Icons]")
    for i, icon := range i.Icons {
        fmt.Printf("%d - %s - Len: %d\n", i, icon.Type, icon.Length)
    }
}

// ReadAppleIcon uses the reader to read bytes into de AppleIcon structure
func ReadAppleIcon(r *bytes.Reader) (*AppleIcon, error) {
    var icns AppleIcon

    binary.Read(r, binary.BigEndian, &icns.Header)

    // We have to iterate until end of the file
    for {
        var icon IconData
        err := binary.Read(r, binary.LittleEndian, &icon.Type)

        // Check if there is another icon for reading
        if err != nil {
            if err == io.EOF {
                break
            }
            // If it is an unexpected error, returns to the caller
            return nil, fmt.Errorf("error reading icons: %s", err)
        }

        binary.Read(r, binary.BigEndian, &icon.Length)
        data := make([]byte, icon.Length-8)  // NOTE 1.1
        binary.Read(r, binary.BigEndian, data)

        // I use here a bytes.Buffer for easily write all the data from reader
        w := bytes.Buffer{}
        w.Write(data)

        icon.Data = w.Bytes()
        icns.Icons = append(icns.Icons, icon)
    }

    return &icns, nil
}

func main() {

    // This icon is from OpenEmu app, you can get it inside the example repository
    data, err := ioutil.ReadFile("OpenEmu.icns")

    if err != nil {
        panic(err)
    }

    reader := bytes.NewReader(data)
    icon, err := ReadAppleIcon(reader)
    if err != nil {
        panic(err)
    }
    icon.Print()    // Dump the information
}

Extra Resources

The End

I hope this example is clear enough and useful for you to start using encoding/binary in Go. There is a lot more about it and you will find more information in the links bellow.

Any suggestions and issues are welcome, please open a ticket on Github inside the Example Repository repository.

Thank you!