Lucas
Klassmann | Blog

Handling Binary Files in Go

How to Extract Data from Binary Formats

2018-07-21 ยท By Lucas Klassmann

Introduction

Go supports popular file formats like BMP, PNG, JPEG, GIF, PE, ELF and many others. However when you need to work with something in special that it is not available in the standard library and there is no third-party packages, you will need to handle by yourself.

In this case, you need to parse bytes from a file and put this data into a custom structure.

An effiencient way to handle binary files is using encoding/binary package. Its usage is simple and ready for use.

Before starting with an example, we need to find something practical to explore, and here comes the Apple Icon Image file.

Case Study - Apple Icon Image

Let's try to decode the ICNS file, a file format for storing multiple images of different resolutions, and it is intended for use as icons in MacOS. It has a simple structure, which it helps us focusing more on decoding binary files instead of trying to understand a complex file format.

Before starting to code, let's take a look at the file format and understand its internal structure.

You can find more information here.

ICNS file is made by fixed-size fields of 4 bytes and variable length fields for storing the image data.

The file can contain one or more images with different sizes, resolutions and compression algorithm. Usually, it is used by applications on MacOS for showing the best image for each screen resolution and also in the Finder and Dock where the icon visualization has different sizes.

The File Structure

Let's define some structures that the ICNS file needs.

The first structure is the Header; it has only two fields of 4 bytes each.

  • Magic, identifies the file format, the fixed string "ICNS".
  • Length, is the total size of the file in bytes.

In the Magic field, we just have to use an array of 4 bytes; The Length uses an unsigned int of 32 bits(4 bytes) for storing length, we use unsigned because there is no negative lengths in the file size.

type Header struct {
    Magic  [4]byte
    Length uint32
}

The second structure is for an image file stored inside the file.

  • Type is fixed-size 4 bytes, and it identifies the format of the image.
  • Length is fixed-size 4 bytes, and its value is the size of IconData structure in bytes.
  • Data is a variable-length field that contains an array of bytes of the image and its value is the same size of the value in Length less 8 bytes which is the size of the fields Type and Length.
type IconData struct {
    Type   [4]byte
    Length uint32
    Data   []byte
}

The last structure is a composition of the other types; it is the complete file structure, it has the Header and a list of IconData.

type AppleIcon struct {
    Header
    Icons []IconData // All Icons
}

Using encoding/binary

To helping us reading the bytes from the file, we use the package encoding/binary from the standard library.

The function binary.Read allows us to read bytes easier from an io.Reader with a particular byte order, in this case, Big Endian. The data argument must be a pointer to a fixed-size value or a slice of fixed-size values.

You can read more in the documentation in the link at the end of this post.

// Function signature in encoding/binary
func Read(r io.Reader, order ByteOrder, data interface{}) error

Here are some examples of how to use the function binary.Read.

Note that it is only an example, there is no code about loading the reader and the use of Big Endian.

    // Some usage examples, they are not related with our main example
    var singleByte byte
    err := binary.Read(reader, binary.BigEndian, &singleByte)

    var sixteenBytes [16]byte
    //
    err = binary.Read(reader, binary.BigEndian, &sixteenBytes)

    var unsignedInteger uint32
    err = binary.Read(reader, binary.BigEndian, &unsignedInteger)

Reading the File

Before decoding the bytes, we have to open a file and read its contents.

The first step is reading the file, in this example, I am reading the whole file.

// This icon is from OpenEmu app, you can get it inside the example repository
data, _ := ioutil.ReadFile("OpenEmu.icns") // NOTE 1.0

Note 1.0 It is not recommended reading the entire file, but in this case, it does not impacts, because we don't intend to read large files here.

I want to be brief here, and I am not checking the errors, but in the complete example it has some checks for errors.

Now we need to create a new Reader, we call the function bytes.NewReader, it returns a new io.Reader.

io.Reader is an interface that abstracts to the caller the way to Read the bytes from a source, it can be the network, a file or another source.

reader := bytes.NewReader(data)

Decoding the File

For decoding our file, we start writing a function that it is going to store all the necessary code. This function receives a Reader and returns two things, the structure AppleIcon filled with the data and an error if there is one, otherwise nil.

func ReadAppleIcon(r *bytes.Reader) (*AppleIcon, error) {
    // Code Here...
}

Inside of the function, we start declaring a variable with the type of our structure AppleIcon that it stores all decoded data.

var icns AppleIcon

We must start reading the Header data; we need to read linearly because the Reader keeps the information of the position the last byte read.

binary.Read(r, binary.BigEndian, &icns.Header)

After reading the Header, the Reader moved 8 bytes -- that is the size of our Header -- now we can start read each icon.

After reading the 8 bytes of the Header, we have to start reading the icons.

Reading the icons is easy, while we don't get an EOF error we continue reading in loop the two fixed fields (Type and Length) and after that we read the variable-length data based on the field Length.

We store each icon inside an IconData variable and append it to AppleIcon.Icons.

for {
    var icon IconData
    err := binary.Read(r, binary.LittleEndian, &icon.Type)

    if err != nil {
        if err == io.EOF {
            break
        }
        return nil, fmt.Errorf("error reading icons: %s", err)
    }

    binary.Read(r, binary.BigEndian, &icon.Length)
    fmt.Printf("%d\n", icon.Length)
    data := make([]byte, icon.Length-8) // NOTE 1.1
    binary.Read(r, binary.BigEndian, data)
    w := bytes.Buffer{}
    w.Write(data)
    icon.Data = w.Bytes()
    icns.Icons = append(icns.Icons, icon)
}

Note 1.1 Each Icon has a Length, but this is the total length of the Icon structure inside the file (counting Type and Length fields plus the Data field length). We need to substract 8 bytes.

We have here the code of our explanation. You will find the link to the repository with the complete example.

package main

import (
    "bytes"
    "encoding/binary"
    "fmt"
    "io"
    "io/ioutil"
)

type Header struct {
    Magic  [4]byte
    Length uint32
}

type IconData struct {
    Type   [4]byte
    Length uint32
    Data   []byte
}

type AppleIcon struct {
    Header
    Icons []IconData
}

// Here is a utility function for dump the information about the file
func (i *AppleIcon) Print() {
    fmt.Printf("Header Magic: %s\n", i.Header.Magic)
    fmt.Printf("Header Length: %d\n", i.Header.Length)
    fmt.Println("[Icons]")
    for i, icon := range i.Icons {
        fmt.Printf("%d - %s - Len: %d\n", i, icon.Type, icon.Length)
    }
}

// ReadAppleIcon uses the reader to read bytes into de AppleIcon structure
func ReadAppleIcon(r *bytes.Reader) (*AppleIcon, error) {
    var icns AppleIcon

    binary.Read(r, binary.BigEndian, &icns.Header)

    // We have to iterate until end of the file
    for {
        var icon IconData
        err := binary.Read(r, binary.LittleEndian, &icon.Type)

        // Check if there is another icon for reading
        if err != nil {
            if err == io.EOF {
                break
            }
            // If it is an unexpected error, returns to the caller
            return nil, fmt.Errorf("error reading icons: %s", err)
        }

        binary.Read(r, binary.BigEndian, &icon.Length)
        data := make([]byte, icon.Length-8)  // NOTE 1.1
        binary.Read(r, binary.BigEndian, data)

        // I use here a bytes.Buffer for easily write all the data from reader
        w := bytes.Buffer{}
        w.Write(data)

        icon.Data = w.Bytes()
        icns.Icons = append(icns.Icons, icon)
    }

    return &icns, nil
}

func main() {

    // This icon is from OpenEmu app, you can get it inside the example repository
    data, err := ioutil.ReadFile("OpenEmu.icns")

    if err != nil {
        panic(err)
    }

    reader := bytes.NewReader(data)
    icon, err := ReadAppleIcon(reader)
    if err != nil {
        panic(err)
    }
    icon.Print()    // Dump the information
}

Some Resources

The End

I tried to show a basic example of how to use the package encoding/binary and the method Read. It is pretty simple, but I think that it is useful as a starting point. If you find something wrong, you can open an issue and comment in the example repository.

Thank you!

Note: English is not my main language, I've been writing for improving my English. If you find something wrong you can open an issue or send me a message on Twitter.