[Caffe] HDF5 Layer

I have struggled with HDF5 data layer when I wanted to have a vector label for each of my data. Below I will share some of my experience with this data layer that is very data-flexible but less straightforward to use.

Note that HDF5 data layer doesn’t support data transformation. This means that you have to either pre-process your data in the desired way before feeding them in, or add some additional processing layer such element-wise multiplication layer for data scaling.

Overall, HDF5 data layer requires a .h5 file and a .txt file. The .h5 file contains your data and label, while the .txt file specifies the path(s) to the .h5 file(s).

Following is an example of creating the .h5 file and its corresponding .txt file using python:

import h5py
import os
from __future__ import print_function

DIR = "/PATH TO xxx.h5/"
h5_fn = os.path.join(DIR, 'xxx.h5')

with h5py.File(h5_fn, 'w') as f:
   f['data'] = X

   f['label1'] = Y1

   f['label2'] = Y2

text_fn = os.path.join(DIR, 'xxx.txt')
with open(text_fn, 'w') as f:
   print(h5_fn, file = f)

Now you should have a .txt file and a .h5 file in your specified path.

The keys ‘data’, ‘label1’, ‘label2’ are keywords you defined for your data. You can have an arbitrary number of keywords, as long as you write the same keywords when you feed in your data to the hdf5 data layer. An example hdf5 data layer is like this:

layer {
   name: "example"
   type: "HDF5Data"
   top: "data"
   top: "label1"
   top: "label2"

   hdf5_data_param {
     source: "/PATH TO .txt file/"
     batch_size: 100

Notice that the top blobs have the same keywords as when I created the .h5 file.

That’s it! Now you can use hdf5 data layer 🙂


[CAFFE] Resume training from saved solverstate

It’s always good to record training states so that you can always get back to the state to resume training or to use the weights from that state’s caffe model. Caffe allows you to do that by specifying some parameters in the solver prototxt file:

# The maximum number of iterations
max_iter: 6000
# snapshot intermediate results
snapshot: 2000
snapshot_prefix: "/PATH to snapshot files/"

This will save a caffemodel file and a solverstate file per snapshot. The caffemodel file contains all the trained weights of your network architecture, while the solverstate file contains the information to be used for resuming training.

If you want to resume training from a state, write a bash file like this:

#!/usr/bin/env sh
$TOOLS/caffe train \
--solver=/PATH to solver.prototxt/ \
--snapshot=/PATH to .solverstate file/

Don’t forget to change the access permission to make the bash file executable:

chmod u+x {your bash file}

[CAFFE] Data Layer

Caffe has multiple input data types, and here I will address the use of .txt file as ‘image data layer’ and lmdb as ‘data layer’. The difference between reading from .txt file that includes image paths, and reading from lmdb file format is that the former reads directly from disk while the latter allocates memory on GPU and reads from there, thus it’s pretty obvious that lmdb allows for faster training. However lmdb requires a big chuck of GPU memory allocation and is not practical when the data is huge. So there is a tradeoff.

1. txt file

Notice that .txt file corresponds to a ‘ImageData’ layer, and the keyword ‘image_data_param’. Be careful when switching between using lmdb and .txt file for inputs, you have to change these keywords correspondingly.
Here is an example of what an ‘ImageData’ layer looks like in a training prototxt file.

  name: "data"
  type: "ImageData"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  transform_param {
  image_data_param {
    source: "/SOURCE PATH/train.txt"
    root_folder: "/ROOT PATH/"
    shuffle: true

and an example of what a train.txt file looks like:

train/00000001.jpg 0
train/00000002.jpg 0
train/00000003.jpg 0

Several important points to notice:
1) Usually the path specified in the .txt file is a relative image path to the .txt file. If that’s the case, then you have to specify the root_folder field inside ‘image_data_param’ to be the root path relative to the caffe root folder. Most of the time SOURCE PATH is the same as ROOT PATH if you put your training image folder and the train.txt file in the same folder.
2) Default parameter says shuffle to be false, so you have to explicitly make it true if you want to shuffle your data.

2. lmdb file

lmdb stands for “Lightning Memory-Mapped Database”, a key/value storage engine.

Caffe offers tool to convert to lmdb file format from multiple data formats including .bin, np arrays and .txt file. The advantage of using lmdb is the speed of training, but the downside is the memory requirements for your GPU, and if it complains about ‘out of memory’ you may have to reduce your batch size.

Different from .txt file format, lmdb format corresponds to ‘Data’ layer, and the keyword ‘data_param’. Here is an example of what a ‘Data’ layer looks like in a training prototxt file.

layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  transform_param {
  data_param {
    source: "PATH TO/lmdb"
    backend: LMDB

I’m not familiar with hdf5 data format, but it seems helpful if you are doing regression and have multiple labels as input.

[CAFFE] What files do you need to train your own network

The following list of files serves as an example to do your own training in Caffe.

  • train.sh

If you are using bash, you will be running this script to train your network. This tells where to look for the solver prototxt, and whether to restart training from an existing ‘.solverstate’ file. Note that ‘–snapshot’ is optional; it’s used when you want to restart training from an existing state.

TOOLS=./build/tools        // assume you are in the caffe directory
$TOOLS/caffe train \       // the verb "train" starts the training process
--solver=<PATH TO solver prototxt>
--snapshot=<PATH TO solver .solverstate file>
  • train.prototxt

This is the network architecture you define for your training.  You define all the layers you need, including data layer, convolutional layer, pooling, ReLU, etc. More examples of the prototxt file could be found in the caffe model zoo. (where trained network architecture is hosted)

The tricker component of train.prototxt is the data layer. I will have another post particularly talking about data layer later.

  • deploy.prototxt

The deploy prototxt is basically a duplicate of the train prototxt. This makes sense since you want your test data to be forwarded to the same network architecture. The only difference is that you have to replace the data layer in train.prototxt with a specification of the input data dimension.

Let’s say you had this data layer in your train.prototxt

layer {
  name: "..."
  type: "Data"
  top: "data"
  top: "label"
  include {
  transform_param {
  data_param {

You would want to replace the above layer with the following in your deploy.prototxt:

input: "data"
input_shape {
  dim: 1
  dim: 3 (If it's RGB color image)
  dim: Height
  dim: Width
  • data

Caffe supports different data types to be used for training. The simplest but slowest is to use a txt file with actual image path and label written in each line. But this has a latency for data fetching from the memory, and could significantly slow down your training process.

I’m more used to using lmdb files as data source. Caffe will allocate memory onto GPU and fetch data from there, thus a big speedup for training. But it’s less straightforward creating lmdb format data file, I may have a separate post on creating lmdb format data.

After you have your data ready, you specify its path in your train.prototxt inside the data layer.

  • solver.prototxt

This contains all the hyper-parameters you have for your training. An example is shown below

# The train/test net protocol buffer definition
net: "
# test_iter specifies how many forward passes the test should carry out.
# total_test_number = test_iter * batch_size
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.001
momentum: 0.9
weight_decay: 0.004
# The learning rate policy
lr_policy: "fixed"
# Display every 100 iterations
display: 100
# The maximum number of iterations
max_iter: 4000
# snapshot intermediate results
snapshot: 4000
snapshot_format: HDF5
snapshot_prefix: "PATH TO PREFIX LOCATION"
# solver mode: CPU or GPU
solver_mode: GPU
  • [optinal] solver2.prototxt

If you want to decay your learning rate after a certain amount of training iteration, you would specify another solver prototxt here with reduced learning rate. This file is optional and is only needed when you want to restart your training with a different set of hyper-parameters.


Basically, this is all you need to train your network with caffe. Happy brewing!