[CAFFE] Data Layer

Caffe has multiple input data types, and here I will address the use of .txt file as ‘image data layer’ and lmdb as ‘data layer’. The difference between reading from .txt file that includes image paths, and reading from lmdb file format is that the former reads directly from disk while the latter allocates memory on GPU and reads from there, thus it’s pretty obvious that lmdb allows for faster training. However lmdb requires a big chuck of GPU memory allocation and is not practical when the data is huge. So there is a tradeoff.

1. txt file

Notice that .txt file corresponds to a ‘ImageData’ layer, and the keyword ‘image_data_param’. Be careful when switching between using lmdb and .txt file for inputs, you have to change these keywords correspondingly.
Here is an example of what an ‘ImageData’ layer looks like in a training prototxt file.

  name: "data"
  type: "ImageData"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    ...
  }
  image_data_param {
    source: "/SOURCE PATH/train.txt"
    root_folder: "/ROOT PATH/"
    shuffle: true
  }
}

and an example of what a train.txt file looks like:

train/00000001.jpg 0
train/00000002.jpg 0
train/00000003.jpg 0
...

Several important points to notice:
1) Usually the path specified in the .txt file is a relative image path to the .txt file. If that’s the case, then you have to specify the root_folder field inside ‘image_data_param’ to be the root path relative to the caffe root folder. Most of the time SOURCE PATH is the same as ROOT PATH if you put your training image folder and the train.txt file in the same folder.
2) Default parameter says shuffle to be false, so you have to explicitly make it true if you want to shuffle your data.

2. lmdb file

lmdb stands for “Lightning Memory-Mapped Database”, a key/value storage engine.

Caffe offers tool to convert to lmdb file format from multiple data formats including .bin, np arrays and .txt file. The advantage of using lmdb is the speed of training, but the downside is the memory requirements for your GPU, and if it complains about ‘out of memory’ you may have to reduce your batch size.

Different from .txt file format, lmdb format corresponds to ‘Data’ layer, and the keyword ‘data_param’. Here is an example of what a ‘Data’ layer looks like in a training prototxt file.

layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    ...
  }
  data_param {
    source: "PATH TO/lmdb"
    backend: LMDB
  }
}

I’m not familiar with hdf5 data format, but it seems helpful if you are doing regression and have multiple labels as input.

One thought on “[CAFFE] Data Layer

  1. Hi Cecilia,
    I just wanted to comment on your first paragraph.
    Both lmdb and imagedatalayer read from disk and transfer the batch data to GPU memory. The difference is that lmdb does not require decoding of images (through opencv for caffe) since it can be made to store images in uncompressed format. When you are in a system where the CPU is the bottleneck, this is useful. However, if you are training relatively shallow networks on a very powerful GPU, the bottleneck can become the reading part itself. For example if you can process 10 batches per second with lmdb batch size 50mb you would need a hdd read speed of 500MB/s to keep up. On the other side, if you compressed those images with jpeg, you could potentially get away with way less. There is one more detail, in non-SSD drives, IOPS is very limited. LMDB allows cleaner sequential reading of data whereas reading hundreds of jpegs in parallel may easily hit the IOPS limit.

    Correct me if I am wrong here and thank you for your post!

Leave a comment