I have struggled with HDF5 data layer when I wanted to have a vector label for each of my data. Below I will share some of my experience with this data layer that is very data-flexible but less straightforward to use.
Note that HDF5 data layer doesn’t support data transformation. This means that you have to either pre-process your data in the desired way before feeding them in, or add some additional processing layer such element-wise multiplication layer for data scaling.
Overall, HDF5 data layer requires a .h5 file and a .txt file. The .h5 file contains your data and label, while the .txt file specifies the path(s) to the .h5 file(s).
Following is an example of creating the .h5 file and its corresponding .txt file using python:
import h5py import os from __future__ import print_function DIR = "/PATH TO xxx.h5/" h5_fn = os.path.join(DIR, 'xxx.h5') with h5py.File(h5_fn, 'w') as f: f['data'] = X f['label1'] = Y1 f['label2'] = Y2 text_fn = os.path.join(DIR, 'xxx.txt') with open(text_fn, 'w') as f: print(h5_fn, file = f)
Now you should have a .txt file and a .h5 file in your specified path.
The keys ‘data’, ‘label1’, ‘label2’ are keywords you defined for your data. You can have an arbitrary number of keywords, as long as you write the same keywords when you feed in your data to the hdf5 data layer. An example hdf5 data layer is like this:
layer { name: "example" type: "HDF5Data" top: "data" top: "label1" top: "label2" hdf5_data_param { source: "/PATH TO .txt file/" batch_size: 100 } }
Notice that the top blobs have the same keywords as when I created the .h5 file.
That’s it! Now you can use hdf5 data layer 🙂