Dataset

The first component necessary for building a deep learning model. I recommend exploring your input data and targets first to get an idea for what model architecture would work best.

Datasets are built in a streaming, functional manner. Data needs to be in an iterable format so minibatching can work with the optimizer. We provide simple functional streaming capabilities to modify data in realtime in the opendeep.data.stream package (while this functional approach is good for quick experimentation, it would be faster to preprocess your whole dataset first to avoid doing these calculations on the fly).

opendeep.data

In this package you will find the classes to hold your datasets:

Dataset

The Dataset object is the superclass object for all other built-in dataset types. You can use the Dataset class to wrap your own iterable streams of data into something usable by the optimizer. See the dataset module documentation for the initialization parameters and attributes.

In essence, there are six attributes in the Dataset object: train inputs, train targets, valid inputs, valid targets, test inputs, and test targets. Inputs are fed into the model, while targets (if applicable) are passed to the loss function. Train, valid, and test correspond to splits in the data.

Other wrappers

See the subclass implementations for the Dataset object to wrap text, images, or in-memory arrays. Documentation here.