Provide a Dataset

This section describs how to provide a dataset to the dAIEdgeVLab Python API.

Format the dataset

The dataset must be formated as follows in order to be accepted by the dAIEdgeVLab :

  • The data must be pre-processed - formated as they are supposed to be presented to the model.
  • The data must be converted in byte format and concatenated.
  • The final file contains model input size in bytes * the number of data. Here the number of data will determin the number of inference perfomed.
  • The resulting file is a .bin.

Simple example with MNIST

The following script convert a subset of the MNIST dataset in a pre-processed binary file :

import torchvision

# Dataset name
dataset_name = "dataset_MNIST.bin"
# Select the number of images we want in the final dataset
nb_images = 5000

# Load MNIST dataset
mnist_dataset = torchvision.datasets.MNIST(
    root="./data", 
    train=True, 
    download=True, 
    transform=None
)

# Get images and process the data as needed 
images = mnist_dataset.data[:nb_images].numpy().astype(np.float32)  # Shape: (nb_images, 28, 28)
images /= 255.0

# Save the pre-processed dataset
images.tofile(dataset_name)

print(f"Saved {images.shape[0]} images in {dataset_name}")

This dataset is therfore adapted for a model with input shape (28, 28):

import tensorflow as tf

model = tf.keras.Sequential([
        tf.keras.layers.Flatten(input_shape=(28, 28)),  
        ...
    ])

Use the dataset

There is two main way to use a dataset for a benchmark with the dAIEdgeVLab :

  1. Upload your dataset to the dAIEdgeVLab
  2. Refer to a previously uploaded dataset

Uploading your dataset to the dAIEdge-VLab and refer it allows faster benchmark response and reduce the amount of trafic. This is not meant to keep your dataset for long period of time.

⚠️
Warning: This not a safe place to store your dataset. The datasets are automatically removed after a given period of time.

Upload & refer your dataset

You can upload a dataset to the dAIEdge-VLab with the method uploadDataset. The dataset_name is simple the file name you provided (here : dataset.bin). You can then use this dataset_name to refer to this dataset to be used for a benchmark. The following example illustre it :

import dAIEdgeVLabAPI

# Authentify to the dAIEdge-VLab
api = dAIEdgeVLabAPI("./path/to/setup.yaml")

dataset_name = api.uploadDataset("./path/to/dataset.bin")

# Start a benchmark for a given target, runtime, model and a dataset
benchmark_id = api.startBenchmark(
    target = "rpi5", 
    runtime = "tflite", 
    model_path = "./path/to/my_model.tflite",
    dataset = dataset_name
    )

# Blocking method - wait for the results
result = api.waitBenchmarkResult(benchmark_id)

# Use the result
print(result["report"])
print(result["user_log"])
print(result["error_log"])
print(result["raw_output"])

In the next example, we open the dataset.bin and provide it to the startBenchmark method. The method will automatically upload the dataset to the dAIEdge-VLab and use it to start the benchmark.

from dAIEdgeVLabAPI import dAIEdgeVLabAPI

# Authentify to the dAIEdge-VLab
api = dAIEdgeVLabAPI("./path/to/setup.yaml")

dataset = open("./path/to/dataset.bin", "rb")

# Start a benchmark for a given target, runtime, model and a dataset
benchmark_id = api.startBenchmark(
    target = "rpi5", 
    runtime = "tflite", 
    model_path = "./path/to/my_model.tflite",
    dataset = dataset
    )

# Blocking method - wait for the results
result = api.waitBenchmarkResult(benchmark_id)

# Use the result
print(result["report"])
print(result["user_log"])
print(result["error_log"])
print(result["raw_output"])
ℹ️
Info: If you upload the model with the startBenchmark method, the dataset can still be referred later using its name.
⚠️
Warning: If you upload two datasets with the same name, the previous one will be overwritten with the new one.

Manage your datasets

Since you can upload dataset to the dAIEdge-VLab and reference them to run your benchmaks, this implies that they are stored in the dAIEdge-VLab facilities. You can then see the list of available dataset that you uploaded. The dAIEdge-VLab manages the dataset and deletes them automatically after a given period of time.

List the available datasets

You can get the list of available datasets associated with your account with the method getDatasets :

from dAIEdgeVLabAPI import dAIEdgeVLabAPI

# Authentify to the dAIEdge-VLab
api = dAIEdgeVLabAPI("./path/to/setup.yaml")

# Get the currently available datasets
datasets = api.getDatasets()
# Use the availale dataset list
for dataset in datasets: 
    print("dataset_name", dataset)

Delete a Dataset

It is appreciated if you delete your datasets once you do not need to use them anymore. This reduces the stresse on the dAIEdgeVLab facilites.

from dAIEdgeVLabAPI import dAIEdgeVLabAPI

# Authentify to the dAIEdge-VLab
api = dAIEdgeVLabAPI("./path/to/setup.yaml")
# Dataset reference to delete
dataset_name = "dataset.bin"
# Delete the dataset
api.deleteDataset(dataset_name)