Provide a Dataset
This section describs how to provide a dataset to the dAIEdgeVLab Python API.
Format the dataset
The dataset must be formated as follows in order to be accepted by the dAIEdgeVLab :
- The data must be pre-processed - formated as they are supposed to be presented to the model.
- The data must be converted in byte format and concatenated.
- The final file contains
model input size in bytes
*the number of data
. Herethe number of data
will determin the number of inference perfomed. - The resulting file is a
.bin
.
Simple example with MNIST
The following script convert a subset of the MNIST dataset in a pre-processed binary file :
import torchvision
# Dataset name
dataset_name = "dataset_MNIST.bin"
# Select the number of images we want in the final dataset
nb_images = 5000
# Load MNIST dataset
mnist_dataset = torchvision.datasets.MNIST(
root="./data",
train=True,
download=True,
transform=None
)
# Get images and process the data as needed
images = mnist_dataset.data[:nb_images].numpy().astype(np.float32) # Shape: (nb_images, 28, 28)
images /= 255.0
# Save the pre-processed dataset
images.tofile(dataset_name)
print(f"Saved {images.shape[0]} images in {dataset_name}")
This dataset is therfore adapted for a model with input shape (28, 28):
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
...
])
Use the dataset
There is two main way to use a dataset for a benchmark with the dAIEdgeVLab :
- Upload your dataset to the dAIEdgeVLab
- Refer to a previously uploaded dataset
Uploading your dataset to the dAIEdge-VLab and refer it allows faster benchmark response and reduce the amount of trafic. This is not meant to keep your dataset for long period of time.
Upload & refer your dataset
You can upload a dataset to the dAIEdge-VLab with the method uploadDataset
. The dataset_name
is simple the file name you provided (here : dataset.bin
). You can then use this dataset_name
to refer to this dataset to be used for a benchmark. The following example illustre it :
import dAIEdgeVLabAPI
# Authentify to the dAIEdge-VLab
api = dAIEdgeVLabAPI("./path/to/setup.yaml")
dataset_name = api.uploadDataset("./path/to/dataset.bin")
# Start a benchmark for a given target, runtime, model and a dataset
benchmark_id = api.startBenchmark(
target = "rpi5",
runtime = "tflite",
model_path = "./path/to/my_model.tflite",
dataset = dataset_name
)
# Blocking method - wait for the results
result = api.waitBenchmarkResult(benchmark_id)
# Use the result
print(result["report"])
print(result["user_log"])
print(result["error_log"])
print(result["raw_output"])
In the next example, we open the dataset.bin
and provide it to the startBenchmark
method. The method will automatically upload the dataset to the dAIEdge-VLab and use it to start the benchmark.
from dAIEdgeVLabAPI import dAIEdgeVLabAPI
# Authentify to the dAIEdge-VLab
api = dAIEdgeVLabAPI("./path/to/setup.yaml")
dataset = open("./path/to/dataset.bin", "rb")
# Start a benchmark for a given target, runtime, model and a dataset
benchmark_id = api.startBenchmark(
target = "rpi5",
runtime = "tflite",
model_path = "./path/to/my_model.tflite",
dataset = dataset
)
# Blocking method - wait for the results
result = api.waitBenchmarkResult(benchmark_id)
# Use the result
print(result["report"])
print(result["user_log"])
print(result["error_log"])
print(result["raw_output"])
startBenchmark
method, the dataset can still be referred later using its name.Manage your datasets
Since you can upload dataset to the dAIEdge-VLab and reference them to run your benchmaks, this implies that they are stored in the dAIEdge-VLab facilities. You can then see the list of available dataset that you uploaded. The dAIEdge-VLab manages the dataset and deletes them automatically after a given period of time.
List the available datasets
You can get the list of available datasets associated with your account with the method getDatasets
:
from dAIEdgeVLabAPI import dAIEdgeVLabAPI
# Authentify to the dAIEdge-VLab
api = dAIEdgeVLabAPI("./path/to/setup.yaml")
# Get the currently available datasets
datasets = api.getDatasets()
# Use the availale dataset list
for dataset in datasets:
print("dataset_name", dataset)
Delete a Dataset
It is appreciated if you delete your datasets once you do not need to use them anymore. This reduces the stresse on the dAIEdgeVLab facilites.
from dAIEdgeVLabAPI import dAIEdgeVLabAPI
# Authentify to the dAIEdge-VLab
api = dAIEdgeVLabAPI("./path/to/setup.yaml")
# Dataset reference to delete
dataset_name = "dataset.bin"
# Delete the dataset
api.deleteDataset(dataset_name)