This video presents ongoing work under IoBT Task 3.2 on methods for optimally partitioning deep neural network models for execution over heterogeneous IoBT networks as well as a demonstration platform implementing these methods on low-power IoT hardware.
Our approach partitions the network into a prefix, executed on an edge device where data are collected, and a suffix, executed on a server. We optimize the partitioning to trade off latency, accuracy, and energy use while respecting the storage, computation and communication bandwidth constraints of the edge device.
Our demonstration system implements our partitioning approach using an IoT node that includes a low-power gray-scale QVGA imager and a commercially available low-power neural network accelerator platform called the Gap8.
We train the demonstration system to perform a 20-class object detection task and achieve a throughput rate of 1FPS while consuming 70ma on the IoT node.
…Read more
Less…