Convolutional neural networks(CNNs) have achieved great success in challenging tasks such as image classification and object detection. Some of the problems for deploying CNNs on different devices have been large memory, power consumption, and computational complexity. In these networks, hundreds of filters and channels should be processed in high-dimensional convolutions. These computations cause a significant amount of data movement. There have been efforts in order to design accelerators and to find a dataflow that supports parallel processing with minimal data movement cost in order to achieve a fast and energy-efficient CNN implementation.
The CNN hardware accelerator that we consider in this project, consists of several processing elements(PE) and each of the proposed dataflows has different ways of organizing data in these PEs. In this project, we want to visualize the dataflows that CNN hardware accelerators use. This visualization will show which part of the data is being processed in PEs and what the sequence of the computation is.