Image classification with artificial neural networks is a field gaining much attention in the last years. Participants of classification competitions like ImageNet are exclusively relying on this technology, like the latest winners AlexNet or ResNet prove. While their classification capabilities even win against human capabilities, the implementations were growing to very deep networks, focusing mainly on classification accuracy and not on processing-speed or the limited availability of computational resources.
With artificial neural networks being integrated into modern life, like the usage in smart-assistants, smart-phones or self-driving cars the need to reduce the processing power to compute these networks is growing.
One technique to reduce the networks computational complexity is binarization of
weights and activations as shown by Courbariaux et al. With weights and activations restricted to the values -1 and 1, the calculation when applying weights to neurons activations can be simplified by replacing floating-point multiplications with the binary XNOR operation.
This work refines the theory of binarized neural networks by providing a distributed implementation on an ARM CPU and a FPGA with the usage of the OpenCL framework. While the implementation on a FPGA can directly benefit from the binary logic of the new network type, the ARM core provides easy accessibility to connected peripherals providing the image sources.
New concepts for training binarized networks are developed within this thesis, leading to an enormous reduction of training time and therefore, a reduction of real-life production costs.
This implementation proves to execute faster than conventional methods on the same hardware, while using less resources and without trading classification accuracy.
Freie Schlagwörter (Englisch):
Neural Network , Deep Learning , Embedded Systems , OpenCL , Image Classification