Accurate Deep Neural Networks for Processing Compressed Images in Smart Cities
In smart cities, cameras are placed all around in order to capture footage from various locations, covering a significant portion of the urban area. These images and videos are then transmitted to servers in order to be processed by AI models (usually deep neural networks) to provide analytics that can assist in different operations, e.g., to count the number of people, or detect anomalous events and perform many other automated tasks. Since modern cameras capture visual streams in very high resolutions, the captured footage has a massive size. Thus transmission of raw footage would take a long time and can cause congestion issues in the data communication networks. Therefore, the captured images and videos are typically compressed before transmission. However, just as human eyes may have difficulty detecting features in lower-quality images, neural networks have a harder time providing high-performing responses when processing compressed images. Researchers at Aarhus University discovered that for the crowd counting task, among readily available compression techniques on cameras, the JPEG compression algorithm provides the best trade-off between size reduction and neural network accuracy[1]
To further improve the accuracy of neural networks operating on compressed images, Aarhus University researchers developed a novel approach called Curriculum Pre-Training for training neural networks [2]. This method is inspired by how students learn new subjects in classrooms, that is, they start with being introduced to the easiest concepts and gradually move on to more difficult and complex ones. In curriculum pre-training, the neural network initially starts learning how to count people on uncompressed raw images, and over the course of the training, the raw images get more and more compressed. The intuition behind curriculum pre-training is that the more compressed an image is, the harder it is to count the number of people in that image. This is due to the fact that details in the original images (e.g., Fig. 1, left) may not be adequately-well preserved in a compressed image (Fig. 1, right) and, the higher-the compression rate, more details are discarded leading to difficulty to distinguish whether a location in the compressed image corresponds to a human or another shape or object with similar appearance. Utilizing the method introduced by AU researchers, one can improve the accuracy of deep neural networks that perform crowd counting on compressed images by up to 20%.
References
[1]Bakhtiarnia, Arian, et al., “Analysis of the Effect of Low-Overhead Lossy Image Compression on the Performance of Visual Crowd Counting for Smart City Applications,” IEEE International Smart Cities Conference (ISC2) 2022
[2]Bakhtiarnia, Arian, Qi Zhang, and Alexandros Iosifidis, “Crowd Counting on Heavily Compressed Images with Curriculum Pre-Training,” arXiv preprint arXiv:2208.07075, 2022
[3]Zhang, Yingying, et al., “Single-image crowd counting via multi-column convolutional neural network.” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016
Blog signed by: AU team
Menu
- Home
- About
- Experimentation
- Knowledge Hub
- ContactResults
- News & Events
- Contact
Funding
This project has received funding from the European Union’s Horizon 2020 Research and Innovation program under grant agreement No 957337. The website reflects only the view of the author(s) and the Commission is not responsible for any use that may be made of the information it contains.