Abstract
Targeting multimedia systems under high throughput, resource and power constraints, this book discusses efficient software-/application-level techniques and hardware-/architectural-level designs for the multimedia (specifically video) systems. Mainly, the aim of the techniques discussed in this book is to maximize the throughput-per-watt metric of the system while considering some modern design challenges and methodologies. The challenges addressed in this book include parallelization of multimedia applications on possibly heterogeneous systems, load balancing on many-core and customized nodes, resource (number of cores and power) budgeting, and efficient design of the multimedia system’s memory architecture. In a broader perspective, these problems can collectively represent the power wall or dark silicon challenge for the next-generation video processing systems.
The authors would like to point out that this work was carried out when all the authors were in Department of Computer Science, Karlsruhe Institute of Technology, Karlsruhe, Germany.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Khan, M. U. K., Shafique, M., & Henkel, J. (2014). Software architecture of high efficiency video coding for many-core systems with power-efficient workload balancing. In Design, Automation and Test in Europe.
Shafique, M., Khan, M. U. K., & Henkel, J. (2014). Power efficient and workload balanced tiling for parallelized high efficiency video coding. In International Conference on Image Processing.
Jiang, W., Mal, H., & Chen, Y. (2012). Gradient based fast mode decision algorithm for intra prediction in HEVC. In International Conference on Consumer Electronics, Communications and Networks.
Sun, H., Zhou, D., & Goto, S. (2012). A low-complexity HEVC Intra prediction algorithm based on level and mode filtering,. In International Conference on Multimedia and Expo (ICME).
Rosas, C. Morajko, A. Jorba, J., & Cesar, E. (2011). Workload balancing methodology for data-intensive applications with divisible load. In Symposium on Computer Architecture and High Performance Computing.
Colin, A., Kandhalu, A., & Rajkumar, R. (2015). Energy-efficient allocation of real-time applications onto single-ISA heterogeneous multi-core processors. Journal of Signal Processing Systems, pp. 1–20.
Ma, K., Li, X., Chen, M., & Wang, X. (2011). Scalable power control for many-core architectures running multi-threaded applications. In Internation Symposium on Computer Architecture.
Fonseca, T. A., Liu, Y., & Queiroz, R. L. D. (2007). Open-loop prediction in H.264 / AVC for high definition sequences. In SBrT.
Kuo, H. C., Wu, L. C., Huang, H. T., Hsu, S. T., & Lin, Y. L. (2011). A low-power high-performance H.264/AVC intra-frame encoder for 1080pHD video. IEEE Transactions on Very Large Scale Integrated Systems (TVLSI), 19(6), 925–938.
Chen, C., Huang, C., Chen, Y., & Chen, L. (2006). Level C+ data reuse scheme for motion estimation with corresponding coding orders. IEEE Transactions on Circuits and Systems for Video Technology, 16(4), 553–558.
Shin, J., Zyuban, V., Bose, P., & Pinkston, T. (2008). A proactive wearout recovery approach for exploiting microarchi-tectural redundancy to extend cache SRAM lifetime. In International Symposium on Computer Architecture (ISCA).
Siddiqua, T., & Gurumurthi, S. (2010). Recovery boosting: A technique to enhance NBTI recovery in SRAM arrays. In Annual Symposium on VLSI.
Sil, A., Ghosh, S., Gogineni, N., & Bayoumi, M. (2008). A novel high write speed, low power, read-SNM-Free 6T SRAM cell. In Midwest Symposium on Circuits and Systems.
Shin, D., & Gupta, S. (2010). Approximate logic synthesis for error tolerant applications. In Design, Automation & Test in Europe Conference & Exhibition (DATE).
Venkataramani, S., Sabne, A., Kozhikkottu, V., Roy, K., & Raghunathan, A. (2012). Salsa: Systematic logic synthesis of approximate circuits. In Design Automation Conference (DAC).
Venkataramani, S., Roy, K., & Raghunathan, A. (2013). Substitute-and-simplify: A unified design paradigm for approximate and quality configurable circuits. In Design, Automation & Test in Europe Conference & Exhibition (DATE).
Ranjan, A., Raha, A., Venkataramani, S., Roy, K., & Raghunathan, A. (2014). ASLAN: Synthesis of approximate sequential circuits. In Design, Automation and Test in Europe Conference and Exhibition (DATE).
Chakrapani, L. N., Muntimadugu, K. K., Lingamneni, A., George, J., & Palem, K. V. (2008). Highly energy and performance efficient embedded computing through approximately correct arithmetic: A mathematical foundation and preliminary experimental validation. In international conference on Compilers, architectures and synthesis for embedded systems.
Gupta, V., Mohapatra, D., Park, S. P., Raghunathan, A., & Roy, K. (2011). IMPrecise adders for low-power approximate computing. In International Symposium on Low Power Electronics and Design (ISLPED).
Gupta, V., Mohapatra, D., Raghunathan, A., & Roy, K. (2012). Low-power digital signal processing using approximate adders. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 23(1), 124–127.
Khang, A. B., & Kang, S. (2012). Accuracy-configurable adder for approximate arithmetic designs. In Design Automation Conference (DAC).
Kulkarni, P., Gupta, P., & Ercegovac, M. (2011). Trading accuracy for power with an under-designed multiplier architecture. In International Conference on VLSI design.
Momcilovic, S., Ilic, A., Roma, N., & Sousa, L. (2014). Dynamic load balancing for real-time video encoding on heterogeneous CPU+GPU systems. IEEE Transactions on Multimedia, 16(1), 108–121.
Momcilovic, S., Roma, N., & Sousa, L. (2013). Exploiting task and data parallelism for advanced video coding on hybrid CPU + GPU platforms. Journal of Real-Time Image Processing, pp. 1–17.
Cuomo, S., Michele, P. D., & Piccialli, F. (2014). 3D data denoising via nonlocal means filter by using parallel GPU strategies. In Computational and Mathematical Methods in Medicine.
Mittal, S., & Vetter, J. S. (2015). A Survey of CPU-GPU Heterogeneous Computing Techniques. ACM Computing Surveys, 47(4), 1–35.
OpenCL – The open standard for parallel programming of heterogeneous systems. Khronos, [Online]. Available: https://www.khronos.org/opencl/. Accessed 12 Oct 2015.
Salehi, M., Tavana, M. K., Rehman, S., Florian Kriebel, M. S., Ejlali, A., & Henkel, J. (2015). DRVS: Power-efficient reliability management through dynamic redundancy and voltage scaling under variations. In International Symposium on Low Power Electronics and Design (ISLPED).
Muller, K., Schwarz, H., Marpe, D., Bartnik, C., Bosse, S., Brust, H., Hinz, T., Lakshman, H., Merkle, P., Rhee, F., Tech, G., Winken, M., & Wiegand, T. (2013). 3D high-efficiency video coding for multi-view video and depth data. EEE Transactions on Image Processing, 22(9), 3366–3378.
Vetro, A., Wiegand, T., & Sullivan, G. (2011). Overview of the stereo and multiview video coding extensions of the H.264/MPEG-4 AVC standard. Proceedings of the IEEE, 99(4), 626–642.
Schwarz, H., Marpe, D., & Wiegand, T. (2007). Overview of the scalable video coding extension of the H.264/AVC standard. IEEE Transactions on Circuits and Systems for Video Technology, 17(9), 1103–1120.
Goyal, V. (2001). Multiple description coding: compression meets the network. IEEE Signal Processing Magazine, 18(5), 74–93.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Khan, M.U.K., Shafique, M., Henkel, J. (2018). Conclusion and Future Outlook. In: Energy Efficient Embedded Video Processing Systems. Springer, Cham. https://doi.org/10.1007/978-3-319-61455-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-61455-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61454-0
Online ISBN: 978-3-319-61455-7
eBook Packages: EngineeringEngineering (R0)