Neural network-based small cursor detection for embedded assistive technology

Darmawan, Jeremie Theddy; Sigalingging, Xanno Kharis; Faisal, Muhamad; Leu, Jenq-Shiou; Ratnasari, Nanda Rizqia Pradana

doi:10.1007/s00371-023-03246-6

Neural network-based small cursor detection for embedded assistive technology

Original article
Published: 25 January 2024

(2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

135 Accesses
1 Citation
Explore all metrics

Abstract

Assistive technology (AT) is invaluable to people with special educational needs and disabilities, enabling them to interact with computers more efficiently. It is able to improve the interactivity between humans and computers for learning purposes. However, there is a lack of accessibility and interactivity with touchscreen devices and visual content that AT users encounter. These issues are critical in providing much-needed assistance and enhancing the daily lives of disabled individuals. Therefore, we propose an AT framework based on neural networks using embedded systems for improving the accessibility and interactivity of AT users. The proposed framework utilizes the Jetson Nano and is mainly used with a speech-to-intent neural network to process speech and move cursors. When improved with cursor object detection, the framework could obtain the location of the cursors in external displays and move the cursors of other devices. Since cursor datasets are very limited and not many detection models are up for the task, we investigated the use of Slicing Aided Hyper Inference (SAHI) pipeline along with two fine-tuned models, Fully Convoluted One-Stage (FCOS) and Task-aligned One-stage Object Detection (TOOD), to identify the minimum data required for these models to work optimally. With less than 120 annotated images and a data multiplier of 5 and 30, both models were able to achieve ~ 52 and ~ 60mAP, respectively. These results were comparable to performance on other small object detection datasets. In addition, we also present a working proof-of-concept for our proposed embedded assistive technology framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

U-Net: Convolutional Networks for Biomedical Image Segmentation

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

Data availability

The datasets generated and analyzed during the current study are not publicly available due to privacy and confidentiality concerns of the authors as they contain images from personal files.

References

Li, Y., Perkins, A.: The impact of technological developments on the daily life of the elderly. Technol. Soc. 29, 361–368 (2007). https://doi.org/10.1016/J.TECHSOC.2007.04.004
Article Google Scholar
Borg, J., Östergren, P.-O.: Users’ perspectives on the provision of assistive technologies in Bangladesh: awareness, providers, costs and barriers. Disabil. Rehabil. Assist. Technol. 10, 301–308 (2015). https://doi.org/10.3109/17483107.2014.974221
Article Google Scholar
Shi, G., Ke, S., Banozic, A.: The role of assistive technology in advancing sustainable development goals. Front. Polit. Sci. 4, 24 (2022). https://doi.org/10.3389/FPOS.2022.859272/BIBTEX
Article Google Scholar
McNicholl, A., Casey, H., Desmond, D., Gallagher, P.: The impact of assistive technology use for students with disabilities in higher education: a systematic review. Disabil. Rehabil. Assist. Technol. 16, 130–143 (2021). https://doi.org/10.1080/17483107.2019.1642395
Article Google Scholar
Olakanmi, O.A., Akcayir, G., Ishola, O.M., Demmans Epp, C.: Using technology in special education: current practices and trends. Educ. Tech. Res. Dev. 68, 1711–1738 (2020). https://doi.org/10.1007/S11423-020-09795-0/TABLES/11
Article Google Scholar
Shin, D.-J., Kim, J.-J.: A deep learning framework performance evaluation to use YOLO in Nvidia Jetson platform. Appl. Sci. 12, 3734 (2022). https://doi.org/10.3390/app12083734
Article Google Scholar
Das, P., Acharjee, K., Das, P., Prasad, V.: Voice recognition system: speech-to-text. J. Appl. Fundam. Sci. 1, 2395–5562 (2015)
Google Scholar
Individuals with Disabilities Education Act, 20 U.S.C. § 300.5 (2004)
Sze, S., Sze, S., Murphy, J., Smith, M., Yu, S., Murphy, J.: An investigation of various types of assistive technology (AT) for Students... Society for Information Technology & Teacher Education International Conference. pp. 4959–4964 (2004)
Sivin-Kachala, J., Biala, E. R: Report on the Effectiveness of Technology in Schools, 1990–1994 (1994)
Kober, N.: What We Know about Mathematics Teaching and Learning. EDTALK, (1991)
Merbler, J.B., Hadadian, A., Ulman, J.: Using assistive technology in the inclusive classroom. Prev. Sch. Fail. 43, 113–117 (2010). https://doi.org/10.1080/10459889909603311
Article Google Scholar
Fernández-Batanero, J.M., Montenegro-Rueda, M., Fernández-Cerero, J., García-Martínez, I.: Assistive technology for the inclusion of students with disabilities: a systematic review. Educ. Technol. Res. Dev. (2022). https://doi.org/10.1007/S11423-022-10127-7/FIGURES/10
Article Google Scholar
Quek, F., El-glaly, Y., Oliveira, F.: Assistive Technology in Education. Handbook of Science and Technology Convergence, pp. 973–983 (2016). https://doi.org/10.1007/978-3-319-07052-0_25
Peter, P.: Clique. ACM SIGACCESS Access. Comput. (2006). https://doi.org/10.1145/1127564.1127571
Article Google Scholar
Karki, J., Rushton, S., Bhattarai, S., de Witte, L.: Access to assistive technology for persons with disabilities: a critical review from Nepal, India and Bangladesh. Disabil. Rehabil.: Assist. Technol. 18, 1–8 (2021). https://doi.org/10.1080/17483107.2021.1892843
Article Google Scholar
Xie, X., Cao, G., Yang, W., Liao, Q., Shi, G., Wu, J.: Feature-fused SSD: fast detection for small objects. In: Yu, H. and Dong, J. (eds.) Ninth International Conference on Graphic and Image Processing (ICGIP 2017). p. 236. SPIE (2018)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2014). https://doi.org/10.1109/cvpr.2014.81
Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV). IEEE (2015). https://doi.org/10.1109/iccv.2015.169
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (2015). https://doi.org/10.48550/ARXIV.1506.01497
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Computer Vision – ECCV 2014. Springer International Publishing, pp. 346–361 (2014). https://doi.org/10.1007/978-3-319-10578-9_23
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A. C.: SSD: single shot multibox detector. In: Computer Vision – ECCV 2016. Springer International Publishing, pp. 21–37 (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD : Deconvolutional Single Shot Detector (2017). https://doi.org/10.48550/ARXIV.1701.06659
Xin, Z., Lu, T., Li, Y., You, X.: MultiCut-MultiMix: a two-level data augmentation method for detecting small and densely distributed objects in large-size images. Vis. Comput. 2023, 1–15 (2023). https://doi.org/10.1007/S00371-023-02920-Z
Article Google Scholar
Xia, H., Lu, L., Song, S.: Feature fusion of multi-granularity and multi-scale for facial expression recognition. Vis. Comput. (2023). https://doi.org/10.1007/S00371-023-02900-3/METRICS
Article Google Scholar
Zeng, S., Yang, W., Jiao, Y., Geng, L., Chen, X.: SCA-YOLO: a new small object detection model for UAV images. Vis. Comput. (2023). https://doi.org/10.1007/S00371-023-02886-Y/METRICS
Article Google Scholar
Xin, F., Zhang, H., Pan, H.: Hybrid dilated multilayer faster RCNN for object detection. Vis. Comput. (2023). https://doi.org/10.1007/S00371-023-02789-Y/METRICS
Article Google Scholar
Faisal, M., Leu, J.-S., Darmawan, J.T.: Model selection of hybrid feature fusion for coffee leaf disease classification. IEEE Access. (2023). https://doi.org/10.1109/ACCESS.2023.3286935
Article Google Scholar
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. Proceedings of the IEEE International Conference on Computer Vision, pp. 9992–10002 (2021). https://doi.org/10.48550/arxiv.2103.14030
Akyon, F.C., Onur Altinuc, S., Temizel, A.: Slicing aided hyper inference and fine-tuning for small object detection. In: 2022 IEEE International Conference on Image Processing (ICIP), pp. 966–970. IEEE (2022)
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE International Conference on Computer Vision. 2019-October, pp. 9626–9635 (2019). https://doi.org/10.48550/arxiv.1904.01355
Qin, Y., Wen, J., Zheng, H., Huang, X., Yang, J., Song, N., Zhu, Y.-M., Wu, L., Yang, G.-Z.: Varifocal-Net: a chromosome classification approach using deep convolutional networks. IEEE Trans. Med. Imaging 38, 2569–2581 (2018). https://doi.org/10.1109/TMI.2019.2905841
Article Google Scholar
Feng, C., Zhong, Y., Gao, Y., Scott, M.R., Huang, W.: TOOD: task-aligned one-stage object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3490–3499 (2021). https://doi.org/10.48550/arxiv.2108.07755
Zhu, P., Wen, L., Bian, X., Ling, H., Hu, Q.: Vision meets drones: a challenge (2018). https://doi.org/10.48550/arxiv.1804.07437
Lam, D., Kuzma, R., McGee, K., Dooley, S., Laielli, M., Klaric, M., Bulatov, Y., McCord, B.: xView: objects in context in overhead imagery. (2018). https://doi.org/10.48550/arxiv.1802.07856
Cai, C., Wang, S., Xu, Y., Zhang, W., Tang, K., Ouyang, Q., Lai, L., Pei, J.: Transfer learning for drug discovery. J. Med. Chem. 63, 8683–8694 (2020). https://doi.org/10.1021/ACS.JMEDCHEM.9B02147/SUPPL_FILE/JM9B02147_SI_002.PDF
Article Google Scholar
Weiss, K., Khoshgoftaar, T.M., Wang, D.D.: A survey of transfer learning. J. Big Data. 3, 1–40 (2016). https://doi.org/10.1186/S40537-016-0043-6/TABLES/6
Article Google Scholar
Padilla, R., Netto, S.L., da Silva, E.A.B.: A survey on performance metrics for object-detection algorithms. In: International Conference on Systems, Signals, and Image Processing. 2020-July, pp. 237–242 (2020). https://doi.org/10.1109/IWSSIP48289.2020.9145130
Schroeder, M.R.: The Speech Signal. Pp. 105–108 (1999). https://doi.org/10.1007/978-3-662-03861-1_7
Anusuya, M.A., Katti, S.K.: Speech recognition by machine, a review. Proc. IEEE 64, 501–531 (2010). https://doi.org/10.48550/arxiv.1001.2267
Article Google Scholar
Ravuri, S., Stoicke, A.: A comparative study of neural network models for lexical intent classification. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 – Proceedings, pp. 368–374 (2016). https://doi.org/10.1109/ASRU.2015.7404818
Desai, V., Wadhwa, S., Anurag, A., Bajaj, B.: Text-based intent analysis using deep learning. Int. J. Innov. Sci. Res. Technol. 5, 267–274 (2020). https://doi.org/10.38124/IJISRT20JUL342
Article Google Scholar
Luo, B., Feng, Y., Wang, Z., Huang, S., Yan, R., Zhao, D.: Marrying up regular expressions with neural networks: a case study for spoken language understanding. In: ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). 1, 2083–2093 (2018). https://doi.org/10.48550/arxiv.1805.05588
Larson, E.: Automatic checking of regular expressions. In: Proceedings - 18th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2018, pp. 225–234 (2018). https://doi.org/10.1109/SCAM.2018.00034
Picovoice: Picovoice/rhino: On-device speech-to-intent engine powered by deep learning, https://github.com/Picovoice/rhino#faq
Al Sweigart: Welcome to PyAutoGUI’s documentation! — PyAutoGUI documentation, https://pyautogui.readthedocs.io/en/latest/
The kernel development community: Linux USB HID gadget driver — The Linux Kernel documentation, https://docs.kernel.org/usb/gadget_hid.html
Rajmond, J., Chindris, G.: Intelligent human interface device. In: 2009 15th International Symposium for Design and Technology of Electronics Packages (SIITME). pp. 289–294. IEEE (2009)
Zhu, P., Wen, L., Du, D., Bian, X., Fan, H., Hu, Q., Ling, H.: Detection and tracking meet drones challenge. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://doi.org/10.1109/TPAMI.2021.3119563
Article Google Scholar
Lalor, J.P., Wu, H., Yu, H.: Soft Label Memorization-Generalization for Natural Language Inference (2017). https://doi.org/10.48550/ARXIV.1702.08563
Kim, M.G., Kim, M., Kim, J.H., Kim, K.: Fine-tuning BERT models to classify misinformation on garlic and COVID-19 on twitter. Int. J. Environ. Res. Public Health (2022). https://doi.org/10.3390/IJERPH19095126
Article Google Scholar
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data. 6, 1–48 (2019). https://doi.org/10.1186/S40537-019-0197-0/FIGURES/33
Article Google Scholar
Hwang, B., Lee, S., Han, H.: LNFCOS: efficient object detection through deep learning based on LNblock. Electronics 11, 2783 (2022). https://doi.org/10.3390/ELECTRONICS11172783
Article Google Scholar
Han, Y., Ding, T., Li, T., Li, M.: An improved anchor-free object detection method. In: Proceedings - 2022 International Conference on Machine Learning and Intelligent Systems Engineering, MLISE 2022, pp. 6–9 (2022). https://doi.org/10.1109/MLISE57402.2022.00009
Wen, C., Chen, H., Ma, Z., Zhang, T., Yang, C., Su, H., Chen, H.: Pest-YOLO: a model for large-scale multi-class dense and tiny pest detection and counting. Front. Plant Sci. 13, 4072 (2022). https://doi.org/10.3389/FPLS.2022.973985/BIBTEX
Article Google Scholar
Adadi, A.: A survey on data-efficient algorithms in big data era. J. Big Data 8, 1–54 (2021). https://doi.org/10.1186/S40537-021-00419-9
Article Google Scholar
Manjusha, L., Suryanarayana, V.: Detect/remove duplicate images from a dataset for deep learning. J. Positive School Psychol. 6, 606–609 (2022)
Google Scholar
Hsieh, H.L., Shanechi, M.M.: Optimizing the learning rate for adaptive estimation of neural encoding models. PLoS Comput. Biol. 14, e1006168 (2018). https://doi.org/10.1371/JOURNAL.PCBI.1006168
Article Google Scholar
Caputo, A.C.: Digital Video Overview. Digital Video Surveillance and Security, pp. 25–47 (2014). https://doi.org/10.1016/B978-0-12-420042-5.00002-2
Liscio, E., Guryn, H., Le, Q., Olver, A.: A comparison of reverse projection and PhotoModeler for suspect height analysis. Foren. Sci. Int. 320, 110690 (2021). https://doi.org/10.1016/J.FORSCIINT.2021.110690
Article Google Scholar

Download references

Author information

Jeremie Theddy Darmawan
Present address: Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, C/O MD7, Level 2, 8 Medical Drive, Singapore, 117597, Singapore

Authors and Affiliations

Department of Electronic and Computer Engineering, National Taiwan University of Science and Technology, No. 43, Section 4, Keelung Rd, Da’an District, Taipei City, 106, Taiwan
Jeremie Theddy Darmawan, Xanno Kharis Sigalingging, Muhamad Faisal & Jenq-Shiou Leu
Department of Bioinformatics, School of Life Sciences, Indonesia International Institute for Life Sciences, Jl. Pulomas Barat No.Kav. 88, RT.4/RW.9, Kayu Putih, Kec. Pulo Gadung, East Jakarta, Special Capital Region of Jakarta, 13210, Indonesia
Jeremie Theddy Darmawan & Nanda Rizqia Pradana Ratnasari

Authors

Jeremie Theddy Darmawan
View author publications
You can also search for this author in PubMed Google Scholar
Xanno Kharis Sigalingging
View author publications
You can also search for this author in PubMed Google Scholar
Muhamad Faisal
View author publications
You can also search for this author in PubMed Google Scholar
Jenq-Shiou Leu
View author publications
You can also search for this author in PubMed Google Scholar
Nanda Rizqia Pradana Ratnasari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jeremie Theddy Darmawan, Jenq-Shiou Leu or Nanda Rizqia Pradana Ratnasari.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work was done at the National Taiwan University of Science and Technology and as a student of Indonesia International Institute for Life Sciences.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Darmawan, J.T., Sigalingging, X.K., Faisal, M. et al. Neural network-based small cursor detection for embedded assistive technology. Vis Comput (2024). https://doi.org/10.1007/s00371-023-03246-6

Download citation

Accepted: 18 December 2023
Published: 25 January 2024
DOI: https://doi.org/10.1007/s00371-023-03246-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Neural network-based small cursor detection for embedded assistive technology

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

Data availability

References

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Neural network-based small cursor detection for embedded assistive technology

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

Data availability

References

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation