Skip to main content
Log in

Neural network-based small cursor detection for embedded assistive technology

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Assistive technology (AT) is invaluable to people with special educational needs and disabilities, enabling them to interact with computers more efficiently. It is able to improve the interactivity between humans and computers for learning purposes. However, there is a lack of accessibility and interactivity with touchscreen devices and visual content that AT users encounter. These issues are critical in providing much-needed assistance and enhancing the daily lives of disabled individuals. Therefore, we propose an AT framework based on neural networks using embedded systems for improving the accessibility and interactivity of AT users. The proposed framework utilizes the Jetson Nano and is mainly used with a speech-to-intent neural network to process speech and move cursors. When improved with cursor object detection, the framework could obtain the location of the cursors in external displays and move the cursors of other devices. Since cursor datasets are very limited and not many detection models are up for the task, we investigated the use of Slicing Aided Hyper Inference (SAHI) pipeline along with two fine-tuned models, Fully Convoluted One-Stage (FCOS) and Task-aligned One-stage Object Detection (TOOD), to identify the minimum data required for these models to work optimally. With less than 120 annotated images and a data multiplier of 5 and 30, both models were able to achieve ~ 52 and ~ 60mAP, respectively. These results were comparable to performance on other small object detection datasets. In addition, we also present a working proof-of-concept for our proposed embedded assistive technology framework.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The datasets generated and analyzed during the current study are not publicly available due to privacy and confidentiality concerns of the authors as they contain images from personal files.

References

  1. Li, Y., Perkins, A.: The impact of technological developments on the daily life of the elderly. Technol. Soc. 29, 361–368 (2007). https://doi.org/10.1016/J.TECHSOC.2007.04.004

    Article  Google Scholar 

  2. Borg, J., Östergren, P.-O.: Users’ perspectives on the provision of assistive technologies in Bangladesh: awareness, providers, costs and barriers. Disabil. Rehabil. Assist. Technol. 10, 301–308 (2015). https://doi.org/10.3109/17483107.2014.974221

    Article  Google Scholar 

  3. Shi, G., Ke, S., Banozic, A.: The role of assistive technology in advancing sustainable development goals. Front. Polit. Sci. 4, 24 (2022). https://doi.org/10.3389/FPOS.2022.859272/BIBTEX

    Article  Google Scholar 

  4. McNicholl, A., Casey, H., Desmond, D., Gallagher, P.: The impact of assistive technology use for students with disabilities in higher education: a systematic review. Disabil. Rehabil. Assist. Technol. 16, 130–143 (2021). https://doi.org/10.1080/17483107.2019.1642395

    Article  Google Scholar 

  5. Olakanmi, O.A., Akcayir, G., Ishola, O.M., Demmans Epp, C.: Using technology in special education: current practices and trends. Educ. Tech. Res. Dev. 68, 1711–1738 (2020). https://doi.org/10.1007/S11423-020-09795-0/TABLES/11

    Article  Google Scholar 

  6. Shin, D.-J., Kim, J.-J.: A deep learning framework performance evaluation to use YOLO in Nvidia Jetson platform. Appl. Sci. 12, 3734 (2022). https://doi.org/10.3390/app12083734

    Article  Google Scholar 

  7. Das, P., Acharjee, K., Das, P., Prasad, V.: Voice recognition system: speech-to-text. J. Appl. Fundam. Sci. 1, 2395–5562 (2015)

    Google Scholar 

  8. Individuals with Disabilities Education Act, 20 U.S.C. § 300.5 (2004)

  9. Sze, S., Sze, S., Murphy, J., Smith, M., Yu, S., Murphy, J.: An investigation of various types of assistive technology (AT) for Students... Society for Information Technology & Teacher Education International Conference. pp. 4959–4964 (2004)

  10. Sivin-Kachala, J., Biala, E. R: Report on the Effectiveness of Technology in Schools, 1990–1994 (1994)

  11. Kober, N.: What We Know about Mathematics Teaching and Learning. EDTALK, (1991)

  12. Merbler, J.B., Hadadian, A., Ulman, J.: Using assistive technology in the inclusive classroom. Prev. Sch. Fail. 43, 113–117 (2010). https://doi.org/10.1080/10459889909603311

    Article  Google Scholar 

  13. Fernández-Batanero, J.M., Montenegro-Rueda, M., Fernández-Cerero, J., García-Martínez, I.: Assistive technology for the inclusion of students with disabilities: a systematic review. Educ. Technol. Res. Dev. (2022). https://doi.org/10.1007/S11423-022-10127-7/FIGURES/10

    Article  Google Scholar 

  14. Quek, F., El-glaly, Y., Oliveira, F.: Assistive Technology in Education. Handbook of Science and Technology Convergence, pp. 973–983 (2016). https://doi.org/10.1007/978-3-319-07052-0_25

  15. Peter, P.: Clique. ACM SIGACCESS Access. Comput. (2006). https://doi.org/10.1145/1127564.1127571

    Article  Google Scholar 

  16. Karki, J., Rushton, S., Bhattarai, S., de Witte, L.: Access to assistive technology for persons with disabilities: a critical review from Nepal, India and Bangladesh. Disabil. Rehabil.: Assist. Technol. 18, 1–8 (2021). https://doi.org/10.1080/17483107.2021.1892843

    Article  Google Scholar 

  17. Xie, X., Cao, G., Yang, W., Liao, Q., Shi, G., Wu, J.: Feature-fused SSD: fast detection for small objects. In: Yu, H. and Dong, J. (eds.) Ninth International Conference on Graphic and Image Processing (ICGIP 2017). p. 236. SPIE (2018)

  18. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2014). https://doi.org/10.1109/cvpr.2014.81

  19. Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV). IEEE (2015). https://doi.org/10.1109/iccv.2015.169

  20. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (2015). https://doi.org/10.48550/ARXIV.1506.01497

  21. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Computer Vision – ECCV 2014. Springer International Publishing, pp. 346–361 (2014). https://doi.org/10.1007/978-3-319-10578-9_23

  22. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A. C.: SSD: single shot multibox detector. In: Computer Vision – ECCV 2016. Springer International Publishing, pp. 21–37 (2016). https://doi.org/10.1007/978-3-319-46448-0_2

  23. Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD : Deconvolutional Single Shot Detector (2017). https://doi.org/10.48550/ARXIV.1701.06659

  24. Xin, Z., Lu, T., Li, Y., You, X.: MultiCut-MultiMix: a two-level data augmentation method for detecting small and densely distributed objects in large-size images. Vis. Comput. 2023, 1–15 (2023). https://doi.org/10.1007/S00371-023-02920-Z

    Article  Google Scholar 

  25. Xia, H., Lu, L., Song, S.: Feature fusion of multi-granularity and multi-scale for facial expression recognition. Vis. Comput. (2023). https://doi.org/10.1007/S00371-023-02900-3/METRICS

    Article  Google Scholar 

  26. Zeng, S., Yang, W., Jiao, Y., Geng, L., Chen, X.: SCA-YOLO: a new small object detection model for UAV images. Vis. Comput. (2023). https://doi.org/10.1007/S00371-023-02886-Y/METRICS

    Article  Google Scholar 

  27. Xin, F., Zhang, H., Pan, H.: Hybrid dilated multilayer faster RCNN for object detection. Vis. Comput. (2023). https://doi.org/10.1007/S00371-023-02789-Y/METRICS

    Article  Google Scholar 

  28. Faisal, M., Leu, J.-S., Darmawan, J.T.: Model selection of hybrid feature fusion for coffee leaf disease classification. IEEE Access. (2023). https://doi.org/10.1109/ACCESS.2023.3286935

    Article  Google Scholar 

  29. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. Proceedings of the IEEE International Conference on Computer Vision, pp. 9992–10002 (2021). https://doi.org/10.48550/arxiv.2103.14030

  30. Akyon, F.C., Onur Altinuc, S., Temizel, A.: Slicing aided hyper inference and fine-tuning for small object detection. In: 2022 IEEE International Conference on Image Processing (ICIP), pp. 966–970. IEEE (2022)

  31. Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE International Conference on Computer Vision. 2019-October, pp. 9626–9635 (2019). https://doi.org/10.48550/arxiv.1904.01355

  32. Qin, Y., Wen, J., Zheng, H., Huang, X., Yang, J., Song, N., Zhu, Y.-M., Wu, L., Yang, G.-Z.: Varifocal-Net: a chromosome classification approach using deep convolutional networks. IEEE Trans. Med. Imaging 38, 2569–2581 (2018). https://doi.org/10.1109/TMI.2019.2905841

    Article  Google Scholar 

  33. Feng, C., Zhong, Y., Gao, Y., Scott, M.R., Huang, W.: TOOD: task-aligned one-stage object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3490–3499 (2021). https://doi.org/10.48550/arxiv.2108.07755

  34. Zhu, P., Wen, L., Bian, X., Ling, H., Hu, Q.: Vision meets drones: a challenge (2018). https://doi.org/10.48550/arxiv.1804.07437

  35. Lam, D., Kuzma, R., McGee, K., Dooley, S., Laielli, M., Klaric, M., Bulatov, Y., McCord, B.: xView: objects in context in overhead imagery. (2018). https://doi.org/10.48550/arxiv.1802.07856

  36. Cai, C., Wang, S., Xu, Y., Zhang, W., Tang, K., Ouyang, Q., Lai, L., Pei, J.: Transfer learning for drug discovery. J. Med. Chem. 63, 8683–8694 (2020). https://doi.org/10.1021/ACS.JMEDCHEM.9B02147/SUPPL_FILE/JM9B02147_SI_002.PDF

    Article  Google Scholar 

  37. Weiss, K., Khoshgoftaar, T.M., Wang, D.D.: A survey of transfer learning. J. Big Data. 3, 1–40 (2016). https://doi.org/10.1186/S40537-016-0043-6/TABLES/6

    Article  Google Scholar 

  38. Padilla, R., Netto, S.L., da Silva, E.A.B.: A survey on performance metrics for object-detection algorithms. In: International Conference on Systems, Signals, and Image Processing. 2020-July, pp. 237–242 (2020). https://doi.org/10.1109/IWSSIP48289.2020.9145130

  39. Schroeder, M.R.: The Speech Signal. Pp. 105–108 (1999). https://doi.org/10.1007/978-3-662-03861-1_7

  40. Anusuya, M.A., Katti, S.K.: Speech recognition by machine, a review. Proc. IEEE 64, 501–531 (2010). https://doi.org/10.48550/arxiv.1001.2267

    Article  Google Scholar 

  41. Ravuri, S., Stoicke, A.: A comparative study of neural network models for lexical intent classification. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 – Proceedings, pp. 368–374 (2016). https://doi.org/10.1109/ASRU.2015.7404818

  42. Desai, V., Wadhwa, S., Anurag, A., Bajaj, B.: Text-based intent analysis using deep learning. Int. J. Innov. Sci. Res. Technol. 5, 267–274 (2020). https://doi.org/10.38124/IJISRT20JUL342

    Article  Google Scholar 

  43. Luo, B., Feng, Y., Wang, Z., Huang, S., Yan, R., Zhao, D.: Marrying up regular expressions with neural networks: a case study for spoken language understanding. In: ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). 1, 2083–2093 (2018). https://doi.org/10.48550/arxiv.1805.05588

  44. Larson, E.: Automatic checking of regular expressions. In: Proceedings - 18th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2018, pp. 225–234 (2018). https://doi.org/10.1109/SCAM.2018.00034

  45. Picovoice: Picovoice/rhino: On-device speech-to-intent engine powered by deep learning, https://github.com/Picovoice/rhino#faq

  46. Al Sweigart: Welcome to PyAutoGUI’s documentation! — PyAutoGUI documentation, https://pyautogui.readthedocs.io/en/latest/

  47. The kernel development community: Linux USB HID gadget driver — The Linux Kernel documentation, https://docs.kernel.org/usb/gadget_hid.html

  48. Rajmond, J., Chindris, G.: Intelligent human interface device. In: 2009 15th International Symposium for Design and Technology of Electronics Packages (SIITME). pp. 289–294. IEEE (2009)

  49. Zhu, P., Wen, L., Du, D., Bian, X., Fan, H., Hu, Q., Ling, H.: Detection and tracking meet drones challenge. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://doi.org/10.1109/TPAMI.2021.3119563

    Article  Google Scholar 

  50. Lalor, J.P., Wu, H., Yu, H.: Soft Label Memorization-Generalization for Natural Language Inference (2017). https://doi.org/10.48550/ARXIV.1702.08563

  51. Kim, M.G., Kim, M., Kim, J.H., Kim, K.: Fine-tuning BERT models to classify misinformation on garlic and COVID-19 on twitter. Int. J. Environ. Res. Public Health (2022). https://doi.org/10.3390/IJERPH19095126

    Article  Google Scholar 

  52. Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data. 6, 1–48 (2019). https://doi.org/10.1186/S40537-019-0197-0/FIGURES/33

    Article  Google Scholar 

  53. Hwang, B., Lee, S., Han, H.: LNFCOS: efficient object detection through deep learning based on LNblock. Electronics 11, 2783 (2022). https://doi.org/10.3390/ELECTRONICS11172783

    Article  Google Scholar 

  54. Han, Y., Ding, T., Li, T., Li, M.: An improved anchor-free object detection method. In: Proceedings - 2022 International Conference on Machine Learning and Intelligent Systems Engineering, MLISE 2022, pp. 6–9 (2022). https://doi.org/10.1109/MLISE57402.2022.00009

  55. Wen, C., Chen, H., Ma, Z., Zhang, T., Yang, C., Su, H., Chen, H.: Pest-YOLO: a model for large-scale multi-class dense and tiny pest detection and counting. Front. Plant Sci. 13, 4072 (2022). https://doi.org/10.3389/FPLS.2022.973985/BIBTEX

    Article  Google Scholar 

  56. Adadi, A.: A survey on data-efficient algorithms in big data era. J. Big Data 8, 1–54 (2021). https://doi.org/10.1186/S40537-021-00419-9

    Article  Google Scholar 

  57. Manjusha, L., Suryanarayana, V.: Detect/remove duplicate images from a dataset for deep learning. J. Positive School Psychol. 6, 606–609 (2022)

    Google Scholar 

  58. Hsieh, H.L., Shanechi, M.M.: Optimizing the learning rate for adaptive estimation of neural encoding models. PLoS Comput. Biol. 14, e1006168 (2018). https://doi.org/10.1371/JOURNAL.PCBI.1006168

    Article  Google Scholar 

  59. Caputo, A.C.: Digital Video Overview. Digital Video Surveillance and Security, pp. 25–47 (2014). https://doi.org/10.1016/B978-0-12-420042-5.00002-2

  60. Liscio, E., Guryn, H., Le, Q., Olver, A.: A comparison of reverse projection and PhotoModeler for suspect height analysis. Foren. Sci. Int. 320, 110690 (2021). https://doi.org/10.1016/J.FORSCIINT.2021.110690

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jeremie Theddy Darmawan, Jenq-Shiou Leu or Nanda Rizqia Pradana Ratnasari.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work was done at the National Taiwan University of Science and Technology and as a student of Indonesia International Institute for Life Sciences.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Darmawan, J.T., Sigalingging, X.K., Faisal, M. et al. Neural network-based small cursor detection for embedded assistive technology. Vis Comput (2024). https://doi.org/10.1007/s00371-023-03246-6

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00371-023-03246-6

Keywords

Navigation