Abstract
Assistive technology (AT) is invaluable to people with special educational needs and disabilities, enabling them to interact with computers more efficiently. It is able to improve the interactivity between humans and computers for learning purposes. However, there is a lack of accessibility and interactivity with touchscreen devices and visual content that AT users encounter. These issues are critical in providing much-needed assistance and enhancing the daily lives of disabled individuals. Therefore, we propose an AT framework based on neural networks using embedded systems for improving the accessibility and interactivity of AT users. The proposed framework utilizes the Jetson Nano and is mainly used with a speech-to-intent neural network to process speech and move cursors. When improved with cursor object detection, the framework could obtain the location of the cursors in external displays and move the cursors of other devices. Since cursor datasets are very limited and not many detection models are up for the task, we investigated the use of Slicing Aided Hyper Inference (SAHI) pipeline along with two fine-tuned models, Fully Convoluted One-Stage (FCOS) and Task-aligned One-stage Object Detection (TOOD), to identify the minimum data required for these models to work optimally. With less than 120 annotated images and a data multiplier of 5 and 30, both models were able to achieve ~ 52 and ~ 60mAP, respectively. These results were comparable to performance on other small object detection datasets. In addition, we also present a working proof-of-concept for our proposed embedded assistive technology framework.
Similar content being viewed by others
Data availability
The datasets generated and analyzed during the current study are not publicly available due to privacy and confidentiality concerns of the authors as they contain images from personal files.
References
Li, Y., Perkins, A.: The impact of technological developments on the daily life of the elderly. Technol. Soc. 29, 361–368 (2007). https://doi.org/10.1016/J.TECHSOC.2007.04.004
Borg, J., Östergren, P.-O.: Users’ perspectives on the provision of assistive technologies in Bangladesh: awareness, providers, costs and barriers. Disabil. Rehabil. Assist. Technol. 10, 301–308 (2015). https://doi.org/10.3109/17483107.2014.974221
Shi, G., Ke, S., Banozic, A.: The role of assistive technology in advancing sustainable development goals. Front. Polit. Sci. 4, 24 (2022). https://doi.org/10.3389/FPOS.2022.859272/BIBTEX
McNicholl, A., Casey, H., Desmond, D., Gallagher, P.: The impact of assistive technology use for students with disabilities in higher education: a systematic review. Disabil. Rehabil. Assist. Technol. 16, 130–143 (2021). https://doi.org/10.1080/17483107.2019.1642395
Olakanmi, O.A., Akcayir, G., Ishola, O.M., Demmans Epp, C.: Using technology in special education: current practices and trends. Educ. Tech. Res. Dev. 68, 1711–1738 (2020). https://doi.org/10.1007/S11423-020-09795-0/TABLES/11
Shin, D.-J., Kim, J.-J.: A deep learning framework performance evaluation to use YOLO in Nvidia Jetson platform. Appl. Sci. 12, 3734 (2022). https://doi.org/10.3390/app12083734
Das, P., Acharjee, K., Das, P., Prasad, V.: Voice recognition system: speech-to-text. J. Appl. Fundam. Sci. 1, 2395–5562 (2015)
Individuals with Disabilities Education Act, 20 U.S.C. § 300.5 (2004)
Sze, S., Sze, S., Murphy, J., Smith, M., Yu, S., Murphy, J.: An investigation of various types of assistive technology (AT) for Students... Society for Information Technology & Teacher Education International Conference. pp. 4959–4964 (2004)
Sivin-Kachala, J., Biala, E. R: Report on the Effectiveness of Technology in Schools, 1990–1994 (1994)
Kober, N.: What We Know about Mathematics Teaching and Learning. EDTALK, (1991)
Merbler, J.B., Hadadian, A., Ulman, J.: Using assistive technology in the inclusive classroom. Prev. Sch. Fail. 43, 113–117 (2010). https://doi.org/10.1080/10459889909603311
Fernández-Batanero, J.M., Montenegro-Rueda, M., Fernández-Cerero, J., García-Martínez, I.: Assistive technology for the inclusion of students with disabilities: a systematic review. Educ. Technol. Res. Dev. (2022). https://doi.org/10.1007/S11423-022-10127-7/FIGURES/10
Quek, F., El-glaly, Y., Oliveira, F.: Assistive Technology in Education. Handbook of Science and Technology Convergence, pp. 973–983 (2016). https://doi.org/10.1007/978-3-319-07052-0_25
Peter, P.: Clique. ACM SIGACCESS Access. Comput. (2006). https://doi.org/10.1145/1127564.1127571
Karki, J., Rushton, S., Bhattarai, S., de Witte, L.: Access to assistive technology for persons with disabilities: a critical review from Nepal, India and Bangladesh. Disabil. Rehabil.: Assist. Technol. 18, 1–8 (2021). https://doi.org/10.1080/17483107.2021.1892843
Xie, X., Cao, G., Yang, W., Liao, Q., Shi, G., Wu, J.: Feature-fused SSD: fast detection for small objects. In: Yu, H. and Dong, J. (eds.) Ninth International Conference on Graphic and Image Processing (ICGIP 2017). p. 236. SPIE (2018)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE (2014). https://doi.org/10.1109/cvpr.2014.81
Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV). IEEE (2015). https://doi.org/10.1109/iccv.2015.169
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (2015). https://doi.org/10.48550/ARXIV.1506.01497
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Computer Vision – ECCV 2014. Springer International Publishing, pp. 346–361 (2014). https://doi.org/10.1007/978-3-319-10578-9_23
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A. C.: SSD: single shot multibox detector. In: Computer Vision – ECCV 2016. Springer International Publishing, pp. 21–37 (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Fu, C.-Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD : Deconvolutional Single Shot Detector (2017). https://doi.org/10.48550/ARXIV.1701.06659
Xin, Z., Lu, T., Li, Y., You, X.: MultiCut-MultiMix: a two-level data augmentation method for detecting small and densely distributed objects in large-size images. Vis. Comput. 2023, 1–15 (2023). https://doi.org/10.1007/S00371-023-02920-Z
Xia, H., Lu, L., Song, S.: Feature fusion of multi-granularity and multi-scale for facial expression recognition. Vis. Comput. (2023). https://doi.org/10.1007/S00371-023-02900-3/METRICS
Zeng, S., Yang, W., Jiao, Y., Geng, L., Chen, X.: SCA-YOLO: a new small object detection model for UAV images. Vis. Comput. (2023). https://doi.org/10.1007/S00371-023-02886-Y/METRICS
Xin, F., Zhang, H., Pan, H.: Hybrid dilated multilayer faster RCNN for object detection. Vis. Comput. (2023). https://doi.org/10.1007/S00371-023-02789-Y/METRICS
Faisal, M., Leu, J.-S., Darmawan, J.T.: Model selection of hybrid feature fusion for coffee leaf disease classification. IEEE Access. (2023). https://doi.org/10.1109/ACCESS.2023.3286935
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. Proceedings of the IEEE International Conference on Computer Vision, pp. 9992–10002 (2021). https://doi.org/10.48550/arxiv.2103.14030
Akyon, F.C., Onur Altinuc, S., Temizel, A.: Slicing aided hyper inference and fine-tuning for small object detection. In: 2022 IEEE International Conference on Image Processing (ICIP), pp. 966–970. IEEE (2022)
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE International Conference on Computer Vision. 2019-October, pp. 9626–9635 (2019). https://doi.org/10.48550/arxiv.1904.01355
Qin, Y., Wen, J., Zheng, H., Huang, X., Yang, J., Song, N., Zhu, Y.-M., Wu, L., Yang, G.-Z.: Varifocal-Net: a chromosome classification approach using deep convolutional networks. IEEE Trans. Med. Imaging 38, 2569–2581 (2018). https://doi.org/10.1109/TMI.2019.2905841
Feng, C., Zhong, Y., Gao, Y., Scott, M.R., Huang, W.: TOOD: task-aligned one-stage object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3490–3499 (2021). https://doi.org/10.48550/arxiv.2108.07755
Zhu, P., Wen, L., Bian, X., Ling, H., Hu, Q.: Vision meets drones: a challenge (2018). https://doi.org/10.48550/arxiv.1804.07437
Lam, D., Kuzma, R., McGee, K., Dooley, S., Laielli, M., Klaric, M., Bulatov, Y., McCord, B.: xView: objects in context in overhead imagery. (2018). https://doi.org/10.48550/arxiv.1802.07856
Cai, C., Wang, S., Xu, Y., Zhang, W., Tang, K., Ouyang, Q., Lai, L., Pei, J.: Transfer learning for drug discovery. J. Med. Chem. 63, 8683–8694 (2020). https://doi.org/10.1021/ACS.JMEDCHEM.9B02147/SUPPL_FILE/JM9B02147_SI_002.PDF
Weiss, K., Khoshgoftaar, T.M., Wang, D.D.: A survey of transfer learning. J. Big Data. 3, 1–40 (2016). https://doi.org/10.1186/S40537-016-0043-6/TABLES/6
Padilla, R., Netto, S.L., da Silva, E.A.B.: A survey on performance metrics for object-detection algorithms. In: International Conference on Systems, Signals, and Image Processing. 2020-July, pp. 237–242 (2020). https://doi.org/10.1109/IWSSIP48289.2020.9145130
Schroeder, M.R.: The Speech Signal. Pp. 105–108 (1999). https://doi.org/10.1007/978-3-662-03861-1_7
Anusuya, M.A., Katti, S.K.: Speech recognition by machine, a review. Proc. IEEE 64, 501–531 (2010). https://doi.org/10.48550/arxiv.1001.2267
Ravuri, S., Stoicke, A.: A comparative study of neural network models for lexical intent classification. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 – Proceedings, pp. 368–374 (2016). https://doi.org/10.1109/ASRU.2015.7404818
Desai, V., Wadhwa, S., Anurag, A., Bajaj, B.: Text-based intent analysis using deep learning. Int. J. Innov. Sci. Res. Technol. 5, 267–274 (2020). https://doi.org/10.38124/IJISRT20JUL342
Luo, B., Feng, Y., Wang, Z., Huang, S., Yan, R., Zhao, D.: Marrying up regular expressions with neural networks: a case study for spoken language understanding. In: ACL 2018 - 56th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers). 1, 2083–2093 (2018). https://doi.org/10.48550/arxiv.1805.05588
Larson, E.: Automatic checking of regular expressions. In: Proceedings - 18th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2018, pp. 225–234 (2018). https://doi.org/10.1109/SCAM.2018.00034
Picovoice: Picovoice/rhino: On-device speech-to-intent engine powered by deep learning, https://github.com/Picovoice/rhino#faq
Al Sweigart: Welcome to PyAutoGUI’s documentation! — PyAutoGUI documentation, https://pyautogui.readthedocs.io/en/latest/
The kernel development community: Linux USB HID gadget driver — The Linux Kernel documentation, https://docs.kernel.org/usb/gadget_hid.html
Rajmond, J., Chindris, G.: Intelligent human interface device. In: 2009 15th International Symposium for Design and Technology of Electronics Packages (SIITME). pp. 289–294. IEEE (2009)
Zhu, P., Wen, L., Du, D., Bian, X., Fan, H., Hu, Q., Ling, H.: Detection and tracking meet drones challenge. IEEE Trans. Pattern Anal. Mach. Intell. (2021). https://doi.org/10.1109/TPAMI.2021.3119563
Lalor, J.P., Wu, H., Yu, H.: Soft Label Memorization-Generalization for Natural Language Inference (2017). https://doi.org/10.48550/ARXIV.1702.08563
Kim, M.G., Kim, M., Kim, J.H., Kim, K.: Fine-tuning BERT models to classify misinformation on garlic and COVID-19 on twitter. Int. J. Environ. Res. Public Health (2022). https://doi.org/10.3390/IJERPH19095126
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data. 6, 1–48 (2019). https://doi.org/10.1186/S40537-019-0197-0/FIGURES/33
Hwang, B., Lee, S., Han, H.: LNFCOS: efficient object detection through deep learning based on LNblock. Electronics 11, 2783 (2022). https://doi.org/10.3390/ELECTRONICS11172783
Han, Y., Ding, T., Li, T., Li, M.: An improved anchor-free object detection method. In: Proceedings - 2022 International Conference on Machine Learning and Intelligent Systems Engineering, MLISE 2022, pp. 6–9 (2022). https://doi.org/10.1109/MLISE57402.2022.00009
Wen, C., Chen, H., Ma, Z., Zhang, T., Yang, C., Su, H., Chen, H.: Pest-YOLO: a model for large-scale multi-class dense and tiny pest detection and counting. Front. Plant Sci. 13, 4072 (2022). https://doi.org/10.3389/FPLS.2022.973985/BIBTEX
Adadi, A.: A survey on data-efficient algorithms in big data era. J. Big Data 8, 1–54 (2021). https://doi.org/10.1186/S40537-021-00419-9
Manjusha, L., Suryanarayana, V.: Detect/remove duplicate images from a dataset for deep learning. J. Positive School Psychol. 6, 606–609 (2022)
Hsieh, H.L., Shanechi, M.M.: Optimizing the learning rate for adaptive estimation of neural encoding models. PLoS Comput. Biol. 14, e1006168 (2018). https://doi.org/10.1371/JOURNAL.PCBI.1006168
Caputo, A.C.: Digital Video Overview. Digital Video Surveillance and Security, pp. 25–47 (2014). https://doi.org/10.1016/B978-0-12-420042-5.00002-2
Liscio, E., Guryn, H., Le, Q., Olver, A.: A comparison of reverse projection and PhotoModeler for suspect height analysis. Foren. Sci. Int. 320, 110690 (2021). https://doi.org/10.1016/J.FORSCIINT.2021.110690
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The work was done at the National Taiwan University of Science and Technology and as a student of Indonesia International Institute for Life Sciences.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Darmawan, J.T., Sigalingging, X.K., Faisal, M. et al. Neural network-based small cursor detection for embedded assistive technology. Vis Comput (2024). https://doi.org/10.1007/s00371-023-03246-6
Accepted:
Published:
DOI: https://doi.org/10.1007/s00371-023-03246-6