ContextMate: a context-aware smart agent for efficient data analysis

Jadoon, Aamir Khan; Yu, Chun; Shi, Yuanchun

doi:10.1007/s42486-023-00144-7

ContextMate: a context-aware smart agent for efficient data analysis

Regular Paper
Published: 16 April 2024

(2024)
Cite this article

CCF Transactions on Pervasive Computing and Interaction Aims and scope Submit manuscript

43 Accesses
Explore all metrics

Abstract

Pre-trained large language models (LLMs) have demonstrated extraordinary adaptability across varied tasks, notably in data analysis when supplemented with relevant contextual cues. However, supplying this context without compromising data privacy can prove complicated and time-consuming, occasionally impeding the model’s output quality. To address this, we devised a novel system adept at discerning context from the multifaceted desktop environments commonplace amongst office workers. Our approach prioritizes real-time interaction with applications, according precedence to those recently engaged with and have sustained prolonged user interaction. Observing this landscape, the system identifies the dominant data analysis tools based on user engagement and intelligently aligns concise user queries with the data’s inherent structure to determine the most appropriate tool. This meticulously sourced context, when combined with optimally chosen prefabricated prompts, empowers LLMs to generate code that mirrors user intentions. An evaluation with 18 participants, each using three popular data analysis tools in real-world office and R &D scenarios, benchmarked our approach against a conventional baseline. The results showcased an impressive 93.0% success rate of our system across seven distinct data-focused tasks. In conclusion, our method significantly improves user accessibility, satisfaction, and comprehension in data analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Interactive Data Analytics for the Humanities

Natural Language Processing and Machine Learning for Big Data

Intelligent User Assistance for Automated Data Mining Method Selection

Article Open access 18 March 2020

Code availability

The prompts substantiating the conclusions can be found in the appendix section of the manuscript. Additionally, the corresponding author will furnish the complete source code upon receiving a justified request.

Notes

References

Bach, J., Bolton, M.: A Context-Driven Approach to Automation in Testing. Technical report, Satisfice, Inc., Feb. 2016. Available at https://shorturl.at/cORT9 (2016)
Bazire, M., Brézillon, P.: Understanding context before using it. In: Modeling and Using Context: 5thInternational and Interdisciplinary Conference CONTEXT 2005, Paris, France, . Proceedings 5, pp. 29–40. Springer (2005)
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A.: Language models are few-shot learners. Adv. Neural Inform. Process. Syst. 33, 1877–1901 (2020)
Google Scholar
Chase, H.: Langchain (2023). https://python.langchain.com/en/latest/index.html
Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H.P.d.O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G.: Evaluating large language models trained on code. arXiv preprint https://arxiv.org/abs/2107.03374 (2021)
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H.W., Sutton, C., Gehrmann, S.: Palm: Scaling language modeling with pathways. arXiv preprint https://arxiv.org/abs/2204.02311 (2022)
Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S., Amodei, D.: Deep Reinforcement Learning from Human Preferences. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper_files/paper/2017/file/d5e2c0adad503c91f91df240d0cd4e49-Paper.pdf
Chung, H.W., Hou, L., Longpre, S., Zoph, B., Tay, Y., Fedus, W., Li, E., Wang, X., Dehghani, M., Brahma, S.: Scaling instruction-finetuned language models. arXiv preprint https://arxiv.org/abs/2210.11416 (2022)
Corporation, M.: Microsoft excel (2019 (16.0)). https://office.microsoft.com/excel
Coutaz, J., Crowley, J.L., Dobson, S., Garlan, D.: Context is key. Communi. of the ACM 48(3), 49–53 (2005)
Article Google Scholar
Diederich, S., Brendel, A.B., Morana, S., Kolbe, L.: On the design of and interaction with conversational agents: an organizing and assessing review of human-computer interaction research. J. Assoc. Inf. Syst. 23(1), 96–138 (2022)
Google Scholar
Ding, J., Zhao, B., Huang, Y., Wang, Y., Shi, Y.: Gazereader: Detecting unknown word using webcam for english as a second language (esl) learners. In: Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, pp. 1–7
Feng, Yingchaojie., Wang, Xingbo., Pan, Bo., Wong, Kam Kwai., Ren, Yi., Liu, Shi., Yan, Zihan., Ma, Yu xin., Qu, Huamin., Chen, Wei.: eng 2023/04/07 IEEE Trans Vis Comput Graph. 2023 Jan 26;PP. https://doi.org/10.1109/TVCG.2023.3240003
Feng, Y., Wang, X., Pan, B., Wong, K.K., Ren, Y., Liu, S., Yan, Z., Ma, Y., Qu, H., Chen, W.: Explaining and diagnosing nli-based visual data analysis. IEEE Trans. Vis. Comput. Graph. (2023). https://doi.org/10.1109/TVCG.2023.3240003
Article Google Scholar
Gao, L., Madaan, A., Zhou, S., Alon, U., Liu, P., Yang, Y., Callan, J., Neubig, G.: Pal: Program-aided language models. arXiv preprint https://arxiv.org/abs/2211.10435 (2022)
Gauselmann, P., Runge, Y., Jilek, C., Frings, C., Maus, H., Tempel, T.: A relief from mental overload in a digitalized world: How context-sensitive user interfaces can enhance cognitive performance. Int. J. Human-Comput. Interact. 39(1), 140–150 (2023)
Article Google Scholar
Greenberg, S.: Context as a dynamic construct. Human-Comput. Interact. 16(2–4), 257–268 (2001)
Article Google Scholar
Hoque, E., Kavehzadeh, P., Masry, A.: Chart question answering: state of the art and future directions. Comput. Graphics Forum 41(3), 555–572 (2022). https://doi.org/10.1111/cgf.14573
Article Google Scholar
Jiang, Z., Xu, F.F., Araki, J., Neubig, G.: How can we know what language models know? Trans. Assoc. Comput. Ling. 8, 423–438 (2020)
Google Scholar
Joshi, H., Ebenezer, A., Cambronero, J., Gulwani, S., Kanade, A., Le, V., Radiček, I., Verbruggen, G.: Flame: A small language model for spreadsheet formulas. arXiv preprint https://arxiv.org/abs/2301.13779 (2023)
Karaman, Ç.Ç., Sezgin, T.M.: Gaze-based predictive user interfaces: visualizing user intentions in the presence of uncertainty. Int. J. Human-Comput. Stud. 111, 78–91 (2018). https://doi.org/10.1016/j.ijhcs.2017.11.005
Article Google Scholar
Khatry, A., Cahoon, J., Henkel, J., Deep, S., Emani, V., Floratou, A., Gulwani, S., Le, V., Raza, M., Shi, S.: From words to code: Harnessing data for program synthesis from natural language. arXiv preprint https://arxiv.org/abs/2305.01598 (2023)
Kumar, S., Talukdar, P.: Reordering examples helps during priming-based few-shot learning. arXiv preprint https://arxiv.org/abs/2106.01751 (2021)
Lazaridou, A., Gribovskaya, E., Stokowiec, W., Grigorev, N.: Internet-augmented language models through few-shot prompting for open-domain question answering. arXiv preprint https://arxiv.org/abs/2203.05115 (2022)
Liang, Y., Wu, C., Song, T., Wu, W., Xia, Y., Liu, Y., Ou, Y., Lu, S., Ji, L., Mao, S.: Taskmatrix. ai: Completing tasks by connecting foundation models with millions of apis. arXiv preprint https://arxiv.org/abs/2303.16434 (2023)
Liu, M.X., Sarkar, A., Negreanu, C., Zorn, B., Williams, J., Toronto, N., Gordon, A.D.: “what it wants me to say”: Bridging the abstraction gap between end-user programmers and code-generating large language models. In: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, pp. 1–31
Liu, C., Han, Y., Jiang, R., Yuan, X.: ADVISor: automatic visualization answer for natural-language question on tabular data. Pac. Vis. Symp. (2021). https://doi.org/10.1109/PacificVis52677.2021.00010
Article Google Scholar
Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., Neubig, G.: Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 55(9), 1–35 (2023)
Article Google Scholar
Lu, Y., Bartolo, M., Moore, A., Riedel, S., Stenetorp, P.: Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint https://arxiv.org/abs/2104.08786 (2021)
Luo, Y., Tang, N., Li, G., Chai, C., Li, W., Qin, X.: Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks (2021). 10.1145/3448016.3457261
Luo, Y., Tang, N., Li, G., Tang, J., Chai, C., Qin, X.: Natural language to visualization by neural machine translation. IEEE Trans. Vis. Comput. Graph. 28(1), 217–226 (2022). https://doi.org/10.1109/TVCG.2021.3114848
Article Google Scholar
Luo, Yuyu, Tang, Nan, Li, Guoliang, Tang, Jiawei, Chai, Chengliang, Qin, Xuedi: Eng Research Support, Non-U.S. Gov’t 2021/11/17. IEEE Trans. Vis. Comput. Graph. 28(1), 217–226 (2022). https://doi.org/10.1109/TVCG.2021.3114848
Article Google Scholar
Maddigan, P., Susnjak, T.: Chat2vis: Fine-tuning data visualisations using multilingual natural language text and pre-trained large language models. arXiv preprint https://arxiv.org/abs/2303.14292 (2023)
Maddigan, P., Susnjak, T.: Chat2vis: Generating data visualisations via natural language using chatgpt, codex and gpt-3 large language models. arXiv preprint https://arxiv.org/abs/2302.02094 (2023)
Min, S., Lyu, X., Holtzman, A., Artetxe, M., Lewis, M., Hajishirzi, H., Zettlemoyer, L.: Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint https://arxiv.org/abs/2202.12837 (2022)
Nakano, R., Hilton, J., Balaji, S., Wu, J., Ouyang, L., Kim, C., Hesse, C., Jain, S., Kosaraju, V., Saunders, W.: Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint https://arxiv.org/abs/2112.09332 (2021)
Narechania, A., Srinivasan, A., Stasko, J.: Nl4dv: a toolkit for generating analytic specifications for data visualization from natural language queries. IEEE Trans. Vis. Comput. Graph. 27(2), 369–379 (2021). https://doi.org/10.1109/TVCG.2020.3030378
Article Google Scholar
Narechania, Arpit, Srinivasan, Arjun, Stasko, John: Eng research support , U.S. Gov’t, Non-P.H.S. 2020/10/14. IEEE Trans. Vis. Comput. Graph. 27(2), 369–379 (2021). https://doi.org/10.1109/TVCG.2020.3030378
Article Google Scholar
Ni, A., Iyer, S., Radev, D., Stoyanov, V., Yih, W.-t., Wang, S.I., Lin, X.V.: Lever: Learning to verify language-to-code generation with execution. arXiv preprint https://arxiv.org/abs/2302.08468 (2023)
OpenAI: Code interpreter-chatgpt plugins (July 2023). https://openai.com/blog/chatgpt-plugins
OpenAI: Gpt-4 technical report. arXiv preprint https://arxiv.org/abs/2303.08774 (2023)
(Open Source): Welcome to streamlit (2023 (1.18.1)). https://github.com/streamlit/streamlit
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A.: Training language models to follow instructions with human feedback. Adv. Neural Inform. Process. Syst. 35, 27730–27744 (2022)
Google Scholar
Rath, A.S., Devaurs, D., Lindstaedt, S.N.: Uico (2009). https://doi.org/10.1145/1552262.1552270
(Open Source): xlwings - make excel fly with python (2023 (0.29.1)).
Setlur, V., Battersby, S.E., Tory, M., Gossweiler, R., Chang, A.X.: Eviza (2016). https://doi.org/10.1145/2984511.2984588
Press, O., Zhang, M., Min, S., Schmidt, L., Smith, N.A., Lewis, M.: Measuring and narrowing the compositionality gap in language models. arXiv preprint https://arxiv.org/abs/2210.03350 (2022)
Setlur, V., Tory, M.: How do you converse with an analytical chatbot? revisiting gricean maxims for designing analytical conversational behavior. In: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, pp. 1–17
Shen, L., Shen, E., Luo, Y., Yang, X., Hu, X., Zhang, X., Tai, Z., Wang, J.: Towards natural language interfaces for data visualization: A survey. arXiv preprint https://arxiv.org/abs/2109.03506 (2021)
Srinivasa Ragavan, S., Hou, Z., Wang, Y., Gordon, A.D., Zhang, H., Zhang, D.: GridBook: Natural Language Formulas for the Spreadsheet Grid (2022). https://doi.org/10.1145/3490099.3511161
Srinivasan, A., Nyapathy, N., Lee, B., Drucker, S.M., Stasko, J.: Collecting and Characterizing Natural Language Utterances for Specifying Data Visualizations (2021). https://doi.org/10.1145/3411764.3445400
Stumpf, S., Bao, X., Dragunov, A., Dietterich, T.G., Herlocker, J., Johnsrude, K., Li, L., Shen, J.: The tasktracker system. In: PROCEEDINGS OF THE NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, vol. 20, p. 1712. Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999
Suzgun, M., Scales, N., Schärli, N., Gehrmann, S., Tay, Y., Chung, H.W., Chowdhery, A., Le, Q.V., Chi, E.H., Zhou, D.: Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint https://arxiv.org/abs/2210.09261 (2022)
Tang, J., Luo, Y., Ouzzani, M., Li, G., Chen, H.: Sevi: Speech-to-Visualization through Neural Machine Translation (2022). https://doi.org/10.1145/3514221.3520150
Taylor, R., Kardas, M., Cucurull, G., Scialom, T., Hartshorn, A., Saravia, E., Poulton, A., Kerkez, V., Stojnic, R.: Galactica: A large language model for science. arXiv preprint https://arxiv.org/abs/2211.09085 (2022)
team, T.: Pandas for python (2023) https://doi.org/10.5281/zenodo.10304236
The MathWorks, I.: Matlab engine for python (2022). https://www.mathworks.com/help/matlab/matlab_external/install-the-matlab-engine-for-python.html
The MathWorks, I.: Matlab version: 9.13.0 (r2022b) (2022). https://www.mathworks.com
Thoppilan, R., De Freitas, D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H.-T., Jin, A., Bos, T., Baker, L., Du, Y.: Lamda: Language models for dialog applications. arXiv preprint https://arxiv.org/abs/2201.08239 (2022)
Van Binsbergen, L.T., Verano Merino, M., Jeanjean, P., Van Der Storm, T., Combemale, B., Barais, O.: A principled approach to repl interpreters. In: Proceedings of the 2020 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, pp. 84–100 (2020)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inform. Process. syst. 30 (2017)
Vuong, T., Jacucci, G., Ruotsalo, T.: Watching inside the screen: digital activity monitoring for task recognition and proactive information retrieval. Proceed. ACM Interact. Mob. Wear. Ubiquitous Technol. 1(3), 1–23 (2017)
Article Google Scholar
Wang, Yun., Hou, Zhitao., Shen, Leixian., Wu, Tongshuang., Wang, Jiaqi., Huang, He., Zhang, Haidong., Zhang, Dongmei. 2023 Eng 2022/10/06 IEEE Trans Vis Comput Graph. 29(1):1222-1232. 10.1109/TVCG.2022.3209357
Wang, Y., Hou, Z., Shen, L., Wu, T., Wang, J., Huang, H., Zhang, H., Zhang, D.: Towards natural language-based visualization authoring. IEEE Trans. Vis. Comput. Graph. 29(1), 1222–1232 (2023). https://doi.org/10.1109/TVCG.2022.3209357
Article Google Scholar
Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D.: Emergent abilities of large language models. arXiv preprint https://arxiv.org/abs/2206.07682 (2022)
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E., Le, Q., Zhou, D.: Chain of thought prompting elicits reasoning in large language models. arXiv preprint https://arxiv.org/abs/2201.11903 (2022)
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: React: Synergizing reasoning and acting in language models. arXiv preprint https://arxiv.org/abs/2210.03629 (2022)
Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Bousquet, O., Le, Q., Chi, E.: Least-to-most prompting enables complex reasoning in large language models. arXiv preprint https://arxiv.org/abs/2205.10625 (2022)

Download references

Acknowledgements

This work is supported by the Natural Science Foundation of China under Grant No. 62132010, and by Beijing Key Lab of Networked Multimedia, the Institute for Guo Qiang, Tsinghua University, Institute for Artificial Intelligence, Tsinghua University (THUAI), and by 2025 Key Technological Innovation Program of Ningbo City under Grant No. 2022Z080. Additionally, we acknowledge the support provided by the Beijing Municipal Science & Technology Commission and the Administrative Commission of Zhongguancun Science Park under Grant No. Z221100006722018.

Author information

Authors and Affiliations

Department of Computer science and Technology, Tsinghua University, 30 Shuangqing Rd, Haidian District, Beijing, 100084, China
Aamir Khan Jadoon, Chun Yu & Yuanchun Shi
Qinghai University, Xining, 810016, China
Yuanchun Shi

Authors

Aamir Khan Jadoon
View author publications
You can also search for this author in PubMed Google Scholar
Chun Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yuanchun Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chun Yu.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Appendices

Appendix 1 Source Code of Background Service

Appendix 2 APIs Prompts

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jadoon, A.K., Yu, C. & Shi, Y. ContextMate: a context-aware smart agent for efficient data analysis. CCF Trans. Pervasive Comp. Interact. (2024). https://doi.org/10.1007/s42486-023-00144-7

Download citation

Received: 17 September 2023
Accepted: 26 December 2023
Published: 16 April 2024
DOI: https://doi.org/10.1007/s42486-023-00144-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ContextMate: a context-aware smart agent for efficient data analysis

Abstract

Access this article

Similar content being viewed by others

Interactive Data Analytics for the Humanities

Natural Language Processing and Machine Learning for Big Data

Intelligent User Assistance for Automated Data Mining Method Selection

Code availability

Notes

References

Acknowledgements