1 Introduction

Artificial intelligence (AI) mimics human behaviors and processes with machine learning (Booth et al., 2021; Wartman & Combs, 2018). AI has come a long way since it was initially mentioned in 1955 (McCarthy et al., 2006). ELIZA, the earliest example of a natural language processing program, was created in the 1960s (Weizenbaum, 1966). Besides, the United States Department of Defense began training computers to mimic basic human skills during those very same years. Deep learning was introduced in the 1980s. Deep learning is the machine using previously stored information for new experiences. In this context, AI has acquired significant traction. In the 1990s, artificial neural networks were constructed. Deep Blue, an intelligent chess-playing algorithm, defeated the world chess champion in 1997. A humanoid robot was developed by MIT in 2001 (Breazeal, 2004). Furthermore, DeepMind’s AlphaGo artificial intelligence system defeated Go Champion Lee Se-dol 4 − 1 in 2016 (Koch, 2016). The rise of smart assistants like Siri highlights the power of AI in defining, evaluating, and acting logically. These capabilities allow AI systems to interact with the natural world in increasingly sophisticated ways (Luckin et al., 2016). Therefore, many industries incorporate AI into their products, including electronics, mechanics, and informatics. Today, AI may be found in most industries, from smartphones to marketing to healthcare to finance (Mou, 2019). All these applications lead to AI moving systems to critical points in our lives. These applications are expected to advance soon (Shabbir & Anwer, 2015).

AI systems invade every industry and force us to make drastic adjustments. In this context, it is anticipated that it will take the lead in education. Education is set to undergo a profound transition due to the rise of AI, posing one of the most significant challenges and implications among the disciplines touched by this technology (Cooper, 2023; Forbes, 2020). Looking back over the last 50 years, we can observe that many technologies and tools like augmented reality (AR) and virtual reality (VR) still need to be fully integrated into educational settings (Lawrie, 2023). According to the Education Intelligence Unit (2019), AI will be fully embraced in education, and its applications will grow significantly over the next five years. Education has been considered among the most crucial factors, especially when nations are developing national AI strategies, as seen in Australia, China, Estonia, France, Singapore, South Korea, and the United States (UNESCO, 2019). Recognizing the considerable potential of AI, countries such as China and the United States are investing in educational AI applications (Chiu & Chai, 2020). Furthermore, a few private companies have created AI apps for teaching and learning. Carnegie Learning and Fuel Education are two companies that disseminate AI in K-12 education. ALEKS, another AI tool developed by McGraw Hill, is a learning and assessment system for grades K-12. Also, AI is commonly used to learn foreign languages. Many foreign language learning software programs include integrated AI. One is Duolingo, a language learning application; a whole university semester of language education is equivalent to 34 h on the Duolingo app (Vesselinov & Grego, 2012). Century is another software that helps teachers build tailored learning modules while reducing their workload. Priya Lakhani founded the Century AI tool in 2013 in London (Lakhani & Halfon, 2020).

ChatGPT, an AI program that emerged in 2022, contains notable characteristics. OpenAI created ChatGPT (Chat Generative Pre-trained Transformer) and released it to users in 2022. ChatGPT is a machine learning-powered chatbot that can deliver detailed responses to inquiries (Flanagin et al., 2023). ChatGPT could serve a purpose in real-world applications such as digital marketing, online content production, resolving customer service questions, or, as some users have discovered, even debugging code (Siddharth, 2022). Because of ChatGPT’s potential, it rapidly piqued people’s curiosity (Mollick, 2022). Along with the exhilaration, it brought the anxiety of providing inaccurate information (Davis, 2023). It is anticipated that it may raise some ethical concerns. It is unclear which strategy will be employed if it is used for homework or original article writing. Experts argue whether to include ChatGPT as a co-author (Stokel-Walker, 2023).

ChatGPT can also provide answers to scientific queries, which is an issue that must be investigated to determine whether it will be a reliable tool for learning new content. Teaching new knowledge and making it more persistent for memory is essential for learning. Students may develop misconceptions or need clarification if new knowledge is delivered incorrectly or incompletely. The misconceptions are accepted as individuals’ unscientific information (Lamichhane et al., 2018). The misconceptions are mostly counted as barriers to meaningful learning. They persistently provide incomplete content knowledge (Nussbaum & Novick, 1982; Queloz et al., 2018). It is challenging to overcome misconceptions (Duit & Treagust, 2003). Besides, many students may already have misconceptions related to introductory chemistry and biology (Atchia et al., 2022; Brandriet & Bretz, 2014; Machová & Ehler, 2021; Naah & Sanger, 2012; Rosenthal & Sanger, 2012; Smith & Villarreal, 2015). Families, teachers, and technological experiences may form and spread misconceptions in various ways (Karpudewan et al., 2017).

Television, websites, and social media platforms are frequently utilized as reference sources for getting and sharing knowledge (Elmas et al., 2013). Students who come across incorrect or incomplete information may turn it into a misconception. For example, an individual’s hair is usually called a feather in the media. Because of this, many students believe that there may be a characteristic of birds in humans who are mammals. (Adıgüzel & Yılmaz, 2020). Another study investigating the misconceptions about the Zika vaccine determined that the source of the misconceptions also came from social media. Specifically, there was misinformation suggesting that the Zika virus itself does not cause microcephaly in babies (Dredze et al., 2016).

Today, practically everyone uses technological tools that are widely accessible, respond quickly, and have extensive information networks (Elmas & Geban, 2012). Due to the latest technological breakthroughs in education and the increasing acceptance of AI technologies, the issues regarding their implementation in education require identification and analysis (Owoc et al., 2021; Jia et al., 2023). The objective is to assess the accuracy of ChatGPT’s responses to scientific inquiries formulated explicitly within the context of biochemical processes.

2 Methodology

A document analysis has been conducted for five questions and the scientific validity of responses produced by ChatGPT. Document in the context of this study refers to the responses generated by ChatGPT to the questions researchers asked. Five questions were asked to ChatGPT in a written format, and the generated answers from the AI were saved and analyzed depending on their scientific validity. The unit of analysis is sentences. The steps of data generation are as follows (Merriam, 2009). Figure 1 shows the data generation steps of our study.

Fig. 1
figure 1

The steps of data generation

In this context, a three-tier misconception test was applied to the ChatGPT (Table 1). The three-tier misconception test is frequently used to identify misconceptions (Andariana et al., 2020; Elmas & Pamuk, 2021; Hasyim et al., 2018; Irwansyah et al., 2018; Suriani et al., 2022; Prodjosantoso & Hertina, 2019). In the three-tier misconception test with AI, the first stage involves asking open-ended questions to ChatGPT. The second stage involves AI writing down why it prefers to answer that way in the first stage. The difference between the three-tier misconception test and the two-tier misconception test is the inclusion of the options “I am sure” and “I am not sure” to determine whether students are confident in their answers. We directly asked AI whether the algorithm was sure about its answer. In this research, the version of ChatGPT 3.5 was used. The five questions are presented below. These questions are also common misconceptions about cell energy metabolism in students (Table 1).

Table 1 Questions in the form of a three-tier test

2.1 Data Collection process

Firstly, the question, ‘Is fermentation an anaerobic respiration?’ was asked. Figure 2 depicts ChatGPT’s response. https://chat.openai.com/share/f974e796-9d0c-4578-9a4c-10ba0d913f95.

Fig. 2
figure 2

Is fermentation an anaerobic respiration?

When Fig. 2 is examined, fermentation is precisely expressed as anaerobic respiration. Besides, the question ‘Why is fermentation anaerobic respiration?’ was asked. Figure 3 depicts ChatGPT’s response.

Fig. 3
figure 3

Why is fermentation anaerobic respiration?

When Fig. 3 is examined, fermentation is referred to as anaerobic respiration because it occurs in an oxygen-depleted environment. Also, the question ‘Are you sure of your answer?’ was asked. Figure 4 depicts ChatGPT’s response.

Fig. 4
figure 4

Are you sure of your answer?

When Fig. 4 is examined, it was observed that ChatGPT seemed unsure of its response and desired to alter it. Secondly, the question ‘What are the final electron acceptor molecules in fermentation?’ was asked. Figure 5 depicts ChatGPT’s response. https://chat.openai.com/share/507f4ff2-d3fd-4841-944c-bd97946e0c09.

Fig. 5
figure 5

What are the final electron acceptor molecules in fermentation?

When Fig. 5 is examined, it seems that ethanol and lactic acid are considered the final electron acceptors in the fermentation pathway. Besides, the question ‘Why are these molecules the final electron acceptor molecules in fermentation?’ was asked. Figure 6 depicts ChatGPT’s response.

Fig. 6
figure 6

Why these molecules are the final electron acceptor molecules in fermentation

When Fig. 6 is examined, it is observed that ChatGPT’s response is insisted upon. Also, the question ‘Are you sure of your answer?’ was asked. Figure 7 depicts ChatGPT’s response.

Fig. 7
figure 7

Are you sure of your answer?

When Fig. 7 is examined, it is observed that ChatGPT seemed unsure of its response and desired to alter it. Thirdly, the question ‘Does oxidative phosphorylation occur in anaerobic respiration?’ was asked. Figure 8 depicts ChatGPT’s response. https://chat.openai.com/share/2ec056e6-fefa-4eb8-bb70-7509116f23e5.

Fig. 8
figure 8

Does oxidative phosphorylation occur in anaerobic respiration?

When Fig. 8 is examined, it is noted that oxidative phosphorylation does not occur in anaerobic respiration. The question ‘Why does not oxidative phosphorylation occur in anaerobic respiration?’ was asked. Figure 9 depicts ChatGPT’s response.

Fig. 9
figure 9

Why does not oxidative phosphorylation occur in anaerobic respiration?

When Fig. 9 is examined, it is observed that ChatGPT’s response is insisted upon. In addition, according to its response, oxygen is required for oxidative phosphorylation. Also, the question ‘Are you sure of your answer?’ was asked. Figure 10 depicts ChatGPT’s response.

Fig. 10
figure 10

Are you sure of your answer?

When Fig. 10 is examined, it is observed that ChatGPT’s response was unsure and desired to be altered. Although limited, it noted that oxidative phosphorylation can occur during anaerobic respiration. Fourth, the question ‘Does oxidative phosphorylation occur in chemosynthesis?’ was asked. Figure 11 depicts ChatGPT’s response. https://chat.openai.com/share/7db89d8c-87e0-4f4a-83c1-94feba1de477.

Fig. 11
figure 11

Does oxidative phosphorylation occur in chemosynthesis?

When Fig. 11 is examined, it is noted that oxidative phosphorylation does not occur in chemosynthesis. Besides, ‘Why does oxidative phosphorylation not occur in anaerobic respiration?’ was asked. Figure 12 depicts ChatGPT’s response.

Fig. 12
figure 12

Why does not oxidative phosphorylation occur in chemosynthesis?

When Image 12 is examined, it is observed that ChatGPT’s response is insisted upon. Also, the question ‘Are you sure of your answer?’ was asked. Figure 13 depicts ChatGPT’s response.

Fig. 13
figure 13

Are you sure of your answer?

When Fig. 13 is examined, it is observed that ChatGPT’s response is still insisted upon. Fifth, the question ‘In which metabolic processes do chemophosphorylation occur?’ was asked. Figure 14 depicts ChatGPT’s response. https://chat.openai.com/share/579df106-e2b6-47a2-a0e6-5f54fdf004ce.

Fig. 14
figure 14

In which metabolic processes do chemophosphorylation occur?

When Fig. 14 is examined, ChatGPT noted that chemophosphorylation occurs in two main metabolic processes: cellular respiration and photosynthesis. Besides, ‘Why does chemophosphorylation occur in these metabolic processes?’ was asked. Figure 15 depicts ChatGPT’s response.

Fig. 15
figure 15

Why does chemophosphorylation occur in these metabolic processes?

When Fig. 15 is examined, it is observed that ChatGPT’s response is insisted upon. Also, the question ‘Are you sure of your answer?’ was asked. Figure 16 depicts ChatGPT’s response.

Fig. 16
figure 16

Are you sure of your answer?

When Fig. 16 is examined, it is observed that ChatGPT’s response is still insisted upon.

2.2 Data Analysis

Analyzed data were scrutinized by a team comprising three faculty members, one chemistry and two biology. Additionally, two other faculty members, one from chemistry and the other from biology were assigned to oversee and verify the findings’ scientific accuracy and explanatory aspects. Current university-level chemistry and biology textbooks widely used internationally are utilized as references to assess the answers provided by ChatGPT. Scientific sources used are:

  • Biochemistry (Berg et al., 2002),

  • (Denniston et al., 2004).

  • Lehninger Principles of Biochemistry (Nelson & Cox, 2008).

  • Biological Science (Freeman et al., 2014).

  • Life: The Science of Biology (Sadava et al., 2014).

  • Campbell Essential Biology (Simon et al., 2017).

  • Biology of Microorganisms (Madigan et al., 2019).

  • Life: The Science of Biology (Hillis et al., 2020).

  • Campbell Biology (Urry et al., 2021).

3 Findings

The responses of ChatGPT to common misconceptions about energy metabolism in the cell were examined, and the findings are given in Table 2.

Table 2 Statements in ChatGPT’s responses and scientific explanations

When Table 2 is examined, it is seen that ChatGPT provided scientifically incorrect answers to the five questions asked. Besides, when asked the reason for ChatGPT’s answer, it is seen that ChatGPT insisted on giving the wrong answers. Following prompts for certainty, the AI’s performance was evaluated. It provided scientifically correct answers to the first two questions, partially correct answers to the third, and consistently offered invalid solutions for the remaining questions.

4 Discussion and conclusions

It has become apparent that the responses generated by ChatGPT contain inaccuracies and omissions, underscoring the importance of scrutinizing such AI-generated content, especially within a specific discipline (Dahlkemper et al., 2023; Fergus et al., 2023). This study sets the stage for a broader discussion on the reliability of AI in producing correct, factual, and comprehensive information (Bitzenbauer, 2023). Duda et al. (2020) discovered that students who took the biotechnology course might generate misconceptions about fermentation. Adıgüzel & Yılmaz (2020) determined that undergraduate students have misconceptions about fermentation, anaerobic respiration, and oxidative phosphorylation. Cakır et al. (2002) emphasized that students have misconceptions about cellular respiration concepts. Scott (2005) mentioned common misconceptions regarding aerobic and anaerobic energy expenditure. Jena (2015) has detected that misconception concerning mitochondrial oxidative phosphorylation.

These misconceptions can stem from a variety of sources, such as teachers (Moodley & Gaigher, 2019), textbooks (Novitasari et al., 2019; Gündüz et al., 2019), and technological tools (Acar-Sesen & Ince, 2010). Yılmaz et al., (2019) found that social media led to misconceptions in science teachers. Meel and Vishwakarma (2020) raised concerns about information inflation in social media and the internet, including invalid scientific knowledge (Dev & Bhatnagar, 2020). Considering that people spend a considerable amount of time in their daily lives on the Internet and that the use of the Internet has increased after the pandemic (Lin, 2020), misinformation is likely to disseminate and might become a misconception.

There are studies on the potential and positive aspects of ChatGPT in the literature (Biswas, 2023; Baidoo-Anu & Owusu Ansah, 2023; Frieder et al., 2023; Surameery & Shakor, 2023; Zhu et al., 2023). Education can benefit from the responsible and ethical application of artificial intelligence technologies such as ChatGPT (Kasneci et al., 2023). Typically, ChatGPT produces responses that are both clear and prompt. Particularly noteworthy is the format of its answers to binary questions, starting with ‘Do/Does’ or ‘Is/Are’; ChatGPT consistently initiates its responses with either ‘Yes’ or ‘No.’ So, we can say that ChatGPT’s answer is definitive. Hill-Yardin et al. (2023) mentioned that the ChatGPT is fast and straightforward, generating 500 words in less than two minutes. Manohar and Prasad (2023) used ChatGPT to write the academic article in their study. As a result of the research, they concluded that ChatGPT can produce clear and understandable text in some form.

Aside from this potential, it may also present us with invalid information from the database. The problem is caused by information inflation in the database used, not directly by the ChatGPT software. ChatGPT is trained on an unlabeled text dataset, like that used by Wikipedia and many other websites (Floridi & Chiriatti, 2020). The limitation of ChatGPT is that it can only generate text based on the input it gets and has no access to additional data or the ability to browse the Internet (Deng &Lin, 2022). So, we can find scientific incorrect and missing statements in ChatGPT’s responses. Mogali (2023) found that ChatGPT gave inconsistent and invalid information to some questions about anatomy. Some research has determined that ChatGPT generated fake citations for bibliographies (Baidoo-Anu & Owusu Ansah, 2023; King, 2023; Szabo, 2023). Wittmann (2023) emphasized that when questioned for details and references, ChatGPT struggled and made various inconsistent statements.

Improving ChatGPT’s accuracy in answering scientific questions, especially domain-specific ones often met with conceptual misunderstandings, involves several strategies. Some suggestions for developing ChatGPT might be domain-specific fine-tuning, enhanced data training, expert feedback incorporation, enriched contextual database, and collaboration with the scientific community (Rice et al., 2024). By implementing these strategies, AI developers can enhance ChatGPT’s capability to handle domain-specific scientific questions more accurately, making it a more reliable tool for educational and research purposes. Improving the correctness of answers from AI in response to scientific questions significantly hinges on the art of prompt engineering—meticulously crafting questions to guide the AI’s domain-specific response (Korzynski et al., 2023). A question’s clarity, specificity, and relevance directly influence the AI’s ability to process it, making it essential to articulate inquiries with care (Okonkwo & Ade-Ibijola, 2021). By being domain-specific and using relevant scientific concepts, questioners can sharply define the scope and context of their question, enabling the AI to fetch and generate ideas and concepts that closely align with the intended search. Incorporating precise scientific and domain-specific terminology and providing context help mitigate misconceptions and incorrect answers (Bozkurt & Sharma, 2023). It’s also important to ask one question at a time and clarify the desired depth of explanation, whether seeking a brief overview or a detailed analysis of the related scientific problem. This level of specificity in questioning improves the correctness and relevance of responses. Besides, follow-up questions can refine and clarify initial answers. The continuous development of scientific knowledge and, in many cases, the possibility of multiple valid perspectives on an answer further enriches the engagement with AI. Prompt engineering emerges as a critical skill in the effective use of AI for scientific questioning, emphasizing the need for clear, contextually rich, and well-structured questions to extract the most accurate and insightful answers.

Ultimately, ChatGPT’s capabilities are limited in providing scientifically rigorous responses. To obtain accurate and appropriate answers, it is imperative to pose comprehensive and detailed inquiries that facilitate a more precise and informed response. Scholars and researchers must acknowledge that ChatGPT harbors certain misconceptions and consequently only constitutes a somewhat dependable and scientifically validated resource.