Launched in late 2022, ChatGPT, a computer program for generating human-like text, took the world by storm and added fuel to debates about the future of AI technologies and content-based platforms, like Google. ChatGPT and its analogs are aiming to understand user’s intent, reply to user’s question or request, and maintain a conversation in natural language. The software is trained by its creators using language models with a large corpus of text data in a dynamic manner, so it is constantly evolving. Together with the record spike in signed-up users, fears about workforce disruptions, misinformation, and uncontrollable unethical applications are growing. It is likely that every discipline and every human activity sector should examine their strategies to prepare for the upcoming both useful and damaging disruptions. This is especially important for public health, where communication and miscommunication about public health problems and potential solutions could lead to lives saved and lives lost.

The call for a response to large-scale disruptions is not new for health professionals. With any new technological advancements from X-ray scanning to mass vaccination, there is always the need for adverse effect monitoring and control. Yet, the advancements in text processing and generation would require a higher level of checks and balances that involve responsibility associated with a computer-generated product, its authenticity, and credibility. In public health fields, this process could involve the decision process for policies and regulations, assessment of quality and validity of research findings, and pedagogic strategies for critical thinking and understanding of truth.

Filling-the-blanks

In my elementary school days, I loved exercises that called for finding the right words to complete a sentence: Fill in the blanks in “After a heavy ___, the roads are ___” with given choices: rain, snow, wet, dry. I especially liked playing a word game that allowed us to introduce our own selections, to make a sentence witty or silly, to amuse ourselves with a fault of logic.

When I began teaching biostatistics in graduate programs of public health, I pondered on the elegance of this exercise. I appreciated its flexibility in offering infinite possibilities to introduce components of critical thinking: inferences, assumptions, and logical constructs. In a quest for a cause-and-effect with a simple statement of “roads are wet after the rain” and some creativity, I could cover modern philosophical paradigms: positivism (stating that one truth exists, and it is observable), post-positivism (stating that one could never truly observe truth), critical theory (stating that multiple truths exist and are influenced by people and power relationships), and constructivism (when multiple truths are constructed by and between people). By exploring what is assumed and accepted as truth, we could reflect on our own and each other’s perceptions and experiences.

To train students to correctly apply statistical tests in public health, I developed a series of ‘mistake-find’ exercises where I carefully planted flaws and asked students to detect them. I would take a paragraph from a published research paper and simply reverse-engineer a filling-the-blanks. The incorporated mistakes contain inconsistencies, misconceptions, and errors. I asked students work in small groups to weed out each planted mistake and explain reasons why each should be corrected, and how. I asked students to pretend they were statistical editors for the JPHP (the role I performed for nearly 15 years) where the only document before them was a manuscript under review. The most exciting part of this exercise was the discussion of challenging subtleties and differences of opinions, with celebration of the winning team which detected all impurities, sometimes even more than those I had considered in my answer key.

The mistake-find exercise became my pedagogical tool and a part of an educational research project funded by the United States National Science Foundation to improve graduate training. This project aimed to examine novel ways to teach data-intensive disciplines in public health, environmental health, and nutrition using the approach called SOLSTICE (an abbreviation of solution-oriented, student-led, computationally enriched). As part of the approach, I expanded the utility of a mistake-find exercise. Together with teaching assistants we have created a system where detected mistakes could signify something about students’ knowledge, skills, and attitudes, such as attention to detail and stylistic preferences. The system also helped us to explore students’ familiarity with scientific methods, subject-specific terms, and concepts. We classified implanted errors according to level comprehension and critical thinking – from light to deep, from simple ‘typos’ and incomplete information to logical inconsistencies and contradictory statements. Some exercises called for students to sort through common confusions in using statistical tests. Some exercises focused on omissions in describing study designs. I tried to make implants subtle, so at first glance, the nuances were not easy to detect. Finding some mistakes required a second reading. In my mind, this process mimics the iterative editorial work when each iteration results in a better product.

The editorial work convinced me that it is possible to develop creative classroom experience to help students appreciate many aspects of critical thinking. As reviewers of journal submissions provide authors with suggestions to improve on clarity and comprehension, they help to provide the audience with high quality reading. As authors, reviewers, and editors, we often go through a mental mistake-find exercise; we check for logical flaws; we ensure clarity of terms and lines of arguments. As mentors and educators, we should focus increasingly on helping students develop skills to detect, define, and discuss truth in its many forms. Success in learning this way is the ability to recognize many truths as they have appeared to different people and as they served varied agendas.

Teaching critical thinking with information innovations

The mistake-find exercise is far from perfect and has its own limits. Even so, it helps me to engage students with the information innovations of the last decade. Searching for information has been simplified enormously with refinement of search engines, the size and depth of information repositories, and the speed and computation capacities of hand-held devises (phones and tablets). Information about almost everything is at fingertips of almost anyone. Yet the information delivered by a designed engine is not necessarily the truth someone is seeking. It could be no doubt useful, but not necessarily valid, accurate, or reliable. And when we employ these criteria to judge usefulness, we could improve data, information, and models in terms of their validity, accuracy, and reliability. This offers immense promise for public health.

At the inception of Wikipedia, I recalled debates in our teaching community about banning or allowing the use of such tools as part of graduate training. For a teacher, fear of the inability to detect cheating always exists, especially if authenticity is one of the assessment criteria. So too, we fear the spreading of misinformation and concepts that are proven to be false. Early Wikipedia pages contained many mistakes in content-specific domains. Even so, the creators’ efforts to build a community-based approach to content co-creating clearly demonstrated the power of community ownership and its capacity to verify and improve content. In classroom settings, I encourage students to perform internet searches for definitions of terms and concepts. This initial search creates the starting point for a discussion of differences and commonalities, of clarity and comprehension.

With text proofing, spell-checking, information repositories, glossaries, and language-translating tools, I also see substantial improvements in drafting and presenting technical texts, a big step forward for students and for authors communicating in and across multiple languages and multiple disciplines. Public health practitioners are routinely expanding their information toolboxes in offering recommendations and best practices for nutrition, hygiene, travel, early education, and injury prevention. Scientists are using web-based tools to survey populations, track health outcomes through data dashboards, and promote behavior change and testing which health policies work the best.

As we learn about novelties of the 2000s, including Wikipedia and Grammarly, and now ChatGPT (chat generative pre-trained transformer) and its alternatives, we recognize their powers and limits. As we shift attention from finding antidotes to their flaws, to building the transparent community input, we could do better by finding ways to use such advances to our advantage in seeking solutions for public health issues. We could and should focus more on issues related to validity and reliability of research findings, improving on clarity of presenting the findings to audiences with different cultural and professional backgrounds, and on reducing cognitive load in communicating complex ideas. We could and should better define and understand the terms we use routinely, yet straggle to specify and explain. We appeal to data, models, information, and knowledge, but cannot agree on the meanings of those terms nor avoid misstating them. Instead, we create new jargon and pepper the text with super-technical terms; we are simply buying time until we are able or willing to explain their true meaning.

The reasons for such avoidances and disagreements vary, yet they are often the core of challenges in our search for truth. In public health research and practice, we try to tackle big questions of what are the best (inclusive, transparent, fair) health policies, how they affect current health status, and even more importantly, how they may affect the future health of populations. Many papers in this journal ask, how have we set up our research frameworks to be useful and effective for proposing and evaluating health policies? How have we set up a vision of the future? How can future health and wellbeing be affected by new technologies?

The litmus test for truth

In my days as a student, my research advisor, Professor Vasilii Vasilievich Gubarev, a brilliant scientist and the founder of information theory, had devised an elegant way of viewing a model as an essential part of human goal-oriented intellectual activity. In his view, the goal is a model of the desired future, and the model is a goal-oriented reflection of reality. A model could not exist without a human acting as a creator, investigator, or user. It could not exist outside of the environment that determines objectives, quality indicators, conditions of use, methods, and technologies, on which basis the model is created, operates, and applies. Because of interconnectedness of ‘object-model-subject-environment relationship’, any model can be simultaneously a true, correct, and incorrect reflection of an object of interest. In this context, models, data, and knowledge are approximations to truth so long as they are obtained using correct logical conclusions and proofs, confirmed by correct physical experiments in a reproducible manner, and verified by fundamentally different approaches. This vision requires a high standard for research and practice because it demands a consistent and systematic reexamination of our goals and ways to reach them. From this vantage point, our classroom mistake-find exercise represents a means of honing our skills to formulate goals, create our images of reality, and test reflections against other perspectives.

Validity and reliability are the cornerstones of the research endeavor. In life sciences, we operate with concepts, models and systems, and ways to define them, theorize about them, and test them in experimental settings. Reliability and validity comprise our litmus test of closeness to what we perceive to be the truth. We recognize that any experiment influences a reality and reflects it with some degree of distortion. Using the concept of validity, we aim to assess how much distortion may occur. We could be wrong in our choice of a goal, of a method, of a perspective—just as we could err in the mistake-find exercise. In public health studies, the combination of objective instrumental measurements, self-assessments, and documented perceptions, that are constantly evolving, may bring any expert and bystander to a numbing conclusion of impossibility to detect, define, and describe an illusive ever-changing truth. Yet, the shared experiences and experiential learning offer ways to reconcile and agree on some principles of accumulating and synthesizing knowledge.

In conducting epidemiological or public health studies, scientists agree on examining internal and external validity. Internal validity interrogates the study approach (design, conduct, and analysis) as a tool to answer a research question with minimal bias. In the broad sense, bias is a departure from what a researcher considers a correct logical progression, reproducible methods, and testable assumptions. Each study design could bring nuances and require examination of biases specific to its design and conditions. External validity examines whether the study findings are generalizable to a broader context. For example, for a study conducted in controlled experimental conditions, we would be interested in ecological validity, a specific way to examine whether the study findings can be generalized to real-life settings.

With the expansion of data collected with routine monitoring, including surveillance systems, social media tracking, and satellite and drone imagery, we must revise our criteria of quality and correctness. The digital description of reality asks us to decipher what we could learn with the new tools about health, population, diseases, risks, and hazards. Are we improving our understanding of the world, or just muddying the water with new gadgets? Are we pursuing the truth or drowning in a deep dive without any clear goals and metrics for success? Would digital reality serve to select groups of people or create the path for democratization and equality?

So far, AI and ChatGPT are producing a human-like generated text that, at least on the surface, may look well-constructed and original. Yet, by the nature of generation and synthesis, the process operates by templates that may lack the needed flexibility or the pointers to verifiable sources. These technological advancements could provide useful roadmaps for well-standardized workflows if proper safeguards are developed in time. Virtual reality is a model, a reflection of someone’s vision, along with the distortions, biases, and agendas of the model’s creators and users. This reality imitates a mistake-find exercise, where we must learn to detect glitches and distortions. Imagine now a transparent AI in which implanted mistakes are highlighted, as I did in the answer key that I revealed at the end of a class. Imagine that we train professionals tuned to identify distortions and biases and find ways to fix, even better, prevent them. The true value of technological advancements of the twenty-first century should be measured by their ability to offer real and lasting solutions to public health problems.

Elena N. Naumova,

Editor-in-Chief