Why AI has a “proving the obvious” problem, and what we can do about it

Guest post by Kai Dröge, Institute for Social Research Frankfurt, Germany, and University for Applied Science, Lucerne, Switzerland. Together with Colin Curtain, he develops QualCoder, an open-source QDA software with included AI support.


Guest posts express the views of guest contributors and do not necessarily represent the views of the CAQDAS Networking Project or constitute an endorsement of any product, method or opinion.


In February this year, Thomas Wolf, co-founder of Hugging Face, an important platform for AI research and development, talked at an event in Paris about the potential impact of AI on the advancement of science. But instead of diving into technical details, he used a personal story to illustrate his perspective on the topic. The story goes back to his time at the Massachusetts Institute of Technology (MIT). Although he had previously graduated from top French universities with excellent results, he struggled with his new role at MIT: “I was a pretty average, underwhelming, mediocre researcher” (Wolf, 2025). Moreover, the very skill that had made him a good student – the ability to quickly grasp and reproduce complex textbook knowledge – now became a major obstacle for him as a researcher: “I found it very hard to challenge the status-quo, to question what I had learned” (Wolf, 2025).

This, he concludes, is precisely the situation of today’s AI models, especially Large Language Models (LLMs). They are “obedient students” (Wolf, 2025) in the sense that they can learn massive amounts of training data and reproduce that knowledge at the level of some of the most challenging university exams. But they are very bad at coming up with new questions and ideas that would truly “challenge the status-quo,” expand our understanding, and push our scientific knowledge into uncharted territory. This can be a problem if we want to use AI to accelerate scientific progress.

AI as a “common sense machine”

If we look at the way today’s Large Language Models are trained, it is easy to see why they have so much trouble generating new ideas or insights. In its most basic form, this training process involves presenting the model with chunks of text (training data) and asking it to predict how the text will continue. Correct predictions earn positive feedback, while incorrect ones earn negative feedback. Repeating this millions of times sharpens the model’s abilities to anticipate not only the most likely next word, but also whole sentences and answers to complex questions.

At the same time, however, this training process also forms the kind of “obedient student” that Thomas Wolf talks about. Just as such a student would try to get good grades by learning to predict which answer a professor wants to hear, the LLM also learns to always respond with the most common, obvious, and expectable answer based on the distribution of similar questions or text fragments in its training data. The result is what I would call a “common sense machine”, an AI that is very good at reproducing already known answers to common problems, but very bad at thinking outside the box, coming up with new ideas, or approaching a problem in an unorthodox way (for a similar critique, see Chomsky, Roberts and Watumull, 2023; Dröge, 2023a).

What does this mean for Qualitative Data Analysis (QDA)?

At first glance, this struggle to come up with new ideas and concepts would seem to disqualify LLMs from most serious use cases in scientific research. But is this also true for their application to qualitative data analysis? Here, our top priority is for the language model to stay true to our data and provide a correct, well-grounded interpretation. In this respect, the lack of creativity and ingenuity in today’s AI models may actually work to our advantage, preventing hallucinations and speculative interpretations that are not supported by our empirical data.

But qualitative research also seeks to go beyond common sense, to gain new insights from the data, and to advance our understanding of the social world. To do this, we need to approach our data “possessed with a devouring curiosity to know more about the world” (Robert E. Park, cited in Matthews, 1977, p. 7) – an open-mindedness that is willing to discard preconceptions in order to learn from new perspectives or surprising observations. But are today’s AI models capable of helping us approach our empirical data in such a way? This is where things get complicated.

The “proving the obvious” problem

Let’s look at an example. In a recent post on this blog, Susanne Friese introduced her concept of “Moving Beyond Coding in Qualitative Research” (Friese, 2025). In this demonstration, she included an AI-generated analysis based on 57 qualitative interviews from a study on “impulse buying” that she conducted as part of her doctoral dissertation. As a first approach, she asked the AI, “How did respondents describe their impulse buying experiences?” (There are also more complex examples in her post, but let’s stick with this simple one for now.) The answer she got:

Impulse buying is often characterized by a lack of concern for budget and financial constraints, driven by emotions such as excitement, boredom, or a need for a mood boost. Many find themselves purchasing items spontaneously, whether it’s clothes, items for children, or even larger purchases like vehicles, without prior planning or deliberation. Sales and special offers frequently trigger these unplanned purchases, leading to a mix of regret and satisfaction. While some feel a high from spending, others experience guilt or a sense of achievement, despite sometimes buying items they already own. The impulse to buy can be strong, often resulting in purchases that are later hidden or rationalized as fulfilling a need or saving money.

This sounds very convincing, and using appropriate QDA software tools, we would probably also be able to trace this analysis back to the underlying sources in the interview data. So far, so good.

But let’s try an experiment. Without giving it any empirical data, I’ve asked GPT-4o, OpenAI’s current default language model, a very similar question: “How do people describe their impulse buying experiences?” The answer (slightly shortened to be about the same length):

Emotional triggers such as excitement, stress, boredom, or a desire for instant gratification often prompt quick, unplanned purchases. Initial excitement can lead to guilt or regret, especially if the purchase was unnecessary or strained finances. Still, some feel satisfied if the item is useful or brings joy. Many justify their purchases afterward by highlighting good deals or unforeseen needs. Impulse buying can also be a financial learning experience, encouraging more careful future spending. The act itself can be thrilling, and social influences like sales or trends on social media often contribute. Overall, it reflects a mix of emotions, decision-making, and social pressures.

If you compare this answer with the original one quoted above, you will see that the arguments are in a slightly different order, and some minor aspects are mentioned in one answer and not in the other (e.g. the “financial learning experience” that GPT-4o brings up). But the main arguments are basically identical – even though the first answer is based on 57 in-depth qualitative interviews on the topic and the other one was generated in seconds by the LLM itself. How is it that we gain so little additional insight by feeding in so much empirical data?

To understand this, we need to consider what has been said above about the training of these models. LLMs were trained to focus on what is the most likely, the most expected outcome for a given question. I would argue that this also guides their analytical perspective on empirical data. LLMs are very good at “finding” in the data what they already know about a phenomenon, what is common sense – as evidenced by the similarity of the data-based and non-data-based responses cited above.

This tendency to rediscover what we already know is what I call the “proving the obvious problem” of today’s AI models. It is a problem because it tends to make LLMs more or less blind to new and unexpected results. This has some practical advantages, as it protects us from a lot of pointless speculation, hallucinations, and weird ideas from the LLM. But it also gives the answers a significant bias toward the already established, preconceived knowledge about a phenomenon, possibly hindering the advancement of our understanding of the social world.

So, what to do in this situation? Some practical advice.

If we agree that AI acts like an “obedient student” and has a “proving the obvious” problem, does that disqualify it for the use in qualitative data analysis? I don’t think so. But we need to be aware of these issues and use AI tools appropriately:

  • First, we must understand what this means for our own role as humans in the research process. Curiosity, new ideas, unconventional interpretations, out-of-the-box and abductive thinking – these are not things we can outsource to an AI. By asking the right questions, we can force the AI to step out of its ‘comfort zone’, go beyond common sense knowledge and help us discover new and interesting insights. But the impulse to do so must come from us as human analysts. If we fail to provide such impulses, AI-assisted QDA will only produce more trivial “proving the obvious” research, of which we already have enough. That would be a shame and a wasted opportunity.
  • Second, I would argue that even in the age of AI-assisted research, there is no way around immersing ourselves in the original empirical data. I am somewhat concerned that many AI-assisted QDA approaches seem to rely heavily on AI-generated summaries of empirical data. As we saw above, these summaries may have already eliminated most of the interesting, new, and unexpected results. If we base your analysis solely on this, we will fall into the “proving the obvious” trap. We need to go beyond that and look directly at the original data to find new insights. Then we can use AI to help us pursue a particular question or verify an interpretation in a larger corpus (see, for example, the “horizontal coding” technique demonstrated in Dröge, 2023b).
  • Third, we need to understand that LLMs have an extraordinary “mimetic” capability, meaning, they can adapt to very different social, cultural, and communicative contexts. We can use this to steer the LLM away from trivial common sense knowledge by giving it as much contextual information as possible – about our research, the nature of the data we have collected, the people we have spoken to, their social, cultural, and personal backgrounds, and so on.  This enables the LLM to develop context awareness in its interpretations, which is, as we know, crucial for any kind of qualitative data analysis.
  • Finally, we can use certain prompting techniques to force the AI language model to step out of its comfort zone of commonly expected known facts and interpretations. Consider an example from my app QualCoder. Here, I implemented a prompt inspired by the strategy of contrasting comparisons known from grounded theory and many other QDA methods:
    Prompt: Take the given topic and description and briefly explain what empirical results would be commonly expected based on your own knowledge about the phenomenon in question. Then look at the actual empirical data presented to you and pick out relevant aspects which are most surprising and unexpected given your previously outlined expectations.
    This two-step strategy uses the vast common-sense knowledge embedded in the language model to our advantage, creating a contrasting backdrop against which new and unexpected results can stand out and surface. In my experience, this is not a panacea for the “proving the obvious” problem of today’s AI models. But I have found it very helpful in many situations where I wanted to shift attention away from the most obvious, boring, and expected observations that LLMs often like to focus on in their analysis of empirical data.

Overall, I hope that, with such strategies and considerations, the introduction of AI into qualitative data analysis will do more than just accelerate individual paper output. Instead, it should also help us to push our research into yet uncharted territory and gain a deeper and more nuanced understanding of the social world we live in.


AI use: AI curated the music I listened to while working on this text; DeepL ruined my unique German-English writing style; Google Gemini’s “deep research” found the great quote from Robert E. Park for me (besides compiling a five-page report with first-year knowledge about qualitative research that nobody had asked for).


References

Chomsky, N., Roberts, I. & Watumull, J. (March 8th, 2023). The False Promise of ChatGPT. New York Times. https://www.nytimes.com/2023/03/08/opinion/noam-chomsky-chatgpt-ai.html

Dröge, K. (June 23rd, 2023a). Hermeneutische Maschinen? Überlegungen zum Verhältnis von qualitativer Sozialforschung und künstlicher Intelligenz. Conference Presentation, Jubiläumstagung der DGS-Sektion „Methoden der qualitativen Sozialforschung“, Mainz. https://drive.switch.ch/index.php/s/oGx4qMh1YaQgtjy

Dröge, K. (2023b). Horizontal Coding: AI-Assisted Qualitative Data Analysis in QualCoder, Free & Open Source. https://youtu.be/FrQyTOTJhCc

Friese, S. (2025). Embracing the Paradigm Shift: Moving Beyond Coding in Qualitative Research. Computer-Assisted Qualitative Data Analysis – A University of Surrey blog. https://blogs.surrey.ac.uk/caqdas/2025/03/03/embracing-the-paradigm-shift-moving-beyond-coding-in-qualitative-research/

Matthews, F. H. (1977). Quest for an American Sociology: Robert E. Park and the Chicago School. McGill-Queen’s University Press.

Wolf, T. (2025). The Einstein AI model. https://thomwolf.io/blog/scientific-ai.html