Guest post by Nick Woolf, Emeritus Advisor, Analyst and Author.
A recent essay by the science fiction writer Ted Chiang muses about whether Gen-AI could ever improve to the point that it produces fiction as good as a human could produce. He says not. Much current effort to incorporate AI into qualitative data analysis (QDA) involves Gen-AI, so his explanation provides a useful analogy to whether Gen-AI can ever meaningfully assist in interpretive QDA.
There are two separate issues: for what aspects of QDA is Gen-AI an appropriate tool, now or in the future; and how does Gen-AI currently perform? Many people are writing about the second issue, testing out current Gen-AI with qualitative data and seeing what they get. I am more interested in the first question – when is Gen-AI appropriate in QDA? First I’ll give you Chiang’s principles, then propose that Harry Wolcott’s principles of QDA are a good analogy that leads to a similar conclusion.
Chiang’s principles of why Gen-AI cannot produce human-level literature
Chiang characterizes writing fiction and creating art in general terms as involving a long sequence of choices:
“When you are writing fiction, you are—consciously or unconsciously—making a choice about almost every word you type; to oversimplify, we can imagine that a ten-thousand-word short story requires something on the order of ten thousand choices. When you give a generative-A.I. program a prompt, you are making very few choices; if you supply a hundred-word prompt, you have made on the order of a hundred choices.
If an A.I. generates a ten-thousand-word story based on your prompt, it has to fill in for all of the choices that you are not making. There are various ways it can do this. One is to take an average of the choices that other writers have made, as represented by text found on the Internet; that average is equivalent to the least interesting choices possible, which is why A.I.-generated text is often really bland. Another is to instruct the program to engage in style mimicry, emulating the choices made by a specific writer, which produces a highly derivative story. In neither case is it creating interesting art.”
He then goes on to describe what it would take for an Gen-AI model to actually produce a “good novel” at the artistic level of a good human novelist:
“This hypothetical writing program might require you to enter a hundred thousand words of prompts in order for it to generate an entirely different hundred thousand words that make up the novel you’re envisioning … [However] the selling point of generative A.I. is that these programs generate vastly more than you put into them, and that is precisely what prevents them from being effective tools for artists … art requires making choices at every scale; the countless small-scale choices made during implementation are just as important to the final product as the few large-scale choices made during the conception. It is a mistake to equate “large-scale” with “important” when it comes to the choices made when creating art; the interrelationship between the large scale and the small scale is where the artistry lies.”
This certainly rings true to me as analogous to the iterative back and forth between my global and often evolving research questions and the local interim findings discovered in the details of the data in a qualitative analysis. The interactions between the global and the local are one way of describing the emergent process of QDA. And as Chiang writes, that “is where the artistry lies”.
But this may not be every qualitative researcher’s experience of their data analysis process. That is where Harry Wolcott’s principles come in.
Wolcott’s principles of QDA
I propose that Harry Wolcott’s framework of description-analysis-interpretation is going to be helpful in deciding when and how to engage Gen-AI. Wolcott proposes these as fairly distinct ways of “transforming data” into findings at progressively further “distances” from the data. He defines description as “what is going on here”, analysis as “how things work”, and interpretation as “what does it mean”.
He says that in any study description is always present, and in some studies description leads on to either analysis or interpretation, or analysis leads on to interpretation. Some of our methodologies would seem to be primarily one of the three (for example, qualitative content analysis is primarily description, grounded theory is primarily analysis, interpretative phenomenological analysis is primarily interpretation), others may be a hybrid mix.
But I try to be conscious at any step of transforming data, regardless of methodology, whether I am currently describing, analyzing or interpreting. As I reflected back on past projects after reading Chiang’s essay I realized that when describing there are relatively few choices to be made, it is a question of clearly and accurately describing what was said in terms of the study’s purpose. However when analyzing there are many more choices at every step, within the bounds of the theoretical framework or lens and many other constraints. And when interpreting, the process is a continuous series of choices, an ever expanding decision tree, and each researcher’s choices – based on innumerable factors – will necessarily be very different from any other’s.
The implication of all this is that considering the appropriateness of Gen-AI for any QDA task should be different depending on the nature of the analytic a task. First let’s briefly review where we currently stand with Gen-AI offerings, and then think about the appropriateness of these tools depending on whether you are primarily describing, analyzing, or interpreting.
Three genres of Gen-AI for assisting QDA
As described on the Qual-AI pages of the CAQDAS Networking Project, right now there are three genres of Gen-AI tools in relation to QDA:
- General Purpose Chatbots (e.g., ChatGPT, Claude, MS copilot, Perplexity, NotebookLM)
- Iintegrations of Gen-AI into established CAQDAS-packages (currently ATLAS.ti, MAXQDA, QualCoder and Nvivo)
- New Apps focused entirely on harnessing Gen-AI capabilities (e.g., AILYZE, coloop, Reveal, QInsight and dozens and dozens more)
Here are some examples of what can be done with current tools:
- SUMMARIZE at different levels (e.g. whole documents, selected text segments, and already coded text segments )
- EXPLAIN TERMS i.e. if there is a technical term or theory in literature you’re unsure about you can have it give you an explanation – or if a participant in a transcript uses a colloquialism you’re not familiar with – but double check to make sure it’s not hallucinating
- SUGGEST CODES from a selected text segment, or sub-codes for a code
- CODING in three ways: (a) fully automated, (b) based on human specified “intentions” , such as analytic questions, project context etc., and (c) ‘directed AI-coding’ which is based on a defined code – the AI then goes looking for text that fits, which in some cases it does surprisingly well.
- RESPOND TO QUERIES such as the sequential conversations tools in ATLAS.ti and MAXQDA using a chatbot interface like ChatGPT, or in the exploding number of new apps in a comparative grid display
- AI-GENERATED “THEMES” offered by some of the new apps, but not currently in established CAQDAS packages
Now we need to consider: How do these tools fit into Wolcott’s scheme of describing, analyzing and interpreting?
AI and describing
Wolcott’s “descriptions” are closest to the data, e.g. by letting informants tell their own stories and treating “descriptive data as fact”. It seems to me that Gen-AI is best suited for this kind or level of QDA. For example having an AI-generated summary of some or all of an interview transcript or a set of field notes could contribute to Wolcott’s descriptive purposes quite well. You might also get additional ideas at the descriptive level than those you may have identified yourself in describing “what is going on” in the material.
The danger we are always now reading about is the AI getting it wrong, making stuff up (“hallucinating” to give it the technical term), or giving the wrong answer. My first thought is always – there is no right answer in a QDA. But in a purely descriptive study in Wolcott’s sense, maybe there is something close to a right answer. Because the AI focuses on the explicit content then generating summaries or using prompts to the AI to question the data at the descriptive level may really be of help. I have comfort in using Gen-AI when a study consists of describing “data as fact” in Wolcott’s terms.
AI and analyzing
Wolcott’s “analysis” means extending a description by identifying and relating “key factors and essential features” into some new conceptual framework– again, regardless of methodology. These “key factors” arise form the wider context of the study and the research questions, not to mention the accumulated research experience of the researcher. These key factors are not objectively embedded in the data. How would Gen-AI decide what is a key factor, and how they should be related? A question then arises: Can Gen-AI usefully even contribute to this task, let alone conduct it alone?
Susanne Friese, on the QUAL-SOFTWARE list suggests it might when she writes:
“You can learn from your data while working hand-in-hand with an Gen-AI-assistant. There is no need to code data first … we can now directly interact with our data through chat – and retrieve information. My suggestion is to do it in a iterative process, learning step by step more about our data.”
Susanne’s proposal for this way of using Gen-AI is an example of a centaur, a “collaborative system where humans and Gen-AI work together, combining human intuition and expertise with Gen-AI’s computational power and data processing capabilities” (see https://www.envisioning.io/vocab/centaur for more on the history of centaurs).
I take this to mean that through a series of prompts, via a chatbot interface, you can direct the Gen-AI to retrieve information about your qualitative data, relevant to the research questions and the prior descriptive phases, but then you alone must identify and relate “key factors and essential features”. As Susanne says “turning it all over to an Gen-AI is very questionable. The way I see it is that the magic (and the benefit) is in the interaction.”
If we take Wolcott’s framework seriously, which I certainly do, we need to consider whether the AI in this centaur process is still operating at the descriptive level with its retrieval in response to our prompts, and providing us new raw material which we work on ourselves non-centaur like; or whether it is actually contributing to “identifying and relating” those “key factors and essential features” in some unknowable way. We read about the extraordinary capabilities of some Gen-AI’s that at this time can’t be explained by humans. Are we happy to include these AI conclusions as valid analytic level contributions to our qualitative studies? A judgement on this critical question affects how you might choose to work with an AI. We look forward to hearing more on this from those who are deeply experimenting with how Gen-AI currently performs.
If it transpires that the AI is operating exclusively at the descriptive level, then pedagogically, our task is to ensure that students restrict Gen-AI to descriptive tasks – very close to the data, and treating the data as fact. The skill here would be in writing chat prompts that restrict the Gen-AI in this way, and the discipline not to go further with Gen-AI in order to save even more time by asking it to try and answer analytic questions – a big ask for many/most students? As Susanne says: “If I let an Gen-AI do all of the analysis, I have no clue and cannot follow. We know that the outcomes are always plausible, but not reliable. Therefore turning it all over to an Gen-AI is very questionable.”
AI and interpreting
However my real concern with Gen-AI comes with interpretation. In Wolcott’s terms, this is “reaching out to make sense, understand or explain beyond the degree of certainty associated with analysis”. It is far from the data, so the links and relevance to the data that produced it have to be made explicit. Intuitively this feels like a wholly human activity for which Gen-AI assistance would be a hindrance, but I have not yet been able to say exactly why.
Interpretation is “reaching out to make sense, understand or explain beyond the degree of certainty associated with analysis”.
Harry Woolcott
One way forward is to use Chiang’s explanation of human creative activity as a continuous series of choices. Analogously, interpretation in qualitative research involves a continuous series of choices from multiple alternative interpretations at every small step forward, a decision tree of enormous complexity that must be made as explicit as possible for the interpretation to be accepted as meaningful and useful.
Each choice is based on the outcome of the previous choice, all of which are based on our accumulated knowledge that we bring to an interpretation, knowledge from previous research projects relevant to the study, the relevant literature, and the accumulation of our experiences as human beings as well as researchers.
For Gen-AI to make a contribution at Wolcott’s interpretation level requires it to make choices and inferences not provided by a human prompt. Those choices will necessarily be based on the AI’s training, which if the AI was trained on the general internet means from texts mostly unrelated to research studies, and, critically, of unknown quality.
How would that process compare to what human researchers do when engaging in interpretational tasks? At first blush it does not seem like Gen-AI as currently understood is up to this task.
Conclusion
AI is clearly going to have a role in how we conduct QDA, but for best outcomes it is going to be different depending on whether we are describing, analyzing or interpreting. Gen-AI will likely have a helpful, direct, and time saving role for descriptive studies, or a centaur role in a disciplined iterative process as described by Susanne Friese when doing analysis. More work is needed to figure out if and how Gen-AI could ever be helpful in interpretive studies.
1 New Yorker 8/31/24 Weekend Essay https://www.newyorker.com/culture/the-weekend-essay/why-ai-isnt-going-to-make-art
2 Wolcott, H. F. (1994). Transforming qualitative data: Description, analysis, and interpretation. Sage.