The growth of original AI-generated content demands that we reconsider methods of student assessment. Clare Williams explores the impact of ChatGPT on pedagogy, identifying three current weaknesses that instructors might bear in mind when setting student assessment questions.
Recently, an instructor from a higher education institution in the United States posted this on Reddit:
Figure 1: Screengrab from Reddit, post by u/mobilepassenger12345 posted to the sub r/Professors
Even before u/mobilepassenger12345 updated their post to confirm that the student’s essay had indeed been written using artificial intelligence (AI), replies were quick to ask whether an algorithm, specifically OpenAI’s latest chat tool, ChatGPT, had been the culprit.
Posts on this blog have explored how AI is reshaping many areas of everyday life, including education, notably by challenging established pedagogies through the creation of original student essays. ChatGPT is the latest iteration of natural language processing (NLP) artificial intelligence (AI) capable of not only holding a realistic conversation with a human, but of generating credible creative content. It has received significant coverage for its ability to pen original stories and poetry, write and debug original computer code, even feign sentience.
And yet, for each claim that ChatGPT can produce profoundly original and human-like content, there is a similar number of posts and articles claiming that the content is flawed, untrustworthy, or simply plain wrong. One conclusion we can take from the current debates is that, despite not being perfect, tools like ChatGPT are improving and are here to stay. Moreover, this is only a small step further than the AI we already expect students to use in their essays, such as spelling and grammar checkers in Microsoft Word, or apps like Grammarly.
What does this mean for pedagogy?
The implications for how we assess student knowledge and understanding are profound though: if we ask students to submit a piece of original work, how can we be sure that it is their own work, and not a (more or less) doctored response scripted by AI?
There are three potential areas that an instructor might keep in mind when designing assessment questions to ensure that the student must answer the question without relying too heavily on AI: current affairs, referencing, and critical analysis. To illustrate, I used a typical short essay prompt, as can be seen in Figure 2 below. The essay prompt is displayed next to the yellow box, with ChatGPT’s response displayed next to the green box, reproduced in full:
Figure 2: Essay prompt and ChatGPT response from December 2022
Critiquing ChatGPT: what to look out for
As we can see from Figure 2, AI tools like ChatGPT still have limitations that instructors can be aware of when setting student essay questions. As ChatGPT is trained on data up to mid- to late-2021 and is not connected to the internet, it currently has no knowledge of the world after that date. In other words, ChatGPT is not performing a quick Google search in the background as it formulates a response, and this gives instructors the possibility of assessing the application of knowledge and understanding to more recent events that lie beyond the capabilities of ChatGPT. The technology also seemingly struggles with in-depth critical analysis demanded by prompts. While it can retrieve, compare, and contrast viewpoints of different authors, it cannot engage deeply with the arguments, again offering instructors a point of entry for assessing their students’ understanding. In the example given above, we might amend the assessment question to ask students to demonstrate their knowledge and understanding of changes in regulatory governance by applying it to ongoing events or current affairs.
Another sure sign that all is not well in a student’s essay is their referencing. While some instructors have found that ChatGPT can offer reliable references for some material, the majority, like u/mobilepassenger12345 posting on Reddit above, seem to find that the algorithm simply invents sources that look convincing to the untrained, non-expert eye, but which look suspicious to anyone with in-depth knowledge of the subject. Additionally, any requirement that references are presented in a different format according to disciplinary norms, for example OSCOLA in law, fall apart as all ChatGPT referencing collapses into a haphazard Harvard style. At the present time, this can be one of the first places to look to check whether AI has been a core element in the production of an essay.
Do we need to worry?
Current affairs, referencing, and critical reasoning seem to offer three key weaknesses of current AI models like ChatGPT that instructors can take advantage of to ensure that the student essay remain a robust and reliable means of assessment. However, this approach is unlikely to be fool proof, and the emergent qualities of foundation models and their downstream applications mean that such weaknesses might no longer be as reliable next year. GPT4 is just around the corner, and likely to be trained on data from interactions with the current iteration. While AI-generated detectors are emerging in response, they lag behind NLP AI and have, so far, not achieved a sufficiently high degree of accuracy to be reliable. Does this mean that the demise of the student essay as a method of assessment is nigh?
The growth of AI-generated original content could mean that the student essay will not be a reliable way of assessing learning for much longer. The impact of this will vary from field to field, but might have particular relevance for degrees that include professional qualifications, like the qualifying law degree (QLD). Professional and regulatory bodies might do well to work with higher education institutions to (re)consider what it means to learn and understand in a world where AI can generate the content for us, and for our students. HEIs might also want to urgently consider the boundaries and borderlands of academic misconduct in an age of AI. If a student uses AI to generate an (original) essay outline and then fleshes out their answer, does this constitute academic misconduct? How much AI is too much?
How can we respond? Instead of banning the use of AI-generated content in education assessment, we might do well to consider harnessing tools like ChatGPT as a starting point for student essays. Students are likely to be graduating into a world that is increasingly reliant on AI, for example, law graduates, will be competing with tools like DoNotPay, “The World’s First Robot Lawyer”. As tools like ChatGPT become embedded in the professional spheres of competencies in which students will work, it seems pointless and even counterintuitive to expect students to avoid AI altogether in their studies. Why not offer the opportunity to work with the technology, learning how to use it safely and constructively? So, for example, a research essay might expect students to use ChatGPT to create an initial essay outline that they then critique and correct, update and extend. One thing is certain: ChatGPT should prompt instructors to revisit how students are assessed if rigorous, robust assessment of knowledge and understanding is to be continued in the era of AI.
The content generated on this blog is for information purposes only. This Article gives the views and opinions of the authors and does not reflect the views and opinions of the Impact of Social Science blog (the blog), nor of the London School of Economics and Political Science. Please review our comments policy if you have any concerns on posting a comment below.
Image Credit: LSE Impact Blog via Canva.
If we value the ability of the essay to assess student’s critical conceptual thinking, then a return to the invigilated seen essay may be required.
Interestingly however, going through the example provided , its interesting to note that the ChatGPT product does not attempt to provide a definition of ‘regulatory governance’, which is the focus of the essay question. It simply describes aspects of ‘trends and change’, with the technology seemingly unable to discern that ‘regulatory governance’ is not a unitary object. Whereas the expectation for students would be that they begin by defining the subject matter of their analysis which is not ‘trends and changes’, but regulatory governance in its multiple forms. Is this a flaw of the technology that we can expect to be remedied in future?
That’s a really fair point Iain, and we should bear in mind that AI tools like ChatGPT do not “understand” anything, but merely string together words based on likely probabilities of what comes next.
Invigilated essays (exams?) are one way of ensuring that students turn in their own work, and this could be a useful short-term stop gap solution, but does not address the point that our students are going to be coming face-to-face with the technology once they graduate and will need to be familiar with navigating its pros and cons in the real world.
You raise a really good point about what we’re after though, and the value and purpose of an essay. Is the essay the end in itself, or a means of testing students’ knowledge and understanding of the core issues?
Thanks for reading!
“Instead of banning the use of AI-generated content in education assessment, we might do well to consider harnessing tools like ChatGPT as a starting point for student essays. … Why not offer the opportunity to work with the technology, learning how to use it safely and constructively? So, for example, a research essay might expect students to use ChatGPT to create an initial essay outline that they then critique and correct, update and extend.”
This sounds good at first, but we have to remember that our ability to “critique, correct, update, and extend” cannot be detached from our writing skills. You can’t be a good editor if you don’t know how to write. And we know that most students need to be become much better writers than they are when they show up for their first day of classes at university. So we must require them to work on their (own) sentences and paragraphs for the most part. We can’t move their starting point to artificially coherent prose drafts.
I agree wholeheartedly about working *with* the technology and reconsidering assessment. But I doubt the weaknesses you outline will survive until next year. There are already AI tools which link to the Internet and can access current information/sources, e.g. https://openai.com/blog/webgpt/. As you say, this is something HEIs should urgently consider.
Great point Carol!
There are also Q&A models that mix a retriever model with a generator model like GPT-X, to retrieve and generate answers from datasets stored in document stores.
This is a good point, and some of the more recent AI posts on this blog have picked up on this. I’m waiting to see what Microsoft manages to do when they integrate ChatGPT into their suite of apps (broswer, Word, Excel, etc). It could be a game changer and mean that current weaknesses outlined here really don’t apply for long. Then again, given this will depend on the integration, I’ll reserve judgment.
ChatGPT 4 though is, apparently, just around the corner, and I would agree that the three points listed here might no longer be relevant once we all have access to that.
You raise some good points and confirm a fair amount of what I suspected. Most NLPs “waffle” rather than support carefully woven arguments. It was true of Meta’s spectacular failure at a science writing NLP and it is true as well of ChatGPT as well. They do not support their arguments and do not reference properly (or have been known to make up references).
Our own coursework in my MSc class this term is a take-home research project with lit review/citations. Consequently we feel safe (for now). In a recent opinion piece on AI, one prof wrote that if he suspected an AI generated essay, he would give a spot oral exam. This seems fair. We work in a world where we should engage new technologies and see their shortcomings. Perhaps there is nothing wrong in a NLP generate homework since, at this stage, it still requires such a great deal of curating and correction that the student really does have to know the subject matter – the NLP merely helps.