Limitations to ChatGPT
Why ChatGPT falls short when answering items on a logical test, and why reducing access to tools such as ChatGPT actually helps test-takers responding to a cognitive test.
Users and test-takers of Master tests have been naturally curious to the impact of ChatGPT assisting test-takers when responding to tests in general, but especially to cognitive tests. Over the last few months, we have received various questions regarding ChatGPT and answering items on our logical tests such as ACE. We too at Master International A/S felt that we needed to understand this sociological tendency, that we are seeing, and therefore, we set to investigate the current state of AI compared to our tests, mainly looking at the accessible tool ChatGPT compared to ACE. The intention of our investigation and thus this paper is to shed light on the limitations and possibilities with ChatGPT in test development, tap into the curiosity of the tool, and hopefully also to answer some of the questions that test-takers and users of our solutions might have to ChatGPT and ACE.
In recent years, artificial intelligence (AI) has made remarkable strides in natural language processing, enabling AI models like ChatGPT to engage in human-like conversations. While ChatGPT possesses an impressive ability to generate coherent responses, it is important to recognize that there are inherent limitations to its logical reasoning capabilities. This article delves into why ChatGPT may struggle to answer logical items on a test, despite its remarkable language proficiency.
A linguistic model
The first element to understand, is that ChatGPT (and all Large Language Models - LLM) is a linguistic model, meaning that ChatGPT is basically a large database of written information, and from the basis of this text data, it has learnt itself to construct sentences from the vast amounts of text data in the database. ChatGPT does so by using statistical models that construct sentences by trying to predict what the most likely next word is. While it excels at understanding and generating human-like language, it does not possess true comprehension or the ability to reason deeply. The model lacks real-world experiences, common sense, and contextual understanding, which are crucial for comprehending complex logical scenarios. As a result, ChatGPT may struggle with nuanced logical questions that require abstract reasoning and critical thinking.
Sensitivity to input phrasing
Inductive reasoning, the ability to generalize from specific examples to broader principles, is a fundamental aspect of logical thinking. While ChatGPT can generate responses based on existing patterns in the data, it does not possess the ability to induce general principles or infer solutions based on limited information. This limitation prevents ChatGPT from tackling complex logical questions that require inductive reasoning, thereby limiting its performance on such test items.
ChatGPT is highly sensitive to the phrasing and structure of input questions, and even a slight rephrasing of the same item can yield different responses, highlighting the linguistic model's lack of robustness in capturing the underlying logic. Unlike human test-takers who can decipher the intent behind a question, ChatGPT relies on patterns and statistical associations in the question it is presented to. Consequently, ChatGPT may struggle o generalize logical concepts across various phrasings, leading to inconsistent or incorrect answers.
Initiatives implemented by Master
Limiting Microsoft Visual Search
When using Edge as browser, the test-taker will under normal circumstances see a small icon on all images on any webpage. If they press the icon, they will search the web for related images. This means that test-takers using Edge to complete ACE and/or CORE could potentially search the web for similar images. Furthermore, they could also potentially be distracted by the icon, which in some cases can influence the responses and therefore the result of the test.
Our investigation has shown that there is a lot of discussions on the web regarding this topic, even shortly after Microsoft released this feature. We have concluded that the impact on test-takers of using Microsoft Visual Search as it is working now is limited, and that there is no immediate threat, as the pictures from the search, for now, are similar but unrelated to ACE or CORE. So, it is more a worry of the Test-taker being distracting when completing the test. Therefore, we have added coding on our test pages to prevent the Microsoft Visual Search icon from showing on our pages.
Limiting right click
Removing the possibility of right-clicking while responding to a test affects especially two actions, that we have identified as possible main disturbers to the test-taker.
First, images are less likely to be copy-pasted into for example a Google-search or similar. This means that valuable time is not being used by the test-taker going through possible similar images in search for help on the test. Limiting time waste such as this is beneficial to the test-taker.
And secondly, limiting copy pasting text is implemented to ensure, that it is more difficult for a test-taker to sit with two screens and copy the text from ACE and pasting it into ChatGPT. This does not completely limit the risk of test-takers using ChatGPT, but it can hopefully make it difficult, and this will lead to the test-taker refraining from the use. Which in the end would also be to their own benefit.
In conclusion, while ChatGPT exhibits impressive language generation capabilities, it faces significant limitations when it comes to answering logical items on a test. As a linguistic model, it lacks true comprehension, reasoning abilities, and contextual understanding, which are vital for accurately responding to complex logical scenarios. The model's vulnerability to ambiguity and its inability to seek clarifications or ask follow-up questions further hinder its performance on nuanced logical questions. Additionally, ChatGPT lacks inductive reasoning skills, making it challenging for the model to generalize logical concepts or infer solutions based on limited information.
Moreover, ChatGPT is highly sensitive to the phrasing and structure of input questions, leading to inconsistent or incorrect answers even with slight rephrasing. This sensitivity to input phrasing highlights the model's lack of robustness in capturing the underlying logic of the questions. Furthermore, the model's inability to process visual information prevents it from effectively answering spatial items or any questions that require visual understanding.
Master International A/S acknowledges these limitations and has taken initiatives to minimize the motivation for test-takers to rely on ChatGPT during their tests. Measures such as limiting the ability to copy-paste text and removing the Microsoft Visual Search icon have been implemented to make it more difficult for test-takers to access external resources and potentially impact their test result.
It is worth mentioning, that future GPT versions (and other similar models) are working towards different solutions, such as plugins and more specific training for various niches, to improve math and logical performance of the models. At the same time, it can be mentioned that the issue of the model making up facts (or "hallucinating" as some call it) continues to be an issue and the AI researchers do not really understand why. Master International A/S follows this development closely.
Ultimately, it is important to recognize that human test-takers still possess an edge in logical reasoning and critical thinking. While ChatGPT can be a valuable tool for various tasks, it falls short when faced with the complexities of logical tests. Understanding the limitations of ChatGPT is crucial for both test users and test-takers, ensuring fair and accurate assessments of logical abilities.
We have published a full White Paper on the mater, where we go even more into depth with the limitations to ChatGPT and how unrestricted ChatGPT can negatively influence the test-takers.
Click here to download: White Paper