AI chatbot ChatGPT is still no match for humans when it comes to accounting and while it is a game changer in several fields, the researchers say the AI still has work to do in the realm of accounting.
Microsoft-backed OpenAI has launched its newest AI chatbot product, GPT-4 which uses machine learning to generate natural language text, passed the bar exam with a score in the 90th percentile, passed 13 of 15 advanced placement (AP) exams and got nearly perfect score on the GRE Verbal test.
“It’s not perfect; you’re not going to be using it for everything,” said Jessica Wood, currently a freshman at Brigham Young University (BYU) in the US. “Trying to learn solely by using ChatGPT is a fool’s errand.”
Researchers at BYU and 186 other universities wanted to know how OpenAI’s tech would fare on accounting exams. They put the original version, ChatGPT, to the test.
“We’re trying to focus on what we can do with this technology now that we couldn’t do before to improve the teaching process for faculty and the learning process for students. Testing it out was eye-opening,” said lead study author David Wood, a BYU professor of accounting.
Although ChatGPT’s performance was impressive, the students performed better.
Students scored an overall average of 76.7 percent, compared to ChatGPT’s score of 47.4 percent.
On 11.3 percent of questions, ChatGPT scored higher than the student average, doing particularly well on AIS and auditing.
But the AI bot did worse on tax, financial, and managerial assessments, possibly because ChatGPT struggled with the mathematical processes required for the latter type, said the study published in the journal Issues in Accounting Education.
When it came to question type, ChatGPT did better on true/false questions and multiple-choice questions but struggled with short-answer questions.
In general, higher-order questions were harder for ChatGPT to answer.
“ChatGPT doesn’t always recognize when it is doing math and makes nonsensical errors such as adding two numbers in a subtraction problem or dividing numbers incorrectly,” the study found.
ChatGPT often provides explanations for its answers, even if they are incorrect. Other times, ChatGPT’s descriptions are accurate, but it will then proceed to select the wrong multiple-choice answer.
“ChatGPT sometimes makes up facts. For example, when providing a reference, it generates a real-looking reference that is completely fabricated. The work and sometimes the authors do not even exist,” the findings showed.
That said, authors fully expect GPT-4 to improve exponentially on the accounting questions posed in their study.