Evaluation of ChatGPT: Uncovering Visuospatial Learning Capabilities and Advancements from Version 3.5 to 4.0
π Contact Information
Stiles-Nicholson Brain Institute
777 Glades Road
Boca Raton, FL 33431-0991
S.E. Wimberly Library
Sandbox
π Project Leader
Belle Krubitski
π Collaborators
π Supervisors
π Project Description
In recent times, large language models (LLMs) such as ChatGPT have gained significant attention and recognition for their diverse capabilities, particularly in the realm of commonsense reasoning. This paper focuses on engaging in a dialogue with different versions of ChatGPT, specifically 3.5, 4.0, as well as 4.0 with advanced data analysis, in the context of visuospatial learning. As these models consistently show impressive performance on established commonsense reasoning benchmarks, there is a growing interest in exploring an unconventional approach to evaluating them. This alternative evaluation method aims to uncover weaknesses and clarify the systemβs constraints. Engaging in a constructive dialogue with the system presents an opportunity to assess its coherence and achieve a deeper understanding of its limitations. In our research, we delve into qualitative investigations employing this dialectical evaluation approach, with a specific emphasis on spatial reasoning, a fundamental aspect of commonsense reasoning. Recognizing the existing constraints of ChatGPT is essential to grasp its prospects for improvement. Additionally, this paper aims to showcase the significant improvements to ChatGPT over a mere five months between the release of 3.5 and 4.0. In summary, this paper provides recommendations for future efforts aimed at enhancing the capabilities of language models, probes the visuospatial capabilities between the previous and latter versions of ChatGPT, and showcases the improvements between ChatGPT 3.5 and 4.0.