Research Colloquium
Cecily Heiner
School of Computing
University of Utah
Friday, October 10, 2008
3147 MEB
Lecture 3:40 p.m.
Title: Automatically Classifying the Questions that Students Ask
Abstract
Introductory computer science students ask many questions during open lab hours, and answering them is expensive and time consuming. A system that provides automated answers to even some of the questions could reduce the cost of instruction in these classes. However, published information about the distribution of students questions across topics and other dimensions is too scarce to design a system that provides automatic answers. I will present a preliminary analysis of student questions from an
introductory computer science course, logged automatically when students requested help during open lab consulting hours. The data set consists of more than one hundred questions that introductory computer science students asked while working on four weekly programming assignments. The initial data suggest that student questions can be repetitive in nature and that different students ask different kinds of questions.
The goal of this stage of analysis is to automatically classify questions, so as to identify similar previous questions whose answers could be recycled. The most obvious approach to this problem is to remove standard stop words, reduce the data to vector form, and apply a standard similarity measure such as cosine similarity. Although this approach has been used somewhat successfully in previous work in a tutor-initiated dialog system, it was insufficient for classifying the questions that students ask in open labs. The divergence of results suggest that more research is needed to better understand the fundamental differences between tutor initiated and student initiated dialogue and to design an algorithm to automatically classify student questions.