Machine learning for feedback in massive open online courses
AffiliationComputing and Information Systems
Document TypePhD thesis
Access StatusOpen Access
© 2016 Dr. Jiazhen He
Massive Open Online Courses (MOOCs) have received widespread attention for their potential to scale higher education, with multiple platforms such as Coursera, edX and Udacity recently appearing. Online courses from elite universities around the world are offered for free, so that anyone with internet access can learn anywhere. Enormous enrolments and diversity of students have been widely observed in MOOCs. Despite their popularity, MOOCs are limited in reaching their full potential by a number of issues. One of the major problems is the notoriously low completion rates. A number of studies have focused on identifying the factors leading to this problem. One of the factors is the lack of interactivity and support. There is broad agreement in the literature that interaction and communication play an important role in improving student learning. It has been indicated that interaction in MOOCs helps students ease their feelings of isolation and frustration, develop their own knowledge, and improve learning experience. A natural way of improving interactivity is providing feedback to students on their progress and problems. MOOCs give rise to vast amounts of student engagement data, bringing opportunities to gain insights into student learning and provide feedback. This thesis focuses on applying and designing new machine learning algorithms to assist instructors in providing student feedback. In particular, we investigate three main themes: i) identifying at-risk students not completing courses as a step towards timely intervention; ii) exploring the suitability of using automatically discovered forum topics as instruments for modelling students' ability; iii) similarity search in heterogeneous information networks. The first theme can be helpful for assisting instructors to design interventions for at-risk students to improve retention. The second theme is inspired by recent research on measurement of student learning in education research communities. Educators explore the suitability of using latent complex patterns of engagement instead of traditional visible assessment tools (e.g. quizzes and assignments), to measure a hypothesised distinctive and complex learning skill of promoting learning in MOOCs. This process is often human-intensive and time-consuming. Inspired by this research, together with the importance of MOOC discussion forums for understanding student learning and providing feedback, we investigate whether students' participation across forum discussion topics can indicate their academic ability. The third theme is a generic study of utilising the rich semantic information in heterogeneous information networks to help find similar objects. MOOCs contain diverse and complex student engagement data, which is a typical example of heterogeneous information networks, and so could benefit from this study. We make the following contributions for solving the above problems. Firstly, we propose transfer learning algorithms based on regularised logistic regression, to identify students who are at risk of not completing courses weekly. Predicted probabilities with well-calibrated and smoothed properties can not only be used for the identification of at-risk students but also for subsequent interventions. We envision an intervention that presents probability of success/failure to borderline students with the hypothesis that they can be motivated by being classified as "nearly there". Secondly, we combine topic models with measurement models to discover topics from students' online forum postings. The topics are enforced to fit measurement models as statistical evidence of instruments for measuring student ability. In particular, we focus on two measurement models, the Guttman scale and the Rasch model. To the best our knowledge, this is the first study to explore the suitability of using discovered topics from MOOC forum content as instruments for measuring student ability, by combining topic models with psychometric measurement models in this way. Furthermore, these scaled topics imply a range of difficulty levels, which can be useful for monitoring the health of a course and refining curricula, student assessment, and providing personalised feedback based on student ability levels and topic difficulty levels. Thirdly, we extend an existing meta path-based similarity measure by incorporating transitive similarity and temporal dynamics in heterogeneous information networks, evaluated using the DBLP bibliographic network. The proposed similarity measure might apply to MOOC settings to find similar students or threads, or thread recommendation in MOOC forums, by modelling student interactions in MOOC forums as a heterogeneous information network.
Keywordsmachine learning; topic modelling; matrix factorisation; logistic regression; MOOCs; psychometrics; item response theory; rasch model; similarity search; heterogeneous information network
- Click on "Export Reference in RIS Format" and choose "open with... Endnote".
- Click on "Export Reference in RIS Format". Login to Refworks, go to References => Import References