Mining Theory-Based Patterns from Big Data: Identifying SRL in MOOCs


Today it’s been one happy day, because today one of the most interesting papers I’ve ever collaborate with has been published :). This is the work of my PhD Student Jorge Maldonado, that have been working together with my Msc student Nicolás Morales, as well as with two other great researchers and good colleagues René Kizilcec and Jorge Munoz-Gama. I will also want to thank the great reviewer process of the Computers and Human Behaviour Journal

In this work we have investigated how Self-Regulated Learning occurs in MOOCs. In particular, we used process mining techniques to mine behavioural patterns from students’ trace data and relate it with current theories of SRL. From my point of view, the most interesting thing is that we are doing an effort to joint theoretical with data-driven actual processes. In words of one of the reviewers:

This work is ‘very close to the ideal’ of synthesizing research culture and methodology between empirical educational/psychological research and computer science

Here you have the abstract and the citation, so that you can enjoy the reading! Please, let us know your comments, since we are really interested on understanding how we can extend this idea.

Abstract. Big data in education offers unprecedented opportunities to support learners and advance research in the learning sciences. Analysis of observed behaviour using computational methods can uncover patterns that reflect theoretically established processes, such as those involved in self-regulated learning (SRL). This research addresses the question of how to integrate this bottom-up approach of mining behavioural patterns with the traditional top-down approach of using validated self-reporting instruments. Using process mining, we extracted interaction sequences from fine-grained behavioural traces for 3458 learners across three Massive Open Online Courses. We identified six distinct interaction sequence patterns. We matched each interaction sequence pattern with one or more theory-based SRL strategies and identified three clusters of learners. First, Comprehensive Learners, who follow the sequential structure of the course materials, which sets them up for gaining a deeper understanding of the content. Second, Targeting Learners, who strategically engage with specific course content that will help them pass the assessments. Third, Sampling Learners, who exhibit more erratic and less goal-oriented behaviour, report lower SRL, and underperform relative to both Comprehensive and Targeting Learners. Challenges that arise in the process of extracting theory-based patterns from observed behaviour are discussed, including analytic issues and limitations of available trace data from learning platforms.


Maldonado, J. J., Pérez-Sanagustín, M.,  Kizilcec, R., Morales, N. Munoz-Gama, J., Mining Theory-Based Patterns from Big Data: Identifying Self-Regulated Learning Strategies in Massive Open Online CoursesComputers and Human Behaviour, (Download Paper)