Assessment for Learning MOOC’s Updates
DATA MINING IN EDUCATION SYSTEM
Educational Data Mining (EDM) stands as a powerful sub-discipline of data science, leveraging sophisticated computational algorithms to analyze large datasets generated within educational environments. As a source of evidence, EDM allows researchers and practitioners to move beyond simple descriptive statistics, providing predictive capabilities and uncovering complex, non-obvious relationships within the learning process (Romero & Ventura, 2010). However, while its ability to process volume and discover patterns is immense, EDM has distinct limitations concerning causal inference and the capture of crucial non-digital factors.
Evidence from Educational Data Mining
A salient area where EDM provides compelling evidence is in predictive modeling for student success and dropout mitigation in online and blended learning contexts.
A systematic mapping study by Silva et al. (2021) synthesizes the application of EDM, confirming its growing use in analyzing data from Virtual Learning Environments (VLEs) to identify and mitigate student abandonment. The core evidence relies on Classification and Regression algorithms. These models are fed transactional data—digital traces such as login frequency, time spent viewing resources, the number of course clicks, and initial assessment scores—collected within the first few weeks of a course. EDM algorithms analyze these behavioral variables to establish a correlation with final outcomes, generating a probabilistic score that predicts whether a student is at high risk of failing or withdrawing (Gasevic et al., 2015).
This evidence is instrumental because it is timely and scalable. Rather than waiting for midterm failures, EDM provides early warnings, enabling institutions to implement targeted, proactive interventions, such as personalized academic counseling or automated alerts, thereby improving retention rates. The evidence here is not merely descriptive ("what happened"), but predictive ("what is likely to happen"), transforming the traditional reactive approach to student support into a proactive one.
What EDM Can Tell Us (The Unveiling Power)
The strengths of EDM lie primarily in its ability to handle immense data volumes and reveal systemic patterns:
Prediction and Early Warning: EDM excels at quantifying the likelihood of future academic events (e.g., passing a course, dropping out). This is its most recognized and impactful capability, providing the empirical foundation for early intervention systems (Romero & Ventura, 2010).
Discovery of Student Patterns: Using techniques like clustering, EDM can categorize students into groups based on their learning behaviors (e.g., "procrastinators," "active but disorganized learners," "early completers") without predefined criteria. This can uncover distinct, often hidden, learning strategies or engagement styles within a single cohort (Romero & Ventura, 2010).
Relationship Mining: EDM can identify complex correlations between disparate variables, such as finding that student interaction with a specific type of optional resource (e.g., supplementary video tutorials) is a stronger predictor of success than simple time spent on required readings. This evidence is crucial for optimizing instructional design (Tempelaar et al., 2015).
Model Assessment and Refinement: It provides evidence on the efficacy of educational software or curriculum features, showing which parts of an educational platform are most utilized by successful learners and which features are ignored, guiding continuous system improvement.
What EDM Cannot Tell Us (The Constraints)
Despite its power, EDM is inherently limited by the nature of its data and its statistical focus, meaning it often fails to provide the full context of the learning process:
Lack of Causal Explanation: EDM establishes correlation, not causation (Gasevic et al., 2015). For instance, a model may predict failure based on a lack of activity, but it cannot explain why the student is inactive (e.g., lack of motivation, sudden illness, technological barriers, or emotional stress). The "why"—the underlying cognitive or affective reason—requires qualitative data and psychological theory that EDM alone cannot access.
Blindness to Non-Digital Context: EDM is restricted to the digital traces it can log (Zhao & Luan, 2006). It cannot capture the impact of crucial non-digital factors, such as the quality of face-to-face classroom interaction, the support of a family environment, a student’s mental health status, or the efficacy of a teacher’s non-verbal communication. These real-world context variables significantly influence learning outcomes but are invisible to the algorithms.
Ethical and Algorithmic Bias: EDM cannot resolve the inherent ethical challenges of using personal data. The findings can be misleading or harmful if the data used to train the model is biased (e.g., if it disproportionately represents students with high socioeconomic status), leading to models that unfairly discriminate or misclassify certain student groups, thereby perpetuating existing inequities (Grover & Mehra, 2008).
Theory Underpinnings (Black Box): While highly accurate, the predictive models are often "black boxes"—complex algorithms where the relationship between input and output is opaque.11 This makes it difficult to use the findings for theory construction or confirmation in the learning sciences, as the evidence often lacks direct explanatory power compared to traditional, theory-driven research (Grover & Mehra, 2008).
References
Gasevic, D., Dawson, S., & Siemens, G. (2015). Let’s not forget about learning: Studying and improving learning as a result of learning analytics. Educational Technology Research and Development, 63(5), 641–653.
Grover, A. C., & Mehra, P. (2008). Overview of data mining’s potential benefits and limitations in education research. Practical Assessment, Research & Evaluation, 13(2), 1–10.
Romero, C., & Ventura, S. (2010). Educational data mining: A review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(6), 601–618.
Silva, J. M. B., Santos, F. R., Alves, L. A., & Rillo, M. C. (2021). Active methodology, educational data mining and learning analytics: A systematic mapping study. Education and Information Technologies, 26(6), 7233–7258.
Tempelaar, D., Rienties, B., & Giesbers, B. (2015). In search of the most informative learning analytics features for timely prediction of student success. Learning and Individual Differences, 38, 64–72.
Zhao, Y., & Luan, J. (2006). Data mining in educational research. Practical Assessment, Research & Evaluation, 11(13), 1–9.

