A comparison of the utility of data mining algorithms in an open distance learning context


The use of data mining within the higher education context has, increasingly, been gaining traction. A parallel examination of the accuracy, robustness and utility of the algorithms applied within data mining is argued as a necessary step toward entrenching the use of EDM. This paper provides a comparative analysis of various classification algorithms within an Open Distance Learning institution in South Africa. The study compares the performance of the ZeroR, OneR, Naïve Bayes, IBk, Simple Logistic Regression and the J48 in classifying students within a cohort over an 8 year timespan. The initial results appear to show that, given the data quality and structure of the institution under study, the J48 most consistently performed with the highest levels of accuracy.

Author Biographies

A. Fynn, University of South Africa
Department of Psychology
J. Adamiak, University of South Africa
Student Success Unit


Aggarwal, C.C. 2015. Data mining: the textbook. Springer, Switzerland.

Archer, E., Y. B. Chetty and P. Prinsloo. 2014. Benchmarking the habits and behaviours of successful students: A case study of academic-business collaboration. International Review of Research in Open and Distance Learning 15(1): 62–83.

Baker, R. S. and K. Yacef. 2009. The state of educational data mining in 2009: A review and future visions. JEDM-Journal of Educational Data Mining 1(1): 3–17.

Ball, N. M. and R. J. Brunner. 2010. Data Mining and Machine Learning in Astronomy. International Journal of Modern Physics D, 19(7): 1049–1106.

Bell, G., T. Hey and A. Szalay. 2009. Beyond the data deluge. Science 323(5919): 1297–1298.

Bennett, K. P. and E. Parrado-Hernández. 2006. The interplay of optimization and machine learning research. Journal of Machine Learning Research, 7: 1265–1281.

Bramer, M. 2013. Principles of data mining. Springer, London.

Chawla, N. V., K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer. 2002. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16: 321–357.

Clarke, B., E. Fokoue, and H. H. Zhang. 2009. Principles and theory for data mining and machine learning. Springer Science & Business Media.

Devasena, C. L., T. Sumathi, V. V. Gomathi and M. Hemalatha. 2011. Effectiveness evaluation of rule based classifiers for the classification of iris data set. Bonfring International Journal of Man Machine Interface: 1: 5.

Drachsler, H. and W. Greller. 2016. Privacy and Learning Analytics – a DELICATE issue. A Checklist for Trusted Learning Analytics. In 6th International Conference on Learning Analytics and Knowledge, Edinburgh, UK, April 25–29 2016. April 25–2: 89–98.

Drummond, C. 2006. Machine learning as an experimental science (revisited). Proceedings of the Twenty-First National Conference on Artificial Intelligence: Workshop on Evaluation Methods for Machine Learning: 1–5. AAAI Press.

Eynon, R. 2013. The rise of Big Data: what does it mean for education, technology, and media research? Learning, Media and Technology 38(3): 237–240.

Ferguson, R. 2012. Learning analytics: drivers, developments and challenges. International Journal of Technology Enhanced Learning 4(5/6): 304–317.

García, S., J. Luengo, and F. Herrera. 2015. Data preprocessing in data mining. Springer, New York.

Hall, M. A. 1999. Correlation-based feature selection for machine learning. Doctoral dissertation, The University of Waikato.

Hall, M., E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. Witten. 2009. The WEKA data mining software: An update. SIGKDD Explorations. 11(1): 10–18.

Han, J., J. Pei, and M. Kamber. 2011. Data mining: concepts and techniques. Morgan Kaufmann, Waltham.

Hawkins, D. M. 2004. The problem of overfitting. Journal of chemical information and computer sciences 44(1): 1–12.

Holte, R. C. 1993. Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning 11(1): 63–91.

Islam, M. M. and A. Al-Ghassani. 2015. Predicting college math success: Do high school performance and gender matter? Evidence from the Sultan Qaboos University in Oman. International Journal of Higher Education 4: 67–80.

Kantardzic, M. 2011. Data mining: concepts, models, methods, and algorithms. John Wiley & Sons.

Kaur, P., M. Singh and G. S. Josan. 2015. Classification and Prediction Based Data Mining Algorithms to Predict Slow Learners in Education Sector. Procedia Computer Science 57: 500–508.

Kohavi R. and G. H. John. 1997. Wrappers for feature subset selection. Artificial Intelligence 97(1–2): 273–324.

Langley, P. 1988. Machine learning as an experimental science. Machine Learning. 3(1): 5–8.

Ledolter, J. 2013. Data mining and business analytics with R. John Wiley & Sons.

Letseka, M. and K. Karel. 2015. Pass Rates in Open Distance Learning (ODL). Open Distance Learning (ODL) In South Africa: 65.

Molina, M. M., J. M. Luna, C. Romero and S. Ventura. 2012. Meta-Learning Approach for Automatic Parameter Tuning: A Case Study with Educational Datasets. International Educational Data Mining Society.

Moreno, G. and C. R. Stephens. 2015. Applying Data Mining Techniques to Identify Success Factors in Students Enrolled in Distance Learning: A Case Study. In Advances in Artificial Intelligence and Its Applications 1: 208–219.

Nghe, N. T., P. Janecek and P. Haddawy. 2007. A comparative analysis of techniques for predicting academic performance. In 2007 37th Annual Frontiers In Education Conference-Global Engineering: Knowledge Without Borders, Opportunities Without Passports, IEEE.

Pal, J. K. 2011. Usefulness and applications of data mining in extracting information from different perspective, Annals of Library and Information Studies 58: 7–16.

Papamitsiou, Z. and A. A. Economides. 2014. Learning Analytics and Educational Data Mining in Practice: A Systematic Literature Review of Empirical Evidence The research questions, Educational Technology & Society. 17(4): 49–64.

Peña-Ayala, A. 2014. Educational data mining: A survey and a data mining-based analysis of recent works. Expert systems with applications 41(4): 1432–1462.

Piatetsky-Shapiro, G. 2007. Data mining and knowledge discovery 1996 to 2005: overcoming the hype and moving from “university” to “business” and “analytics”. Data Mining and Knowledge Discovery 15(1): 99–105.

Piatetsky-Shapiro G. and G. Parker. 2011. "Lesson: Data Mining, and Knowledge Discovery: An Introduction". Introduction to Data Mining. KD Nuggets. http://www.kdnuggets.com/data_mining_course/index.html (accessed 14 February 2017)

Prinsloo, P., E. Archer, G. Barnes, Y. Chetty and D. van Zyl. 2015. Big(ger) Data as Better Data in Open Distance Learning. International Review of Research in Open and Distributed Learning 16(1): 284–306.

Prinsloo, P., S. Slade and F. Galpin. 2012. Learning analytics: challenges, paradoxes and opportunities for mega open distance learning institutions. In Proceedings of the 2nd International Conference on Learning Analytics and Knowledge. 130–133.

Prinsloo, P. and S. Slade. 2013. An evaluation of policy frameworks for addressing ethical considerations in learning analytics. In 3rd International Conference on Learning Analytics and Knowledge. LAK 2013: 240–244.

Quinlan, J. R. 1987. Simplifying decision trees. International Journal of man-machine studies, 27(3): 221–234.

Quinlan, J. R. 1993. C4. 5: Programming for machine learning. Morgan Kauffmann, Chicago, 38.

Romero, C. and S. Ventura. 2010. Educational data mining: a review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 40(6): 601–618.

Shahiri, A. M. and W. Husain. 2015. A Review on Predicting Student's Performance Using Data Mining Techniques. Procedia Computer Science 72: 414–422.

Sharma M., and M. Mavani. 2011. Accuracy Comparison of Predictive Algorithms of Data Mining: Application in Education Sector. In Advances in Computing, Communication and Control. Springer, Berlin Heidelberg: 189–194.

Simpson, O. 2005. The costs and benefits of student retention for students, institutions and governments. Studies in Learning, Evaluation Innovation and Development 2(3): 34–43.

Simpson, O. 2006. Predicting student success in open and distance learning. Open Learning, 21(2): 125–138.

Simpson, O. 2013. Supporting students for success in online and distance education. Routledge.

Subotzky, G. and P. Prinsloo. 2011. Turning the tide: a socio-critical model and framework for improving student success in open distance learning at the University of South Africa, Distance Education 32(2): 177–193.

Thakar P., A. Mehta and Manisha. 2015. Performance Analysis and Prediction in Educational Data Mining: A Research Travelogue. International Journal of Computer Applications 110(15): 60–68.

Witten, I. H., F. Eibe and M. A. Hall. 2011. Data Mining: Practical Machine Learning Tools and Techniques, Third Edition. Morgan Kaufman Publishers.

Wook, M., Z. M. Yusof and M. Z. A. Nazri. 2016. Educational data mining acceptance among undergraduate students. Education and Information Technologies. April: 1–22.

How to Cite
Fynn, A., and J. Adamiak. 2020. “A Comparison of the Utility of Data Mining Algorithms in an Open Distance Learning Context”. South African Journal of Higher Education 32 (4), 81-95. https://doi.org/10.20853/32-4-2473.
General Articles