[1]赵宇奔,王鑫宁,李 崇.基于K-XGBoost融合模型的高校学生学情预测研究[J].南京师大学报(自然科学版),2023,46(03):89-97.[doi:10.3969/j.issn.1001-4616.2023.03.012]
 Zhao Yuben,Wang Xingning,Li Chong.Research on Undergraduate Academic Prediction Based on K-XGBoost Fusion Model[J].Journal of Nanjing Normal University(Natural Science Edition),2023,46(03):89-97.[doi:10.3969/j.issn.1001-4616.2023.03.012]
点击复制

基于K-XGBoost融合模型的高校学生学情预测研究()
分享到:

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

卷:
第46卷
期数:
2023年03期
页码:
89-97
栏目:
计算机科学与技术
出版日期:
2023-09-15

文章信息/Info

Title:
Research on Undergraduate Academic Prediction Based on K-XGBoost Fusion Model
文章编号:
1001-4616(2023)03-0089-09
作者:
赵宇奔1王鑫宁2李 崇1
(1.中国海洋大学工程学院,山东 青岛 266100)
(2.中国海洋大学基础教学中心,山东 青岛 266100)
Author(s):
Zhao Yuben1Wang Xingning2Li Chong1
(1.College of Engineering,Ocean University of China,Qingdao 266100,China)
(2.Teaching Center of Fundamental Courses,Ocean University of China,Qingdao 266100,China)
关键词:
K-XGBoost学情预测数据挖掘机器学习集成学习
Keywords:
K-XGBoost academic performance prediction data mining machine learning ensemble learning
分类号:
TP181
DOI:
10.3969/j.issn.1001-4616.2023.03.012
文献标志码:
A
摘要:
高精准的学情预测是提升高校教学水平促进教学改革的重要技术手段. 目前学情预测存在数据维度单一和数据结构不平衡等问题,降低了预测模型的准确性与泛化能力. 为此,本文提出了K-XGBoost学情预测融合模型. 首先,该模型通过精准特征提取与重构,构建基于高校教务处数据库的多维度学情特征集; 其次,设计基于最小2-范数的聚类算法,创新性地建立无监督数据平衡化机制; 最后,基于损失函数优化的XGBoost集成学习方法设计学情预测模块,构建高准确性和高泛化能力的K-XGBoost学情预测融合算法. 实验结果表明,K-XGBoost 多个子类模型的预测值均较好地逼近真实值,可将成绩预测结果的平均绝对误差(MAE)和均方根误差(RMSE)相较基线XGBoost模型分别降低了76.19%、85.33%,显著提升了学情预测的准确性和泛化能力.
Abstract:
High-precision prediction of academic conditions is an important technical means to improve the teaching level of colleges and promote teaching reform. At present,there are problems such as single data dimension and unbalanced data structure in academic prediction,which reduces the accuracy and generalization ability of the prediction model. To the end,this paper proposes a K-XGBoost academic situation prediction fusion model. Firstly,through accurate feature extraction and reconstruction,the model constructs a multi-dimensional set of academic features based on the database of the Academic Affairs Office of the University. Secondly,the clustering algorithm based on the minimum 2-norm is designed,and the unsupervised data balancing mechanism is innovatively established. Finally,the XGBoost integrated learning method based on loss function optimization designs the academic situation prediction module,and constructs a K-XGBoost learning situation prediction fusion algorithm with high accuracy and high generalization ability. The experimental results show that the predicted values of K-XGBoost models can well approximate the real values,and the MAE and RMSE of performance prediction results are reduced by 76.19% and 85.33% respectively compared with XGBoost models,which significantly improves the accuracy and generalization ability of the academic performance prediction.

参考文献/References:

[1]陈桂香. 大数据对我国高校教育管理的影响及对策研究[D]. 武汉:武汉大学,2017.
[2]RAY S,SAEED M. Applications of educational data mining and learning analytics tools in handling big data in higher education[M]//Applications of big data analytics. Switzerland:Springer,Cham,2018:135-160.
[3]YANG F,LI F W B. Study on student performance estimation,student progress analysis,and student potential prediction based on data mining[J]. Computers & education,2018,123:97-108.
[4]SALAL Y K,ABDULLAEV S M,KUMAR M. Educational data mining:Student performance prediction in academic[J]. International journal of engineering and advanced technology,2019,8(4C):54-59.
[5]聂秀山,马玉玲,乔慧妍,等. 任务粒度视角下的学生成绩预测研究综述[J]. 山东大学学报(工学版),2022,52(2):1-14.
[6]HUSSAIN M,ZHU W,ZHANG W,et al. Using machine learning to predict student difficulties from learning session data[J]. Artificial intelligence review,2019,52(1):381-407.
[7]MUEEN A,ZAFAR B,MANZOOR U. Modeling and predicting students' academic performance using data mining techniques[J]. International journal of modern education & computer science,2016,8(11):36-42.
[8]FRANCIS B K,BABU S S. Predicting academic performance of students using a hybrid data mining approach[J]. Journal of medical systems,2019,43(6):1-15.
[9]MARBOUTI F,DIEFES-DUX H A,MADHAVAN K. Models for early prediction of at-risk students in a course using standards-based grading[J]. Computers & education,2016,103:1-15.
[10]TOMASEVIC N,GVOZDENOVIC N,VRANES S. An overview and comparison of supervised data mining techniques for student exam performance prediction[J]. Computers & education,2020,143:103676.
[11]KIM B H,VIZITEI E,GANAPATHI V. GritNet:Student performance prediction with deep learning[J/OL]. arXiv Preprint,2018. 10.48550/arXiv:1804.07405.
[12]LIU Q,HUANG Z,YIN Y,et al. Ekt:Exercise-aware knowledge tracing for student performance prediction[J]. IEEE transactions on knowledge and data engineering,2019,33(1):100-115.
[13]WAHEED H,HASSAN S U,ALJOHANI N R,et al. Predicting academic performance of students from VLE big data using deep learning models[J]. Computers in human behavior,2020,104:106189.
[14]MONTAVON G,SAMEK W,MüLLER K R. Methods for interpreting and understanding deep neural networks[J]. Digital signal processing,2018,73:1-15.
[15]PANDEY M,TARUNA S. A comparative study of ensemble methods for students' performance modeling[J]. International journal of computer applications,2014,103(8):26-32.
[16]AMRIEH E A,HAMTINI T,ALJARAH I. Mining educational data to predict student's academic performance using ensemble methods[J]. International journal of database theory and application,2016,9(8):119-136.
[17]BATOOL S,RASHID J,NISAR M W,et al. A random forest students' performance prediction(rfspp)model based on students' demographic features[C]//2021 Mohammad Ali Jinnah University International Conference on Computing(MAJICC). USA:IEEE,2021:1-4.
[18]AHMED D M,ABDULAZEEZ A M,ZEEBAREE D Q,et al. Predicting university's students performance based on machine learning techniques[C]//2021 IEEE International Conference on Automatic Control & Intelligent Systems(I2CACIS). New York,USA:IEEE,2021:276-281.
[19]DUAN D,DAI C,TU R. Research on the prediction of students' academic performance based on XGBoost[C]//2021 Tenth International Conference of Educational Innovation through Technology(EITT). New York,USA:IEEE,2021:316-319.
[20]张新玉. 类不平衡数据分类关键技术研究[D]. 武汉:武汉大学,2021.
[21]GHORBANI R,GHOUSI R. Comparing different resampling methods in predicting students' performance using machine learning techniques[J]. IEEE access,2020,8:67899-67911.
[22]ARTHUR D,VASSILVITSKII S. k-means++:The advantages of careful seeding[C]//Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithm. New Orleans:SIAM,2006:1027-1035.
[23]CHEN T,GUESTRIN C. Xgboost:A scalable tree boosting system[C]//Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. California. USA:ACM,2016:785-794.

备注/Memo

备注/Memo:
收稿日期:2022-07-29.
基金项目:中央高校基本科研业务费专项(202213016)、山东省自然科学基金项目(ZR201910230031)、2022年度青岛市社会科学规划研究项目(QDSKL2201014).
通讯作者:王鑫宁,博士,讲师,研究方向:智能信息处理,大数据分析与可视化. E-mail:wangxinning@ouc.edu.cn
更新日期/Last Update: 2023-09-15