[1]袁兴梅,杨明.一种面向不平衡数据的结构化SVM集成算法[J].南京师大学报(自然科学版),2010,33(04):123-127.
 Yuan Xingmei,Yang Ming.A Kind of StASVM Ensemble Algorithm for Unbalanced Data Sets[J].Journal of Nanjing Normal University(Natural Science Edition),2010,33(04):123-127.
点击复制

一种面向不平衡数据的结构化SVM集成算法()
分享到:

《南京师大学报(自然科学版)》[ISSN:1001-4616/CN:32-1239/N]

卷:
第33卷
期数:
2010年04期
页码:
123-127
栏目:
计算机科学
出版日期:
2010-12-20

文章信息/Info

Title:
A Kind of StASVM Ensemble Algorithm for Unbalanced Data Sets
作者:
袁兴梅;杨明;
南京师范大学计算机科学与技术学院, 江苏南京210046 江苏省信息安全保密技术工程研究中心, 江苏南京210046
Author(s):
Yuan XingmeiYang Ming
School of Computer Science and Technology,Nanjing Normal University,Nanjing 210046,China
关键词:
不平衡数据 结构化 支持向量机 集成学习
Keywords:
im ba lanced data structure SVM ensem ble lea rn ing
分类号:
TP181
摘要:
不平衡数据在实际应用中广泛存在,如何处理不平衡数据成为目前一个新的研究热点.鉴于最大间隔思想在很多分类问题中的优越性,将最大间隔思想引入到非平衡分类问题中,使用SVM的方法取得了很好的分类性能.本文在利用类间分布信息的同时,加上类内结构信息,使用结构化的SVM作为基分类器,进行分类集成.实验表明该方法可对不平衡数据进行有效的分类.
Abstract:
Im ba lanced data se ts, a rising perv as ive ly in practica l application, have attracted m ore and mo re attentions. In v iew o f the superior ity o fm ax imum ma rg in in m any classification problem s, w e use it fo r the c lassification of im ba lanced data. U sing suppo rt vectorm achine, a good c lassification pe rfo rm ance can be obta ined. B ased on StASVM, wh ich uses not only be tw een- class inform ation, but also the in-c lass inform ation, w e propose the EStASVM by integ ra ting the obta ined subc lassifie rs induced by StASVM. Exper im enta l results show this ensemb le m ode l can better handle the im ba-l anced problem.

参考文献/References:

[ 1] Batisa Geapa, Pra tirc, M onardM C. A study of the behav io r o f several me thods for ba lanc ing m ach ine learn ing tra in ing data [ J] . ACM SIGKDD Exp lo rations N ew letter, 2004, 6( 1): 20-29.
[ 2] Barande lar, Valdov inos R M, Sanchez J S, e t a.l The imba lanced training sam ple prob lem: unde r or over samp ling [ C ] / / Proc o f In ternationalW orkshops on Str-uctura,l Syntactic, and Statistical Pattern Recogn ition. L isbon, 2004.
[ 3] Chaw lanv, H a lllo, Bowyer K W, e t a.l SMOTE: syn theticm inority oversamp ling techn ique[ J]. Journa l o fA rtific ia l In tell-i genceResearch, 2002, 16( 6): 321-357.
[ 4] FanW e,i Sto lfo S J, Zhang Junx in, et a .l AdaCo st: m isc lassifica tion cost sensitive boosting [ C ] / / Proceedings o f the 16 th Interna tiona l Conference onM achine Learn ing, 1999.
[ 5] Josh iM, Kum ar V, Agarwa l R. Eva luating boosting algorithm s to class ify ra re c lasses: com par ison and improv em ents[ C ] / / Proceedings o f the F irst IEEE Internationa l Con ference on DataM in ing, 2001.
[ 6] Liu Yang, An A ijun, Huang X iang j.i Boo sting pred iction accuracy on im balanced datasets w ith SVM ensem bles[ C ] / / Proceedings of the 10th Pacific-Asia Con ference on Adavances in Know ledg e Discovery and Da taM ining. Ber lin, 2006.
[ 7] Pa zzanim, M erz C, M urphy P, et a.l Reduc ing m isclassification costs[ C] / / Proceed ing s o f the 11th Internationa lConference onM ach ine Learn ing. San Franc isco, 1994. [ 8] Y anm in Sun, M ohamed S Kam e,l Andrew K CW ong, et a.l Cost- sens itive boosting fo r c lassification of imba lanced data[ J]. Pa ttern Recognition, 2007, 40( 2): 3 358-3 378.
[ 9] Chaw la N V. C415 and imba lanced da ta sets: investigating the effect of sam pling m e thod, probab ilistic estim ate, and dec-i sion tree structure[ C ] / / Proceed ings of Interna tiona l Conference onM ach ine Learn ing. W ash ing ton DC, 2003.
[ 10] Cardiec, H ow en. Im prov ingm inor ity c lass pred icting us ing case-spec ific featurew e ighted[ C] / / Proceedings o f the 14th Interna tiona l Conference onM achine Learn ing. San Franc isco, 1997.
[ 11] Zheng Z H, H arir S R. Optima lly comb ining positive and nega tive fea tures for text ca tego rization[ C] / / Proceedings of Interna tiona l Conference onM achine Learn ing. W ash ing ton DC, 2003.
[ 12] Japkow icz N, Stephen S. The class imba lance prob lem: a system atic study[ J]. In telligent Da taAna lysis, 2002, 6( 5): 203- 231.
[ 13] Brefe ld U, Sche ffer T. AUC m ax im izing support vector learn ing [ C] / / Pro ceedings of Internationa l Con ference onM achine Learn ing Wo rkshop on ROC Analysis inM achine Learn ing. Bonn, 2005.
[ 14] Am ar i S, WU S. Im prov ing support vectorm achine c lassifiers by mod ify ing kerne l functions[ J]. N euralN etworks, 1999, 12 ( 6): 783-789.
[ 15] Vapnik V. S tatistical Learn ing Theory[M ]. New York: JohnW iley and Sons, 1998.
[ 16] Wu S H, L in K P, Chen CM. Asymm etric support v ector machines: low fa lse positive learn ing under the user tolerance [ C ] / / Pro ceeding of the 14th ACM SIGK-DD Internationa l Con fe rence on Know ledge D iscov ery and Data M in ing. New York, 2008.
[ 17] 张青青, 陈松灿. 非平衡类的异常检测研究[ D]. 南京: 南京航空航天大学信息科学与技术学院, 2010.
[ 18] Zhou Zh ihua, Li Nan. Mu lt-i in fo rm ation ensemb le d iversity[ C] / / Proceed ing s of the 9th Interna tiona lWo rkshop onM ult-i ple C lassifie r System s. Ca iro Egypt, 2010.
[ 19] TaoDacheng, Tang X iaoou, Li Xue long, e t a.l Asymm etr ic bagg ing and random subspace for support vectorm ach ines-based re levance feedback in im age retr ieva l[ J]. IEEE Transactions on Pattern Ana lys is andM ach ine Inte lligence, 2006, 28( 7): 1 088-1 099.
[ 20] L iu Xuy ing, W u Jianx in, Zhou Zh ihua. Exp lo ra to ry undersam pling for c lass im ba lance learning[ J]. IEEE Transac tions on System s, M an and Cybernetics-part B: Cybernetics, 2009, 39( 2): 539-550.
[ 21] H uang Faliang, X ieGuoq ing, X iao Ru liang. Resea rch on ensem ble learning[ C] / / Proceed ing of the Interna tiona l Conference on A rtific ia l Inte llig ence and Com puta tiona l Inte lligence. Shangha ,i 2009.
[ 22] Yu Lean, W ang Shouyang, K in Keung La.i Investigation of d iversity strateg- ies in SVM ensem ble learn ing[ C ] / / Proceeding s o f the 4th In ternational Conference on N atura l Com puta tion. Jinan, 2008: 39- 42.
[ 23] Yeung D S, W ang D, NgW W Y, et a .l Structured large ma rg in m ach ines: sens- itive to da ta distr ibutions[ J]. M achine Learn ing, 2007, 68( 2) : 171-200.
[ 24] XueH, Chen S, Yang Q. Structural support vector m ach ines[ C ] / / Pro ceedings o f the 15 th In ternational Sym po sium on Neura lNe tw orks. Be ijing, 2008.
[ 25] H eH a ibo, Edw ardoA Garcia. Learn ing form imbalanced data[ J]. IEEE Transactions on Know ledge and Data Eng ineer ing, 2009, 21( 9): 1 263-1 284.

备注/Memo

备注/Memo:
基金项目: 国家自然科学基金( 60873176)、江苏省自然科学基金( BK2008430 ) . 通讯联系人: 杨 明, 博士, 教授, 博士生导师, 研究方向: 数据挖掘, 机器学习, 粗集理论与应用. E-mail:yxmnjnu@ 126. com
更新日期/Last Update: 2013-04-08