《人工智能与数据挖掘教学课件》lect-3-12.ppt

上传人:京东小超市 文档编号:5908835 上传时间:2020-08-15 格式:PPT 页数:26 大小:232KB
返回 下载 相关 举报
《人工智能与数据挖掘教学课件》lect-3-12.ppt_第1页
第1页 / 共26页
《人工智能与数据挖掘教学课件》lect-3-12.ppt_第2页
第2页 / 共26页
亲,该文档总共26页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述

《《人工智能与数据挖掘教学课件》lect-3-12.ppt》由会员分享,可在线阅读,更多相关《《人工智能与数据挖掘教学课件》lect-3-12.ppt(26页珍藏版)》请在三一文库上搜索。

1、4/14/2020,AI&DM,1,Chapter 3 Basic Data Mining Techniques,3.1 Decision Trees (For classification),佐潦臭畏螟迅箭转截两逛哟诬叛常捅潞凋死设锭惊燎券泉跋椭晒倍颂礁端人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,4/14/2020,AI&DM,2,Introduction: ClassificationA Two-Step Process,1. Model construction: build a model that can describe a set

2、of predetermined classes Preparation: Each tuple/sample is assumed to belong to a predefined class, labeled by the output attribute or class label attribute This set of examples is used for model construction: training set The model can be represented as classification rules, decision trees, or math

3、ematical formulae Estimate accuracy of the model The known label of test sample is compared with the classified result from the model Accuracy rate is the percentage of testing set samples that are correctly classified by the model Note: Test set is independent of training set, otherwise over-fittin

4、g will occur 2. Model usage: use the model to classify future or unknown objects,皋膝噎俐肘蛋倦瘟捌育狈馏审阉砂丽增躁戎幌恕钮镣臃铝蛮颈利个傈键届人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,4/14/2020,AI&DM,3,Classification Process (1): Model Construction,Training Data,Classification Algorithms,IF rank = professor OR years 6 THEN t

5、enured = yes,Classifier (Model),坏禾育淌雌饥唐帘自准涛奇螟其犊枫叔蔚干就龋邀移喉路揍驻畴恃蛀馈婚人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,Classification Process (2): Use the Model in Prediction,Classifier,Testing Data,Unseen Data,(Jeff, Professor, 4),Tenured?,纲君啥终寡肃韶舀绥瑚酵哇牲陪屡糙挺魏焙范蝗锰饺锭婶贿温伶益具烁筹人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件le

6、ct-3-12,4/14/2020,AI&DM,5,1 Example (1): Training Dataset,An example from Quinlans ID3 (1986),篓兆掩鼻疲慎坠捍型仓磁遗散考砸皮睫赘米工羊村托钨炉殉芬台乖诊哥膜人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,4/14/2020,AI&DM,6,1 Example (2): Output: A Decision Tree for “buys_computer”,age?,overcast,student?,credit rating?,no,yes,fair,ex

7、cellent,=30,40,no,no,yes,yes,yes,3040,仑强骸头梗谈滑虱弘戌肆异征兵善歇晶寒陨教临织型汐皂侨砧粳碗柯蛛锐人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,4/14/2020,AI&DM,7,2 Algorithm for Decision Tree Building,Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive divide-and-conquer manner At start, all the tra

8、ining examples are at the root Attributes are categorical (if continuous-valued, they are discretized in advance) Examples are partitioned recursively based on selected attributes Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain) Conditions for

9、stopping partitioning All samples for a given node belong to the same class There are no remaining attributes for further partitioning majority voting is employed for classifying the leaf There are no samples left Reach the pre-set accuracy,港哟已诡侣建裤咨戴鳖衅顿坍阔秉密砖毡曝号氰肿粱踏嚣功侮表陆你传舷人工智能与数据挖掘教学课件lect-3-12人工智能与

10、数据挖掘教学课件lect-3-12,4/14/2020,AI&DM,8,Information Gain (信息增益)(ID3/C4.5),Select the attribute with the highest information gain Assume there are two classes, P and N Let the set of examples S contain p elements of class P and n elements of class N The amount of information, needed to decide if an arbit

11、rary example in S belongs to P or N is defined as,意辰晤佑瘦攻霖渡烂缉褪掖箔毯吃煌痔体啡狐扁楷毗掘胆绘镶稳洋管篡卢人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,4/14/2020,AI&DM,9,Information Gain in Decision Tree Building,Assume that using attribute A, a set S will be partitioned into sets S1, S2 , , Sv If Si contains pi examples of

12、 P and ni examples of N, the entropy (熵), or the expected information needed to classify objects in all subsets Si is The encoding information that would be gained by branching on A,捐滁邮埂沉堡叉师务怠河撕碳涌遁疑购侮耘邹鸦伤输惊覆夜檬翌稠鹤碰烟人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,4/14/2020,AI&DM,10,Attribute Selection by

13、 Information Gain Computation,Class P: buys_computer = “yes” Class N: buys_computer = “no” I(p, n) = I(9, 5) =0.940 Compute the entropy for age:,Hence Similarly,= 0.940-0.69=0.25,蜜详煮赌沂尔佣叁针待钨侦岿霜雌而卤证兆盗毕骏货振泛舒叙梭她郡伐和人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,4/14/2020,AI&DM,11,3. Decision Tree Rules,Au

14、tomate rule creation Rules simplification and elimination A default rule is chosen,环匿鹤擦疏只铬南梆责腿蹿猫菌藩洱棍违看琴妇播忧赎卖袄登万傣共斧抵人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,4/14/2020,AI&DM,12,3.1 Extracting Classification Rules from Trees,Represent the knowledge in the form of IF-THEN rules One rule is created f

15、or each path from the root to a leaf Rules are easier for humans to understand Example IF age = “40” AND credit_rating = “excellent” THEN buys_computer = “yes” IF age = “40” AND credit_rating = “fair” THEN buys_computer = “no”,屈奔低措航缮涧台雷媒诧杨肤瘴啄开缎瞒久升隙诧霞钮矾屋娠描档捶袍序人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3

16、-12,4/14/2020,AI&DM,13,A Rule for the Tree in Figure 3.4,IF Age =43 & Sex = Male & Credit Card Insurance = No THEN Life Insurance Promotion = No (accuracy = 75%, Figure 3.4),A Simplified Rule Obtained by Removing Attribute Age,IF Sex = Male & Credit Card Insurance = No THEN Life Insurance Promotion

17、= No (accuracy = 83.3% (5/6), Figure 3.5),3.2 Rules simplification and elimination,找猴入胎招酿巍滨爵伙慑驴般席呵怜来愿扁疗龄态嫁赚洞愚近钩缠仰霸弦人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,Figure 3.4 A three-node decision tree for the credit card database,Figure 3.5 A two-node decision tree for the credit card database,裁猎措亦迈况傲奥

18、剔爪抨咽奏藩乙旷詹产故泵酱甘强枫酝嘎其缀琵婆辱续人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,4/14/2020,AI&DM,15,奈钻沪倒邵涂婆艳盼菲迹敬烽婪臆巫篮垣搅莉靳网总坍调耀乔认甸梢婶傀人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,4/14/2020,AI&DM,16,4. Further discussion,Attributes with more values accuracy / splits GainRatio(A) = Gain(A) / SplitInfo(A) Numerical

19、attributes binary split Stopping condition More than 2 values Other Methods for building decision trees ID3 C4.5 CART CHAID,耶约骆疾簿才蝗敢埠代限座嗜自条故沛拆稳似揉焙降搞偏旅仙复恩去捂卤人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,4/14/2020,AI&DM,17,5. General consideration: Advantages of Decision Trees,Easy to understand. Map n

20、icely to a set of production rules. Applied to real problems. Make no prior assumptions about the data. Able to process both numerical and categorical data.,弛言久诬倾姜儒仕腹摆礼尸尿祟宪沈格峰哭境金烯倡巧罕严恬味咒毕戍漾人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,4/14/2020,AI&DM,18,Disadvantages of Decision Trees,Output attribut

21、e must be categorical. Limited to one output attribute. Decision tree algorithms are unstable. Trees created from numeric datasets can be complex.,餐早坯仓皂鹤持鼻汉欺协读贷芜锐牌邦百革勇彻酋潜蹬粟钙看跌秸勺肚咋人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,4/14/2020,AI&DM,19,Decision Tree Attribute Selection,Appendix C,或聘粕甘砍旁聘网心住森办

22、栓恨方羔家儒若迅狸安寿卡袖伍柳喇诣峭墩供人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,Equation C.1,Computing Gain Ratio,梆梗桑谨佩瑞淀胆坛淹派鼓赐茵济绸坯奉韭玛筐搞阶哎上嘻涕浦瑞辕段缺人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,Equation C.2,Computing Gain(A),社即山塘艇翅赣惨马疙痊佃沮寡两弱工沧喉揽先馆蓑睛吓肆弘且仍癣吵陨人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,Equation C.3,Com

23、puting Info(I),敛涟猛恨吏辈酉诚喊胎群伤略奏制关撵废脸抢触漳闻瑟吕析民胁讫跳迷恕人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,Equation C.4,Computing Info(I,A),墒拴甲拨桥照挤帅则管煽摧注账棉阎菌涂淘恤衡僵展铱卉佰讥究激醇邵气人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,Equation C.5,Computing Split Info(A),怕访哪久猜均祝啼第厄傻嚎乔总裂馆灯抛蚕婿哼弃巳冀计丹骡焕丑溪酷照人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,翠胸铆兽泄赛示砂斥兢敬肥创貌蹄闹威腰儒胁臀钩晨书冠搬紧孵琶煤商诈人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,Figure C.1 A partial decision tree with root node = income range,停惯眠坡影霍筑介配摘孙桂轩粮乖藕女搂场恢厕斜羌沼颁网娘旷掀延附蠢人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 其他


经营许可证编号:宁ICP备18001539号-1