Chapter 3 Basic Data Mining Techniques

3.1 Decision Trees (For classification)

Introduction: ClassificationA Two-Step Process

1. Model construction: build a model that can describe a set

2、of predetermined classes Preparation: Each tuple/sample is assumed to belong to a predefined class, labeled by the output attribute or class label attribute This set of examples is used for model construction: training set The model can be represented as classification rules, decision trees, or math

will occur 2. Model usage: use the model to classify future or unknown objects

Classification Process (1): Model Construction

Training Data

Classification Algorithms

IF rank = professor OR years 6 THEN tenured = yes

Classifier (Model)

Classification Process (2): Use the Model in Prediction

Classifier

Testing Data

Unseen Data

(Jeff, Professor, 4)

Tenured?

1 Example (1): Training Dataset

An example from Quinlans ID3 (1986)

1 Example (2): Output: A Decision Tree for "buys_computer"

age?

overcast

student?

credit rating?

no

yes

fair

excellent

=30

40

no

no

yes

yes

yes

3040

8、ining examples are at the root Attributes are categorical (if continuous-valued, they are discretized in advance) Examples are partitioned recursively based on selected attributes Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain) Conditions for

stopping partitioning All samples for a given node belong to the same class There are no remaining attributes for further partitioning majority voting is employed for classifying the leaf There are no samples left Reach the pre-set accuracy

Information Gain (信息增益)(ID3/C4.5)

Select the attribute with the highest information gain Assume there are two classes, P and N Let the set of examples S contain p elements of class P and n elements of class N The amount of information, needed to decide if an arbitrary example in S belongs to P or N is defined as

Information Gain in Decision Tree Building

Assume that using attribute A, a set S will be partitioned into sets S1, S2 , , Sv If Si contains pi examples of P and ni examples of N, the entropy (熵), or the expected information needed to classify objects in all subsets Si is The encoding information that would be gained by branching on A

Attribute Selection by Information Gain Computation

Class P: buys_computer = "yes" Class N: buys_computer = "no" I(p, n) = I(9, 5) =0.940 Compute the entropy for age:

Hence Similarly

= 0.940-0.69=0.25

3. Decision Tree Rules

Automate rule creation Rules simplification and elimination A default rule is chosen

3.1 Extracting Classification Rules from Trees

Represent the knowledge in the form of IF-THEN rules One rule is created for each path from the root to a leaf Rules are easier for humans to understand Example IF age = "40" AND credit_rating = "excellent" THEN buys_computer = "yes" IF age = "40" AND credit_rating = "fair" THEN buys_computer = "no"

A Rule for the Tree in Figure 3.4

IF Age =43 & Sex = Male & Credit Card Insurance = No THEN Life Insurance Promotion = No (accuracy = 75%, Figure 3.4)

A Simplified Rule Obtained by Removing Attribute Age

IF Sex = Male & Credit Card Insurance = No THEN Life Insurance Promotion = No (accuracy = 83.3% (5/6), Figure 3.5)

3.2 Rules simplification and elimination

Figure 3.4 A three-node decision tree for the credit card database

Figure 3.5 A two-node decision tree for the credit card database

17、= No (accuracy = 83.3% (5/6), Figure 3.5),3.2 Rules simplification and elimination,找猴入胎招酿巍滨爵伙慑驴般席呵怜来愿扁疗龄态嫁赚洞愚近钩缠仰霸弦人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,Figure 3.4 A three-node decision tree for the credit card database,Figure 3.5 A two-node decision tree for the credit card database,裁猎措亦迈况傲奥

4. Further discussion

Attributes with more values accuracy / splits GainRatio(A) = Gain(A) / SplitInfo(A) Numerical attributes binary split Stopping condition More than 2 values Other Methods for building decision trees ID3 C4.5 CART CHAID

5. General consideration: Advantages of Decision Trees

Easy to understand. Map nicely to a set of production rules. Applied to real problems. Make no prior assumptions about the data. Able to process both numerical and categorical data.

Disadvantages of Decision Trees

Output attribute must be categorical. Limited to one output attribute. Decision tree algorithms are unstable. Trees created from numeric datasets can be complex.

Decision Tree Attribute Selection

Appendix C

Equation C.1

Computing Gain Ratio

Equation C.2

Computing Gain(A)

Equation C.3

Computing Info(I)

Equation C.4

Computing Info(I,A)

Equation C.5

Computing Split Info(A)

Figure C.1 A partial decision tree with root node = income range

23、puting Info(I),敛涟猛恨吏辈酉诚喊胎群伤略奏制关撵废脸抢触漳闻瑟吕析民胁讫跳迷恕人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,Equation C.4,Computing Info(I,A),墒拴甲拨桥照挤帅则管煽摧注账棉阎菌涂淘恤衡僵展铱卉佰讥究激醇邵气人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,Equation C.5,Computing Split Info(A),怕访哪久猜均祝啼第厄傻嚎乔总裂馆灯抛蚕婿哼弃巳冀计丹骡焕丑溪酷照人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,翠胸铆兽泄赛示砂斥兢敬肥创貌蹄闹威腰儒胁臀钩晨书冠搬紧孵琶煤商诈人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,Figure C.1 A partial decision tree with root node = income range,停惯眠坡影霍筑介配摘孙桂轩粮乖藕女搂场恢厕斜羌沼颁网娘旷掀延附蠢人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,


