R语言朴素贝叶斯分类上机指导.doc

资源描述

1、朴素贝叶斯分类上机指导一、实验目的：1 .掌握矩阵数据的输入，输出，以及矩阵和数据框数据的转换，认识一下list型数据。2 .理解并掌握朴素贝叶斯分类原理。3 .会使用klaR包中的NaiveBayes雨数实现贝叶斯分类算法。二、实验内容：本实验利用朴素贝叶斯分类方法对课本P144页playtennis数据集建立模型并预测。三、实验步骤：1.将课本P144页playtennis数据以矩阵形式输入，注意理解下面红色代码。data-matrix(c(sunny,hot,high,weak,no,“sunny,hot,high,strong,no,“overcast,hot,high,weak,ye

2、s,rain,mild,high,weak,yes,rain,cool,normal,weak,yes,rain,cool,normal,strong,no,“overcast,cool,normal,strong,yes,“sunny,mild,high,weak,no,“sunny,cool,normal,weak,yes,rain,mild,normal,weak,yes,“sunny,mild,normal,strong,yes,“overcast,mild,high,strong,yes,“overcast,hot,normal,weak,yes,rain,mild,high,str

3、ong,no),byrow=TRUE,dimnames=list(day=c(),condition=c(outlook,temperature,“humidity,wind,playtennis),nrow=14,ncol=5);# 上网查询dimnames的用法# 输出一下data数据:# 将矩阵转化成数据框# data1write.table(data1,file=playtennis.txt,sep=)# 可以重新读入保存的txt文件data2-read.table(playtennis.txt,head=TRUE)2.理解并掌握朴素贝叶斯分类原理，读懂下列代码# 算出去玩与不玩的先验

4、概率prior.yes-sum(data2,5=yes)/length(data2,5);prior.no-sum(data2,5=no)/length(data2,5);# 建立朴素贝叶斯分类函数bayespre-function(condition)post.yes-sum(data2,1=condition1)&(data2,5=yes)/sum(data2,5=yes)*sum(data2,2=condition2)&(data2,5=yes)/sum(data2,5=yes)*sum(data2,3=condition3)&(data2,5=yes)/sum(data2,5=yes)

5、sum(data2,4=condition4)&(data2,5=yes)/sum(data2,5=yes)*prior.yes;post.no=post.no,yes,no);# 利用建立的朴素贝叶斯函数做预测bayespre(c(rain,hot,high,strong)$prob.yes10.005291005$prob.no10.02742857$prediction1 no#这些结果是多少？用函数bayespre()算一下，你能手动算出来吗？把你的算式写出来bayespre(c(sunny,mild,normal,weak)bayespre(c(overcast,mild,norm

6、al,weak)bayespre(c(sunny,cool,high,strong)3使用klaR包中的NaiveBayes(璃数实现贝叶斯分类算法NaiveBayes()函数的语法和参数如下：NaiveBayes(formula,data,，subset,na.action=na.pass)NaiveBayes(x,grouping,prior,usekernel=FALSE,fL=0,)formula指定参与模型计算的变量，以公式形式给出，类似于y=x1+x2+x3;data用于指定需要分析的数据对象；na.action指定缺失值的处理方法，默认情况下不将缺失值纳入模型计算，也不会发生报错

7、信息，当设为“na.omit”时则会删除含有缺失值的样本；x指定需要处理的数据，可以是数据框形式，也可以是矩阵形式；grouping为每个观测样本指定所属类别；prior可为各个类别指定先验概率，默认情况下用各个类别的样本比例作为先验概率；usekernel指定密度估计的方法(在无法判断数据的分布时，采用密度密度估计方法)，默认情况下使用标准的密度估计，设为TRUE时，则使用核密度估计方法；fL指定是否进行拉普拉斯修正，默认情况下不对数据进行修正，当数据量较小时，可以设置该参数为1,即进行拉普拉斯修正。# 通过抽样建立训练样本和测试样本index-sample(2,size=nrow(iris

8、),replace=TRUE,prob=c(0.75,0.25)train-irisindex=1,test-irisindex=2,# 加载R包并使用朴素贝叶斯算法library(MASS)library(klaR)# 因子化train$Species-as.factor(train$Species)res2-NaiveBayes(Species.,data=train)pre-predict(res2,newdata=test,1:4)#生成实际与预判交叉表和预判精度table(test$Species,pre$class)1231 10002 0923 0011sum(diag(tabl

9、e(test$Species,pre$class)/sum(table(test$Species,pre$class)0.9375读懂上面的例子。仿照上面的例子使用klaR包中的NaiveBayes(眄数建立playtennis.txt贝叶斯分类模型，并预测上面的例子。c(rain,hot,high,strong)c(sunny,mild,normal,weak)c(overcast,mild,normal,weak)c(sunny,cool,high,strong)#将上面的情况建立dataframe作为测试题test-data.frame()test-rbind(c(rain,hot,high,strong),c(sunny,mild,normal,weak),c(overcast,mild,normal,weak),c(sunny,cool,high,strong)res2-NaiveBayes(playtennis.,data=data2)pre-predict(res2,newdata=test)

展开阅读全文