论文(设计)-基于决策树的递归包分类算法36267.doc

上传人:椰子壳 文档编号:3967726 上传时间:2019-10-11 格式:DOC 页数:6 大小:253.02KB
返回 下载 相关 举报
论文(设计)-基于决策树的递归包分类算法36267.doc_第1页
第1页 / 共6页
论文(设计)-基于决策树的递归包分类算法36267.doc_第2页
第2页 / 共6页
论文(设计)-基于决策树的递归包分类算法36267.doc_第3页
第3页 / 共6页
论文(设计)-基于决策树的递归包分类算法36267.doc_第4页
第4页 / 共6页
论文(设计)-基于决策树的递归包分类算法36267.doc_第5页
第5页 / 共6页
点击查看更多>>
资源描述

《论文(设计)-基于决策树的递归包分类算法36267.doc》由会员分享,可在线阅读,更多相关《论文(设计)-基于决策树的递归包分类算法36267.doc(6页珍藏版)》请在三一文库上搜索。

1、专业好文档文章编号:基于决策树的递归包分类算法张艳军1,2,陈友1,2, 郭莉1, 程学旗1(1. 中国科学院计算技术研究所, 北京 100080; 2. 中国科学院研究生院, 北京 100039)摘要: 包分类速度已经成为网络传输的瓶颈,提高算法性能是解决传输瓶颈的必然要求.该文提出了一种新的包分类算法SRC(Sensitive Recursive Classification).它建立在决策树基础之上.在以FW,ACL为种子的规则库中进行实验, 结果表明:SRC内存使用比Hicuts减少3到10倍,最坏查找速度比Hicuts提高5倍以上;SRC的内存使用比EGT-PC减少2到8倍,最坏查找

2、速度比EGT-PC提高4倍以上.关 键 词: 包分类; 决策树; 映射中图分类号: 文献标识码: A Recursive Packet Classification Algorithm Based on Decision TreeZHANG Yan-jun1,2, CHEN You1,2, GUO Li1, CHENG Xue-qi1(1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;2. Graduate University, Chinese Academy of Scienc

3、es, Beijing, China)Abstract: Computational complexity is not the only challenging aspect of the packet classification problem. Increasingly, traffic in large ISP networks and the Internet backbone travels over links with transaction rate in excess of one billion bits per second. This paper introduce

4、s a classification algorithm called SRC(Sensitive Recursive Classification). It is based on a decision tree structure . Doing many experiments, especially in FW and ACL, we verified that SRC uses 3 to 10 times less memory than HiCuts, while the worst case search time is up to 5 times smaller. Compar

5、ed with EGT-PC, SRC uses 2 to 8 times less memory while the worst case search time is up to 4 times smaller.Key words: packet classification; decision tree; mapping1 引言随着网络速度的提高和服务需求种类的增加,越来越多的网络服务需要包分类技术.包分类是下一代因特网网络设备(例如MPLS 路由器,防火墙,VPN 网关,VoIP 网关等)和新型网络服务(例如差分服务,包安全过滤,流量记帐,流量限制服务等) 实现的关键技术之一.作者简介

6、: 张艳军(1975-),男,硕士生,研究方向是网络与信息安全Email:陈友(1981-),男,硕士生,研究方向是网络与信息安全郭莉(1969-),女,高级工程师,研究方向为网络与信息安全,大规模字符串匹配程学旗(1971-),男,研究员, 研究方向为网络信息安全、大规模信息检索与信息挖掘网络链路速度的增长对包分类性能提出了更高的要求,10Gb/s的链路速度需要包分类器每秒处理3100万个大小为40字节的包.在一定内存空间上研究有效的包分类算法及其实现技术是目前网络技术领域的热门话题.网络包分类就是根据网络上传送包的包头信息,由分类器(Classifier)对包进行分类,找出每一个包匹配的规

7、则(Rule),以区分其所属的网流(Flow).在因特网的分层模型中,要传输的数据被各层协议的包头依次封装着,每一层的包头都包含着若干域(Field),它们分别携带着该层协议的特征数据. 包分类规则可以涉及到从数据链路层到应用层的任何域,它对所涉及的域及域的取值或取值范围加以定义. 若干个涉及相同域的包分类规则的集合构成包分类器.2 SRC算法2.1 SRC算法思想SRC算法是基于决策树的基础上建立起来的.决策树的分支策略按照HiCuts1算法用到的分支策略思想,即对分类器中的每个规则,按照其域值所在的区间范围来划分.而在叶结点上采用一种类似于RFC2算法的处理方式.由于Hicuts算法是在一

8、定的内存使用上建立的,因此叶结点中包含的分类规则数目被限定为小于某一特定值.由于要提高算法的性能,这一特定值一般都比较小.这样HiCuts建立的决策树包含的中间结点与叶结点的数目就比较大,占用了内存空间,并且树的深度也很大.随之而后的HyPerCuts3算法直接在分支策略上采取多域切割,这样虽然能够降低树的深度,但是增加了计算复杂性与空间冗余度.SRC算法通过对HiCuts算法中叶结点的数目的限定值扩大,同时引进多域同时切割,使得决策树的深度,中间结点,叶结点数目减少,从而提高算法的性能.叶结点限定值扩大之后,怎样处理叶结点内部的分类规则成为解决问题的关键.叶结点在几何上有多种表现形式:如果叶

9、结点是通过一个域切割得来的就是直线,如果是两个域同时切割得来的就是平面,如果是三个域同时切割得来就是立体.在叶结点上应用RFC算法思想,就可以大大提高算法的性能.RFC算法在性能上具有很大的优势,但在空间上利用率不高.一个重要的原因是在每个阶段,RFC都是针对整个空间来建立索引,如16-bit表示的传输层端口号,RFC在建立索引时,空间范围在0,65535.经过HiCuts算法分枝之后形成叶结点L,L中分类规则的域值所在的空间范围与分支前整个空间范围相比缩减了很多,并且在缩减后的范围中具有相同域值的规则数目较多.这样在叶结点上建立的RFC索引结构占用的空间就很少,并且控制了树的深度.2.2 S

10、RC算法数据结构SRC算法两个核心问题是索引结构的建立,分类算法的设计.索引结构的建立涉及到算法采用的数据结构形式.算法首先对分类器中规则对应的根结点rootNode采取切割,一般采取两域同时切割或是一域切割.当切割之后形成的决策树的叶结点规则数目达到算法设计的要求之后,在每一个叶结点上重新建立一种映射关系. 假设有规则如图1所示,分类器中包括12个规则.为了建立决策树,首先对图1中规则的域区间化,得到图2.图2只列出了图1的前三个规则对应的区间范围.每一个规则覆盖的区域可以表示为一个五组:IPmin-IPmax,IPDmin-IPDmax,PSmin-PSmax,PDmin-PDmax,Po

11、rtmin-Portmax.在图1中树的根结点覆盖的区间是:0-15,0-15,0-3,0-3,0-1,记为currRange.对根结点按照最小代价进行切割,可以得到一系列的子结点.此时记录下切割的子结点个数,记为numCuts,在整型数组RuleList中存放子结点内每个规则的序号. RuleField1Field2Field3Field4Field5ACTIONR0000*111*10*UDPact0R1000*111*0110UDPact0R2000*10*10TCPact1R3000*10*01TCPact2R4000*10*1011TCPact1R50*111*1001UDPact0

12、R60*111*1010UDPact0R70*1*TCPact2R8*01*TCPact2R9*0*01UDPact0R10*UDPact3R11*TCPact4图1 规则集合的一个示例,集合有12个规则,每个规则含有从Field1到Field5的5个域RuleField1Field2Field3Field4Field5ACTIONR00-114-1520-30act0R10-114-15120act0R20-18-110-321act1图2 对图1规则集合中前三个规则的5个域分别区间化得到的结果0-150-3Filed2,4cuts;Field4, 2 cuts0-112-152-30-32

13、-34-7R0 R5R7 R10R11R10 R11R8 R10R11图3 图1规则集合中规则在域2上切割4部分;域4上切割2部分形成的部分决策树对图1的根结点两域切割得到图3的结果.由于篇幅所限,图3是根节点切割后的部分图.由图可知叶结点切割域的覆盖范围缩减了很多,记为Lmin-Lmax.在形成的决策树中对每个结点设立一个变量记为:nodeinfo, nodeinfo的值为0表示中间结点,大于0表示叶结点.对叶结点的缩减区间Lmin,Lmax的映射可以分阶段进行,最终把 Lmin与Lmax之间的值映射成相应的eqID.首先针对叶结点各个域的区间构造短整型数组Cell,Cell的大小是Lmax

14、-Lmin+1.然后针对此叶结点内所有的规则,如果规则在某个域的值为value,则把value填入该域对应的Cellvalue中.对Cell存储的每一个不同的值记为一个eqID.对每个不同的eqID构造一个CES. CES由整型映射值eqID和与此映射值相关联的规则序号集合cbm组成.cbm是比特流数组,每个比特对应叶结点中的一个规则.在每一个阶段处理中,都有与阶段相关联的映射结点,映射结点PNode由两部分组成:(1) 短整型数组Cell(2) 链表ListEqs其中链表ListEqs包括CES和nCES.nCES是eqID个数.CES,ListEqs,PNode的详细结构参见图4,图5 ,

15、图6图4 CES结构,由整型值eqID,比特数组cbm,CES指针组成eqID cbm CES*nextCES* CES* nCES Rear nCESHeadRear图5 链表ListEqs由多个CES相互链接,从Head链到Rear;nCES是相互链接的CES个数LminLmaxListEqsInsLmin+1Cell ListEqs*图6 映射结点PNode由短整型数组Cell,ListEqs指针组成假设分类器中规则的传输层端口域用16-bit表示,区间范围是0,65535,且其取值情况有四种:*,eq www,range 20-21,gt1023.因端口域的范围是0,65535,所以需

16、要Cell的长度为65536,用于存放在该区间内规则的端口域对应的映射值.因其取值只有四种情况,所以通过映射之后,只需用2-bit表示即可.也就是区间0,65535只需用00b,01b,10b,11b四个二进制映射值来表示.把此2-bit值称之为eqIDs.有四个不同的eqID,所以链表ListEqs的nCES为4.如果对端口域切割处理之后区间范围变为Lmin,Lmax,则Cell的长度变成Lmax-Lmin+1,并且在Lmin与Lmax之间,具有相同映射值的规则较多,相应的nCES也小.这样既提高了空间利用率,也使得预处理复杂性变小.切割后的叶结点含有的规则数小,相应的规则序号RuleLis

17、t以及规则序号组合cbm空间也小.按照上述方法进行第一阶段处理得到结点PNode_phase0,然后按照一定的策略组合分类器各个域,如分类器含有6个域,把6个域每3个域映射成一个新域,且第一个新域对应的原始三个域记为dot0,dot1,dot2,第二个新域对应的原始3个域记为dot3,dot4,dot5.最后把两个新域映射成一个域,形成一个唯一的结点PNode.从该结点的cell部分可以形成eqID,然后由eqID形成与之关联的cbm,通过cbm即可得到每个规则的ID.2.3 SRC分类算法SRC分类算法分成两个阶段,第一阶段查找待匹配的包在哪个叶结点;第二阶段,在叶结点的区间范围内映射出该包

18、的eqID,由eqID得出与之关联的cbm,由cbm可以得出与该包匹配的规则,然后赋上相应的网流标志(FlowID).由根结点rootNode的切割信息numCuts以及其区间范围currRange,按照深度优先搜索的方法,在rootNode的子结点中递归找出待匹配包所在的叶结点.在叶结点中,已知叶结点的区间信息,通过映射结点PNode_phase0的Cell部分,可以快速判断出待匹配包各个域对应的映射值index_zero.假设网络包含有6个域,则相应的index_zero有6个值.根据映射信息以及它们之间的组合,最终得出该网络包的最后结点PNode和与之相应的映射值index.根据inde

19、x和cell可以得到与网络包匹配的分类规则,再由匹配的分类规则给该网络包赋上相应的FlowID.3 实验结果3.1 评价标准包分类算法的评价标准是多方面的,主要从以下几个方面对其进行性能评估: 分类速度,即分类处理一个包所需要的时间,用访存次数来衡量;空间复杂度,即算法运行所需要的存储空间大小;对规则的适应性,包括对规则个数,规则在空间中的分布,域的多少以及规则定义方式的适应性;更新速度,当规则集增量变化时,算法的数据结构需要更新.3.2 结果对比在人工模拟的基础上,对当前典型的几种算法性能做了集中测试.模拟数据的产生以及试验环境来源于包分类算法的基准测试4.试验结果如表1,表2,表3所示.

20、表1 EGT-PC,HiCuts,SRC算法在四种FW和四种ACL数据库中算法分别占用的内存,单位是字节ByteDatabaseNo.of RulesEGT-PCHiCuts-4SRCFW127028,688180,76318,048FW218012,35284,43910,932FW31509,64243,3697,279FW427520,38037,23015,340Acl127015,685140,06313,448Acl218019,83696,97215,932Acl315028,642108,83020,569Acl427580,324230,66340,221表2 EGT-PC,

21、HiCuts,SRC算法在四种FW和四种ACL数据库查找时在最坏情况下各自需要的访存次数,一次访存是一个字,32-bitDatabaseNo.of RulesEGT-PCHiCuts-4SRCFW1270604220FW2180567816FW3150577814FW4275284614Acl1270362812Acl2180456820Acl3150638636Acl42758010849表3 SRC算法在FW,ACL数据库中测试时,算法占用的内存,以及最坏情况查找需要的访存次数DatabaseNo.of RulesMemory SpaceSearchFW12500141,7286FW250

22、00210,36910FW310000963,20511FW4150002,367,92414ACL12500439,2105ACL25000809,7686ACL3100001050,9439ACL4150002,108,47213表1是针对内存空间的测试,表2是针对最坏查找时间的测试.从这两个方面来比较Hicuts,EGT-PC,SRC三种算法的优劣.从结果数据可以发现,SRC算法内存空间占用少于HiCuts,EGT-PC算法3到8倍,而最坏查找时间却比它们快4倍以上.表3是针对SRC单独测试,从结果看出规则集的规则数目从2500增加到15000时,SRC占用内存空间增长不快,仅从几百K到

23、几M之间,而最坏查找时间也稳定在15次访存之内.4 结论SRC算法利用多域切割以及小范围内的映射技术节省了大量的内存,同时这两项技术可以降低树的深度和映射复杂度,提高了查找速度.通过仿真技术以及合成的的规则库对SRC算法测试, 结果表明:SRC内存使用比Hicuts减少3到10倍,最坏查找速度比Hicuts提高5倍以上;SRC的内存使用比EGT-PC减少2到8倍,最坏查找速度比EGT-PC提高4倍以上. SRC算法的稳定性好,分类规则的增长不会引起叶结点区间范围大的变化,这样内存的耗用具有稳定性.SRC算法采用的数据结构,使得其查找性能随着规则集的扩大也不会降低.实验证明当规则集含有规则从25

24、00到15000时,其访存次数变化不大.SRC算法性能与叶结点的规则数目,域的切割个数关系很大.控制叶结点规则数目大小,以及切割域的个数还需进一步研究,试验证明.参考文献:.1 P. Gupta and N. McKeown, “Packet Classification using Hierarchical Intelligent Cuttings,” in Hot Interconnects VII, August 1999.2 P. Gupta and N. McKeown, “Packet Classification on Multiple Fields,” in ACM Sigco

25、mm, August 1999.3 S. Singh, F. Baboescu, G. Varghese, and J. Wang, “Packet Classification Using Multidimensional Cutting,” in Proceedings of ACM SIGCOMM03, August 2003. Karlsruhe, Germany.4 D. E. Taylor and J. S. Turner, “ClassBench: A Packet Classification Benchmark,” Tech. Rep. WUCSE- 2004-28, Dep

26、artment of Computer Science & Engineering,Washington University in Saint Louis, May 2004.Editors note: Judson Jones is a meteorologist, journalist and photographer. He has freelanced with CNN for four years, covering severe weather from tornadoes to typhoons. Follow him on Twitter: jnjonesjr (CNN) -

27、 I will always wonder what it was like to huddle around a shortwave radio and through the crackling static from space hear the faint beeps of the worlds first satellite - Sputnik. I also missed watching Neil Armstrong step foot on the moon and the first space shuttle take off for the stars. Those ev

28、ents were way before my time.As a kid, I was fascinated with what goes on in the sky, and when NASA pulled the plug on the shuttle program I was heartbroken. Yet the privatized space race has renewed my childhood dreams to reach for the stars.As a meteorologist, Ive still seen many important weather

29、 and space events, but right now, if you were sitting next to me, youd hear my foot tapping rapidly under my desk. Im anxious for the next one: a space capsule hanging from a crane in the New Mexico desert.Its like the set for a George Lucas movie floating to the edge of space.You and I will have th

30、e chance to watch a man take a leap into an unimaginable free fall from the edge of space - live.The (lack of) air up there Watch man jump from 96,000 feet Tuesday, I sat at work glued to the live stream of the Red Bull Stratos Mission. I watched the balloons positioned at different altitudes in the

31、 sky to test the winds, knowing that if they would just line up in a vertical straight line we would be go for launch.I feel this mission was created for me because I am also a journalist and a photographer, but above all I live for taking a leap of faith - the feeling of pushing the envelope into u

32、ncharted territory.The guy who is going to do this, Felix Baumgartner, must have that same feeling, at a level I will never reach. However, it did not stop me from feeling his pain when a gust of swirling wind kicked up and twisted the partially filled balloon that would take him to the upper end of

33、 our atmosphere. As soon as the 40-acre balloon, with skin no thicker than a dry cleaning bag, scraped the ground I knew it was over.How claustrophobia almost grounded supersonic skydiverWith each twist, you could see the wrinkles of disappointment on the face of the current record holder and capcom

34、 (capsule communications), Col. Joe Kittinger. He hung his head low in mission control as he told Baumgartner the disappointing news: Mission aborted.The supersonic descent could happen as early as Sunday.The weather plays an important role in this mission. Starting at the ground, conditions have to

35、 be very calm - winds less than 2 mph, with no precipitation or humidity and limited cloud cover. The balloon, with capsule attached, will move through the lower level of the atmosphere (the troposphere) where our day-to-day weather lives. It will climb higher than the tip of Mount Everest (5.5 mile

36、s/8.85 kilometers), drifting even higher than the cruising altitude of commercial airliners (5.6 miles/9.17 kilometers) and into the stratosphere. As he crosses the boundary layer (called the tropopause), he can expect a lot of turbulence.The balloon will slowly drift to the edge of space at 120,000

37、 feet (22.7 miles/36.53 kilometers). Here, Fearless Felix will unclip. He will roll back the door.Then, I would assume, he will slowly step out onto something resembling an Olympic diving platform.Below, the Earth becomes the concrete bottom of a swimming pool that he wants to land on, but not too h

38、ard. Still, hell be traveling fast, so despite the distance, it will not be like diving into the deep end of a pool. It will be like he is diving into the shallow end.Skydiver preps for the big jumpWhen he jumps, he is expected to reach the speed of sound - 690 mph (1,110 kph) - in less than 40 seco

39、nds. Like hitting the top of the water, he will begin to slow as he approaches the more dense air closer to Earth. But this will not be enough to stop him completely.If he goes too fast or spins out of control, he has a stabilization parachute that can be deployed to slow him down. His team hopes it

40、s not needed. Instead, he plans to deploy his 270-square-foot (25-square-meter) main chute at an altitude of around 5,000 feet (1,524 meters).In order to deploy this chute successfully, he will have to slow to 172 mph (277 kph). He will have a reserve parachute that will open automatically if he los

41、es consciousness at mach speeds.Even if everything goes as planned, it wont. Baumgartner still will free fall at a speed that would cause you and me to pass out, and no parachute is guaranteed to work higher than 25,000 feet (7,620 meters).It might not be the moon, but Kittinger free fell from 102,800 feet in 1960 - at the dawn of an infamous space race that captured the hearts of many. Baumgartner will attempt to break that record, a feat that boggles the mind. This is one of those monumental moments I will always remember, because there is no way Id miss this.

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 其他


经营许可证编号:宁ICP备18001539号-1