软件维护中的关键预测问题研究.docx

资源描述

《软件维护中的关键预测问题研究.docx》由会员分享，可在线阅读，更多相关《软件维护中的关键预测问题研究.docx（132页珍藏版）》请在三一文库上搜索。

1、软件维护中的关键预测问题研究重庆大学博士学位论文学生姓名：杨梦宁指导教师：杨丹教授专业：计算机科学与技术学科门类：工学重庆大学计算机学院二 O 一六年十一月Research on Fundamental PredictionProblems in Software MaintainanceA Thesis Submitted to Chongqing Universityin Partial Fulfillment of the Requirement for theDoctors Degree of EngineeringByYang MengningSupervised by P

2、rof. Yang DanSpecialty: Computer Science and TechnologyCollege of Computer Science AtChongqing University, Chongqing, ChinaNovember 2016中文摘要摘要软件工程中的预测问题是软件工程研究领域的热点课题之一，近年来一直受到软件工程研究者和软件开发从业者们的广泛关注，其基本思想在于利用软件开发与演化过程中的历史经验和知识，预测软件未来可能的状态，达到有效的辅助软件工程活动决策的目的，典型的应用场景包括软件项目计划、软件测试、软件质量保证、过程改进和管理、软件维护与演

3、化。在软件整个生命周期中，软件维护是时间跨度最长的阶段。此外，随着软件开发技术、开发模式的不断变化，软件需求和业务复杂度不断提升，软件维护在软件开发与演化过程中的重要性日益凸显。鉴于此，本文拟解决软件维护中的三个关键预测问题：软件维护工作量预测、软件变更预测和软件缺陷预测。论文的具体工作内容和创新点如下：(1) 动静结合的软件维护工作量预测算法设计。针对软件维护工作量预测中的现有问题：静态预测方法并不适应于维护中的软件项目，在维护的过程中软件的所有基本构成要素都会变化而目前的静态方法不能完全掌握这些变化对软件维护工作量构成的影响，动态预测方法依赖软件演化数据，这对部分项目来讲，其数据收集比较困

4、难。本部分工作设计了一种动静结合的预测方法，即 RPBSC 模型，该模型结合了静态模型和动态模型的优点，来弥补各自在实践应用中的不足。(2) 软件维护工作量预测实证研究。本部分工作展示了基于工作(1)中算法所开展的实证研究，主要包括三个部分，一是介绍了实证研究所采用的数据集，即选取了 Apach 下三个开源项目 shindig、Lucene、以及 openwebbeans 为实验对象；二是介绍了实验设计及评价标准，最后给出了实验结果和分析，实验结果表明本文提出的方法可以较精准的预测软件维护工作量。(3) 自学习的软件变更预测算法设计。现有软件变更预测方法依赖于历史标签数据集，无法对无历史标签数

5、据集做出预测，针对该问题，本部分工作设计了一种自学习的软件变更预测解决方案，该方案从度量元数据特点与软件变更的关系出发，基于度量元值越大软件类越复杂从而更可能发生改变的原则，选取部分样本打上标签，构建训练空间，进而构建自学习的变更预测算法。(4) 基于自学习的软件变更预测实证研究。本部分工作展示了基于工作(3)中的自学习算法的实证研究，首先介绍了实验中所采用的数据集和实验环境，其次介绍了实验所采用的评指标和对比方法，最后给出了实验结果与分析。实验结果表明，在 14 个开源项目的平均预测效果上，自学习方法的预测效果优于四个现有预测方法；I重庆大学博士学位论文(5) 基于 LDA 的缺陷预测算法设

6、计。现有缺陷预测方法多基于软件结构和设计度量元来组成特征空间，软件源代码中的语义特征却鲜有工作涉及。鉴于此，本文基于 LDA 主题模型提出了一种新的缺陷预测度量元，即主题缺陷密度，该度量元使得源代码中的语义信息和缺陷主题产生关联，通过软件前一版本的主题缺陷密度，来预测当前版本的主题缺陷密度，进而能预测当前软件版本的缺陷信息。基于该思路，本部分工作详细展示了如何基于源代码语义信息来构建缺陷预测模型。(6) 基于 LDA 的缺陷预测实证研究。本部分工作展示了基于工作(5)中提出的缺陷预测算法的实证研究，首先介绍了实验环境和实验所采用数据集，其次展示了实验总体过程以及实验性能所采用的评估指标，最后给

7、出了实验结果与分析。结果表明，利用语义特征，采用版本间主题的关联度来量化版本间的主题缺陷密度来预测缺陷是一种可行方案。通过对三个开源项目的实验结果分析，发现本文方法的预测结果与真实的缺陷之间能达到较好的一致性。本论文的工作针对软件维护中的关键预测问题，从现有方法的局限性出发，改进了现有预测方法，提出了新的预测模型，提高了预测精度，为软件的维护和演化提供了更精准的决策建议。关键词：软件预测模型，软件维护，变更预测，缺陷预测，主题模型II英文摘要ABSTRACT As one of the hot topics in software engineering, the prediction pro

8、blems has beenattracted the wide attention of software engineering researchers and softwaredevelopment practitioners in recent years. The basic idea is predicting the possiblefuture state of software and achieving the purpose of aiding the software engineeringactivities effectively by using the hist

9、orical experience and knowledge in softwaredevelopment and evolution process. The typical applications include software projectplan, software testing, software quality assurance, process improvement andmanagement, software maintenance and evolution. In the whole life cycle of software, software main

10、tenance is the longest period oftime. Besides, with the development of software development technology and thecontinuous change of software development mode, software requirements andcomplexity are promoting gradually, the importance of the software maintenance insoftware development and evolution p

11、rocess is increasingly prominent. In view of this,this paper intends to address three prediction problems in software maintainance:software maintenance effort prediction, software change prediction and software defectprediction. The content and contributions of the paper are as follows: (1) Software

12、 maintenance effort prediction algorithm design by combining thedynamic and static methods. For the existing problems in software maintenance effortprediction: static forecasting methods are not adapted to the maintaining softwareproject, in the process of software maintenance, all the basic element

13、s are changing sothat the static methods cannot fully grasp the impact of these changes on softwaremaintenance effort. And dynamic prediction methods are based on software evolutiondata, the data collection is difficult in terms of parts of the project. This part of the workdesigns a prediction algo

14、rithm which combined the dynamic and static methods, namelyRPBSC model. The model combines the advantages of static and dynamic model inpractical applications which is complementary. (2) The empirical study of software maintenance effort prediction. This part showsthe empirical research based on ste

15、p (1), mainly includes three parts: we introduce thedata set in the empirical research: three open source projects shindig, Lucene, andopenwebbeans of Apach, as the experimental object; We introduce the experimentaldesign and evaluation criteria; We show the experimental results and analysis. TheIII

16、重庆大学博士学位论文experimental results show that the proposed method can predict software maintenanceeffort accurately. (3) The design of software change prediction algorithm based on self-learning.Existing software change prediction methods rely on the labeled history dataset andcannot predict the unlabele

17、d data sets. In order to solve the problem, this paper designsa self-learning software prediction solution. The approach uses the metric features andthe relationship of software change, based on the principlethe metric value is larger,the software is more complex, and the software is more likely to

18、change, and thenselects some samples and then labeled them to construct training space, to build aself-learning change prediction algorithm. (4) Empirical study of software change prediction based on self-learning. This partof work shows the empirical research based onthe self-learning algorithm of

19、step (3).Firstly, this paper introduces the data sets and experimental environment, followed bythe introduction of the evaluation criteria and contrasted methods. Finally, the papergives the experimental results and analysis. The experimental results show that theperformance of self-learning method

20、is better than four existing prediction methods onthe average of 14 open source projects. (5) The design of defect prediction algorithm based on LDA. The existing defectprediction methods are based on the software structure and metrics to constrct featurespace, but the semantic features of the softw

21、are source code are rarely involved. In viewof this, this paper proposes a new defect prediction metrics based on the LDA topicmodel, namely topic defect density. The measure relates the semantic information anddefect topics in the source code, predicts the topic defect density of current softwareve

22、rsion by the topic defect density of previous software version, and then predicts thedefect information of current software version. Based on this idea, this part of workshows how to construct the defect prediction model based on the semantic informationof the source code. (6) Empirical study of def

23、ect prediction based on LDA. This part of work shows anempirical study based on defect prediction algorithmof step (5). Firstly, the partintroduces the experiment environment and the data sets. Secondly, we show the overallexperimental process and evaluation criteria. At last, the paper gives the ex

24、perimentalresults and analysis. The results show that it is feasible to predict the defect by using thesemantic features and the correlation between the topics of different versions. ThroughIV英文摘要the analysis of the experimental results of three open source projects, we find that theresults of the p

25、roposed method can achieve a good agreement with the real defects.The paper studied the fundamental prediction problems in software maintainance,improved the existing prediction method and proposed new prediction models in termsof the limitations of the existing methods. The method not only improved

26、 the accuracyof prediction, and also provided more accurate solutions for the maintenance andevolution of software.Keywords: software prediction model, software maintenance, change prediction, defectprediction, topic modelV重庆大学博士学位论文VI目录目录中文摘要.I英文摘要.III1 绪论.11.1 研究背景与意义.11.2 相关研究进展.31.2.1 软件维护工作量

27、预测.41.2.2 软件变更预测.61.2.3 软件缺陷预测.81.3 研究动机. 111.4 本文主要研究内容.121.5 论文组织结构.132 相关理论与技术 .152.1 软件仓库挖掘.152.2 软件维护工作量预测.162.3 变更(change-prone) 预测与缺陷 (defect-prone) 预测 .182.3.1 静态预测技术.192.3.2 软件缺陷动态预测技术.212.4 自学习.222.5 主题模型.232.5.1 主题模型概述.232.5.2 LDA 模型.242.5.3 狄利克雷分布.252.5.4 Gibbs 抽样.262.6 本章小结.263 软件维护工作量预

28、测模型.273.1 引入.273.2 回归模型.283.2.1 线性回归.293.2.2 多元线性回归.293.3 RPBSC 模型.303.3.1 软件维护工作量度量指标选取 .30VII重庆大学博士学位论文3.3.2 源代码版本数量的充分必要性. 333.3.3 RPBSC 模型的建立. 333.4 实验设计及结果分析. 343.4.1 实验环境. 343.4.2 数据来源. 353.4.3 指标及数据. 363.4.4 实验结果与分析. 373.5 本章小结. 414 基于自学习的软件变更预测模型. 434.1 引入. 434.2 基于自学习的软件变更预测模型的建立. 464.2.1 聚类与标记. 474.2.2 选择度量元. 514.2.3 实例选择. 514.2.4 学习与预测. 524.3 实验设计. 524.3.1 研究问题. 524.3.2 实验环境. 524.3.3 数据集. 534.3.4 实验对比. 544.3.5 评估指标. 554.4 实验结果与分析.

展开阅读全文