Abstract | Analiza široko rasprostranjenih tehnika procjene količine posla za projekte razvoja programskih rješenja pokazuje da su postojeće tehnike prvenstveno namijenjene razvoju novih rješenja. Primjenom tih tehnika na projektima koji koriste prethodno razvijene artefakte iz projekata sa sličnim obimom posla dolazi do velikih odstupanja između procijenjene i stvarne količine posla. Ovim doktorskim istraživanjem predložen je novi model procjene količine posla za razvoj programskih rješenja zasnovan na višestrukom korištenju slučajeva uporabe i značajkama projektnog tima, nazvan model UCR (Use Case Reusability model). Model UCR nastao je modifikacijom postojećeg Karnerovog modela za procjenu količine posla UCP (Use case Points) te dodavanjem elemenata koji opisuju aspekt višestruke iskoristivosti. U sklopu validacijskog procesa, model UCR se primijenio na inicijalnim i slijednim projektima iz tri različita i nepovezana programa iz industrije i akademske zajednice. Rezultati analize pokazuju da se model UCR može primijeniti u različitim projektnim okruženjima. Prema promatranoj srednjoj veličini relativne greške, primjena modela UCR pokazuje poboljšane rezultate procijenjene količine posla pri usporedbi s modelom UCP za promatrane projekte unutar studijskog primjera. Opisan je proces oblikovanja modela procjene količine posla zasnovanog na metodama strojnog učenja. Primjenom regresijskih modela (metode linearne regresije, umjetne neuronske mreže i stablo odlučivanja) oblikovan je model UCR_ML. U sklopu evaluacije rezultata, za dane povijesne podatke najpreciznije rezultate procijenjene količine posla dala je metoda radijalnih neuronskih mreža. Ograničenje pri oblikovanju modela predstavlja relativno mali skup povijesnih podataka jer su posljedično mali skupovi podataka za učenje i testiranje. Na temelju validacije modela procjene UCR_ML, odnosno usporedbom s rezultatima procjene modela UCR, može se zaključiti da su za organizacije s ograničenim brojem projekata sa sličnim obimom posla primjereniji algoritamski modeli. Organizacije mogu lako primijeniti opisani proces razvoja modela UCR i UCR_ML i po potrebi ih prilagoditi svojim potrebama modifikacijom ulaznih faktora, težina ili kalibracijom veličine UCR izražene u čovjek – sat. |
Abstract (english) | Effort estimation required for a software development project is extremely important for the success of the overall solution delivery. Despite this fact, studies show that there a significant progress in improving the performance estimation techniques has not been reported, which represents one of the major challenges within the software industry. Incorrect effort evaluation often causes budget overrun, delay in delivery, failure to fulfil contractual obligations and indirectly affects the quality of the product itself. It is therefore not surprising that a very common cause of failure of software development projects in the field of information technology (IT) is incorrect effort estimation. The estimated amount of work has a direct impact on several aspects of the software development process life cycle: it may challenge or support a decision on the development of software product depending on the investment justification, it is used as an input parameter for determining the budget of the project and the market price, affects the project plans, schedules delivery of project artifacts, etc. Present trends impose goals such as shorter duration of the software development cycle and cheaper product prices. This directly targets lower project effort, but there is still a parallel demand to maintain the agreed quality of the product. To meet set requirements, software development models introduce the practice of defining software modules and their implementation. Those serve as the core solution with a possibility to reuse certain modules in other (separate) software solutions. Core solution beside program code contains a vast number of development process artifacts, such as requirement descriptions, architecture, use cases, test specifications, etc. Reusability practice of software artifacts improves the productivity and quality of new software products, while reducing the resources, cost and time of the future software development projects. Conducted studies include an analysis of the most commonly used effort estimation techniques and those can be categorized into two groups: algorithmic models based on parameters (Constructive Cost Model - COCOMO, Lines of Code – LOC, Functional Points – FP, Use Case points, etc.) and heuristic approach (expert estimation, neural networks, a rule of thumb, techniques, Delphi, etc.). Analysis of the widely spread effort estimation techniques for software development projects show that these techniques were primarily intended for the development of new software solutions. By applying these techniques to the projects that are reusing artifacts previously developed in past projects with a similar scope of work, there is great discrepancy between the estimated and actual project effort. Such results clearly show there is a strong need for defining a new effort estimate model that would serve the projects which are reusing artifacts developed in previous projects within the same program. The term “initial project” refers to the project where all project artifacts are developed from scratch. This thesis describes new effort estimation model based on use case reuse and project team characteristics, called the Use Case Reusability (UCR). UCR model is modification of the existing Use Case Points (UCP) effort estimation model developed by Karner with the elements that are describing the reusability aspect. Input factors of UCR model appertained to three main aspects: functional scope, technical complexity and environmental factors. The UCR model is distinctive by inclusion of additional classification of use cases for the subsequent projects based on their reusable elements (new, similar or identical use case). In the subsequent project artifacts developed in previous projects are being reused. In the initial project where all project artifacts are built from scratch, all the use cases are defined as new use cases. Each of the subsequent projects has a dedicated functional scope and every new requirement must be analyzed to determine if it fits the elements of an already existing use case. Each of the technical and environmental factors has been assigned a descriptive scale of attributes (low, medium, high) with their set of criteria for each attribute. By setting the clear criteria for each possible attribute of individual factor, input parameters for the estimation model are objective and transparent. Historical project data from three different programs were collected to calibrate the size of UCR expressed in man hours for initial and subsequent projects. This paper also presents a study which validates the usage of UCR model. Validation of UCR model is performed as an experiment that evaluates the accuracy of estimated project effort when different estimation models are used. The study is conducted within industry and academic environments using industry project teams and postgraduate students as subjects. The analysis results show that UCR model can be applied in different project environments and that according to the observed mean magnitude relative error, it produced very promising effort estimates. Projects whose historic data was used for model calibration and validation are dominantly small sized in terms of project scope (small or moderate number of deliverables to be produced) and team size (up to five team members), so certain deviations in effort estimation might be expected for medium or large projects. As part of future work, there is a possibility for medium or large sized projects to enlarge the scale of factor attributes in the UCR model. Project size and project complexity are increasing from small to large sized projects: medium and large sized projects have a moderate to high number of deliverables which are usually technically more complex, number of team members rises and timeframe for delivery expands. Additional granularity of factor attributes would allow more precise differentiation between the projects in terms of technical complexity solution and project team characteristics. Apart from parametric UCR model, this thesis also presents effort estimation model based on machine learning techniques. UCR_ML model was formed based on regression techniques: linear regression method, artificial neural network and decision tree. Evaluation results of all three techniques showed that the most precise estimate of project effort was produced by radial neural network. The design constraint of UCR_ML model is a relatively small set of historical data for learning and testing. Based on the validation of the UCR_ML estimation model, i.e. by comparing the results of the UCR model estimation, it can be concluded that for organizations with a limited number of projects of similar scope more appropriate are algorithmic effort estimation models. However, as experts in effort estimation advise, the application of several different estimation models contributes to a more accurate estimation of effort. Consequently, organizations with a limited amount of historical data should not rule out the use of a model based on machine learning methods, but after designing them, they should be enlarged with new historical data. Organizations can easily apply the described both UCR and UCR_ML models, and if necessary, tailor them by modifying the input factors and weights or calibrating UCR size expressed in the man hour. The sections of this thesis are organized as follows. The introduction gives an overview of current research along with the analysis of effort estimation models with the focus on expert estimation and use case points. This section describes mathematical model of UCP and algorithm of estimated project effort. The second chapter describes an experience and challenges of applying UCP model in projects that are reusing previously developed artifacts. Great discrepancy between estimated and actual project effort clearly show there is a strong need for defining a new effort estimate model that would serve the projects which are reusing artifacts developed in previous projects with the similar scope. The third section describes software reusability elements along with the process of comparing a new software requirement with the existing use cases. In addition to the existing classification of cases by complexity, a new classification of use cases is established based on their reusable elements. The fourth section proposes the definition of a new effort estimation model based on use case reuse (UCR). The UCR model introduces new classification of use cases based on their reusability and it includes only those technical and environmental factors of UCP model that according to the effort estimation experts have significant impact on effort for the target projects. This section also describes mathematical model along with all algorithms for producing effort estimation followed by model parameterization. Furthermore, the fifth section presents the validation process of the UCR model. Validation is performed as an experiment that evaluates the accuracy of estimated project effort when different estimation models are used. The factor in the experiment is the estimation model and the treatments are UCP and UCR models. Subjects are different project teams from the industry and academic community (Ph.D. students). Effort estimation model based on machine learning techniques UCR_ML is defined within the sixth section. Case study describes development of the UCR_ML model using regression techniques: linear regression, neural networks and decision trees. Effort estimation results of UCR_ML were validated against the results of UCR model. As part of the case study, the UCR_ML model based on machine learning shows slightly inferior to the UCR algorithm model. Under the threat of the validity of the results, a relatively small set of historical data has been recorded, so there are small sets of data for learning and testing. It can be concluded that for organizations with a limited number of projects with a similar scope of work, who want to develop a suitable model for estimating the effort, more appropriate are the algorithmic models. However, organizations with a greater number of historical data can easily apply the described estimation model and, if necessary, adjust input factors. The last chapter outlines the conclusions of the dissertation research and suggests the directions for further research in this area. |