PSP stands for Personal Software Process, which is one
of the most crucial element of Software engineering. PSP consists
of several major phases that are involved in the development of
medium and large scale software systems, mainly Requirements
Analysis, Design, Coding, and Testing.
Requirements analysis phase involves the understanding
and planning out of the software system specifications, requirements,
and constraints. To an extent, it also incorporates decisions
on the types of development and execution environments, and analyzes
the use cases (simulating system interaction). Design
phase outlines specific implementation details, such as objects
(classes and their functionality) and how they interact with each
other. Coding phase involves the actual software implementation
of all such functionality. Testing phase monitories this
implementation on both "white-box" (by checking all
the branches and paths of execution) and "black-box"
(making sure system doesn't crash when run) levels.
Accurate estimations of how long each development phase will take, and how these times will relate to one another are extremely important when trying to produce quality software on time. In the recent years, software industry and the academic and research groups have been paying a great deal of attention to applying software engineering practices, and it paid off. Unfortunately, it often turns into a tedious and unintuitive task, presenting an obvious need for an automated tool to do PSP predictions and logging. Although there have been several (not very successful) attempts to create a logging tool, the actual prediction of the times are left entirely up to the programmers, which has been a major problem for people who do not have as much experience making such predictions. The goal of this project was to create a neural network system trained on the actual PSP data that would be able to predict times required for each development phase, including requirements analysis, design, coding, and testing.
In order to make a PSP prediction system, I had to adopt the following
assumption about the nature of the problem. It
is possible to estimate ratios among the times needed for each
development phase, given the following parameters:
These input parameters have been carefully selected among many
other possible ones as most relevant factors affecting the times
each development phase takes, based on my experience. It should
be noted, however, that there may be other important metrics that
could have improved the performance of the system if taken into
consideration. Nonetheless, as it will be shown later, the given
selection proved to be adequate and produced reasonably accurate
ratios among the development phase times.
Following this assumption, it is evident that if one of the development
phase times could be estimated statically, one can apply neural
networks to predict times needed for the other three phases.
Since Coding Time is mostly consistent with the LOC, I
chose it to be evaluated statically. Thus, three separate neural
networks were trained to predict the (Analysis/Coding),
(Design/Coding), and (Testing/Coding) time ratios.
Since the available training data consisted entirely of student project records, the size of the projects never exceeded 2500 LOC, thus putting an upper limit to the range of possible projects that the system can predict. I also made a decision to limit the familiarity, complexity, and level input parameters to the scale from 0 to 7.
Since it was not obvious how important the parameters were compared to each other, I decided to allocate an equal number of input units of the neural network to all of them. To remedy the fact that the LOC is an unscaled number, I split the range of all possible LOC into eight subranges: {(<400), (400..700), (700..1000), . (1900..2200), (2200..2500)}. This division allows to represent LOC in the same manner as the other three input parameters.
The next question that has to be answered is how exactly the input parameters are to be represented. Naturally, in order to keep the network manageable, one would like to limit the total number of input units. Since there are eight possible values for each of the inputs, the minimum number of binary nodes required to represent this range is three, using some kind of an encoding scheme.
In order to decide on the best encoding scheme, I constructed
a simple neural network that attempts to learn ranges given encoded
binary inputs. The scheme that performed the best is shown in
the rightmost column in Table 1 (below). Intuitively
it can be explained by the fact that as the number of 1's in
the encoded sequence never decreases as the number itself grows.
Table1: Binary number encoding:
Outputs we also chosen to be represented as ranges of ratios on the scale [1..10], using 10 binary digits (ex.: {0 0 0 1 0 0 0 0 0 0} represents fourth ratio range). This gives us the flexibility of assigning different ratio ranges to each of the three neural networks to be trained, based on the statistical information available for the training sets.
Another possibility was to use a similar kind of encoding scheme for the outputs as well. However, the random success probability in this case would could be as high as 12.5% (0.5^3), introducing a lot of instability into the predictions. The random success rate of the chosen format is only (0.5)^10 = 0.1%, allowing for a more accurate evaluation of system performance. This format also allows for an alternative, more accurate method of output avaluation described below.
Intuitively we would evaluate an output sequence by matching all its binary terms to the ten possibilities (where '1' is in one of the ten places, and '0's everywhere else). If an error occurs, we are most likely to receive an invalid output (with all zeros, or more then one ones), since there are only 10 legal outputs out of 1024 possible ones. Then we could either tell the user about the error, or apply the following formula to calculate an alternative answer:
R = Sum( Zi*Ri)/ Sum( Zi), for i=1..10,
where R is the estimated ratio (PhaseTime/CodeTime, in our case), Zi = (Z1 Z10) - output fractions produced by the neural network, and Ri=(R1..R10) are their respective ranges.
In the case of using an encoding scheme for the output, almost any combination of output unit values would be valid, and we would have no way of knowing if an error occured.
When applying the momentum term, I used the following formula, which is slightly different from the traditional one, but seemed to work a lot better for this project:
w(t+1) = ALPHA*DELTA*z + NU*[w(t) - w(t-1)] + w(t)
Momentum was only applied to the regular (non-biased) weights,
biased and non-biased weights have different metrics, and applying
the same momentum rate kills the leering.
Varying these parameters, I learned quite a bit about the architecture of backpropogation neural networks, and how each parameter affects the performance. I will described the experiments that I conducted to test the performance of the network in later sections.
Initially, I used the data set, kindly provided by Laurie Williams, graduate Computer Science student here at the University of Utah. The set contains PSP data collected from CS202 students during the Winter quarter, 1997. It came in the following format:
Table 2, Original data sample
| ID | Analys | Design | Code | Review | Comp | Test | Total | LOC | Grade |
A lot of the entries were missing. In order to process this datum
into a neural-network-friendly format, I wrote a program that
retrieves applicable present entries for each network, calculates
statistical information about each selected datum, assigns ranges
to the LOC and output, and converts data to the following Training/Testing
file format:
Table 3, sample neural network input
After processing this data set, I got the following statistics on the data:
654 Records Total, representing 109 students in CS202, after completing six projects.
564 records represented completed projects (Grade > 60). However,
since a lot of entries are missing, the actual data sample for
each phase is going to be different.
Table 4, Distribution of times for phases for data set #1
Ratio of CodeTime/LOC ranges: (0.046 - 8.808), average: 1.080
+-1.133 (1 st. dev)
Table 5, Distribution of times for phases for data set #1
The data is obviously very inconsistent. All the metrics have
standard deviations greater than the magnitude of the average.
Even for the ratios, which are supposed to be more consistent
than the specific times (because projects differ in complexity),
we see huge variations.
Training and Testing the network on data set #1:
The LOC parameter could be taken directly from the data. I also assigned Familiarity to the environment on a per-project base (for each of the six projects): {1, 2, 3, 4, 5, 6}, Complexity based on each projects, as suggested by the instructor: {2, 3, 4, 5, 7, 4}. The Level of programming background was assigned based on Grade, representing one of eight ranges: [<66], [66..70], [71..75] [91..95], [>96]}
Since ratios between each of the development phase times and the coding time were in their majority between zero and two, I assigned the following Output Ranges: {[<0.2], [0.2..0.4], [0.4..0.6] [1.4..1.6], [1.6..1.8], [>1.8]}
After generating the network input format, I ran the data set
using the following neural network characteristics: 1 hidden level,
25 hidden units, momentum rate= 0.9, learning rate = 0.2, with
biased nodes, and got the following results:
Table 6, results of training and testing on data set #1
As we can see, the results are simply terrible. The fact that even on the training set the network never reached 30% shows how inconsistent the data was. It is little surprize how low the testing percentages were. We definitely see some learning (since absolutely random set would have produced about 0.1% accuracy). However, it is far from being useable for making any kind of reasonable estimates.
Here are some possible reasons for such terrible performance:
Keeping all these in mind, I have decided to use a different
data set to train the neural networks on. Since students in CS453
(Senior Software Lab) have been tracking their PSP for almost
a year by now, I have decided to use these data instead. I thought
that this would be more accurate, since I can assign familiarity,
complexity, and programming level better because I was personally
involved with the projects and know all the people who did them
(including myself).
I the short period of time available, I was able to collect information
on 72 projects done by CS453 students. Most of the data came from
myself and the members of my team in that class. All records represented
completed projects (75 < Grades < 100). The following is
the statistics on the times for development phases for the projects,
and their ratios to the respective times spent on coding.
Table 7, Distribution of times for phases for data set #2
Ratio of CodeTime/LOC ranges (1.987 - 3.015), average 2.417 0.496
Table 8, Distribution of times for phases for data set #2
If you compare the data in tables 7 and 8 to thoes of tables 4
and 5, you will notice a huge difference in the variance both
actual times and ratios of sets 1 and 2. This can be explained
by students in 453 being more careful and having more experience
in tracking and logging their PSP data. Also, the second set
represented a much smaller number of individuals with very similar
levels of skills, which also reduced the variation.
Before doing the actual training and testing, I assigned Familiarity and Complexity parameters on a per-project basis. This was possible because I was personally involved in working on all these projects, so I was able to be consistent with my estimations. I assigned Level parameter values based on Grade: {[<79],[79..82],[83..85] [95..97], [>98]}. The following represent the ranges that I chose for the outputs for each of the phase networks (based on the statistics on each of the phase data):
Analysis/Code Range : {[<0.1], [0.1..0.2], [0.2..0.3], [0.8..0.9], [>0.9]}
Design/Code Range {[<0.1], [0.1..0.2], [0.2..0.3] , [0.8..0.9], [>0.9]}
Test/Code Range: {[<0.05], [0.05..0.1], [0.1..0.15], ,
[0.40..0.45], [>0.45]}
I trained the networks on 100 epochs, both 1 and 2 hidden units
levels, 25 hidden units in each level, momentum rate = 0.9, and
learning rate = 0.2 for 1 level, and 0.02 for two levels. The
following are the results of the experiments:
Table 9, results of training and testing on data set #2
Performance improvement was very significant as compared to the data set#1! Especially striking is the performance difference when going from one to two levels of hidden units. Perhaps, the relationships between the phases are better described with non-linear functions that can only be approximated through two layers of hidden units.
As a matter of fact, performance of the networks was good enough to be included into the real application. For the final project in Senior Software, our team developed a PSP logging tool that allows PSP tracking over the network (using a web browser) through a centralized PSP server. The pre-trained neural networks were included in the server to give estimates to the times required for each development phase and to evaluate final results.
In order to estimate the Coding Time based on the number of lines of code, I used an empirically constructed function (for LOC [100, 2500]):
CodeTime = LOC(1.85 + 0.0004LOC + 0.04familiarity
+ 0.11Complexity - 0.02Programming Level)
For the other development phases (Requirements analysis, Design, and Testing), the system first finds the correspondent ratios, and then multiplies them by the statically evaluated Code Time.
Overall, the program seems to give reasonable estimation of time
for each development phase, although it is tailored towards the
data set and academic-type projects of limited size. It is currently
being considered for student PSP logging and estimation for the
next year.
In order to find the optimum configuration for the neural network, I experimented with the number of hidden units in hidden layers, momentum rate, learning rate, and presence of biased nodes.
This parameter proved to be crucial for the learning speed of the network. Consider the following graph (top curve represents a 1-layer network, bottom one - 2 layer):
The data have been collected after training the networks with 25 hidden units in each level, 0.1 learning rate, and biased nodes present. The X-axis has the momentum rate, ranging from 0 to 0.9. The Y-axis corresponds to the number of epochs before 96% accuracy was reached. As it is obvious from this graph, high momentum rates reduce learning time significantly.
Learning rate:
This parameter is also very important for the learning speed. However, I have also noticed that with very high learning rates (above .05 for two levels of hidden units, and .2 for one level), the network becomes very unstable, and takes much longer to converge then lower learning rates. Sometimes it gets into a permanent oscillating state and never converges.
Consider another chart (top curve represents a 1-layer network, bottom one - 2 layer):
As it can be seen from table 9 and the charts above,
The data have been collected after training the networks with
25 hidden units in each level, 0.8 momentum rate, and biased nodes
present. The X-axis has the learning rates, specifically {0.01,
0.02, 0.03, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5}. The Y-axis corresponds
to the number of epochs before 96% accuracy was reached. As it
is obvious from this graph, high momentum rates reduce learning
time significantly.
Hidden Units and Layers:
I experimented with various sizes of hidden units and layers, and they also affected the speed of training significantly. I experimented with the number of units ranging from 5 to 50, and the performance peaked at 80% for 2 layers and 25 units in each layer.
Biased Nodes: Addition of biased nodes seems to give a
75% improvement in the speed of learning
Concluding Remarks:
Neural Network-based PSP prediction systems can be created, but they are heavily biased towards a particular type of user, who is consistent with the type of the projects being logged, their complexity, and size. However, if it is possible to identify such user, neural networks can perform reasonably well in predicting the development phases.
In the future I would like to learn more about different types
of neural networks and their potential applications. In particular,
I am interested in medical applications of neural nets. It would
be interesting how well they will performed as compared to other
machine learning methods used in medical expert system. In general,
I think this project gave me a very valuable experience needed
for a more advanced research in the field of artificial intelligence.