Object-Oriented Metrics in Practice
Object-Oriented Metrics in Practice
Object-Oriented Metrics
in Practice
Using Software Metrics to
Characterize, Evaluate, and Improve
the Design of Object-Oriented Systems
Foreword by Stéphane Ducasse
123
Authors
We would like to thank Ralf Gerstner, our editor, and his team at
Springer for following this book project with dedication and patience.
Our special thanks go to Prof. Gerhard Goos for his strong encourage-
ment to start writing this book, and for remembering us that “every
man must plant a tree, build a house and write a book”.
Finally we would like to thank our wives, Marisa and Cristina, for
being so understanding — patience is truly a female trait!
April, 2006
Foreword
Some Context
It is a great pleasure and difficult task to write the foreword of this book. I
would like to start by setting out some context.
Everything started back in 1996 in the context of the IST project FAMOOS
(Framework-Based Approach for Mastering Object-Oriented Software Evo-
lution). At that time we started to think about patterns to help approach
and maintain large and complex industrial applications. Some years later, in
2002, after a lot of rewriting these patterns ended up in our book “Object-
Oriented Reengineering Patterns”. Back in 1999, Radu Marinescu was a
young researcher on object-oriented metrics and Michele Lanza was starting
to work on program visualization. At that time, object-oriented reengineering
was nearly a new field that we explored with imagination and fun. While writ-
ing the “Object-Oriented Reengineering Patterns” book, we (Oscar Nierstrasz,
Serge Demeyer and I) felt the need to have some metric-based patterns that
would help us apply metrics to understand or spot problems in large appli-
cations, but we could not find the right form for doing it, so we dropped this
important topics from our book.
A few years later, in the context of RELEASE Network, a European Science
Foundation network, I remember talking with Radu, who was working on
detection strategies, about a book that would have pattern metrics at its
center. Such a book was then still missing. Now you can read about years of
concrete experience in this book.
I’m French, I often use the metaphor of teaching cooking where, besides the
technical aspects of slicing and cooking the elements, creativity comes into
play because the cook knows tastes and spices and how they interact. To
learn we should get in touch with varieties of spices, aromas and textures:
we do not teach cooks by only feeding them with fast food, but by exposing
them to varieties and subtle flavors. I always remember when I was a kid the
first time I went to sleep in a friend’s place. There things were the same but
also different. I realized that we understand the world also by stressing and
tasting differences. After being exposed to change, we can decide to explore
or not, but at least this helps us to understand our own world. This is why I
expose students to the beauty of Smalltalk. My goal is to destabilize them, so
that they realize that “0.7 sin” (i.e., sin is just a message sent to a number)
can be more natural than “Math.sin(0,7)”, or that late binding is a big case
statement at the virtual machine level. A nice example is to understand how
Boolean behavior (NOT, AND, OR) is defined when we have only objects and
not primitive types.
Recently I have been more and more involved in the maintenance and
evolution of Squeak, this great open-source multimedia Smalltalk. I decided
that I should help make this gem shine. And this has been rewarding since I
have learned a lot. Squeak has given me many ideas about my own practices
and has sharpened my taste and views about design, and often even changed
my mind. Here are some of the thoughts I want to share with you:
(1) Reducing coupling is difficult. Often we would like to be able to load one
package independently of others. But there is this one reference to that class
that does not make it possible. Easy you think. Just move the class to an-
other package. But you simply move the dependency around! If you are lucky
you have dead code. If you can attach the changes as a class extension to
another package you can fix it, but in Java and C++ you do not have that
possibility, while the next version of C# is taking a step in that direction. In
all the other non-trivial cases you have to understand the context and see if
a registration mechanism or any other design change can solve the problem.
(2) It is really fun to see that the old procedural way of thinking is still with
us. People still believe that a package should be cohesive and that it should
be loosely coupled to the rest of the system. Of course strong coupling is a
problem. But what is cohesion in the presence of late binding and frame-
works? Maybe the packages I’m writing are transitively cohesive because the
classes they contained extend framework classes defined in cohesive pack-
ages? Therefore naive assessments may be wrong.
(3) Evolution in general is difficult. Not really because of the technical dif-
ficulty of the changes but because of the users. The most difficult things I
learned with Squeak is that on the one hand all the system and the world
urge you to fix that specific behavior, it is easy to fix and the system and
your ego would be better after. But the key questions are: How are the clients
impacted? Is the change worth it? May be the design is good enough finally?
Foreword XI
But what is “good enough”? On the other side, not changing is not the so-
lution. Not changing is not really satisfactory because maybe with a slightly
different vocabulary our problem would be so simple to express. In addition
a used system must change. Therefore the next challenge is then how can
we escape code sclerosis. How can we create a context in which changes are
acceptable and possible and not a huge pain? The only way is to build a
change-friendly context. One path to follow is investing in automated tests.
1
In Smalltalk or Objective-C, a method does not have to be in the file or
package of the class to which the method is attached. A package can define
a method that will extend a class defined in another package.
I’m a bit biased when I talk about polymetric views since I love them. Poly-
metric views display structural entities and their relationships using some
trivial algorithms. Then the entities are enriched with metrics. Once again,
the metrics are put into a context. And from this perspective new knowl-
edge emerges. It is worth mentioning that one of the powers of polymetric
views is their simplicity. Indeed, researchers tend to focus on solving difficult
problems, and some people confuse the complexity of problems with that of
the solutions. I have always favored simple solutions when possible since
they may have a chance to get through. Polymetric views have been designed
to be simple so that engineers using different environments can implement
them in one or two days. As an anecdote, an Apple engineer to whom we
showed the polymetric views one evening showed us the next morning that
he had introduced some of them in his environment. This was delightful.
I hope that in the future metrics tools will introduce the overview pyramid
and that reengineers will use the power of polymetric views.
This book goes a step further: It also introduces a systematic way of
detecting bad smells by defining detection strategies. Basically a detection
strategy is a query on code entities that identifies potential bad smells and
structural design problems. Now there are two dangers: first there is the dan-
ger of thinking that because your code does not exhibit some of these bad
smells you are safe; and second there is the danger of thinking the inverse.
Indeed, the authors measure and reveal structural aspects of the program
and not its Design2 . While this may be true that if the structure of an appli-
cation is bad, its design can have problems — there is no systematic way of
measuring the design of an application. Of course, in trivial cases (i.e., when
a system is distorted according to bad practices) structural measurements
will reveal flaws; but in the case of well-designed systems that have evolved
over time, this is another story.
Therefore it is important to see the suggested refactorings as the prelim-
inary step to further and more consequent analysis and action. But this is
an important step. This is like removing the furniture of a room before ren-
ovating it — once you removed it you can see the wall that you should fix.
Thus, just because the suggested refactorings are applied and the proposed
detection strategies do not detect anything does not mean that the problem
is not there, but you are in a much better position moving forward.
So, for all the reasons I’ve mentioned, I’m convinced — and I guess that
you see that I’m not an easy guy to convince — that this book will really help
you to deal with your large applications.
2
You remember, with a capital “D”.
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
5 Identity Disharmonies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.1 Rules of Identity Harmony . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 Overview of Identity Disharmonies . . . . . . . . . . . . . . . . . . . . 78
5.3 God Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.4 Feature Envy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.5 Data Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.6 Brain Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.7 Brain Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.8 Significant Duplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.9 Recovering from Identity Disharmonies . . . . . . . . . . . . . . . 109
XIV Contents
B iPlasma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .175
B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
B.2 iPlasma at Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
B.3 Industrial Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
B.4 Tool Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
C CodeCrawler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .181
C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
C.2 CodeCrawler at Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
C.3 Industrial Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
C.4 Tool Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .195
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .201
1
Introduction
This book is not about metrics per se. It is about the way metrics can
be used in practice to aid us in characterizing software systems, to
evaluate their design and when we detect design problems to provide
the appropriate refactorings.
The goal of this book is to help you characterize, evaluate and
improve the design of the large applications that you have to main-
tain and enhance, by using metrics and visualization techniques to
localize potential structural design problems and identify context-
dependent recovery means.
Why are these relevant problems? Well, if a straightforward and
simple software engineering solution to build perfect and extensible
applications existed, any software engineer would know it and you
would not be reading this book. Designing large applications is diffi-
cult because of the intrinsic complexity of the modelled domains. To
this intrinsic complexity, another incidental factor is added, which
comes from business processes, organizational issues, human and
other external factors.
Based on centuries of development, engineers have built – with
success – extremely complex artifacts, such as bridges, buildings,
satellites, space shuttles, etc. In software engineering, many devel-
opment methodologies and design heuristics [Rie96] have been pro-
posed in recent decades to help software engineers to produce robust
and extensible software. Some methodologies promoted up-front de-
sign such as the now obsolete waterfall model. The spiral model ac-
knowledged this fact and proposes a more flexible model of develop-
ment adapted to changes [Boe88]. More recently, agile methodologies
acknowledged the fact that there is no way to predict future require-
2 1 Introduction
ments and that the only way to survive is to embrace change as a fact
of the software industry [Bec00, Kru04].
Over the years a common understanding of basic software en-
gineering principles emerged. In the case of object-oriented pro-
gramming and design good examples are the Open-Closed Princi-
ple [Mey88a], the Law of Demeter [Lie96], the Substitution Principle
[MGM02, LW93b], Responsibility-Driven Design [WBW89], etc. More-
over, nowadays most software engineers understand and use design
patterns [GHJV95, ABW98], write unit tests and use refactorings
[FBB+ 99].
There are several factors that make designing and developing large
software systems a difficult task:
We wrote this book for all the software engineers concretely facing
legacy systems in their daily work. The ideal reader we have in mind
is a fluent programmer who has to maintain and evolve code he did
not write, or a consultant who has to assess large applications.
While many books exist to help designing existing applications,
there is no book that supports the understanding of applications and
the identification of potential design problems in a scalable manner.
4 1 Introduction
Several good books have emerged over recent years, and next we
present a short and non-exhaustive list. While none of them covers
the objective of this book, they are connected to the exact purpose
of this book, i.e., to characterizing, evaluating and improving existing
code design.
Software metrics books. There are also many good books on soft-
ware metrics. Lorenz and Kidd [LK94] present a first attempt to
use simple metrics to qualify the design of applications. How-
ever, the result is fairly outdated and overly simplistic. There
1 Introduction 5
Tool Support
Our approaches are powerful, but in order to make use of their full
potential an adequate tool support is needed. From this point of view
there are three possibilities:
Enter metrics...
2
In this chapter we briefly introduce you to the good, the bad and
the ugly of software metrics. In this context, we also take a short
look on why and how visualization can be used in conjunction with
metrics to counter-balance several drawbacks of using metrics. By
doing this we aim to set a basis for our approach of employing metrics
to characterize, evaluate and improve the design of software systems.
What is a metric? It is the mapping of a particular characteristic of
a measured entity to a numerical value. An entity can be anything, in-
cluding yourself; the characteristic can be anything, e.g., your height.
The metric height in your case, for example, would be 180 cm. The
metric could also have been 1.8 m. This seemingly trivial issue actu-
ally unravels a space where decisions have to be taken: what is the
unit we are using? Is it important? Yes, otherwise you could end up
being a giant of 180 meters! Moreover, why do we care at all about
your height? Maybe we just wanted to measure your weight – and
this leads us to the next issue: we can measure almost everything,
but if we do not have a clear goal in mind of what we are actually
trying to achieve with these measurements we are wasting our time.
Since this is a book about object-oriented construction and design,
we are quantifying and qualifying those aspects.
Why is it useful to measure? Engineering artifacts are made ac-
cording to precise guidelines, i.e., the size, weight, material, etc. of
screws, construction elements, etc. must be defined upfront and be
respected by those actually creating the artifacts. Metrics in this case
are a way to control quality. Losing control in such a case may have
implications on security and potentially endanger people. In software
engineering it is important and useful to measure systems, otherwise
we risk losing control because of their complexity. Losing control in
12 2 Facts on Measurements and Visualization
such a case could make us ignore the fact that certain parts of the
system grow abnormally or have a bad quality, e.g., cryptic and un-
commented code, badly structured code, etc..
In practice there are different types of metrics that quantify vari-
ous aspects of software development ranging from human resources
to bugs and documentation. Consequently, metrics are used by de-
velopers, team leaders, and project managers, for many specific pur-
poses, like quantifying and qualifying the code that has been writ-
ten, or predicting future development efforts that must be invested
into a project. Software metrics can be divided into two major groups
[LK94]:
1. Project metrics. They deal with the dynamics of a project, with what
it takes to get to a certain point in the development life cycle and
how to know you are there. They can be used in a predictive man-
ner, e.g., to estimate staffing requirements. Being at a higher level
of abstraction, they are less prescriptive, but are more important
from an overall project perspective.
2. Design metrics. These metrics are used to assess the size and in
some cases the quality, size and complexity of software. They look
at the quality of the project’s design at a particular point in the
development cycle. Design metrics tend to be more locally focused
and more specific, thereby allowing them to be used effectively to
directly examine and improve the quality of the product’s compo-
nents.
1. List the major Goals for which metrics are going to be employed.
2. From each goal derive the Questions that must be answered to
determine if the goals are met.
3. Decide what Metrics must be collected to answer the questions.
The goal indicates the purpose of collecting the data. The questions
tell us how to use the data and they help in generating only those
2.1 Metrics and Thresholds 13
measures that are related to the goal. In many cases, several mea-
surements are necessary to answer a single question; likewise, a sin-
gle measurement may apply to more than one question. In the rest
of this book, we implicitly use GQMto efficiently employ metrics for
characterizing and evaluating the design structure of object-oriented
systems.
Statistics-Based Thresholds
• the Typical values, i.e., the range of values that includes the data
from most projects.
• the Lower and respectively the Higher margins of this interval.
• the Extreme high values, i.e., a value beyond which a value can be
considered an outlier.
We use two statistical means to find what the typical high and low
values are:
1. Average (AVG), to determine the most typical value of the data set
(i.e., the central tendency).
2. Standard deviation (STDEV), to get a measure of how much the
values in the data set are spread1 .
Knowing the AVG and STDEV values and assuming a normal distri-
bution for the collected data (i.e., that most values are concentrated
in the middle rather than the margins of the data set), we also know
the two margins of the typical values interval for a metric2 and the
threshold for very high values. These are:
1
The standard deviation is defined as the square root of the variance. This
means it is the root mean square (RMS) deviation from the average. It is
defined this way in order to give us a measure of dispersion that is (1) a
non-negative number, and (2) has the same units as the data. For example,
if the data are distance measurements in meters, the standard deviation
will also be measured in meters.
2
If the distribution of the data set is normal around 70% of the values will
be in this interval.
16 2 Facts on Measurements and Visualization
given metric. Based on the information from Table 2.1 we can state
that a Java method is very long if it has at least 20 LOC, or that a
C++class has few methods if it has between 4 and 9 methods.
Java C++
Metric Low Ave- High Very Low Ave- High Very
rage High rage High
CYCLO/Line of Code 0.16 0.20 0.24 0.36 0.20 0.25 0.30 0.45
LOC/Method 7 10 13 19.5 5 10 16 24
NOM/Class 4 7 10 15 4 9 15 22.5
The thresholds values presented above are relevant for more than
the three metrics themselves; they can be used to derive thresholds
for any metric that can be expressed in terms of these three metrics.
Example. We want to know what a high WMC (Weighted Method
Count) value is for a class written in Java. We use the following def-
inition of WMC [CK94]: the sum of the CYCLO metric [McC76] over
all methods of a class. Thus, WMC can be expressed in terms of the
three metrics as follows:
CY CLO LOC N OM
WMC = · ·
LOC M ethod Class
To compute a threshold for high WMC means selecting from Ta-
ble 2.1 the high statistical values for the three primary terms from
the formula above and multiplying them. In a similar fashion we can
compute the low, average, high, and very high thresholds for two
other size and complexity metrics used in this book, i.e., LOC/Class
and AMW (Average Method Weight) a.k.a. CYCLO/Method (see Ta-
ble 2.2).
Java C++
Metric Low Ave- High Very Low Ave- High Very
rage High rage High
WMC 5 14 31 47 4 23 72 108
AMW 1.1 2.0 3.1 4.7 1.0 2.5 4.8 7.0
LOC/Class 28 70 130 195 20 90 240 360
NOM/Class 4 7 10 15 4 9 15 23
Meaningful Thresholds
Statistics-based thresholds are useful for most metrics, but for some
others they are implicitly given by observations. In that sense they
are also based on statistics, but their values have become part of our
culture. Therefore we do not need to statistically measure them, but
we can infer them from common knowledge.
Example. If we think about the maximum nesting level of state-
ments in a method it is clear that 0 denotes a method without any
conditional statements and 1, 2 or 3 would mean that there is some
nesting but it is quite shallow; but if the maximum nesting level gets
higher than that we know that the method has a deep nesting level
and following the control flow is harder.
We identified two cases of thresholds based on meanings that are
generally accepted and easy to understand: (1) commonly-used frac-
tion thresholds and (2) thresholds with generally-accepted meaning.
Table 2.3. Threshold values for normalized metrics and their semantic la-
bels.
values lie between 0 and 1. The lower the TCC value, the less cohesive
is the class: if we want to find non-cohesive classes we pick a value
lower than half, i.e., either one-third or one-quarter.
Note that these thresholds are used throughout this book, but this
does not make them generally applicable to other contexts.
Example. We use such thresholds for the ATFD (Access To Foreign
Data) metric, which counts how many attributes from other classes
are accessed directly from a measured class. If we wanted to judge “by
the book” then every class with AT F D > N ON E is problematic. But if
we wanted to admit that accessing accidentally a couple of attributes
of other classes is not that much of a problem, we can classify as
critical only those classes that have AT F D > F EW .
Quiz. Which of the cylinders in Fig. 2.1 have which letter associated
with it?
It is easy to assess that the cylinder on the right has the largest
diameter, the one in the middle has the greatest height, while the one
on the left has the smallest diameter. Why is that? Human perception
allows us to perform such non-trivial analysis as an in-grained mech-
anism, despite the fact that we had no numbers to hand. However,
when provided with a table containing metric information (height, di-
ameter, weight) for the cylinders we have no problem assigning those
numbers. Do we? There is a problem with the weight metric which
confuses us. Why? It does not respect the so-called representation
condition.
In measurement theory, the procedure of rendering metrics on
visual characteristics of representations is called measurement map-
ping, and must fulfill the representation condition, which asserts that
“a measurement mapping M must map entities into numbers and
20 2 Facts on Measurements and Visualization
NOM
C
A LOC
B
This book is not about metrics per se. It is about the way that metrics
can be used to aid us in characterizing software systems, to evaluate
them and when we detect design problems to provide the appropriate
refactorings.
3
ArgoUML is open source, see http://argouml.tigris.org for more in-
formation.
22 2 Facts on Measurements and Visualization
Not really. But what if the same system has 1 million lines of code?
Then it becomes quite striking that the 500 classes are overloaded
with functionality, since each class has an average of 2,000 lines!
Or imagine that we additionally knew the number of methods in
the system. What if we find out that there are only 1,000 methods
in the system? It would be striking again because having only two
methods per class looks suspiciously low.
• Missing reference points. Without reference values, any compari-
son of a qualitative assessment is impossible. A line of C++code
is not the same as a line of Java, Smalltalk or Ada code because
each language has its own syntax and semantics; and, even more
important, each has its own style and best practices.
Providing an overall characterization of a system is a tough job. Con-
sequently, it is deceiving to believe that several “classic” system-level
metrics (usually size metrics such as the ones mentioned previously)
can characterize a whole system. But characterizing a system is pos-
sible if we use the proper measurement means, i.e., if we extend our
system-level measurements to other aspects than size and correlate
these results in a proper manner.
In the remainder of the chapter we present two techniques to
characterize object-oriented systems in terms of size and complex-
ity which solve the above issues:
1. The Overview Pyramid is a metrics-based means to both describe
and characterize the structure of an object-oriented system by
quantifying its complexity, coupling and usage of inheritance.
2. The Polymetric Views are a visualization of software entities and
their relationships enriched with metrics.
Inheritance
Fig. 3.1. The the three major structural aspects of a system quantified by
the Overview Pyramid.
The three previously mentioned aspects are closely related and mu-
tually influence each other (Fig. 3.1). While Size & Complexity and
Coupling characterize every software system, the Inheritance aspect
26 3 Characterizing the Design
The left side of the Overview Pyramid (Fig. 3.2) provides information
characterizing the size and complexity of the system.
Size and complexity: direct metrics. We need a set of direct
metrics (i.e., metrics computed directly from the source code) to de-
scribe a system in simple, absolute terms. The metrics describing the
size and complexity are probably some of the simplest and widely
used metrics. They count the most significant modularity units of
an object-oriented system, from the highest level (i.e., packages or
namespaces), down to the lowest level units (i.e., code lines and in-
dependent functionality blocks). For each unit there is one metric in
the Overview Pyramid that measures it. The metrics are placed one
per line in a top-down manner, from a measure for the highest level
unit (i.e., Number of Packages (NOP )) down to a complexity measure
counting the number of independent paths in an operation (i.e., the
cyclomatic complexity (CYCLO)). We use the following metrics for the
size and complexity side of the Overview Pyramid:
The top part of the Overview Pyramid is not a ladder as in the pre-
vious cases; it is composed of two metrics that provide an overall
characterization of inheritance usage. These proportion metrics re-
veal how much inheritance is used in the system, as a first sign of
how much “object-orientedness” (i.e., usage of class hierarchies and
polymorphism) to expect in the system.
30 3 Characterizing the Design
Java C++
Metric Low Average High Low Average High
CYCLO/Line of code 0.16 0.20 0.24 0.20 0.25 0.30
LOC/Operation 7 10 13 5 10 16
NOM/Class 4 7 10 4 9 15
NOC /Package 6 17 26 3 19 35
CALLS/Operation 2.01 2.62 3.2 1.17 1.58 2
FANOUT /Call 0.56 0.62 0.68 0.20 0.34 0.48
ANDC 0.25 0.41 0.57 0.19 0.28 0.37
AHH 0.09 0.21 0.32 0.05 0.13 0.21
2
We did not include a very-high threshold because we consider that the
three low, average and high thresholds are enough for the interpretation
of the Overview Pyramid.
3
As already mentioned in the previous chapter, these metrics are collected
from a statistical base of 45 Java projects and 37 C++projects. The projects
have various sizes (from 20,000 up to 2,000,000 lines), they come from
various application domains, and we included both open-source and in-
dustrial (commercial software) systems.
3.2 Polymetric Views 33
operations per class, and 20.21 classes per package the system has
rather large classes and packages.
On the System Coupling side we learn the following: the system is
intensively coupled in terms of operation calls, but these calls tend to
be rather localized, i.e., functions tend to call many operations from
few classes.
In the Class Hierarchies part we read the following: The class hi-
erarchies are frequent in the system (low ANDC value), and very
shallow (low AHH value).
To facilitate the visual interpretation of the Overview Pyramid
we associate the computed proportions with colors that map those
numbers to their semantics in terms of the three types of statisti-
cal thresholds (i.e., low, average, high) presented in Table 3.1. Thus,
we place a computed proportion in a blue rectangle to show that
the value is closest to the low threshold. Similarly, if a value is clos-
est to the average threshold it will be placed in a green rectangle;
eventually, if the computed value is closest to the high threshold, the
number will be placed in a red rectangle.
Fig. 3.6. Using colors to interpret the Overview Pyramid. BLUE means a low
value; GREEN means an average value; RED stands for a high value.
Edge Width
Relationship
& Color Metrics
Entities
• Node size. The width and height of a node can render two mea-
surements. We follow the convention that the wider and the higher
the node, the bigger the measurements its size is reflecting.
• Node color. The color interval between white and black can dis-
play a measurement. Here the convention is that the higher the
4
The underlying model is thus that a software system can be modelled as
a graph where the vertices represent entities, i.e., source code artifacts or
abstractions of them, and the arcs represent relationships.
3.2 Polymetric Views 35
Fig. 3.8. A System Complexity view. This view uses the following metrics:
width metric = number of attributes, height metric = number of methods,
color metric = number of lines of code.
to each other and compare them not only within a global, but espe-
cially within a local context. As another example: in our experience if
sibling classes look very similar, i.e., they have a similar number of
attributes and methods, this may oftentimes point to duplication.
The research we have performed on the polymetric views is ex-
tensive and can be found in various publications [Lan03a, Lan03b,
LD03, LD05, DL05, LDGP05, GLD05], but since the goal of this book
is not to explain everything about the views in detail, we direct the
interested reader to those publications. Note also that the views can
be easily composed and tweaked, especially regarding the metrics,
i.e., the number of potential views is very high, but in the context
of this book we limit ourselves to using two simple views, which are
presented next.
Fig. 3.9. A System Hotspots view of Duploc. The nodes represent all the
classes, while the size of the nodes represent the number of methods they
define. The gray nodes represent metaclasses.
38 3 Characterizing the Design
This simple view helps to identify large and small classes and
scales up to very large systems. It relates the number of methods to
the number of attributes of a class. The nodes are sorted according to
the number of methods, which makes the identification of behavioral
outliers easy (note that a class that has many methods also tends to
consist of many lines of code).
This view gives a general impression of the system in terms of
overall size (how many classes are there?) and in terms of size of the
classes (are there any really large classes and how many of these
giants are there?).
Large nodes represent voluminous classes that define many meth-
ods and should be further investigated. Small nodes represent either
structs or very small classes. Classes with NOM = 0 should be in-
vestigated to see if they are not dead code. Further evidence can be
gained from the color, which can be used to reflect the number of
lines of code of a class. Should a tall class have a light color it means
that the class contains mostly short methods.
Example - System Hotspots. In Fig. 3.9 we see a System Hotspots
view of all the classes of Duploc. The classes in the bottom row con-
tain more than 100 methods and should be further investigated.
They are DuplocPresentationModelController (107 methods), RawMa-
trix (107), DuplocSmalltalkRepository (116) and DuplocApplication (117
methods). This view shows that Duploc is a system of more than 300
classes, where the largest classes contain more than 100 methods. It
also shows an impressive number of very small classes implementing
few methods.
Fig. 3.10. A System Complexity view on Duploc. The nodes represent the
classes, while the edges represent inheritance relationships. As metrics we
use the number of attributes (NOA) for the width, the number of methods
(NOM) for the height and the number of lines of code (LOC) for the color.
a hierarchy. It also answers the question about the size of the subject
system. Moreover, it helps to detect exceptional classes in terms of
number of methods (tall nodes) or number of attributes (wide nodes).
Tall, narrow nodes represent classes with few attributes and many
methods. Deep or large hierarchies are definitively subsets of the sys-
tem on which more specific views should be applied to refine their
understanding. Large, standalone nodes represent classes with many
attributes and methods without subclasses. It may be worth looking
at the internal structure of the class to learn if the class is well struc-
tured or if it could be decomposed or reorganized.
Example - System Complexity. In Fig. 3.10 we present a System
Complexity view of all the classes of Duploc. We see that Duploc is
composed of many classes not organized in inheritance hierarchies.
Indeed, there are some very large classes which do not have sub-
classes. The largest inheritance hierarchies are five and six levels
deep. Noteworthy hierarchies seem to be the ones with the follow-
ing root classes: AbstractPresentationModelControllerState, Abstract-
PresentationModelViewState and DuplocSourceLocation. By manually
inspecting the first one, with the root class AbstractPresentationMod-
elControllerState having 31 descendants, we infer that it seems to be
the application of the state design pattern [GHJV95, ABW98] for the
controller part of an Model–View–Controller pattern. Such a complex
hierarchy within Duploc is necessary, since Duploc does not make
40 3 Characterizing the Design
To visually get a first idea of the raw size of the system we display in
Fig. 3.12 a System Hotspots view. What we see in the figure are all
1,393 model classes of ArgoUML . The size of the nodes represents
the number of methods (NOM), while the color represents the lines of
code (LOC) of each class. The class ModelFacade is striking because
of its size (453 methods, 3,507 lines) compared to the other classes
in the system. The next three largest classes are CoreFactory (116
NOM, 1,100 LOC), GeneratorCpp (97 NOM, 2,259 LOC) and Project
(85 NOM, 690 LOC). Another class which has many lines compared to
its number of methods is ParserDisplay (“only” 53 methods but 2517
lines). Moreover, we thickened the border of the abstract classes in
the system, and perceive that in ArgoUML there are many of them,
the two largest being FigNodeModelElement and FigEdgeModelEle-
ment which each have 81 and 55 methods.
In Fig. 3.13 we see a System Complexity visualization of the com-
plete system in terms of the inheritance hierarchies. The reader
42 3 Characterizing the Design
should keep in mind that what is actually displayed in one single fig-
ure here resides in more than 1,200 source files distributed in dozens
of directories: We are talking about a fairly complex system, although
certainly not a very complex system. However, we see that ArgoUML
has been implemented using some complex and deep inheritance hi-
erarchies (some of them have more than seven levels).
Design Harmony
do not talk to nobody, etc.), and finally in harmony with its ancestors
and descendants. Every artifact must have its appropriate place, size,
and complexity to fit the system context.
A metric alone cannot help to answer all the questions about a sys-
tem and therefore metrics must be used in combination to provide
relevant information. Why?
Using a medical metaphor we might say that the interpretation of
abnormal measurements can offer an understanding of symptoms,
but the measurements cannot provide an understanding of the dis-
ease that caused those symptoms. The bottom-up approach, i.e., going
from abnormal numbers to the recognition of design diseases is im-
practicable because the symptoms captured by single metrics, even
if perfectly interpreted, may occur in several diseases: The interpre-
tation of individual metrics is too fine grained to indicate the disease.
This leaves us with a major gap between the things that we mea-
sure and the things that are in fact important at the design level with
respect to a particular investigation goal.
How should we combine then metrics in order to make them serve
our purposes? The main goal of the mechanism presented below is
to provide engineers with a means to work with metrics at a more
abstract level. The mechanism defined for this purpose is called a
detection strategy, defined as follows:
4.1 Detection Strategies 49
Filtering
The key issue in filtering is to reduce the initial data set so that only
those values that present a special characteristic are retained. A data
filter is a boolean condition by which a subset of data is retained from
an initial set of measurement results, based on the particular focus
of the measurement.
The purpose of filtering is to keep only those design fragments
that have special properties captured by the metric. To define a data
filter we must define the values for the bottom and upper limits of
the filtered subset. Depending on how we specify the limit(s) of the
resulting data set, filters can be either statistical, based on absolute
thresholds, or based on relative thresholds.
Statistical Filters
Threshold-Based Filters
Composition
Input Term
Input Term
Input Term
OR Output Term
Input Term
The second step is to select proper metrics that quantify best each of
the identified properties. In this context the crucial question is: from
where should we take the proper metrics? There are two alternatives:
54 4 Evaluating the Design
Notice that while the first two metrics (i.e., WMC and TCC) are metrics
defined in the literature, the last one was defined by us in order to
capture a very specific aspect, i.e., the extent to which a class uses
attributes of other classes.
The next step is to define for each metric the filter that captures best
the symptom that the metric is intended to quantify. As mentioned
earlier, this implies to (1) pick-up a comparator and (2) to set an
1
For a precise description of all the metrics used in the book, including the
metrics below please refer to Appendix A.
56 4 Evaluating the Design
Invocation Sequence
A class blueprint decomposes a class into layers and assigns its at-
tributes and methods to each layer based on the heuristics described
below (see Fig. 4.6). The layers support a call-graph notion in the
sense that a method node on the left connected to another node on
the right is either invoking or accessing the node on the right that
represents a method or an attribute.
The layers have been chosen according to a notion of time-flow and
encapsulation. The notion of encapsulation is visualized by separat-
ing state (to the right) from behaviour (to the left), and distinguish-
ing the public (to the left) from the private part (to the right) of the
class’ behaviour. Added to this only the actual source code elements
are visualized, i.e., we do not represent artificial elements resulting
3
The colors used in our visualizations follow visual guidelines suggested by
Bertin [Ber74], Tufte [Tuf90], Ware [War00], and Pinker [Pin97], e.g., we
take into account that the human brain is not capable of simultaneously
processing more than a dozen distinct colors.
60 4 Evaluating the Design
Number of Invocations
Abstract
Method
Number of
Method Lines of Code Overriding
Method
Number of
Extending
Attribute internal accesses Method
Constant
Direct access
Method invocation Method
Read Accessor
Method
Write Accessor
Method
Fig. 4.7. In a class blueprint the metrics are mapped on the width and the
height of a node. The methods and attributes are positioned according to the
layer they have been assigned to.
class for the width and the number of direct accesses from methods
defined in other classes for the height. This allows one to identify how
attributes are accessed.
Description Color
Attribute blue node
Abstract method cyan node
Extending method. A method which performs a super invocation. orange node
Overriding method. A method redefinition without hidden method invocation. brown node
Delegating method, forwards the method call to another object. yellow node
Constant method. A method which returns a constant value. grey node
Interface and Implementation layer method. white node
Accessor layer method. Getter. red node
Accessor layer method. Setter. orange node
Invocation of a method. blue edge
Invocation of an accessor. Semantically equivalent to a direct access. blue edge
Access to an attribute. cyan edge
The left part of Fig. 4.8 shows the blueprint of a Smalltalk class
named Jun-OpenGL3dGraphAbstract which we describe hereafter. As
the named blueprint on the right in Fig. 4.8 shows, this kind of rep-
resentation does not scale well in practice; additionally, metrics in-
formation is not reflected in a named blueprint (i.e., the width and
height of nodes is not correlated with metric value). Therefore it is
not used in this book.
The code shown is Smalltalk code; however, in order to understand
the code sequence being fluent in Smalltalk is not a must as we are
only concerned with method invocations and attribute accesses.4
The class blueprint shown in Fig. 4.8 has the following structure:
method with the same name, hence the node color is orange. It
directly accesses two attributes, as the cyan line shows. The code
of the method initialize is as follows:
initialize
super initialize.
displayObject := nil.
displayColor := nil
asPointArray
ˆ self displayObject asPointArray
The five grey nodes in the interface layer are methods returning
constant values as illustrated by the following method isArc. This
method illustrates a typical practice to share a default behavior
among the hierarchy of classes.
isArc
ˆ false
displayObject
displayObject isNil ifTrue:
[ displayObject := self createDisplayObject ].
ˆ displayObject
createDisplayObject
ˆ self subclassResponsibility
Fig. 4.9. A blueprint of the class JunSVD. This class blueprint shows patterns
of the type Single Entry, Structured Flow and All State.
Example 2: An Algorithm
The class blueprint presented in Fig. 4.9 displays the class JunSVD
implementing the algorithm of the same name. Looking at the blueprint
we get the following information.
compute
| superDiag bidiagNorm eps |
m := matrix rowSize.
n := matrix columnSize.
u := (matrix species unit: m) asDouble.
v := (matrix species unit: n) asDouble.
sig := Array new: n.
superDiag := Array new: n.
bidiagNorm := self bidiagonalize: superDiag.
eps := self epsilon * bidiagNorm.
self diagonalize: superDiag with: eps.
color
ˆ ColorValue hue: self hue
saturation: self saturation
brightness: self brightness
We see that the methods xy: (B) and xy (C), play a central role in the
design of the class as they are both called by several of the methods
of each subclass, as confirmed by the following method of the class
JunColorChoiceSBH:
JunColorChoiceSBH>>brightness: value
((value isKindOf: Number) and:
[0.0 <= value and: [value <= 1.0]])
ifTrue: [self xy: self xy x @ 1 - value]
By building the class blueprint for this class (see Fig. 4.11) we
can immediately see that Modeller is not a class with an excessive
number of methods, but has a certain number of considerably large
and complex methods (3 methods are longer than 100 lines of code,
the longest one addDocumentationTag (annotated as 1a in the figure)
70 4 Evaluating the Design
is 150 lines code and invoked by three other methods, two of which
are the second and third longest methods in this class: addOperation
(1b, 116 LOC) and addAttribute (1c, 108 LOC). The class blueprint
reveals other disharmonies in this class: there are 12 attributes in
this class, all of them private (which is good), but there are “only” 4
accessor methods. Moreover, the attributes are accessed both directly
and indirectly (using the accessors), denoting a certain inconsistency
or lack of access policy.
Identity Disharmonies
Identity disharmonies are design flaws that affect single entities such
as classes and methods. The particularity of these disharmonies is
that their negative effect on the quality of design elements can be
noticed by considering these design elements in isolation.
Each class should present its identity (i.e., its interface) by a set
of services, which have one single responsibility and which pro-
vide a unique behavior
Proportion Rule
Rationale
Presentation Rule
Rationale
Practical Consequences
Implementation Rule
Rationale
Practical Consequences
• Keep data close to operations – Data and the operations that use
it most should be placed as close as possible to one another. In other
words, data (e.g., attributes, local variables, etc.) should stay in the
class or method where they are used the most.
diate consequence, the methods of the (God) classes, which use the
foreign data, smell of Feature Envy(84) [FBB+ 99], being more inter-
ested in the attributes of other classes than those of their own class.
80 5 Identity Disharmonies
Applies To Classes.
The general design of ArgoUML is good enough so that we could not Example
identify a pure God Class i.e., a class controlling the flow of the appli-
cation and concentrating all the crucial behavior, which would indi-
cate a clear lack of object-oriented design. However, certain classes in
ArgoUML acts as a black hole attracting orphan functionalities. Such
classes are also detected by the metrics presented above and are still
a design problem. A class of ArgoUML which clearly stands out is the
huge class ModelFacade (see Fig. 3.12). This class implements 453
82 5 Identity Disharmonies
methods, defines 114 attributes, and is more than 3500 lines long.
Moreover, all methods and all attributes are static. Its name hints
at being an implementation of the Facade Design Pattern [GHJV95],
but it has become a sort of black hole of functionality. In Fig. 5.3 we
see its Class Blueprint with a modified layout for the methods and
attributes to make this Class Blueprint fit on one screen. Looking at
the Class Blueprint for this class it seems that the developers use it
for everything that does not fit into other classes, but the downside
is that this class is like a tumor within this system and can only
5.3 God Class 83
Description Objects are a mechanism for keeping together data and the opera-
tions that process that data. The Feature Envy design disharmony
[FBB+ 99] refers to methods that seem more interested in the data
of other classes than that of their own class. These methods access
directly or via accessor methods a lot of data of other classes. This
might be a sign that the method was misplaced and that it should be
moved to another class.
Applies To Methods.
Impact Data and the operations that modify and use it should stay as close
together as possible. This data-operation proximity can help minimize
ripple effects (a change in a method triggers changes in other methods
and so on; the same applies for bugs, i.e., in case of a poor data-
operation proximity bugs will also be propagated) and help maximize
cohesion (see Implementation Rule).
Detection The detection is based on counting the number of data members that
are accessed (directly or via accessor methods) by a method from
outside the class where the method under investigation is defined.
Feature Envy happens when the envied data comes from a very few
classes or only one class. The detection strategy (Fig. 5.4) in detail is:
1
In defining the God Class(80) detection strategy we also used a metric
called ATFD, which counts how many distinct attributes from other classes
are accessed by the measured design entity. The only difference is that in
the God Class(80) the metric is defined for a class entity, while here it is
defined for a method entity.
5.4 Feature Envy 85
FDP ! FEW
This problem can be solved if the method is moved into the class to Refactoring
which it is coupled the most. If only a part of the method suffers from
a Feature Envy it might be necessary to extract that part into a new
method and after that move the newly created method into the envied
class. If the method envies two different classes, you should move it
to the one that it uses most.
Oftentimes, the class that a method affected by Feature Envy is de-
pending on is a class with not much functionality, sometimes even a
Data Class(88). If this is a case then moving the Feature Envy method
to that class is even more a desirable refactoring, as it re-balances
the distribution of functionality among class and improves the data-
behavior locality.
The concrete refactoring technique for Feature Envy is based on
the Move Method and Extract Method refactorings [FBB+ 99]. Fur-
thermore, the Move Behavior Close to the Data reengineering pattern
[DDN02] discusses the steps to follow to move behavior close to the
data it uses and the potential difficulties.
88 5 Identity Disharmonies
Description Data Classes [FBB+ 99] [Rie96] are “dumb” data holders without com-
plex functionality but other classes strongly rely on them. The lack of
functionally relevant methods may indicate that related data and be-
havior are not kept in one place; this is a sign of a non-object-oriented
design. Data Classes are the manifestation of a lacking encapsulation
of data, and of a poor data-functionality proximity.
Applies To Classes.
Detection We detect Data Classes based on their characteristics (see Fig. 5.6):
we search for “lightweight” classes, i.e., classes which provide almost
no functionality through their interfaces. Next, we look for the classes
that define many accessors (get/set methods) and for those who de-
clare data fields in their interfaces. Finally, we confront the lists and
manually inspect the lightweight classes that declare many public
attributes and those that provide many accessor methods. The detec-
tion strategy in detail is:
Fig. 5.7. Data Class reveals many attributes and is not complex.
The name itself already suggests that the class is not really modelling
an abstraction in the system, but rather keeps together a set of data.
Looking closer, we notice that the class has five attributes. In Fig. 5.8
we depict the Property class together with the classes that use its
data. In spite of the fact that all attributes are declared as private,
the class is still a pure data holder, due to the fact that all (but one)
of its methods are accessors (see methods in red). Thus, the class
has no behavior, it just keeps some data, used by three other classes.
Although none of the involved classes are large, the fact that data
5.5 Data Class 91
The basic idea of any refactoring action on a Data Class is to put Refactoring
together in the same class the data and the operations defined on
that data, and to provide proper services to the former clients of the
public data, instead of the direct access to this data.
• This data-operation proximity (see Implementation Rule) can be
achieved in most of the cases by analyzing how clients of the Data
Class use this data. In this way we can identify some pieces of
functionality (behavior) that could be extracted and moved as ser-
vices to the Data Class. This refactoring action is very much re-
lated to what needs to be done when Feature Envy(84) is encoun-
tered. In other words, when refactoring a case of Feature Envy(84),
this could lead to a positive effect towards repairing a envied Data
Class.
• In some other cases, especially if the Data Class is dumb and has
only one or a few clients, we could remove the class completely
from the system and put the data it contains in those classes (for-
mer clients) where the best data-operation proximity is achieved.
• If the Data Class is a rather large class with some functionality,
but also with many exposed attributes, it is very possible that only
a part of the class needs to be cured. In some cases this could
mean extracting the disharmonious parts together to a separate
class and applying the classical treatment, i.e., trying to extract
pieces of functionality from the data clients as services provided
by the new class.
92 5 Identity Disharmonies
Description Often a method starts out as a “normal” method but then more and
more functionality is added to it until it gets out of control, becoming
hard to maintain or understand. Brain Methods tend to centralize the
functionality of a class, in the same way as a God Class(80) central-
izes the functionality of an entire subsystem, or sometimes even a
whole system.
Impact A method should avoid size extremities (Proportion Rule). In the case
of Brain Methods the problem concerns overlong methods, which are
harder to understand and debug, and practically impossible to reuse.
A well-written method should have an adequate complexity which is
concordance with the method’s purpose (Implementation Rule).
Detection The strategy for detecting this design flaw (see Fig. 5.9) is based on
the presumed convergence of three simple code smells described by
Fowler [FBB+ 99]:
• Long methods – These are undesirable because they affect the un-
derstandability and testability of the code. Long methods tend to
do more than one piece of functionality, and they are therefore us-
ing many temporary variables and parameters, making them more
error-prone.
• Excessive branching – The intensive use of switch statements
(or if–else–if) is in most cases a clear symptom of a non-object-
oriented design, in which polymorphism is ignored.2
• Many variables used – The method uses many local variables but
also many instance variables.
2
The excessive use of polymorphism also introduces testability and analyz-
ability problems [Bin99]. Yet, the emphasis in the context of this design
flaw is on a very frequent case in which legacy systems migrated from
structured to object-oriented programming.
5.6 Brain Method 93
CYCLO ! HIGH
AND Brain Method
Method has deep nesting
MAXNESTING ! SEVERAL
3
Only the lines of code in the methods of the class are counted.
94 5 Identity Disharmonies
Example Fig. 5.10 shows that Modeller is not a class with an excessive number
of methods, but has a certain number of Brain Methods. Some of
the methods reach considerable sizes (eight methods are longer than
50 lines of code), the longest one addDocumentationTag (annotated
as 1a in the figure) is 150 lines of code and invoked by three other
methods, two of which are the second and third longest methods in
this class: addOperation (1b, 116 LOC) and addAttribute (1c, 108
LOC).
The Class Blueprint reveals other disharmonies in this class: there
are 12 attributes in this class, all of them private (which is good), but
there are only four accessor methods. Moreover, the attributes are
accessed both directly and indirectly (using the accessors), denot-
ing a certain inconsistency or lack of access policy. As we will see
5.6 Brain Method 95
String changeIndicator =
ProjectManager.getManager().
getCurrentProject().
getSaveRegistry().
hasChanged() ? " *" : "";
ArgoDiagram activeDiagram =
ProjectManager.getManager().
getCurrentProject().
getActiveDiagram();
The problem with such long invocation chains is that only one of
the “links” in the middle has to break (because some method has
changed) to make the whole chain break down.
Fowler suggests [FBB+ 99] that in almost all cases a Brain Method Refactoring
should be split, i.e., that one or more methods (operations) are to be
extracted. He also explains how to find the possible “cutting points”:
This design disharmony is about complex classes that tend to accu- Description
mulate an excessive amount of intelligence, usually in the form of
several methods affected by Brain Method(92).
This recalls the God Class(80) disharmony, because those classes
also have the tendency to centralize the intelligence of the system.
It looks like the two disharmonies are quite similar. This is partially
true, because both refer to complex classes. Yet the two problems are
distinct.
The fingerprint of a God Class is not just its complexity, but the
fact that the class relies for part of its behavior on encapsulation
breaking, as it directly accesses many attributes from other classes.
On the other hand, the Brain Class detection strategy is trying to
complement the God Class strategy by catching those excessively
complex classes that are not detected as God Classes either because
they do not abusively access data of “satellite” classes, or because
they are a little more cohesive.
Classes which are not a God Class(80) and contain at least one Applies To
method affected by Brain Method(92).
The detection rule can be assumed as follows (see Fig. 5.11). A class Detection
is a Brain Class if it has at least a few methods affected by Brain
Method(92), if it is very large (in terms of LOC), non-cohesive and
very complex. If the class is a “monster” in terms of both size (LOC)
and functional complexity (WMC) then the class is considered to be
a Brain Class even if it has only one Brain Method(92).4 The detection
strategy in detail is:
1. Class contains more than one Brain Method(92) and is very
large. A class is very large if the total number of lines of code
from methods of the class is very high (see Fig. 5.12).
4
Looking carefully at the detection rule and comparing it to the one for God
Class(80), you will notice that nothing hinders a God Class from also being
detected as a Brain Class. For simplification, we exclude a priori classes
classified as God Classes.
98 5 Identity Disharmonies
Example In Fig. 5.13 we see that the class ParserDisplay not only is visually de-
formed, but also plays strange tricks in terms of inheritance. As for
the visual deformation, this class implements some very large meth-
ods, the largest one (the tallest method box) with 576 lines of code
(this method is also the largest in the entire system) and another five
methods longer than 100 lines. In total 13 methods are longer than
50 lines. Moreover, there is a large amount of intra-method dupli-
5.7 Brain Class 99
Fig. 5.13. A Class Blueprint of ParserDisplay with its completely abstract su-
perclass Parser and a Class Blueprint of FigClass.
removed this class would become much lighter, and less problem-
atic. As you can see, in order to restore one aspect of harmony the
other aspects must be considered as well. In this concrete example,
we would not have found the cause of the Brain Method(92) problem
if we had not looked at the duplication within the hierarchy.
The primary characteristic of a Brain Class is the fact that it contains Refactoring
Brain Method. Therefore the main refactoring actions for these classes
must be directed towards curing the Brain Method(92) disharmonies.
Additionally, in our approach classes affected either by Brain Class(97)
or God Class(80) represent the starting point in the detection and
correction of identity disharmonies (see Sect. 5.9).
Apart from that, in our experience, there are at least three types
of Brain Class, each of them requiring a different treatment:
1. The methods suffering from Brain Method(92) contained in the
class are semantically related (oftentimes overloaded methods),
and contain a significant amount of duplicated code. Factoring
out the commonalities from these methods in form of one or more
private or protected methods, while making the initial methods
provide only the slight differences would significantly reduce the
complexity of the class.
2. A possible type of Brain Method appears when a class is conceived
in a procedural programming style. Consequently, the class is
mainly used as a grouping mechanism for a collection of some-
how related methods that provide some useful algorithms. In this
case the class is non-cohesive. Refactoring such a class requires
to split it into two or more cohesive classes. Yet, performing such a
refactoring requires a substantial amount contextual information
(e.g., which class(es) use(s) which parts of the initial class, where
is stored the data on which each Brain Method operates on etc.)
3. There are cases where a Brain Class proves to be rather harmless.
In several case studies we encountered cases where an excessively
complex class was a matured utility class, usually not very much
related to the business domain of the application (e.g., a class
modelling a Lisp interpreter in a 3-D graphics framework). If, ad-
ditionally, the maintainers of the system or the analysis of the
system’s history [RDGM04] show that no maintenance problems
have been raised by that class then it makes no sense to start a
costly effort of refactoring that class just for the sake of getting
better metric values for the system.
102 5 Identity Disharmonies
Description The detection of code duplication plays an essential role in the as-
sessment and improvement of a design. But detected clones might
not be relevant if they are too small or if they are analyzed in isola-
tion. In this context, the goal of this detection strategy is to capture
those portions of code that contain a significant amount of duplica-
tion. What does significant mean? In our view a case of duplication is
considered significant if:
Detection In practice, duplications are rarely the result of pure copy–paste ac-
tions, but rather of copy–paste–adapt “mutations”. These slight mod-
ifications tend to scatter a monolithic copied block into small frag-
ments of duplicated code. The smaller such a fragment is, the lower
the refactoring potential, since the analysis becomes harder, and the
granted importance is decreased, too. So, for example, imagine we
found two operations that have five identical lines, followed by one
line that is different, which is followed by another four identical lines.
Did we find two clones (of five and four lines) or one single clone
5.8 Significant Duplication 103
Now, with these metrics in mind we can revisit the example men-
tioned earlier in this section, with two functions having two exact
clones. In terms of the low-level duplication metrics introduced in
this section, we can now say that the first clone has a SEC value of
5, while the second one has a SEC value of 4. Between the two clones
104 5 Identity Disharmonies
Significant standalone
exact clone
SDC ! 2x(FEW+1)+1
LB " FEW
Looking at the ArgoUML case study just shows that code duplication Example
is one of the plagues that are omnipresent; but this can be now quan-
tified. In the case of ArgoUML , we checked for Significant Duplication
and we found that 239 classes (17% of all the classes) are affected by
it. Summing the SDC duplication metric at the system level, we end
up with more than 10,000 duplication lines!5
Usually duplication is a design disharmony that often appears in
conjunction with other disharmonies. Therefore, we believe that it
5
Note that one code line may be involved in more than one duplication
chain, and thus it is multiply counted; still, the number of lines of code
involved in duplication is impressive.
106 5 Identity Disharmonies
does not make sense to discuss just a single concrete example of du-
plication. So, the aspect of duplication will occur over and over again,
as we discuss in an integrated manner various design problems that
we encountered in ArgoUML .
A A
m1
m1'
m3
m2
m2'
A A
m3
B C B C
m1 m2 m1 m2
A special case is the one where the duplication between two inheri-
tance-related methods is fragmented, i.e., the code is similar but not
108 5 Identity Disharmonies
identical. In this case you would probably be able to apply the Tem-
plate Method design pattern [GHJV95], as this would help separate
the common code (which goes into the closest ancestor class) from
the fragments that are different (which will become the hooks from
the pattern mentioned above).
In this third case the two operations that share a duplicated block
are neither part of the same class, nor of the same hierarchy; either
the two operations are part of two independent classes (in the sense
of classification) or they are (one of them or both) global functions.
If you find duplicated code in methods belonging to unrelated
classes, there are three major options on where to place the common
code, extracted from the two (or more) classes:
• One hosts, one calls. In this case we notice that the code belongs
to one of the protagonist classes. Thus, it will host the common
code, in the form a method, while the other class will invoke that
method. This usually applies when the portions of duplication are
not very large and especially not encountered in many methods.
If the duplication between two classes affects many methods, then
we probably miss an abstraction, i.e., a third class. Thus, we de-
fine the new class and place the duplicated code there. Now, the
question is how to relate the two former classes with this third
one? The answer depends on the context, boiling down to two op-
tions: association and inheritance.
• Third class hosts, both inherit. If we find that the two classes are
conceptually related, then they probably miss a common base
class. Consequently the third class becomes the base class of the
two.
Good examples for this case are the classes FigNodeModelEle-
ment and FigEdgeModelElement which indeed miss a common
base class.
• Third class hosts, both call. If the two unrelated classes involved
in duplication are not conceptually related we need to introduce
an association from the two classes to the third one and call from
both classes the method that now hosts the formerly duplicated
code.
5.9 Recovering from Identity Disharmonies 109
Where to Start
1. Start with the “intelligence magnets”, i.e., with those classes that
tend to accumulate much more functionality than an abstraction
should normally have. In terms of the detection strategies pre-
sented so far, this means to make a blacklist containing all classes
affected by the God Class(80) or by the Brain Class(97) disharmony.
2. For each of the classes in the blacklist built in Step 1 find the
disharmonious methods. A method is considered disharmonious if
at least one of the following is true:
• it is a Brain Method(92);
• it contains duplicated code;
• it accesses attributes from other classes, either directly or by
means of accessor methods.
How to Start
How should you start when you want to improve the identity harmony
of your system’s classes? Assuming that for a class in the blacklist we
have gathered its disharmonious methods, then in order to recover
from identity design disharmonies we have to follow the roadmap
described in Fig. 5.18, and explained briefly below.
Method with
Identity Disharmony
Intraclass YES
Duplication Host Remove Duplication
NO
YES
Temporary Field Replace Attribute
User with Local Variable
NO
NO
YES
YES
Move Behavior to Data Class
Data Provider
NO
NO
YES
Brain Method Extract Method
NO
STOP
– Move the attribute from its definition class to the class where the
user method belongs. This is very rarely the case, especially in
the context of Brain Class(97) and God Class(80). It applies only
for cases where the attribute belongs to a small class that has
no serious reason to live, and which will be eventually removed
from the system.
• Action 4: Refactor Brain Method. If you reached this step while in-
specting a method that was initially reported as a Brain Method(92),
first look if this is still the case after proceeding with Step 1
and Step 3. Sometimes, removing duplication and refactoring a
method for better data-behavior locality solves the case of the Brain
Method(92). If the problem is not solved, revisit Sect. 5.6 where we
discussed the main refactoring cases for a Brain Method(92).
6
Collaboration Disharmonies
Collaboration Rule
Rationale
Practical Consequences
1
The rule is also very much related to Pelrines’s Object Manifesto which
states: Be private: do not let anybody touch your private data. Be lazy:
Delegate as much as possible
6.1 Collaboration Harmony Rule 117
2
The term Collaborate refers both to the active (i.e., call another operation)
and to the passive (i.e., be called (invoked) by another operation) aspects.
118 6 Collaboration Disharmonies
Description One of the frequent cases of excessive coupling that can be improved
is when a method is tied to many other operations in the system,
whereby these provider operations are dispersed only into one or a
few classes (see Fig. 6.2). In other words, this is the case where the
communication between the client method and (at least one of) its
provider classes is excessively verbose. Therefore, we named this de-
sign disharmony Intensive Coupling.
The detection strategy is based on two main conditions that must Detection
be fulfilled simultaneously: the function invokes many methods and
the invoked methods are not very much dispersed into many classes
(Fig. 6.3).
Additionally, based on our practical experience, we impose a min-
imal complexity condition on the function, to avoid the case of config-
uration operations (e.g., initializers, or UI configuring methods) that
call many other methods. These configuration operations reveal a less
harmful (and hardly avoidable) form of coupling because the depen-
dencies can be much easily traced and solved.
The detection strategy is composed of the following heuristics (see
Fig. 6.3):
Intensive
AND
Coupling
Method has few nested
conditionals
Fig. 6.4. In Intensive Coupling operation calls too many methods from a few
unrelated classes
Fig. 6.7. The essence of the refactoring solution in case of Intensive Coupling
Detection The detection rule is defined in the same terms as the the one defined
for Intensive Coupling(120), with only one complementary difference:
we capture only those operations that have a high dispersion of their
coupling (Fig. 6.9). The detection strategy in detail is:
CDISP ! HALF
Fig. 6.10. In Dispersed Coupling operation calls a few methods from each of
a large number of unrelated classes.
Fig. 6.11. The class ActionOpenProject is coupled with many classes. The
red classes are non-model classes, i.e., belong to the Java library. The blue
edges represent invocations.
Fig. 6.12. The class Modeller is coupled with many classes and suffers itself
from many other problems.
Not only outgoing dependencies cause trouble, but also incoming Description
ones. This design disharmony means that a change in an opera-
tion implies many (small) changes to a lot of different operations and
classes [FBB+ 99] (see Fig. 6.13). This disharmony tackles the issue
of strong afferent (incoming) coupling and it regards not only the cou-
pling strength but also the coupling dispersion.
An operation affected by Shotgun Surgery has many other design en- Impact
tities depending on it. Consequently, if a change occurs in such an
operation myriads of other methods and classes might need to change
as well. As a result, it is easy to miss a required change causing thus
maintenance problems.
134 6 Collaboration Disharmonies
Detection We want to find the classes in which a change would significantly af-
fect many other places in the system. In detecting the methods most
affected by this disharmony, we consider both the strength and the
dispersion of coupling. In contrast to Intensive Coupling(120) and Dis-
persed Coupling(127), here we are interested exclusively in incoming
dependencies caused by function calls. In order to reveal especially
those cases where dependencies are harder to trace, we will count
only those operations (and classes) that are neither belonging to the
same class nor to the same class hierarchy with the measured oper-
ation.
CC > MANY
In Fig. 6.15 we see an extreme case of Shotgun Surgery(133) that in- Example
volves several methods of class Project. The class is coupled with 131
classes (10% of ArgoUML ) and has cyclic invocation dependencies
with the classes ProjectBrowser and CoreFactory (the second largest
class in the system). The classes above Project depend on it, while
Project itself depends on (i.e., invokes methods of) the classes below
136 6 Collaboration Disharmonies
it. The view reveals how fragile the system is if a major change is per-
formed on the class Project. Lots of classes in the whole system are
potentially affected by changes.
How to Start
Classification Disharmonies
Proportion Rule
Rationale
Practical Consequences
Presentation Rule
Rationale
The concept of inheritance allows for writing compact code and also
for the reuse of the code already implemented in one of its ancestor
classes. In this sense a descendant should always be in sync with
what has been defined by its ancestors, and not reinvent the wheel
or duplicate the code.
Practical Consequences
Implementation Rule
Rationale
Practical Consequences
1. The derived class denies the inherited bequest [FBB+ 99] (Refused
Parent Bequest(145)).
2. The derived class massively extends the interface of the base class
with services that do not really characterize that family of abstrac-
tions (Tradition Breaker(152))
The shape of the hierarchy itself says a lot about the classifica-
tion harmony. As we will see, in most cases the Refused Parent
Bequest(145) and Tradition Breaker(152) disharmonies appear in an
over-bloated hierarchy with an inflation of classes.
In conclusion, while inheritance is (also) a powerful mechanism to
reuse code, subtyping is the actual point because it supports a better
understanding of a hierarchy than subclassing, since a subclass is a
more specialized version of its ancestor and not an unrelated concept
that is there because it can reuse some code.
Another difficult issue related to inheritance is when is it useful
to introduce a new class in the system. Often developers are afraid
of having many small classes and prefer to work instead with fewer
but larger classes. Developers often believe that they will have less
complexity to manage if they have to deal with fewer classes. It is
better to have more classes conveying meaningful abstractions than
having a single large one. However, having useless classes or classes
without meaningful behavior is not good either because they pollute
and complicate the abstraction space: The challenge is to find the
right level of abstraction.
7.3 Refused Parent Bequest 145
The primary goal of inheritance is certainly code reuse. However, ex- Impact
tending base classes without looking at what they have to offer in-
troduces duplication and in general class interfaces that become in-
coherent and non-cohesive. An often overlooked part of the process
when adding or extending subclasses is to study the superclasses
and determine what can be reused, what must be added and finally
what could be pushed into the superclasses to increase generality.
AND
Child uses only little of
parent's bequest
Functional complexity
above average
Fig. 7.3. Main components of the Refused Parent Bequest detection strategy.
The unusual form of this hierarchy (see Fig. 7.4) already gives us a Example
first hint that its classes are afflicted by some problems. Moreover,
the fact that there is an abstract class (called ToDoPerspective) in the
148 7 Classification Disharmonies
YES
YES
NO
Make all unused protected Solve renaming dependencies
B Any usages
members private with former parent class
YES
YES NO
Still has RPB STOP
Assess dependencies of
Extract protected members with Solve initial dependencies by
C protected members in definition
loose dependencies to new class delegation to new class
class
Fig. 7.6. Inspection and refactoring process for a Refused Parent Bequest.
150 7 Classification Disharmonies
In this case the cause of the problem is that the child class simply
does not belong in the hierarchy; in other words, the hierarchy might
be ill-designed. The more relevant symptom for this case is when
the child class has no inheritance-specific dependencies on the parent
class.
In some cases this goes together with the Tradition Breaker(152)
disharmony. An interesting aspect is that in some cases the “false
child” does belong to the hierarchy, but as a child class of another
parent (i.e., an initial “grandparent” or ancestor). This can be found
out by analyzing the dependencies between the disharmonious class
and the other ancestors.
The third case, probably the most interesting one, is when the parent
class has many child classes, and the bequest offered by it is relevant
only for some of these siblings, but not for the class affected by Re-
fused Parent Bequest. By cumulating the bequest needed by various
subsets of descendants, the total bequest becomes excessively large.
Consequently, the main symptoms in this case are:
• A large number of descendants.
• Often, there is more than one class exhibiting Refused Parent Be-
quest in the same hierarchy.
• Each descendant uses a small, non-overlapping portion of the to-
tal bequest.
7.3 Refused Parent Bequest 151
Description This design disharmony strategy takes its name from the principle
that the interface of a class (i.e., the services that it provides to the
rest of the system) should increase in an evolutionary fashion. This
means that a derived class should not break the inherited “tradition”
and provide a large set of services which are unrelated to those pro-
vided by its base class.
Applies To Classes. If C is the name of the class, the following conditions are
assumed: (1) C has a base class B, (2) B is not a third-party class and
(3) B is not an interface.
Method complexity in
child class above average
NOM ! HIGH
Parent's functional
complexity above
average
AMW > AVERAGE
In Fig. 7.8 we see a high-level view of the detection rule for a Tradition Detection
Breaker. There are three main conditions that must be simultane-
ously fulfilled for a class to be put on the blacklist of classes that
154 7 Classification Disharmonies
break the inherited tradition by the interface that they define. These
conditions are:
• The size of the public interface of the child class has increased
excessively compared to its base class.
• The child class as a whole has a considerable size and complexity.
• The base class, even if not as large and complex as its child, must
have a “respectable” amount of functionality defined, so that it can
claim to have defined a tradition.
1. Excessive increase of child class interface. To quantify the evo-
lution of a child’s public interface compared to that of its par-
ent, we use two measures: (1) Newly Added Services (NAS) tells
us in absolute values how many public methods were added to
the class; and (2) the Percentage of Newly Added Services (PNAS)
which shows us the percentile increase, i.e., how much of the
class’s interface consists of newly added services. We used these
metrics in the following way:
a) More newly added services than average number of meth-
ods per class. This threshold is based on the statistical in-
formation related to the number of methods per class (see Ta-
ble 2.1), using the following logic. If a class adds more new
methods than the average number of methods (public or not)
of a class then the measured class is an outlier with respect to
NAS. For Java this average value1 is 6.5.
b) Newly added services are dominant in child class. We use
this metric to make sure that the absolute value provided by
the NAS is a significant part of the entire interface of the mea-
sured class. Therefore, PNAS is a normalized metric and we set
the threshold so that NAS represents at least two-thirds of the
public interface.
2. Child class has substantial size and complexity. To speak about
a relevant Tradition Breaker the child class must contain a sub-
stantial amount of functionality. This means that it must have a
substantial size (measured in this case by the number of meth-
ods) and accumulate a significant amount of logical complexity.
Therefore we require either the average complexity or the total
complexity of the class to be high. An additional requirement is
that the child class has a significant number of methods (NOM).
We use the following metrics (see Fig. 7.9):
1
Computed as the average between the lower value and upper value of
NOM/Class.
7.4 Tradition Breaker 155
In Fig. 7.10 we see a System Complexity view of the hierarchy whose Example
root class is named FigNodeModelElement. Visually striking is that
the hierarchy is top-heavy (the root class is by far the largest in terms
of methods and attributes) and unbalanced (there is a sub-hierarchy
on the left). Moreover, many direct subclasses of FigNodeModelEle-
ment look similar “from the outside” (i.e., they have a similar shape,
pointing to a possible duplication problem), and as we will see also
from the inside.
From the point of view of the disharmonies, nearly half of the
classes of this hierarchy are afflicted by at least one of two classifica-
tion disharmonies: Refused Parent Bequest(145) or Tradition Breaker.
Among the subclasses of FigNodeModelElement there is one in
particular which is striking because it is the only one which is both af-
fected by Refused Parent Bequest(145) and is also a Tradition Breaker,
namely FigObject. Additionally this class is also a Brain Class(97) that
contains two methods which are Brain Method(92).
YES
YES
YES YES
Any other TB among Push up all common NAS to
B siblings
Any common NAS
parent class
NO NO
Still is TB
NO
YES
NO
Solve renaming
dependencies with former
parent class
STOP
In this case the derived class has an excessively large interface, i.e.,
it includes in its interface methods that should have been declared
158 7 Classification Disharmonies
Where to Start
How to Start
Flawed Hierarchy
NO NO
Extract commonality to new
class, and solve dependencies
by delegation
(sometimes create new base
classes to factor out
commonality for siblings)
NO
NO
STOP
In this appendix you will find definitions of the metrics used through-
out this book. These metrics are neither the best in the world, nor
magic. We chose them in the context of the detection strategies pre-
sented in Chapter 4. Thus, their most efficient use is in the context
of these strategies.
In Fig. A.1 we see all the containment (HAVE and BELONGS-TO) re-
lations that are relevant in the context of object-oriented design, i.e.,
what other entities does the measured entity have (contain), in the
sense of being a scope for these entities? This also includes the in-
verse relation: to which entity does the measured entity belong to?
For example, an operation has parameters and local variables, while
it belongs to a class.
In Fig. A.2 we see all direct usage (USE and USED-BY) relations that
are relevant in the context of object-oriented design, i.e., what en-
tities does the measured entity use; and again the inverse relation:
by which entities is the measured one being used? For example, an
operation is using the variables that it accesses, while it is used by
the other operations that call (invoke) it. A class uses another class
by extending it through inheritance, but also uses other classes by
communicating with them.
1
The goal of this book is neither to list all possible questions nor to answer
them, but to put you, the reader, in a position where you can ask and
answer such questions yourself.
A.2 Alphabetical Catalogue of Metrics 167
CC - Changing Classes
Definition The number of classes in which the methods that call the measured method
are defined in [Mar02a]
Used for
Shotgun Surgery(133)
Measured Entity Method Definition Visibility Get/Set Constr. Static Abstract
user-defined all + + + +
Involved Relations
IS USED Class Definition Abstract
(called by) user-defined, scope +
of operations called
from the measured
operation (see CM)
CM - Changing Methods
Definition The number of distinct methods that call the measured method [Mar02a]
Used for
Shotgun Surgery(133)
Measured Entity Method Definition Visibility Get/Set Constr. Static Abstract
user-defined all + + + +
Involved Relations
IS USED Method Definition Visibility Get/Set Constr. Static Abstract
(is called by) user-defined all + + + -
iPlasma
B.1 Introduction
iPlasma 1 is an integrated environment for quality analysis of object-
oriented software systems that includes support for all the necessary
phases of analysis: from model extraction (including scalable parsing
for C++and Java) up to high-level metrics-based analysis, or detec-
tion of code duplication. iPlasma has three major advantages: exten-
sibility of supported analysis, integration with further analysis tools
and scalability, as were used in the past to analyze large industrial
projects of the size of millions of code lines (e.g., Eclipse and Mozilla).
Fig. B.1. The layered structure of the iPlasma quality assessment platform.
2
See http://recoder.sourceforge.net/
B.2 iPlasma at Work 177
Using INSIDER
(over 80) that should be displayed. As seen in Fig. B.2 the met-
rics are displayed for all the entities in the group. By selecting an
entity in the group we can display another group associated with
that entity (e.g., for a class, display the group of methods defined
in the class or the group of its ancestors, etc.).
• Group Manager. During a software analysis we usually need to
work with more than a single group. The groups that are cur-
rently open are displayed on the top-left side of the screen. The
Group Manager allows us to select a group that we want to see
in the Group Inspector. It also allows us to delete those groups
that are no longer relevant for the analysis. Last but not least, the
Group Manager allows us to create a new group, by filtering the
entities of the selected group based on a filtering condition, i.e., a
combination of metrics (as in detection strategies). Apart from the
predefined filters, new filters can be defined at run-time using the
Filter Editor (see windows at the bottom-right of Fig. B.2). The Fil-
ter Editor can be used not only to create new (sub)groups, but also
for revealing the entities in a group that fulfill the defined filtering
condition (see red highlighting in the Group Inspector in Fig. B.2).
• Entity Browser. When an entity is selected in the Group Inspector
on the bottom part of the screen we see various details about that
entity. For example, for a class we see the position of the class
in the class hierarchy, its methods and attributes, etc. The big
advantage of the Entity Browser is that any reference to another
design entity (e.g., the base class of the selected class) is a hyper-
link to that entity. Thus, by clicking it the details of that entity
will be displayed in a new tab of the Entity Browser. Additionally,
for operations, INSIDER allows us to get quick access to the ac-
tual source code of the operation. The code appears on demand in
a separate window, namely the Source Code Viewer (several such
windows can be open at the same time).
Tool Availability
http://loose.upt.ro/iplasma/
C
CodeCrawler
C.1 Introduction
Fig. C.1. CodeCrawler at work. Every visible item can be interacted with in
its own customized way.
CodeCrawler ’s top input field, e.g., metric values and other seman-
tic information, whether the class is abstract, etc. Moreover, using a
context menu the viewer can interact with the item in focus.
The polymetric views (implemented in the CodeCrawler tool ) can
be created either programmatically by constructing the view objects,
or using a View Editor, where each view can be composed using drag
and drop. In Fig. C.2 we see CodeCrawler ’s View Editor with the spec-
ification of the System Complexity view: the user can freely compose
and specify the types of items that will be displayed in a view and
also define the way the visualization will be performed. The user can
choose among various types of nodes (class, method, package, etc.)
and edges (inheritance, invocation, containment, etc.). For every node
and edge the user can choose the figure type and assign to the figures
the metrics to be used; there are several dozens of metrics that can
be used, but as we will see in the remainder of this book only certain
metrics make sense for certain polymetric views. The user can also
choose the layout that he or she wants to use. In the case of Fig. 3.8
C.3 Industrial Validation 183
where class nodes and inheritance edges have been chosen, a simple
tree layout has been used.
Table C.1. A list of some of the case studies CodeCrawler was applied on.
Tool Availability
http://www.iam.unibe.ch/∼scg/Research/CodeCrawler/
http://www.cincomsmalltalk.com/
D
Figures in Color
The class Modeller is coupled with many classes and suffers itself
from many other problems.
ABW98. Sherman R. Alpert, Kyle Brown, and Bobby Woolf. The Design
Patterns Smalltalk Companion. Addison Wesley, 1998.
Aré03. Gabriela Arévalo. X-Ray views on a class using concept analy-
sis. In Proceedings of WOOR 2003 (4th International Workshop
on Object-Oriented Reengineering), pages 76–80. University of
Antwerp, July 2003.
BDW98. Lionel C. Briand, John Daly, and Jürgen Wüst. A Unified Frame-
work for Cohesion Measurement in Object-Oriented Systems.
Empirical Software Engineering: An International Journal, 3(2),
1998.
BDW99. Lionel C. Briand, John W. Daly, and Jürgen K. Wüst. A Unified
Framework for Coupling Measurement in Object-Oriented Sys-
tems. IEEE Transactions on Software Engineering, 25(1):91–121,
1999.
Bec97. Kent Beck. Smalltalk Best Practice Patterns. Prentice-Hall, 1997.
Bec00. Kent Beck. Extreme Programming Explained: Embrace Change.
Addison Wesley, 2000.
Ber74. Jacques Bertin. Graphische Semiologie. Walter de Gruyter, 1974.
Bin99. Robert V. Binder. Testing Object-Oriented Systems: Models, Pat-
terns, and Tools. Object Technology Series. Addison Wesley,
1999.
BK95. J.M. Bieman and B.K. Kang. Cohesion and reuse in an object-
oriented system. In Proceedings ACM Symposium on Software
Reusability, April 1995.
BMMM98. William J. Brown, Raphael C. Malveau, Hays W. McCormick,
III, and Thomas J. Mowbray. AntiPatterns: Refactoring Software,
Architectures, and Projects in Crisis. John Wiley Press, 1998.
Boe88. Barry W. Boehm. A spiral model of software development and
enhancement. IEEE Computer, 21(5):61–72, 1988.
196 References
ArgoUML, 8, 21, 40–44, 81, 86, 90, non-cohesive, 18, 56, 97, 138
105, 106, 135, 186 Coupling, 7, 24, 25, 28, 29, 31, 33,
Assessment, 3, 7, 13, 24, 46, 102, 41, 44, 53, 87, 115, 118–121,
110, 175, 177 123–133, 135, 137, 138, 169,
design, 3 190, 191
quality, 24, 175, 177 dispersion, 29, 128, 169
dispersively coupled, 127, 130
Bad smells, 5, 53, 80, 119 excessive, 29, 120, 130, 137
code smells, 5, 53, 92 intensity, 29, 128, 169
Best practices, 5, 70 intensively coupled, 33, 41, 120,
Brain Class, 71, 78, 87, 97–99, 101, 123, 124, 189
109, 111, 113, 155, 171, 173,
174 Data Class, 71, 78, 87–91, 113, 119,
Brain Method, 71, 78, 85, 93, 95–99, 136, 138, 172, 174, 187
101, 109, 110, 113, 123, 130, Data collection, 12, 15, 32, 137, 138
131, 136, 138, 155, 170–172 Data-operation proximity, 84, 91
Dependencies, 26, 85, 87, 116, 118,
C++, 14–16, 24, 26, 32, 60, 63, 175, 121, 126, 132–135, 138, 142,
176, 179–181, 184 158, 191
Class Blueprint, 7, 22, 44, 48, incoming, 116, 134
57–63, 65, 67–70, 82, 94, 100, outgoing, 116, 118, 133
123, 148, 187, 188, 192 Design, 1–8, 11–13, 18, 20–23, 28,
accessor layer, 60 36, 39, 45, 46, 48–55, 57,
attribute layer, 60 67–70, 72, 73, 78, 80, 81, 84,
implementation layer, 60, 64, 65 85, 88, 91, 92, 95–97, 105,
initialization layer, 60, 63, 65 106, 108, 111, 115, 116, 120,
interface layer, 60, 64, 65 126, 133, 138, 152, 159, 160,
CodeCrawler, 35, 181–184 164–166, 176–179, 202
Cohesion, 17, 53, 55, 56, 80, 81, 84, characterize, 1, 4, 9, 11, 21,
98, 111, 173 23–25, 28–30, 44, 69, 98, 111,
Cohesive 144
202 Index