This Class Is Too Big and I Don’t Want It to Get A- 123docz.net

but I Don’t Know What Tests to Write

Chapter 20: This Class Is Too Big and I Don’t Want It to Get Any Bigger . 245

This Class Is Too Big and I Don’t Want It to Get Any Bigger

Many of the features that people add to systems are little tweaks. They require the addition of a little code and maybe a few more methods. It’s tempting to just make these changes to an existing class. Chances are, the code that you need to add must use data from some existing class, and the easiest thing is to just add code to it. Unfortunately, this easy way of making changes can lead to some serious trouble. When we keep adding code to existing classes, we end up with long methods and large classes. Our software turns into a swamp, and it takes more time to understand how to add new features or even just understand how old features work.

I visited a team once that had what looked like a nice architecture on paper.

They told me what the primary classes were and how they communicated with each other in the normal cases. Then, they showed me a couple of nice UML diagrams that showed the structure. I was surprised when I started to look at the code. Each of their classes could really be broken out into about 10 or so, and doing that would help them get past their most pressing problems.

What are the problems with big classes? The ﬁrst is confusion. When you have 50 or 60 methods on a class, it’s often hard to get a sense of what you have to change and whether it is going to affect anything else. In the worst cases, big classes have an incredible number of instance variables, and it is hard to know what the effects are of changing a variable. Another problem is task scheduling. When a class has 20 or so responsibilities, chances are, you’ll have an incredible number of reasons to change it. In the same iteration, you might have several programmers who have to do different things to the class. If they are working concurrently, this can lead to some serious thrashing, particularly because of the third problem: Big classes are a pain to test. Encapsulation is a good thing, right? Well, don’t ask testers about that; they are liable to bite your

ptg9926858

This Class Is Too Big and I Don’t Want It to Get Any Bigger

head off. Classes that are too big often hide too much. Encapsulation is great when it helps us reason about our code and when we know that certain things can be changed only under certain circumstances. However, when we encapsu- late too much, the stuff inside rots and festers. There isn’t any easy way to sense the effects of change, so people fall back on Edit and Pray (9) programming. At that point, either changes take far too long or the bug count increases. You have to pay for the lack of clarity somehow.

The ﬁrst issue to confront when we have big classes is this: How can we work without making things worse? The key tactics we can use here are Sprout Class (63) and Sprout Method (59). When we have to make changes, we should consider putting the code into a new class or a new method. Sprout Class (63) really keeps things from getting much worse. When you put new code into a new class, sure, you might have to delegate from the original class, but at least you aren’t making it much bigger. Sprout Method (59) helps also, but in a more subtle way. If you add code in a new method, yes, you will have an additional method, but at the very least, you are identifying and naming another thing that the class does; often the names of methods can give you hints about how to break down a class into smaller pieces.

The key remedy for big classes is refactoring. It helps to break down classes into sets of smaller classes. But the biggest issue is ﬁguring out what the smaller classes should look like. Fortunately, we have some guidance.

The single-responsibility principle is kind of hard to describe because the idea of a responsibility is kind of nebulous. If we look at it in a very nạve way, we might say, “Oh, that means that every class should have only a single method, right?” Well, methods can be seen as responsibilities. A Task is responsible for running using its run method, for telling us how many subtasks it has with taskCount method, and so on. But what we mean by a responsibility really comes into focus when we talk about main purpose. Figure 20.1 shows an example.

Single-Responsibility Principle (SRP)

Every class should have a single responsibility: It should have a single purpose in the system, and there should be only one reason to change it.

ptg9926858 THIS CLASS IS TOO BIGAND I DON’T WANT ITTO GET ANY BIGGER 247

This Class Is Too Big and I Don’t Want It to Get Any Bigger Figure 20.1 Rule parser.

We have a little class here that evaluates strings containing rule expressions in some obscure language. What responsibilities does it have? We can look at the name of the class to ﬁnd one responsibility: It parses. But is that its main purpose? Parsing doesn’t seem to be it. It seems that it evaluates also.

What else does it do? It holds on to a current string, the string that it is parsing. It also holds on to a ﬁeld that indicates the current position while it is parsing. Both of those mini-responsibilities seem to ﬁt under the category of parsing.

Let’s take a look at the other variable, the variables ﬁeld. It holds on to a set of variables that the parser uses so that it can evaluate arithmetic expressions in rules such as a + 3. If someone calls the method addVariable with the arguments a and 1, the expression a + 3 will evaluate to 4. So, it seems that there is this other responsibility, variable management, in this class.

Are there more responsibilities? Another way to ﬁnd them is to look at method names. Is there a natural way to group the names of the methods? It seems that the methods kind of fall into these groups:

The evaluate method is an entry point of the class. It is one of only two public methods, and it denotes a key responsibility of the class: evaluation. All of the

evaluate branchingExpression nextTerm addVariable

causalExpression hasMoreTerms variableExpression

valueExpression

RuleParser - current : string

- variables : HashMap - currentPosition : int + evaluate(string) : int

- branchingExpression(Node left, Node right) : int - causalExpression(Node left, Node right) int - variableExpression(Node node) : int - valueExpression(Node node) : int - nextTerm() : string

- hasMoreTerms() : boolean + addVariable(string name, int value)

ptg9926858

This Class Is Too Big and I Don’t Want It to Get Any Bigger

methods that end with the Expression sufﬁx are kind of the same. Not only are they named similarly, but they all accept Nodes as arguments and return an int that indicates the value of a subexpression. The nextTerm and hasMoreTerms methods are similar, too. They seem to be about some special form of tokenization for terms. As we said earlier, the addVariable method is concerned with variable management.

To summarize, it seems that Parser has the following responsibilities:

• Parsing

• Expression evaluation

• Term tokenization

• Variable management

If we had to come up with a design from scratch that separated all of these responsibilities, it might look something like Figure 20.2.

Is this overkill? It could be. Often people who write little language interpret- ers merge parsing and expression evaluation; they just evaluate as they parse.

But although that can be convenient, often it doesn’t scale well as a language grows. Another responsibility that is kind of meager is that of SymbolTable. If the only responsibility of SymbolTable is to map variable names to integers, the class isn’t giving us much advantage over just using a hash table or a list. Nice design, but guess what? It is pretty hypothetical. Unless we are choosing to rewrite this part of the system, our little multiclass design is a castle in the sky.

Figure 20.2 Rule classes with responsibilities separated.

TermTokenizer + nextTerm () : String + hasMoreTerms() : boolean RuleEvaluator

+ evaluate(string) + addVariable(string, int)

SymbolTable + addVariable(string, int)

RuleParser + parse(string) : Expression

{abstract}

Expression + evaluateWith(SymbolTable)

ôcreatesằ

ôparameterằ

ptg9926858 SEEING RESPONSIBILITIES 249

Seeing Responsibilities

In real-world cases of big classes, the key is to identify the different responsibilities and then ﬁgure out a way to incrementally move toward more focused responsibilities.

Seeing Responsibilities

In the RuleParser example in the last section, I showed a particular decomposi- tion of a class into smaller classes. When I did that breakdown, I did it pretty much by rote. I listed all of the methods and started to think about what their purposes were. The key questions I asked were “Why is this method here?” and

“What is it doing for the class?” Then I grouped them into lists, putting together methods that had a similar reason for being there.

I call this way of seeing responsibilities method grouping. It’s only one of many ways of seeing responsibilities in existing code.

Learning to see responsibilities is a key design skill, and it takes practice. It might seem odd to talk about a design skill in this context of working with legacy code, but there really is little difference between discovering responsibilities in existing code and formulating them for code that you haven’t written yet.

The key thing is to be able to see responsibilities and learn how to separate them well. If anything, legacy code offers far more possibilities for the applica- tion of design skill than new features do. It is easier to talk about design trade- offs when you can see the code that will be affected, and it is also easier to see whether structure is appropriate in a given context because the context is real and right in front of us.

This section describes a set of heuristics that we can use to see responsibilities in existing code. Note that we are not inventing responsibilities; we’re just discovering what is there. Regardless of what structure legacy code has, its pieces do identiﬁable things. Sometimes they are hard to see, but these tech- niques can help. Try to apply them even with code that you don’t have to change immediately. The more you start noticing the responsibilities inherent in code, the more you learn about it.

This technique, method grouping, is a pretty good start, particularly with very large classes. The important thing is to recognize that you don’t have to

Heuristic #1: Group Methods

Look for similar method names. Write down all of the methods on a class, along with their access types (public, private, and so on), and try to ﬁnd ones that seem to go together.

ptg9926858

Seeing Responsibilities

categorize all of the names into new classes. Just see if you can ﬁnd some that look like they are part of a common responsibility. If you can identify some of these responsibilities that are a bit off to the side of the main responsibility of the class, you have a direction in which you can take the code over time. Wait until you have to modify one of the methods you’ve categorized, and then decide whether you want to extract a class at that point.

Method grouping is a great team exercise also. Put up poster boards in your team room with lists of the method names for each of your major classes. Team members can mark up the posters over time, showing different groupings of methods. The whole team can hash out which groupings are better and decide on directions for the code to go in.

Big classes can hide too much. This question comes up over and over again from people new to unit testing: “How do I test private methods?” Many people spend a lot of time trying to ﬁgure out how to get around this problem, but, as I mentioned in an earlier chapter, the real answer is that if you have the urge to test a private method, the method shouldn’t be private; if making the method public bothers you, chances are, it is because it is part of a separate responsibility. It should be on another class.

The RuleParser class earlier in this section is the quintessential example of this. It has two public methods: evaluate and addVariable. Everything else is private. What would the RuleParser class be like if we made nextTerm and has- MoreTerms public? Well, it would seem pretty odd. Users of the parser might get the idea that they have to use those two methods along with evaluate to parse and evaluate expressions. It would be odd to have those methods public on the RuleParser class, but it is far less odd—and, actually, perfectly ﬁne—

to make them public methods on a TermTokenizer class. This doesn’t make RuleParser any less encapsulated. Even though nextTerm and hasMoreTerms are public on TermTokenizer, they are accessed privately in a parser. This is shown in Figure 20.3.

Heuristic #2: Look at Hidden Methods

Pay attention to private and protected methods. If a class has many of them, it often indicates that there is another class in the class dying to get out.

ptg9926858 SEEING RESPONSIBILITIES 251

Seeing Responsibilities Figure 20.3 RuleParser and TermTokenizer.

When you are trying to break up a big class, it’s tempting to pay a lot of attention to the names of the methods. After all, they are one of the most noticeable things about a class. But the names of methods don’t tell the whole story. Often big classes house methods that do many things at many different levels of abstraction. For instance, a method named updateScreen()might gener- ate text for a display, format it, and send it to several different GUI objects.

Looking at the method name alone, you’d have no idea how much work is going on and how many responsibilities are nestled in that code.

For this reason, it pays to do a little extract method refactoring before really settling on classes to extract. What methods should you extract? I handle this by looking for decisions. How many things are assumed in the code? Is the code calling methods from a particular API? Is it assuming that it will always be accessing the same database? If the code is doing these things, it’s a good idea to extract methods that reflect what you intend at a high level. If you are getting particular information from a database, extract a method named after the information you are getting. When you do these extractions, you have many more methods, but you also might find that method grouping is easier. Better than that, you might find that you completely encapsulated some resource behind a set of methods. When you extract a class for them, you’ll have broken some dependencies on low-level details.

Heuristic #3: Look for Decisions That Can Change

Look for decisions—not decisions that you are making in the code, but decisions that you’ve already made. Is there some way of doing something (talking to a database, talking to another set of objects, and so on) that seems hard-coded? Can you imagine it changing?

Heuristic #4: Look for Internal Relationships

Look for relationships between instance variables and methods. Are certain instance variables used by some methods and not others?

TermTokenizer + Tokenizer(String) + nextTerm () : String + hasMoreTerms() : boolean RuleParser

+ evaluate(String) : int

ptg9926858

Seeing Responsibilities

It’s really hard to ﬁnd classes in which all the methods use all of the instance variables. Usually there is some sort of “lumping” in a class. Two or three methods might be the only ones that use a set of three variables. Often the names help you see this. For instance, in the RulerParser class, there is a collec- tion named variables and a method named addVariable. That shows us that there is an obvious relationship between that method and that variable. It doesn’t tell us that there aren’t other methods that access that variable, but at least we have a place to start looking.

Another technique we can use to ﬁnd these “lumps” is to make a little sketch of the relationships inside a class. These are called feature sketches. They show which methods and instance variables each method in a class uses, and they are pretty easy to make. Here is an example:

class Reservation {

private int duration;

private int dailyRate;

private Date date;

private Customer customer;

private List fees = new ArrayList();

public Reservation(Customer customer, int duration, int dailyRate, Date date) {

this.customer = customer;

this.duration = duration;

this.dailyRate = dailyRate;

this.date = date;

}

public void extend(int additionalDays) { duration += additionalDays;

}

public void extendForWeek() {

int weekRemainder = RentalCalendar.weekRemainderFor(date);

final int DAYS_PER_WEEK = 7;

extend(weekRemainder);

dailyRate = RateCalculator.computeWeekly(

customer.getRateCode()) / DAYS_PER_WEEK;

}

public void addFee(FeeRider rider) { fees.add(rider);

}

int getAdditionalFees() { int total = 0;

ptg9926858 SEEING RESPONSIBILITIES 253

Seeing Responsibilities for(Iterator it = fees.iterator(); it.hasNext(); ) {

total += ((FeeRider)(it.next())).getAmount();

}

return total;

}

int getPrincipalFee() { return dailyRate

* RateCalculator.rateBase(customer) * duration;

}

public int getTotalFee() {

return getPrincipalFee() + getAdditionalFees();

} }

The ﬁrststep is to draw circles for each of the variables, as shown in Figure 20.4.

Next, we look at each method and put down a circle for it. Then we draw a line from each method circle to the circles for any instance variables and methods that it accesses or modiﬁes. It’s usually okay to skip the constructors. Gen- erally, they modify each instance variable.

Figure 20.4 Variables in the Reservation class.

duration

dailyRate

date

customer

fees

ptg9926858

Seeing Responsibilities

Figure 20.5 shows the diagram after we’ve added a circle for the extend method:

Figure 20.5 extend uses duration.

If you’ve already read the chapters that describe effect sketching, you might notice that these feature sketches look a lot like effect sketches (155). Essentially, they are pretty close. The main difference is that the arrows are reversed. In feature sketches, arrows point in the direction of a method or variable that is used by another method or variable. In effect sketches, the arrow points toward methods or variables that are impacted by other methods and variables.

These are two different, completely legitimate ways of looking at interactions in a system. Feature sketches are great for mapping the internal structure of classes. Effect sketches (155) are great for reasoning forward from a point of change.

Is it confusing that they look somewhat the same? Not really. These sketches are disposable tools. They are the sort of thing that you sit down and draw up with a partner for about 10 minutes before you make your changes. Afterward you throw them away. There is little value in keeping them around, so there is little likelihood that they will be confused with each other.

duration

extend

dailyRate

date

customer

fees

This Class Is Too Big and I Don’t Want It to Get Any Bigger . 245

It Takes Forever to Make a Change

I Need to Make a Change. What Methods Should I Test?