Alex Richey

Software Engineer @ Amazon

Notes on Ben Orenstein's "Refactoring from Good to Great"

I watched Ben Orenstein's talk "Refactoring from Good to Great" at the beginning of my career and it made big and, I think, positive impact on how I write code. Here are some notes I took on the original talk to make its content easier to share.

Don't think of the following advice as absolute truth. Some advice may make sense in some cases. Some advice may not. Use your own best judgement.

Method Extraction

Consider the following code. It's decent, but how could it be made better?

require 'date'
require 'ostruct'

class OrdersReport
 def initialize(orders, start_date, end_date)
   @orders = orders
   @start_date = start_date
   @end_date = end_date
 end

 def total_sales_within_date_range
   orders_within_range =
     @orders.select { |order| order.placed_at >= @start_date &&
                              order.placed_at <= @end_date }

   orders_within_range.
     map(&:amount).inject(0) { |sum, amount| amount + sum }
 end
end

class Order < OpenStruct
end

One thing we can do is extract a new method, orders_within_range(), like so.

require 'date'
require 'ostruct'

class OrdersReport
 def initialize(orders, start_date, end_date)
   @orders = orders
   @start_date = start_date
   @end_date = end_date
 end

 def total_sales_within_date_range
   orders_within_range.
     map(&:amount).inject(0) { |sum, amount| amount + sum }
  end

  private

  def orders_within_range
    @orders.select { |order| order.placed_at >= @start_date &&
                             order.placed_at <= @end_date }
  end
end

class Order < OpenStruct
end

There are three upshots to this refactor.

  • We go from one method with two lines to two methods with just one line each. Changes like this aren't always improvements, but they usually are, since they result in methods that are more focused.
  • This refactor makes it more likely for this code to be reused rather than rewritten by another developer.
  • This refactor causes readers of this code to focus on business logic, rather than incidental details. It "gives a hint" to readers that orders_within_range() isn't important to what this report is about.

These wins are small, but, in aggregate, they're worth doing.

Tell; Don't Ask

There's another problem with this code, which is that it violates the tell-dont-ask principle. This principle isn't a law. It's a maxim that, when followed, can sometimes (but not always) lead to better code. Here's the principle:

It's generally better to send a message to an object and have it perform work than to ask an object about its internal state and decide what work to do on its behalf.

We violate this principle in our new helper method when we ask order about its internal state.

def orders_within_range
  @orders.select { |order| order.placed_at >= @start_date &&
                           order.placed_at <= @end_date }
end

This code can be improved by refactoring it in the following way.

class OrdersReport
  ...
  
  def orders_within_range
    @orders.select { |order| order.placed_between?(start_date, end_date) }
  end
end

class Order < OpenStruct
  def placed_between?(start_date, end_date)
    placed_at >= start_date && placed_at <= end_date
  end
end

Now order handles the nuances of comparing dates and the internal details of this comparison don't leak out into code consuming it. This code better follows tell-dont-ask becuase order itself tells us whether it's in between two dates. What's more, in orders_within_range(), we don't ask order about its internal state anymore.

Data Clump

Now there's another code smell. start_date and end_date form a data clump. A data clump is when two or more pieces of information always appear together and are in some way dependent on each other. Another way to think about the idea is that, if one piece of information were removed, would the result be useful or even make sense? Does start_date make sense on its own?

We can create a new class DateRange to store this data clump and to make the mutual dependency between start_date and end_date explicit to readers.

require 'date'
require 'ostruct'

class OrdersReport
  def initialize(orders, date_range)
    @orders = orders
    @date_range = date_range
  end

  def total_sales_within_date_range
    orders_within_range.
      map(&:amount).inject(0) { |sum, amount| amount + sum }
  end

  private

  def orders_within_range
    @orders.select { |order| order.placed_between?(@date_range) }
  end
end

class DateRange < Struct.new(:start_date, :end_date)
end

class Order < OpenStruct
  def placed_between?(date_range)
    placed_at >= date_range.start_date &&
    placed_at <= date_range.end_date
  end
end

Is this change worth it? Bob Martin says that

Intermediate object oriented programmers are too reluctant to extract classes.

We should be aggressive, according to Bob Martin, about extracting classes because of the clarity that such refactors produce. That is to say, it's worth it to make the relationship between start_date and end_date explicit.

Coupling

There are two more concrete reasons that make the above refactor worthwhile. First, we've made our code less coupled. Coupling occurs when changing one piece of software requires you to change another. In other words, when two components are coupled, it's hard to change one component without breaking the other. Low coupling is good because it makes it easier for maintainers to respond to change.

In the above refactor, we reduced parameter coupling. The argument here is that functions that have more arguments are worse than functions that have fewer. The reason for this is that, for every additional parameter, we introduce another bit of potential coupling. In other words, we make it easier for a caller to accidentally cause our function to "blow up." By reducing the number of arguments required above from three to two, we reduce this potential coupling.

Cleaning Up

The second reason the above refactor is worthwhile is that it creates an ideal place for us hang new behavior. This insight leads us to another refactor.

require 'date'
require 'ostruct'

class OrdersReport
  def initialize(orders, date_range)
    @orders = orders
    @date_range = date_range
  end

  def total_sales_within_date_range
    orders_within_range.
	  map(&:amount).inject(0) { |sum, amount| amount + sum }
  end

  private

  def orders_within_range
    @orders.select { |order| order.placed_between?(@date_range) }
  end
end

class DateRange < Struct.new(:start_date, :end_date)
  def include?(date)
    date >= self.start_date && date <= self.end_date
  end
end

class Order < OpenStruct
  def placed_between?(date_range)
    date_range.include?(self.placed_at)
  end
end

It makes more sense for DateRange to do the date comparison than for order to do it, since DateRange is intrinsically concerned with dates and their comparisons. DateRange is also now more useful and more likely to be reused. Moreover, concerns are now clearly separated across modules and the concerns of each module make sense.

Null Object Pattern

Consider the following code.

require 'ostruct'

class JobSite
  attr_reader :contact

  def initialize(location, contact)
    @location = location
    @contact = contact
  end

  def contact_name
    if contact
      contact.name
    else
      'no name'
    end
  end

  def contact_phone
    if contact
      contact.phone
    else
      'no phone'
    end
  end

  def email_contact(email_body)
    if contact
      contact.deliver_personalized_email(email_body)
    end
  end
end

class Contact < OpenStruct
  def deliver_personalized_email(email)
    email.deliver(name)
  end
end

The complexity of the above code is due to the way it handles the case when contact is null. There are several "ugly" conditionals. This code can be improved by refactoring it to follow the null object pattern.

require 'ostruct'

class JobSite
  attr_reader :contact

  def initialize(location, contact)
    @location = location
    @contact = contact || NullContact.new
  end

  def contact_name
    contact.name
  end

  def contact_phone
    contact.phone
  end

  def email_contact(email_body)
    contact.deliver_personalized_email(email_body)
  end
end

class NullContact
  def name
    'no name'
  end

  def phone
    'no phone'
  end

  def deliver_personalized_email(email_body)
  end
end

class Contact < OpenStruct
  def deliver_personalized_email(email)
    email.deliver(name)
  end
end

For the cost of NullContact, we've removed complex conditionals, about 12 lines of code, and made the methods of JobSite more obvious and direct to readers.

There's a problem with this refactor, however, which is that NullContact and JobSite are coupled. If we change JobSite, we might have to add new methods to NullContact. This is a valid concern. In this case, the simplicity we get in exchange is worth the cost, but in other cases, the cost might not be worth it.

Depend Upon Abstractions

Consider the following code, which is concerned with charging customers. BraintreeGem is a payment provider.

class User
  SUBSCRIPTION_AMOUNT = 10.to_money

  def charge_for_subscription
    braintree_id = BraintreeGem.find_user(email).braintree_id
    BraintreeGem.charge(braintree_id, SUBSCRIPTION_AMOUNT)
  end

  def create_as_customer
    BraintreeGem.create_customer(email)
  end
end

class Refund
  def process!
    transaction_id = BraintreeGem.find_transaction(order.braintree_id)
    BraintreeGem.refund(transaction_id, amount)
  end
end

This code is decent, but what if we switch payment providers? To do that, I'd need to do "shotgun surgery," where I'd have to open up many files and change every case where BraintreeGem is used. This is bad because it's labor intensive and error prone. It would be better if I could change BraintreeGem in one place and everything would still work.

What if I don't think it's likely for my payment provider to change? Keep in mind that any software dependency can change and their APIs can change. It's worthwhile to guard against such possibilities.

Here's another way of writing the code above.

class User
  def charge_for_subscription
    PaymentGateway.new.charge_for_subscription(self)
  end

  def create_as_customer
    PaymentGateway.new.create_customer(self)
  end
end

class Refund
  def process!
    PaymentGateway.new.refund(self)
  end
end

# lib/payment_gateway.rb
class PaymentGateway
  SUBSCRIPTION_AMOUNT = 10.to_money

  def initialize(gateway = BraintreeGem)
    @gateway = gateway
  end

  def charge_for_subscription(user)
    braintree_id = @gateway.find_user(user.email).braintree_id
    @gateway.charge(braintree_id, SUBSCRIPTION_AMOUNT)
  end

  def create_customer(user)
    @gateway.create_customer(user.email)
  end

  def refund(refund_model)
    transaction_id = @gateway.find_transaction(order.braintree_id)
    @gateway.refund(transaction_id, order.amount)
  end
end

There's a number advantages to the above refactor.

  • My business logic only knows about PaymendGateway and we're not directly dependent on third-party code.
  • If I change payment providers from BraintreeGem, all I need to do is update PaymentGateway. I don't need to do "shotgun surgery."
  • Testing is improved. I only need to test my code that depends on PaymentGateway which is easier to stub. I don't test my dependencies.

When to refactor?

Kent Beck, author of Extreme Programming and other books, says

for each desired change, make the change easy (warning: this may be hard), then make the easy change

Good candidates for refactoring include "God" objects, high-churn files, and files with bugs.

  • "God" objects are objects that many things depend on and which violate the single responsibility principle.
  • High-churn files are files that change a lot. This usually means that the ideas expressed in these files are not well understood.
  • The presence of one bug often indicates the presence of another.