Thursday, 9 June 2011

Abstraction and OO

One observation I've heard when people start working with OO (especially when shown something silly like OOO) is that all the abstraction and indirection makes it difficult to get a clear picture of all the execution paths through the code. A colleague of mine (not a dev, but someone with a fair amount of programming experience) once told me OO makes it difficult to be able to hold the entire program in your head at once. And this is certainly true to an extent, because it isn't what OO is for. For this post I wanted to provide a quick overview of how I think about OO and abstractions.

Understanding procedural code

For many of us our first introduction to programming is procedural code. We use variables with different scopes, conditionals and program flow operators like if, while and for, as well as function calls to jump around. This makes it fairly easy and natural for us to trace through program execution by stepping through the code. Now you quickly get to a level of complexity where you can no longer hold the entire program in your head, but there is a certain sense of reassurance that we could start from the beginning of the code, and step our way through to the end to understand the code.

By comparison OO can seem like you're drowning in a sea of indirection. You no longer just step through the functions being called, but you also need to understand the state of the objects those functions live in, what types are in use due to polymorphism, or even which method will be called once virtual method dispatch is taken into account. For example, you may try and trace the execution of an IFoo.DoSomething() call, only to find you have no idea which implementation of IFoo is being used. Maybe it's a CompositeFoo which aggregates a WidgetFoo and a GadgetFoo, and GadgetFoo may have an instance of an IAmAtABar, which will completely change what it's DoSomething() method does. Surely this OO is a horrible beast to be avoided at all costs! (Cue functional programmers nodding in agreement ;))

Hiding details is the point of abstraction

This is not a problem with OO. This is the very point of OO. OO allows us to abstract away details so we only deal with a cohesive, understandable amount of information relevant to the abstraction level at which we're working. It's ok not to understand the whole thing at once. I'm pretty sure you're not meant to.

Rather than tracing through a procedural program from top to bottom, for OO programs we move sideways along a plane of abstraction to find the collaborators at that level. Our ProcessOrderCommand calls BillCustomerCommand and ShipInventoryCommand. I don't know that BillCustomerCommand checks the customer is in our loyalty program and this order qualifies for a 10% discount. That's a different level of detail that lives at another level of abstraction. All these little details are mercifully hidden so we can understand that processing an order means billing a customer and shipping some inventory to them.

Holding all the combinations of state at each level of abstraction in our heads becomes a near impossibility, but we don't need it. That's what our abstraction is for; encapsulating all the details and freeing us to work at the optimum level of abstraction for our current problem. We can then switch between levels of abstraction to get the information relevant to the problem we're trying to solve.

How can obscuring details be a good thing?

We pay a price for the traceability of procedural code. It tends to be hard to change because the code is all about implementation; the "how" rather than the "what" or "why". This can also make it hard to test, because isolating a section of code from the execution state is difficult, which makes it even harder to change with confidence.

OO trades of some of this traceability for the ability to use abstractions in the form of objects in our code. By hiding details at one level we can better see the main features of another; we can see the forest for the trees. Abstractions also let us express the "what" and "why" of the code. Because we're programming to abstractions we can potentially change the details those abstractions encapsulate without affecting the rest of the system. In fact we can modify the behaviour of our system just by adding objects, rather than modifying existing code (see Open Closed Principle of SOLID). This encapsulation of details also lets us isolate small units for testing purposes.

Abstraction can be painful

Now it's important to realise there are costs to the OO approach. We've talked about losing some of the ease with which we could trace through procedural code. This can also make concurrency difficult when we need to synchronise bits of state in different places. There can also be problems with the impedance mismatch between abstract concepts and technical implementation. A related problem is that of leaky abstractions, where the encapsulation we've chosen breaks down in places and affects other parts of this system, increasing coupling and actually making the code harder to change (the opposite effect of what OO is designed for).

While we can mitigate against these problems, it's worth acknowledging that we will experience some level of pain from all these issues when using OO. OO gives us a lot, but there's no such thing as a free lunch. This is, incidentally, a great reason to look at other programming paradigms, as well as different frameworks and languages that all address these issues in different ways and to differing extents. Combining techniques (such as functional and object-oriented) can help give you some of the best of all worlds.

Getting the most from abstractions

There is a whole lot of design guidance that can help us with OO abstractions. SOLID, the 4 rules of simple design, TDD and related disciplines, GRASP, etc. I really recommend looking into all that stuff, but for this post I want to look at it from the more general viewpoint of what we want to get out of the abstractions (read: rant).

Note: Just to clarify, when I talk about abstractions here, I'm referring to an object or group of objects that encapsulate some part of your system. We're really talking OO design, but in the general terms of abstraction.
  • Don't mix levels of abstraction. Each piece of an abstraction should be at a similar level of detail.
  • Abstractions are lots of work. Don't have one if you're not willing to look after it. You will need to nurture it and help it grow into a useful member of your design society. Corollary: don't use too many abstractions. They should be small and cohesive. The aim is to do more with less code.
  • Define abstractions around things that need to change together. If you need to add lots of views to your app, writing a new view should not require changing bits of 7 different abstractions. Similarly, if you are only going to be using SQL Server, you don't need to abstract that fact away so you can plug in a new DB engine (although your data access details will probably live at a similar level of abstraction, so it won't be impossible to change either). Optimise for the things that change all the time. One big class that never needs to change is preferable to 30 tiny classes that all need to change all the time.
  • Favour wide abstractions over deep ones. In other words, favour aggregating/composing several collaborators, rather than having many layers of objects (A uses B uses C uses ... Z). Having to traverse many layers of abstraction down a deep hierarchy to pass one new piece of data is soul crushing.
  • Keep data close to where it is used. Having to pass the same piece of data through many abstractions is also soul crushing.
  • Don't hide the important stuff. You want to abstract away the unnecessary details, not the key feature of what you're working on.
  • Obey the Law of Demeter. She will try and tell you when your abstractions are leaking. You can then fix these leaks by applying the Tell, Don't Ask principle.
  • Tests can be another good guide to tell you when your abstractions are going wrong. If you need to do loads of setup or reach into lots of different collaborators then your abstraction is wrong. Unfortunately it doesn't tell you how to do it right, but if you write the test you wish you had first, then you have a better chance of getting the abstraction you need.
  • Avoid creating abstractions for testability alone. A well abstracted design should be naturally testable, but not all testable designs are well abstracted. The Reused Abstractions Principle (RAP) tells us to look for valuable abstractions. Just testing against an interface does not mean we have a good abstraction. (Thanks to Xerx for pointing this out.)
  • Modelling the real world as objects is probably not what you're after. Read Uncle Bob's Coffee Maker example (linked to here). (Aside: this doesn't conflict with the goals of DDD, which works to accurately reflecting business needs and concepts.)
  • Layered / n-tier architectures do not necessarily help with abstractions. If you need to force your classes into presentation, business logic and data access layers, then your abstractions are not free to grow or (more importantly) shrink as required. Maybe the abstractions you choose will naturally group together into layers, or maybe each abstraction will have its own layers. I think forcing it can be detrimental though. If you want to change how you are loading data for a screen (say, optimised query instead of what your ORM generates), you don't want to fight the weight of every other data access abstraction in the code.
  • Separate infrastructure code concerns from app-specific abstractions. Abstractions should be modular), the infrastructure can compose them together. An example of this infrastructure is your friendly neighbourhood IoC container. (We also want to keep infrastructure small and prevent it bleeding into our abstractions; again, less is more.)
  • Ideally you want to be able to be able to try out new abstractions without fighting against the existing ones and without breaking too many conventions. Admittedly I have not found a good solution for this, but keep it in mind when designing and consider how much work it would be to do this differently if required for another feature.

Finally, remember there is no single right abstraction for a given problem. This still seems more art than science, and practice and experience will trump all the rules and guidelines you can find. The trick is finding ways of experimenting productively with different options in both real and hobby projects.

Friday, 3 June 2011

Random software dev thoughts (June 2011 edition)

I've been struggling with a few thoughts on software development lately, and said as much on Twitter. OJ suggested I blog it, so here goes a rambling, incoherent post (even more so than usual). Entirely OJ's fault. ;)

Abstraction

Back in the old days I spent more time than I care to admit writing procedural code in classes. More recently I've also played around with Over-enthusiastic OO (OOO), using insanely fine-grained objects for everything. These approaches roughly correspond to not enough and too much abstraction.

And so begins my hunt for the right amount of abstraction. I find fine-grained objects with single responsibilities really nice to work with at a micro level, but at a macro level the weight of useless abstractions can be crushing.

This weight is a symptom of violating the Reused Abstractions Principle (RAP), which Mark explains very nicely. (I've also experienced this as the NAF problem, or "Not Another Factory". Anyone else felt this?) The trick is finding the "right" abstraction that captures a reusable aspect of the system. This is made harder in C# by the fact(?) that it makes effective unit testing difficult without creating abstractions everywhere.

So my options include hoping to drive out the right abstractions and assume that this will magically make it easy to test without using otherwise useless abstractions, or starting to use more real stuff in unit tests (having larger units, with potentially more suitable abstractions), in which case my tests get messier as I throw loads of data through my tests to exercise all the code paths.

One thing I'll definitely rule out is going back to procedural code where classes are little more than a namespace, as I found this quickly became unmanageable; difficult to change, impossible to test effectively, and a nightmare to build on. At least my poorly chosen, testable abstractions are better than untestable spaghetti.

Refactoring

Another thing I'm struggling with is refactoring. I find the distinction between refactoring and redesign helps, but it's not always quite that simple. Sometimes despite refactoring the little section of the code you're currently working in and around, the tendrils of previous design decisions still reach out across many other classes. Once you have coupling like this, when can you ever clean it up by refactoring alone?

Contrived example: if you have ye olde layered architecture and are passing data (or transformations of the same data) through the layers and through multiple classes, untangling all the places this data has spread can only really be tackled in a redesign. And, as described in my original post, once you are in the territory of redesign you are optimising for today's cases and possibly making your next change harder if it tries to flex in a different direction. On the other hand, the overall design is getting increasingly bloated and harder to understand as you continue to build on it while making localised design changes using refactoring.

Is it possible to gradually refactor even these wide-spread problems? Or are there times when you just have to bite the bullet and redesign? Is it just a case of writing off the redesign time that could otherwise be spent on features your customers care about, in the hope it will speed you up later? Or is it a case of admitting defeat; the design can actually become too compromised to completely fix if we miss too many cues to refactor, and we just have to go into legacy code mode and keep making local improvements in the hope that these isolated changes will eventually link up and restore a fairly clean, usable design?

End rant

So those are the problems that are currently weighing on my mind.

On one hand, I can see an incredible amount of value in using very fine abstractions, but on the other abstractions can quickly turn into a dead weight that impedes progress. I'm torn between continuing to get the benefit of lots of small classes, and trying to reduce the amount of weight and noise contributed by the less-useful of my abstractions.

On the refactoring side, I've advocated small, localised refactoring in the hope of gradually pushing a design into good shape, but I've also observed times when this just doesn't seem possible. In these cases the only options seem redesigning larger areas of code, or just capitulating to entropy and keeping small islands of cleanliness within the mess (this is in keeping with Eric Evans' somewhat depressing observation that "the natural state of all software is a big ball of mud"). The problem here is trying to decide what approach to take, and balancing time invested now versus the future cost of not making that investment. I'm also bothered by the idea that the only evidence we have to base this decision on is poorly-informed guesswork and gambling based on that.

Experience over multiple projects would most definitely help this, but as the real pain of this stuff seems to be felt in large-ish, multi-year projects, this experience and chances to experiment with different approaches is hard to come by.

As a quick aside, I'm pretty sure these two problems I'm having are closely related.

Appreciate any thoughts or counselling on offer. :)