3. Object Oriented Thinking
Before we can get onto the really meaty stuff, it’s important for us to take a pause and check on our understanding of what object oriented programming is. Even though this isn’t a book for those first starting out in their development career, it becomes all to easy to form a fixed opinion of OO stuff when we first come across the idea as developers.
Further to this, there’s a plethora of online tutorials that all take pretty much the same approach to explaining it and, as such, these tend to propagate some rather fixed thinking in this area.
If we’re going to build our palaces and castles in beautiful and elegant PHP code, we must turn our attention first to what we are intending to build upon. In order to raise glorious edifices of logical magnificence that not just blend into the world wide web’s skyline but to actually be part of what defines that skyline, we must focus on the groundwork.
Our applications need to not only suffer the slings and arrows of outrageous fortune but also stand firm against the whims and change requests of our business teams and product managers. These change requests are the earthquakes and floods that our application development must withstand. We know that they’re going to come. We can brace ourselves effectively against the flood. Just so long as our foundations are rock solid.
So let’s go back to the basics and examine what we already know.
All too frequently, a tutorial will take the notion of an object as being a representation of a real world “thing” and how the developer is supposed to hang on to this notion as the author goes on to explain how real world things have a particular set of characteristics and attributes that go on to define what the thing is and what it does.
The benefit of this approach is that the examples given are already familiar to the reader and as such allows him or her to connect the concepts with current knowledge and experience. Anyone setting out to learn object oriented PHP will know what a car is. Or that a dog is a type of animal. For anyone approaching object oriented development from a procedural background, something that is certainly prevalent in the PHP arena, this relationship between code and real-world objects can help the developer reach that “penny drop” moment sooner; that point where he or she will suddenly “get it”.
The danger here though, and it’s a pitfall that many of us have fallen into, is that the developer starts to cling on to this idea of linking the objects they create with real world examples. The next project that they take on, they’ll starting hunting down the “nouns” in the project brief and planning their objects around them. This here is a User, that over there is a Product. It’s a perfectly valid start to the process of identifying and designing the objects that will be the key players in our new application. But it is only a start. Unfortunately, that is commonly where the tutorials end.
If you’re going to be guiding and mentoring the more junior members of your team, you’re going to see some quite iffy code along the way. Just to make sure that we’re on the same page, so to speak, I would like to set out the path that we’re going to take in order to reach object oriented thinking. It’s starts with a shiny new junior, a likeable chap that we’ll call Joe. The route that Joe has taken through the PHP learning landscape in order to arrive at our office OO ready is not an uncommon one. I’d like to say that it’s entirely fictional but that wouldn’t be quite true. You see, Joe’s path was actually the path that I took albeit with some hearty doses of artistic license added here and there. I have no shame.
From the outset, Joe learnt to script in PHP; building out the pages of the sites that he built with one script for each. Here an index.php, there an aboutus.php. Things such as database access and variable assignments could be done at the top of the file, then down below the page itself is built up with html and peppered with inline php constructs. At the top, the program logic, at the bottom, the output.
After a while, Joe’s realised that he’s duplicating significant amounts of code across his scripts. This is the point where he starts breaking chunks out into separate files; header, footer, routines for accessing the database, others for building html tables. This of course is all in accordance with the tutorials that he has been following regarding the use of the include and require functions.
Before long, he’s creating libraries of commonly used functions which he can port from one project to another. Big old PHP files with names such as database.php, html.php and other such collections of useful functions gathered together in a single file.
What happens then when Joe starts reading up about object oriented development? He’s introduced to a Car class that not only has properties for things like the wheels and the engine, but also methods (functions!) for when the car needs to do things like move(), turn() and stop().
Joe thinks it’d be a great idea to wrap his carefully crafted library of functions in class statements. This now is the point where Joe could take one of two paths. Does he instantiate a Database object in order to use those transplanted library methods? Or does he add the static keyword to the function declarations so that he can call the class methods statically?
Well, instantiating the object doesn’t really look to be terribly useful. Let’s go with the static.
Now Joe’s coding regularly features things like this:
<?php
include("Config.php");
include("Database.php");
$conn = Database::connect($dbname, $dbuser, $dbpass);
Youch.
Eventually our developer will make the transition from wrapping his function libraries in class statements to identifying the nouns in his system and building objects around those. This is very much in line with the tutorials that he has followed. What we see next from Joe is the predominant but natural outcome of those tutorials and their habit of finishing a topic early.
Joe’s classes have become enormous. The classes at the centre of the application, whether this be a User class or a Product class, are truly huge, spanning thousands of lines of code and with methods so long the start and end of them cannot be viewed onscreen at the same time. Scroll, scroll, scroll.
What’s going on here?
The developer has fallen back on his procedural code knowledge once more in order to code up object methods that, rather than performing a single function, run through an entire process from start to finish. Perhaps the most obvious example of this is an object method, probably located in a class called ‘User’ and most likely named something along the lines of ‘create’ or ‘register’.
I’ll grant you that for many web applications, the user registration process can be a convoluted one, performing validation against a number of submitted form fields, creating a user record along with the login credentials, possibly also storing an address and linking the newly created user to it as well as hooking up any number of configuration settings. What has happened here is that the developer has taken a procedural process and simply transplanted it into an object method. What used to live as a single php page for receiving and processing a user registration has now been transplanted wholesale into a single method in the User class. Not a literal copy and paste operation you understand, but a selective extraction and remodelling of the code to squish it in between those opening and closing braces.
Joe starts by validating the parameters that were passed in to the register() method. If that’s all fine and dandy, he moves onto performing the database ops necessary to get the data into storage and extract the ids. This part may result in just one query being run, or it could be many; the basic user details could be accompanied by a row of default preference settings in one table, a physical home address in another. If that all proceeds ok, Joe then sends out the welcome-cum-verify email before finally returning a true or false back to the original invoker of the method.
In just that one method, we have a minimum of three fracture points - places where the process can fail - leading to a brittle design that can fall over a number of ways and be difficult to maintain at the same time. The validation stage, the database stage and the email sending stage.
Now is a good time to introduce a key principle to object oriented thinking. I don’t recall where I first encountered this one but it has stayed with me ever since. It goes like this: An object should either know things or do things, but never both.
So many of the applications that we build are going to have a User model object. If such a model object represents what we know about a particular user, and we know that user registration is a process then it naturally follows that our User model class cannot have a register() method.
The more that you think about this, the more it makes sense. After all, why should all of our instances of the User class be lugging around a method whose purpose is to create the user record in the first place. When would such an instance have need of the register() method again?
If you were to take that knows things or does things principle and apply it to the model layer of your most recent application, how many model classes would it suggest that you change? How many of the entities in your model both know the details of the thing that it represents (i.e. hold the data for) and provides ways to manipulate that data beyond the act of setting and getting it?
In most cases, the primary residents of our model layer will be objects that represent the data that lives inside our application. In this sense, these are the objects that know things. For instance, suppose we have an application that’s going to be handling lots of User instances. We ought to be confident that each instance knows the name, date of birth and email address of the User that it represents. In all likelihood, a full blown application will have User models that hold a lot more detail than that but this will serve us as a good starting point for the time being. None of these instances should be holding methods that go beyond managing the individual pieces of data that they represent. The methods of our model classes should be entirely introspective. Setters and getters are naturally of this ilk but what about the methods that we can identify as being processors?
What do I mean by processors?
Processors are methods that do things. A method that validates user input is a processor. A method that triggers the sending of an email is a processor. In almost every case, unless a processor is specifically introspective, it can be moved out into a new object that’s designed to handle, to encapsulate that process.
For the registration procedure, ideally what we are looking for is a whole range of objects all collaborating in the user account creation process. Each object will have a tightly defined area of focus, performing a single task and performing it well. Having each tiny piece operating as a part of the whole is our goal here. We’re looking for a range of validator objects responsible for checking each part (a password validator can confirm that the offered password has the right number and range of characters, a date validator can confirm that a submitted date of birth is in the right format, and perhaps importantly, is within the correct range (over 18s only?).
When we take this approach, we’re neatly separating the logic that performs validation away from the logic that performs record creation. Continuing in an ideal fashion, our process for record creation should be nicely squirrelled away and separate from the objects that represent those data records in the first place.
Now that you’ve just read that aside (you did, didn’t you?), I’m going to make my first mention of the Single Responsibility Principle. The Single Responsibility Principle is, to my mind, the absolute single most important one of the five principles that go in to make up the set of SOLID principles. It also pleases me greatly that it’s the first one in the set. Familiarity with the SRP can only help to reinforce the idea that our objects should either know things or do things, but never both. If we have objects in our system that know things and do things at the same time, it’s a reasonably safe bet that we’re already violating the Single Responsibility Principle. When we get to that chapter, I hope to make it clear as to why this will be.
Returning to Joe then, we know that his tutorials taught him to build his objects based around the nouns of his system. We also know that those self same beginner tutorials didn’t tell him when to stop adding methods to his objects. The good news is though that we’re now in a much better position to enlighten him as to when he’s putting too much into a single class.
Regrettably it’s not so easy to draw a line between the knowing and the doing. Adding processors to an object that’s only supposed to be knowing things is all too easy to do. Worse still, it usually begins with the tiniest little thing and before you know it, the slow but inexorable creep towards bloated classes has begun. How then are you supposed to watch out for this, outside of an all-out code audit?
Taking a finger in the air approach, you should start to feel uncomfortable whenever any of the following signs appear in an object method.
Conditional statements such as an if statement or a switch appear in a method, and those conditionals are not used in performing validation but are selecting different logic paths to follow based on a incoming parameter. Try to restrict the use of ‘if’ statements to validation only. In the event that you’re creating branched processes in your code because of the value of a particular property, you’re almost certainly going to be better served by creating an independent object for each branched process and utilising something like the Strategy pattern, or the Chain of Responsibility pattern in order to handle the processing.
You can’t see the start and end of a particular method at the same time. If a single method occupies more than a single screenful in your editor of choice, you have a problem. Look carefully at those methods to see if you can’t at least break them down - the chances are good that they’re doing more than one task. As a general rule of thumb, I’d suggest ensuring that your methods contain no more than twenty lines of active code.
There are lots of comments inside your methods. Nicely documented code is a good thing, but if you find a method that feels the need to explain every step it’s taking, it’s either taking too many steps or the author thinks you’re a numpty. The best methods are nice and short, with easy to follow code and a terse but helpful explanation of the intent in the docblock above it.
These are the types of things that we need to be looking out for when we’re reviewing the code of our more junior team members, and indeed the code that we produce ourselves. Detecting code smells is a knack that comes with both knowledge and experience but just by being aware of these three things, you’re already well on your way. Nevertheless, code smells are certainly rife in PHP. Somehow it just seems to be something that we in the PHP community have grown up with, although of course there are plenty of examples to be found in other languages too. Even so, they are certainly something that we need to guard against. Much of the advice in the upcoming chapters is geared towards not such much how to avoid code smells, but how to take the right approach to creating an application whose objects don’t stink.
Martin Fowler has an excellent bliki post on code smells which succinctly explains what they are but in doing so he makes mention of anaemic objects that might benefit from having behaviours added to them. I’m not in full agreement with the brevity of this post since I really am very keen to put forward the notion that any single class, and the objects that are instantiated from them, should have a laser-focussed intent and purpose. If you’re interested, you can find a list of the more common code smells online, but I would actually be keen to suggest that you look them up after we’re done with the first part of the book. Reading about them afterwards is much more likely to reinforce what you will have already read at that point.
Anyway, let’s bring this back on topic. If we were to continue along the path of tightening the focus of our imaginary User class, what questions should we be asking? We’ve already considered the possibility of removing the register() method since we’ve determined that it doesn’t belong within instances of our User class. How about password handling? This is perhaps the second most prevalent wrong thing to be found within a user model. For sure, we may want to accept and hold the hashed value of a user’s password but do we actually want to incorporate the hashing mechanism within the user class itself?
The immediate answer seems to be yes, since it’s something that we’ll be doing only in conjunction with the user’s own data. Nevertheless, we need to consider all of the things that we might want to do with passwords. For starters, we’ll need to be able to accept a password from the user when they register for an account, which then needs to be hashed appropriately. Obviously. we will also need to be able to check a password when they log in, generating a hash of of the password that they’ve given us and checking it against the hash that we’ve already stored. Already, we have two processors for the most basic of operations.
Experience tells us that we are also going to have to provide some sort of password reset mechanism, since some of our users are likely to fall squarely into the can’t remember passwords for toffee camp. Do we also need to implement a lock-out mechanism after three failed login attempts?
Answering these questions leads us to the conclusion that actually, building password handling logic into our User class is maybe not such a good idea afterall. Instead, we can wrap up all of these password related methods inside a new PasswordManager class, instances of which can either be injected into our User instances at creation time, or lazily loaded on request dependent upon our appetite for tight/loose coupling between a user instance and a password management processor.
Simply by hiving off two very common processes, that of registration and password management, we’ve not only improved the focus of the User class dramatically, we have also created two additional classes with tightly defined areas of responsibility. That in itself is as awesome as a large and tasty pint of ice cold beer. Well, maybe almost as awesome - nothing really comes close to a large and tasty pint of ice cold beer. Ever.
So where does this leave us now? All being well, we have progressed from the stage where the most basic tutorials leave off. We are now a little better equipped to guard against thinking of our primary classes as silos for the ever expanding lists of processor methods that our application appears to need.
This is rather the key point that I want to make at this stage. All too often, it’s terribly easy to get stuck with the real world nouns idea when thinking about the objects that will come into play within our applications, when what we really need to develop is an ability to think of application objects in an abstract sense. There isn’t a tangible real world equivalent for a PasswordManager but if we can successfully keep that notion of objects only being able to know things or do things at the forefront, we’re a long way down the trail of instinctively knowing what should go where.
Summary
For our first chapter, I’ve rather concentrated on the idea of grouping our application’s objects into two distinct camps; the knowers and the doers. This is very much the key theme that I would like to introduce at this point. In a very general sense, our knowers are likely to be the principle citizens that reside in our model layer. They hold and represent the data that lies at the heart of our application. These are the users, the products, the orders and the invoices. They present an interface which is designed to set, retrieve, manipulate or transform the individual elements of data that they are responsible for.
Then there are the doers. The objects within our system that cause things to happen and whose interfaces comprise methods that we can call in order to trigger those things. These doers might mask the simplest of processes, such as the hashing of a password, or they might be a facade onto much more complex procedures, governing the various stages of user registration for example.
Clearly then, I’m really quite keen on this idea. Largely because I’ve witnessed the positive effects that it can have. It’s not so much the idea in itself per se, more that the end results speak for themselves. Smaller, tighter, leaner and meaner objects are so much more efficient and maintainable than the alternative: classes treated like silos into which we’ve dumped great quantities of superficially related methods almost as if we considered the class name to be little more than a namespace for a coagulated library of code.