Modernizing Legacy Applications In PHP cover page
Modernizing Legacy Applications In PHP

Modernizing Legacy Applications In PHP

1. Consolidate Classes and Functions

Now that we have an autoloader in place, we can begin to remove all the include calls that only load up class and function definitions. When we are done, the only remaining include calls will be those that are executing logic. This will make it easier to see which include calls are forming the logic paths in our legacy application, and which are merely providing definitions.

We will start with a scenario where the codebase is structured relatively well. Afterwards, we will answer some questions related to layouts that are not so amenable to revision.

For the purposes of this chapter, we will use the term include to cover not just include but also require, include_once, and require_once.

Consolidate Class Files

First, we will consolidate all the application classes to our central directory location as determined in the previous chapter. Doing so will put them where our autoloader can find them. Here is the general process we will follow:

  1. Find an include statement that pulls in a class definition file.
  2. Move that class definition file to our central class directory location, making sure that it is placed in a sub-path matching the PSR-0 rules.
  3. In the original file and in all other files in the codebase where an include pulls in that class definition, remove that include statement.
  4. Spot check to make sure that all the files now autoload that class by browsing to them or otherwise running them.
  5. Commit, push, and notify QA.
  6. Repeat until there are no more include calls that pull in class definitions.

For our examples, we will assume we have a legacy application with this partial file system layout:

/path/to/app/


    classes/                # our central class directory location
        Mlaphp/
            Autoloader.php  # A hypothetical autoloader class
    foo/
        bar/
            baz.php         # a page script
    includes/               # a common "includes" directory
        setup.php           # setup code
    index.php               # a page script
    lib/                    # a directory with some classes in it
        sub/
            Auth.php        # class Auth { ... }
            Role.php        # class Role { ... }
            User.php        # class User { ... }

Your own legacy application may not match this exactly, but you get the idea.

Find A Candidate include

We begin by picking a file, any file, then we examine it for include calls. The code therein might look like this:

1 <?php
2 require 'includes/setup.php';
3 require_once 'lib/sub/User.php';
4 
5 // ...
6 $user = new User();
7 // ...
8 ?>

We can see that there is a new User class being instantiated. On inspecting the lib/sub/User.php file, we can see it is the only class defined therein.

Move The Class File

Having identified an include statement that loads a class definition, we now move that class definition file to the central class directory location so that our autoloader function can find it. The resulting file system layout now looks like this (note that User.php is now in classes/):

/path/to/app/


    classes/                # our central class directory location
        Mlaphp/
            Autoloader.php  # A hypothetical autoloader class
        User.php            # class User { ... }
    foo/
        bar/
            baz.php         # a page script
    includes/               # a common "includes" directory
        setup.php           # setup code
        db_functions.php    # a function definition file
    index.php               # a page script
    lib/                    # a directory with some classes in it
        sub/
            Auth.php        # class Auth { ... }
            Role.php        # class Role { ... }

Now the problem is that our original file is trying to include the class file from its old location, a location that no longer exists. We need to remove that call from the code …

index.php


1 <?php
2 require 'includes/setup.php';
3 
4 // ...
5 // the User class is now autoloaded
6 $user = new User();
7 // ...
8 ?>

… but of course there are likely to be other places where the code attempts to load the now-missing lib/sub/User.php file.

This is where a project-wide search facility comes in handy. We have different options here, depending on your editor/IDE of choice and operating system.

  • In GUI editors like TextMate, SublimeText, and PHPStorm, there is usually a “Find in Project” menu item that we can use to search for a string or regular expression across all the application files at once.
  • In other editors like Emacs and Vim, there is generally a key-binding that will search all the files in a particular directory and its subdirectories for a string or regular expression.
  • Finally, if you are of the old school, you can use grep at the command line to search all the files in a particular directory and its subdirectories.

The point is to find all the include calls that refer to lib/sub/User.php. Because the include calls can be formed in different ways, we need to use a regular expression like this to search for the include calls:

    ^[ \t]*(include|include_once|require|require_once).*User\.php

If you are not familiar with regular expressions, here is a breakdown of what we are looking for:

    ^               Starting at the beginning of each line,
    [ \t]*          followed by zero or more spaces and/or tabs,
    (include|...)   followed by any of these words,
    .*              followed by any characters at all,
    User\.php       followed by User.php, and we don't care what comes after.

(Regular expressions use “.” to mean “any character” so we have to specify “User\.php” to indicate we mean a literal dot, not any character.)

If we use a regular expression search to find those strings in the legacy codebase, we will be presented with a list of all matching lines and their corresponding files. Unfortunately, it is up to us to examine each line to see if it really is a reference to the lib/sub/User.php file. For example, this line might turn up in the search results …

    include_once("/usr/local/php/lib/User.php");

… but clearly it is not the User.php file we are looking for.

We could be more strict with our regular expression so that we search specifically for lib/sub/User.php but that is more likely to miss some include calls, especially those in files under the lib/ or sub/ directories. For example, an include in a file in sub/ could look like this:

    include 'User.php';

As such, it’s better to be a little loose with the search to get every possible candidate, then work through the results manually.

Examine each search result line, and if it is an include that pulls in the User class, remove it and save the file. Keep a list of each modified file, as we will need to test them later.

At the end of this, we will have removed all the include calls for that class throughout the codebase.

Spot Check The Codebase

After removing the include statements for the given class, we now need to make sure the application works. Unfortunately, because we have no testing process in place, this means we need to pseudo-test or “spot check” by browsing to or otherwise invoking the modified files. In practice this is generally not difficult, but it is tedious.

When we spot check we are looking specifically for “file not found” and “class not defined” errors. These mean, respectively, that a file tried to include the missing class file, or that the autoloader failed to find the class file.

To do the “testing” we need to set PHP error reporting so that it either shows us the errors directly, or logs the errors to a file that we examine while “testing” the codebase. In addition, the error reporting level needs to be sufficiently strict that we actually see the errors. In general, error_reporting(E_ALL) is what we want, but because this is a legacy codebase, it may show more errors than we can bear (especially “variable not defined” notices). As such, it may be more productive to set error_reporting(E_WARNING). The error reporting values can be set either in a setup or bootstrap file, or in the correct php.ini file.

Commit, Push, Notify QA

After the “testing” is complete and all errors have been fixed, commit the code to source control and (if needed) push it to the central code repository. If you have a QA team, now would be the time to notify them that a new testing round is needed, and provide them the list of files to test.

Do … While

That is the process to convert a single class from include to autoloading. Go back through the codebase and find the next include that pulls in a class file and begin the process again. Continue doing so until all classes have been consolidated into the central class directory location and their relevant include lines have been removed. Yes, this is a tedious, tiresome, and time-consuming process, but it is a necessary step towards modernizing our legacy codebase.

Consolidate Functions Into Class Files

Not all legacy applications use a large set of classes. Often, instead of classes, there is a significant number of user-defined functions for core logic.

Using functions is not a problem in itself, but it does mean that we need to include the files where the functions are defined. But autoloading only works for classes. It would be good to find a way to automatically load the function files as well as the class files. That would help us remove even more include calls.

The solution here is to move the functions into class files, and call the functions as static methods on those classes. That way, the autoloader can load up the class file for us, and then we can call the methods in that class.

This procedure is more complex than when we consolidated class files. Here is the general process we will follow:

  1. Find an include statement that pulls in a function definition file.
  2. Convert that function definition file into a class file of static methods; we need to pick a unique name for the class, and we may need to rename the functions to more suitable method names.
  3. In the original file and in all other files in the codebase where any functions from that file are used, change calls to those functions into static method calls.
  4. Spot check to see if the new static method calls work by browsing to or otherwise invoking the affected files.
  5. Move the class file to the central class directory location.
  6. In the original file and in all other files in the codebase where an include pulls in that class definition, remove the relevant include statement.
  7. Spot check again to make sure that all the files now autoload that class by browsing to them or otherwise running them.
  8. Commit, push, and notify QA.
  9. Repeat until there are no more include calls that pull in function definition files.

Find A Candidate include

We pick a file, any file, and look through it for include calls. The code in our chosen file might look like this:

1 <?php
2 require 'includes/setup.php';
3 require_once 'includes/db_functions.php';
4 
5 // ...
6 $result = db_query('SELECT * FROM table_name');
7 // ...
8 ?>

We can see that there is a db_query() function being used, and on inspecting the includes/db_functions.php file, we can see that function along with several others defined therein.

Convert The Function File To A Class File

Let’s say that the db_functions.php file looks something like this:

includes/db_functions.php


 1 <?php
 2 function db_query($query_string)
 3 {
 4     // ... code to perform a query ...
 5 }
 6 
 7 function db_get_row($query_string)
 8 {
 9     // ... code to get the first result row
10 }
11 
12 function db_get_col($query_string)
13 {
14     // ... code to get the first column of results ...
15 }
16 ?>

To convert this function file to a class file, we need to pick a unique name for the class we’re about to create. It seems pretty clear in this case, both from the file name and from the function names, that these are all database-related calls. As such, we’ll call this class “Db.”

Now that we have a name, we’ll create the class. The functions will become static methods in the class. We are not going to move the file just yet; leave it in place with its current file name.

Then we make our changes to convert the file to a class definition. If we change function names, we need to keep a list of old and the new names for later use. After the changes, it will look something like the following (note the changed method names):

includes/db_functions.php


 1 <?php
 2 class Db
 3 {
 4     public static function query($query_string)
 5     {
 6         // ... code to perform a query ...
 7     }
 8 
 9     public static function getRow($query_string)
10     {
11         // ... code to get the first result row
12     }
13 
14     public static function getCol($query_string)
15     {
16         // ... code to get the first column of results ...
17     }
18 }
19 ?>

The changes are very moderate: we wrapped the functions in a unique class name, marked them as public static, and made minor changes to the function names. We made no changes at all to the function signatures or code in the functions themselves.

Change Function Calls To Static Method Calls

We have converted the contents of db_functions.php from function definitions to a class definition. If we try to run the application now, it will fail with “undefined function” errors. So, the next step is to find all of the relevant function calls throughout the application and rename them to static method calls on our new class.

There is no easy way to do this. This is another case where project-wide search-and-replace becomes very handy. Using our preferred project-wide search tool, search for the old function call, and replace it with the new static method call. For example, using a regular expression, we might do this:

Search for …


    db_query\s*\(

Replace with …


    Db::query(

The regular expression indicates the opening parenthesis, not the closing one, as we don’t need to look for parameters in the function call. This helps to distinguish from function names that might be prefixed with the function name we’re searching for, such as db_query_raw(). The regular expression also allows for optional whitespace between the function name and the opening parenthesis, since some style guides recommend such spacing.

Perform this search-and-replace for each of the old function names in the old function file, converting each to the new static method call in the new class file.

Spot Check The Static Method Calls

When we are finished renaming the old function names to the new static method calls, we need to run through the codebase to make sure everything works. Again, there is no easy way to do this. You may need to go so far as browsing to, or otherwise invoking, each file that was changed in this process.

Move The Class File

At this point we have replaced the contents of the function definition file with a class definition, and “testing” has showed that the new static method calls work as expected. Now we need to move the file to our central class directory location and name it properly.

Currently, our class definition is in the includes/db_functions.php file. The class in that file is named Db, so move the file to its new autoloadable location as classes/Db.php. Afterwards, the file system will look something like this:

/path/to/app/


    classes/                # our central class directory location
        Db.php              # class Db { ... }
        Mlaphp/
            Autoloader.php  # A hypothetical autoloader class
        User.php            # class User { ... }
    foo/
        bar/
            baz.php         # a page script
    includes/               # a common "includes" directory
        setup.php           # setup code
    index.php               # a page script
    lib/                    # a directory with some classes in it
        sub/
            Auth.php        # class Auth { ... }
            Role.php        # class Role { ... }

Do … While

Finally, we follow the same ending process as we did when moving class files …

  • Remove the related include calls for the function definition file throughout the codebase
  • Spot check the codebase
  • Commit, push, notify QA

… and repeat it for every function definition file we find in the codebase.

Common Questions

Should We Remove The Autoloader include Call?

If we placed our autoloader code in a class as a static or instance method, our search for include calls will reveal the inclusion of that class file. If you remove that include call, autoloading will fail, because the class file will not have been loaded. This is a chicken-and-egg problem. The solution is to leave the autoloader include in place as part of our bootstrapping or setup code. If we are fully diligent about removing include calls, that is likely to be the only include remaining in the codebase.

How Should We Pick Files For Candidate include Calls?

There are several ways to go about this. We could …

  • … manually traverse the entire codebase and work file-by-file.
  • … generate a list of class and function definition files, and then generate a list of files that include those files.
  • … search for every include call and look at the related file to see if it has class or function definitions.

What If An include Defines More Than One Class?

Sometime a class definition file may have more than one class definition in it. This can mess with the autoloading process. If a file named Foo.php defines both Foo and Bar classes, then the Bar class will never be autoloaded, because the file name is wrong.

The solution is to split the single file into multiple files. That is, create one file per class, and name each file for the class it contains per the PSR-0 naming and autoloading expectations.

What If The One-Class-Per-File Rule Is Disagreeable?

I sometimes hear complaints about how the one-class-per-file rules is somehow “wasteful” or otherwise not aesthetically pleasing when examining the file system. Isn’t it a drag on performance to load that many files? What if some classes are only needed along with some other class, such as an Exception that is only used in one place? I have some responses here:

  • There is, of course, a performance reduction in loading two files instead of one. The question is how much of a reduction, and compared to what? I assert that, compared to the other more likely performance issues in our legacy application, the drag from loading multiple files is a rounding error. It is more likely that we have other, far greater performance concerns. If it really is a problem, using a bytecode cache like APC will reduce or completely remove these comparatively small performance hits.
  • Consistency, consistency, consistency. If some of the time a class file has only one class in it, and at other times a class file has more than one class in it, that inconsistency will later become a source of cognitive friction for everyone on the project. One of the main themes through legacy applications is that of inconsistency; let us reduce that inconsistency as much as we can by adhering to the one-class-per-file rule.

If we feel that some classes “naturally” belong together, it is perfectly acceptable to place the subordinate or child classes in a subdirectory beneath the master or parent class. The subdirectory should be named for that higher class or namespace, per the PSR-0 naming rules.

For example, if we have a series of Exception classes related to a Foo class:

    Foo.php                        # class Foo { ... }
    Foo/
        NotFoundException.php      # class Foo_NotFoundException { ... }
        MalformedDataException.php # class Foo_MalformedDataException { ... }

Renaming classes in this way will change the related class names throughout the codebase where they are instantiated or otherwise referenced.

What If A Class Or Function Is Defined Inline?

I have seen cases where a page script has one or more classes or functions defined inside it, generally when the classes or functions are used only by that particular page script.

In these cases, remove the class definitions from the script and place them in their own files in the central class directory location. Be sure to name the files for their class names per the PSR-0 autoloader rules. Similarly, move the function definitions to their own related class file as static methods, and rename the function calls to static method calls.

What If A Definition File Also Executes Logic?

I have also seen the opposite case, where a class file has some logic that gets executed as a result of the file being loaded. For example, a class definition file might look like this:

/path/to/foo.php


 1 <?php
 2 echo "Doing something here ...";
 3 log_to_file('a log entry');
 4 db_query('UPDATE table_name SET incrementor = incrementor + 1');
 5 
 6 class Foo
 7 {
 8     // the class
 9 }
10 ?>

In the above case, the logic before the class definition will be executed when the file is loaded, even if the class is never instantiated or otherwise called.

This is a much tougher situation to deal with than when classes are defined inline with a page script. The class should be loadable without side effects, and the other logic should be executable without having to load the class.

In general, the easiest way to deal with this is to modify our relocation process. Cut the class definition from the original file and place it in its own file in the central class directory location. Leave the original file with its executable code in place, and leave all the related include calls in place as well. This allows us to pull out the class definition so it can be autoloaded, but scripts that include the original file still get the executable behavior.

For example, given the above combined executable code and class definition, we could end up with these two files:

/path/to/foo.php


1 <?php
2 echo "Doing something here ...";
3 log_to_file('a log entry');
4 db_query('UPDATE table_name SET incrementor = incrementor + 1');
5 ?>

/path/to/app/classes/Foo.php


1 <?php
2 class Foo
3 {
4     // the class
5 }
6 ?>

This is messy, but it preserves the existing application behavior while allowing for autoloading.

What If Two Classes Have The Same Name?

When we start moving classes around, we may discover that “application flow A” uses a Foo class, and that “application flow B” also uses a Foo class, but the two classes of the same name are actually different classes defined in different files. They never conflict with each other because the two different application flows never intersect.

In this case, we have to rename one or both of the classes when we move them to our central class directory location. For example, call one of them FooOne and the other FooTwo, or pick better descriptive names of your own. Place them each in separate class files named for their class names, per the PSR-0 autoloading rules, and rename all references to these classes throughout the codebase.

What About Third-Party Libraries?

When we consolidate our classes and functions, we may find some third-party libraries in the legacy application. We don’t want to move or rename the classes and functions in a third-party library, because that would make it too difficult to upgrade the library later. We would have to remember what classes were moved where and which functions were renamed to what.

With any luck, the third-party library uses autoloading of some sort already. If it comes with its own autoloader, we can add that autoloader to the SPL autoloader registry stack in our setup or bootstrap code. If its autoloading is managed by another autoloader system, such as that found in Composer, we can add that autoloader to the SPL autoloader registry stack, again in our setup or bootstrap code.

If the third-party library does not use autoloading, and depends on include calls both in its own code and in the legacy application, we are in a bit of a bind. We don’t want to modify the code in the library, but at the same time we want to remove include calls from the legacy application. The two solutions here are “least-worst” options:

  • modify our application’s main autoloader to allow for one or more third party libraries
  • write an additional autoloader for the third-party library and add it to the SPL autoloader registry stack.

Both of these options are beyond the scope of this book. You will need to examine the library in question, determine its class naming scheme, and come up with appropriate autoloader code on your own.

Finally, in terms of how to organize third-party libraries in the legacy application, it might be wise to consolidate them all to their own central location in the codebase. For example, this might be under a directory called 3rdparty/ or external_libs/. If we move a library, we should move the entire package, not just its class files, so we can upgrade it properly later. This will also allow us to exclude the central third-party directory from our search for include calls so that we don’t get extra search results from files that we don’t want to modify.

What About System-Wide Libraries?

System-wide library collections, like those provided by Horde and PEAR, are a special case of third-party libraries. They are generally located on the server file system outside of the legacy application so they can be available to all applications running on that server. The include statements related to these system-wide libraries generally depend on the include_path settings, or else are referenced by absolute path.

These present a special problem when trying to eliminate include calls that only pull in class and function definitions. If we are lucky enough to be using PEAR-installed libraries, we can modify our existing autoloader to look in two directories instead of one. This is because the PSR-0 naming conventions rise out of the Horde/PEAR conventions. The trailing autoloader code changes from this …

 1 <?php
 2     // convert underscores in the class name to directory separators
 3     $subpath .= str_replace('_', DIRECTORY_SEPARATOR, $class);
 4 
 5     // the path to our central class directory location
 6     $dir   = '/path/to/app/classes'
 7 
 8     // prefix with the central directory location and suffix with .php,
 9     // then require it.
10     require $dir . DIRECTORY_SEPARATOR . $subpath . '.php';
11 ?>

… to this:

 1 <?php
 2     // convert underscores in the class name to directory separators
 3     $subpath .= str_replace('_', DIRECTORY_SEPARATOR, $class);
 4 
 5     // the paths to our central class directory location and to PEAR
 6     $dirs = array('/path/to/app/classes', '/usr/local/pear/php');
 7     foreach ($dirs as $dir) {
 8         $file = $dir . DIRECTORY_SEPARATOR . $subpath . '.php';
 9         if (file_exists($file)) {
10             require $file;
11         }
12     }
13 ?>

For Functions, Can We Use Instance Methods Instead Of Static Methods?

When we consolidated user-defined global functions into classes, we redefined them as static methods. This left their global scope unchanged. If we feel particularly diligent, we can change them from static to instance methods. This involves more work, but in the end it can make testing easier and is a cleaner technical approach. Given our earlier Db example, using instance instead of static methods would look like this:

classes/Db.php


 1 <?php
 2 class Db
 3 {
 4     public function query($query_string)
 5     {
 6         // ... code to perform a query ...
 7     }
 8 
 9     public function getRow($query_string)
10     {
11         // ... code to get the first result row
12     }
13 
14     public function getCol($query_string)
15     {
16         // ... code to get the first column of results ...
17     }
18 }
19 ?>

The only added step when using instance methods instead of static ones is that we need to instantiate the class before calling its methods. That is, instead of this …

1 <?php
2 Db::query(...);
3 ?>

… we would do this:

1 <?php
2 $db = new Db();
3 $db->query(...);
4 ?>

Even though it is more work in the beginning, I recommend instance methods over static ones. Among other things, it gives us a constructor method that can be called on instantiation, and it makes testing easier in many cases.

If you like, you may wish to start by converting to static methods, and then later convert the static methods to instance methods, along with all the related method calls. However, your schedule and preferences will dictate which approach you choose.

Can We Automate This Process?

As I have noted before, this is a tedious, tiresome, and time-consuming process. Depending on the size of the codebase, it may take days or weeks of effort to fully consolidate the classes and functions for autoloading. It would be great if there was some way to automate the process to make it both faster and more reliable.

Unfortunately, I have not yet discovered any tools that make this process easier. As far as I can tell, this kind of refactoring is still best done “by hand” with strong attention to detail. Having obsessive tendencies and long periods of uninterrupted concentration on this task are likely to be of benefit here.

Review and Next Steps

At this point, we have made a big step forward in modernizing our legacy application. We have begun converting from an “include-oriented” architecture to a “class-oriented” one. Even if we later discover a class or function that we missed, that’s OK; we can follow the above process as many times as needed until all definitions have been moved to the central location.

We may still have lots of include statements in the application, but those that remain are related to the application flow, and not to pulling in class and function definitions. Any include calls that remain are executing logic. We can now see the flow of the application much better.

We have put in place a structure for new functionality. Any time we need to add a new behavior, we can place it in a new class, and that class will be autoloaded whenever we need it. We can stop writing new stand-alone functions; instead, we will write new methods on classes. These new methods will be much more amenable to unit tests.

However, the existing classes that we have consolidated for autoloading are likely to have globals and other dependencies in them. This makes them tightly bound to each other and difficult to write tests for. With that in mind, the next step is to examine the dependencies in our existing classes, and attempt to break those dependencies to improve the maintainability of our application.

2. Replace Includes In Classes

Even though we have Model View Controller separation now, we may still have many include calls in our classes. We want our legacy application to be free from the artifacts of its include-oriented heritage, where merely including a file causes logic to be executed. To do so, we will need to replace include calls with method calls throughout our classes.

For the purposes of this chapter, we will use the term include to cover not just include but also require, include_once, and require_once.

Embedded include Calls

Let’s say we extracted some action logic with an embedded include to a Controller method. The code receives information on a new user, calls an include to perform some common validation functionality, and then deals with success or failure of validation.

classes/Controller/NewUserPage.php


 1 <?php
 2     public function __invoke()
 3     {
 4         // ...
 5         $user = $this->request->post['user'];
 6         include 'includes/validators/validate_new_user.php';
 7         if ($user_is_valid) {
 8             $this->user_transactions->addNewUser($user);
 9             $this->response->setVars('success' => true);
10         } else {
11             $this->response->setVars(array(
12                 'success' => false,
13                 'user_messages' => $user_messages
14             ));
15         }
16 
17         return $this->response;
18     }
19 ?>

Here is an example of what the included file might look like.

includes/validators/validate_new_user.php


 1 <?php
 2 $user_messages = array();
 3 $user_is_valid = true;
 4 
 5 if (! Validate::email($user['email'])) {
 6     $user_messages[] = 'Email is not valid.';
 7     $user_is_valid = false;
 8 }
 9 
10 if (! Validate::strlen($foo['username'], 6, 8)) {
11     $user_messages[] = 'Username must be 6-8 characters long.';
12     $user_is_valid = false;
13 }
14 
15 if ($user['password'] !== $user['confirm_password']) {
16     $user_messages[] = 'Passwords do not match.';
17     $user_is_valid = false;
18 }
19 ?>

Let us ignore for now the specifics of the validation code. The point here is that the include file and any code using it are both tightly coupled to each other. Any code using the file has to initialize a $user variable before including it. Any code using the file also has an expectation of getting two new variables introduced into its scope ($user_messages and $user_is_valid).

We want to decouple this logic so that the logic in the include file does not intrude on the scope of the class methods in which is it used. We do this by extracting the logic of the include file to a class of its own.

The Replacement Process

The difficulty of extracting includes to their own classes depends on the number and complexity of the include calls remaining in our class files. If there are very few includes and they are relatively simple, the process will be easy to complete. If there are many complex interdependent includes, the process will be relatively difficult to work through.

In general, the process is as follows:

  1. Search the classes/ directory for an include call in a class.
  2. For that include call, search the entire codebase to find how many times the included file is used.
  3. If the included file is used only once, and only in that one class …
    1. Copy the contents of the included file code as-is directly over the include call.
    2. Test the modified class, and delete the include file.
    3. Refactor the copied code so that it follows all our existing rules: no globals, no new, inject dependencies, return instead of output, and no include calls.
  4. If the included file is used more than once …
    1. Copy the contents of the included file as-is to a new class method.
    2. Replace the discovered include call with inline instantiation of the new class and invocation of the new method.
    3. Test the class in which the include was replaced to find coupled variables; add these to the new method signature by reference.
    4. Search the entire codebase for include calls to that same file, and replace each with inline instantiation and invocation; spot check modified files and test modified classes.
    5. Delete the original include file; unit test and spot check the entire legacy application.
    6. Write a unit test for the new class, and refactor the new class so that it follows all our existing rules: no globals, no superglobals, no new, inject dependencies, return-not-output, and no includes.
    7. Finally, replace each inline instantiation of the new class in each of our class files with dependency injection, testing along the way.
  5. Commit, push, notify QA.
  6. Repeat until there are no include calls in any of our classes.

Search For include Calls

First, as we did in a much earlier chapter, we use our project-wide search facility to find include calls. In this case, search only the classes/ directory with the following regular expression:

    ^[ \t]*(include|include_once|require|require_once)

This should give us a list of candidate include calls in the classes/ directory.

We pick a single include file to work with, then search the entire codebase for other inclusions of the same file. For example, if we found this candidate include

1 <?php
2     require 'foo/bar/baz.php';
3 ?>

… we would search the entire codebase for include calls to the file name baz.php:

    ^[ \t]*(include|include_once|require|require_once).*baz\.php

We search only for the file name because, depending on where the include call is located, the relative directory paths might lead to the same file. It is up to us to determine which of these include calls reference the same file.

Once we have a list of include calls that we know lead to the same file, we count the number of calls that include that file. If there is only one call, our work is relatively simple. If there is more than one call, our work is more complex.

Replacing A Single include Call

If a file is used as the target of an include call only once, it is relatively easy to remove the include.

First, we copy the entire contents of the include file. We move back to the class where the include occurs, delete the include call, and paste the entire contents of the include file in its place.

Next, we run the unit tests for the class to make sure it still works properly. If they fail, we rejoice! We have found errors to be corrected before we continue. If they pass, we likewise rejoice, and move on.

Now that the include call has been replaced, and the file contents have been successfully transplanted to the class, we delete the include file. It is no longer needed.

Finally, we can return to our class file where the newly transplanted code lives. We refactor it according to all the rules we have learned so far: no globals or superglobals, no use of the new keyword outside of factories, inject all needed dependencies, return values instead of generating output, and (recursively) no include calls. We run our unit tests along the way to make sure we do not break any pre-existing functionality.

Replacing Multiple include Calls

If a file is used as the target of multiple include calls, it will take more work to replace them.

Copy include File To Class Method

First, we will copy the include code to a class method of its own. To do this, we need to pick a class name appropriate to the purpose of the included file. Alternatively, we may name the class based on the path to the included file so we can keep track of where the code came from originally.

As for the method name, we again pick something appropriate to the purpose of the include code. Personally, if the class is going to contain only a single method, I like to co-opt the __invoke() method for this. However, if there end up being multiple methods, we need to pick a sensible name for each one.

Once we have picked a class name and method, we create the new class in the proper file location, and copy the include code directly into the new method. (We do not delete the include file itself just yet.)

Replace The Original include Call

Now that we have a class to work with, we go back to the include call we discovered in our search, replace it with an inline instantiation of the new class, and invoke the new method.

For example, say the original calling code looked like this:

Calling Code


1 <?php
2     // ...
3     include 'includes/validators/validate_new_user.php';
4     // ...
5 ?>

If we extracted the include code to a Validator\NewUserValidator class as its __invoke() method body, we might replace the include call with this:

Calling Code


1 <?php
2     // ...
3     $validator = new \Validator\NewUserValidator;
4     $validator->__invoke();
5     // ...
6 ?>

Using inline instantiation in a class violates one of our rules regarding dependency injection. We do not want to use the new keyword outside of factory classes. We do so here only to facilitate the refactoring process. Later, we will replace this inline instantiation with injection.

Discover Coupled Variables Through Testing

We have now successfully decoupled the calling code from the include file, but this leaves us with a problem. Because the calling code executed the include code inline, the variables needed by the newly-extracted code are no longer available. We need to pass into the new class method all the variables it needs for execution, and to make its variables available to the calling code when the method is done.

To do so, we run the unit tests for the class that called the include. The tests will reveal to us what variables are needed by the new method. We can then pass these into the method by reference. Using a reference makes sure that both blocks of code are operating on the exact same variables, just as if the include was still being executed inline. This minimizes the number of changes we need to make to the calling code and the newly extracted code.

For example, say we have extracted the code from an include file to this class and method:

classes/Validator/NewUserValidator.php


 1 <?php
 2 namespace Validator;
 3 
 4 class NewUserValidator
 5 {
 6     public function __invoke()
 7     {
 8         $user_messages = array();
 9         $user_is_valid = true;
10 
11         if (! Validate::email($user['email'])) {
12             $user_messages[] = 'Email is not valid.';
13             $user_is_valid = false;
14         }
15 
16         if (! Validate::strlen($foo['username'], 6, 8)) {
17             $user_messages[] = 'Username must be 6-8 characters long.';
18             $user_is_valid = false;
19         }
20 
21         if ($user['password'] !== $user['confirm_password']) {
22             $user_messages[] = 'Passwords do not match.';
23             $user_is_valid = false;
24         }
25     }
26 }
27 ?>

When we test the class that calls this code in place of an include, the tests will fail, because the $user value is not available to the new method, and the $user_messages and $user_is_valid variables are not available to the calling code. We rejoice at the failure, because it tells us what we need to do next! We add each missing variable to the method signature by reference:

classes/Validator/NewUserValidator.php


1 <?php
2     public function __invoke(&$user, &$user_messages, &$user_is_valid)
3 ?>

We then pass the variables to the method from the calling code:

classes/Validator/NewUserValidator.php


1 <?php
2     $validator->__invoke($user, $user_messages, $user_is_valid);
3 ?>

We continue running the unit tests until they all pass, adding variables as needed. When all the tests pass, we rejoice! All the needed variables are now available in both scopes, and the code itself will remain decoupled and testable.

Not all variables in the extracted code may be needed by the calling code, and vice versa. We should let the unit testing failures guide us as to which variables need to be passed in as references.

Replace Other include Calls And Test

Now that we have decoupled our original calling code from the include file, we need to decouple all other remaining code from the same file. Given our earlier search results, we go to each file and replace the relevant include call with an inline instantiation of the new class. We then add a line that calls the new method with the needed variables.

Note that we may be replacing code within classes, or within non-class files such as view files. If we replace code in a class, we should run the unit tests for that class to make sure the replacement does not break anything. If we replace code in a non-class file, we should run the test for that file if it exists (such as a view file test), or else spot check the file if no tests exist for it.

Delete The include File And Test

Once we have replaced all include calls to the file, we delete the file. We should now run all of our tests and spot checks for the entire legacy application to make sure that we did not miss an include call to that file. If a test or spot check fails, we need to remedy it before continuing.

Write A Test And Refactor

Now that the legacy application works just as it used to before we extracted the include code to its own class, we write a unit test for the new class.

Once we have a passing unit test for the new class, we refactor the code in that class according to all the rules we have learned so far: no globals or superglobals, no use of the new keyword outside of factories, inject all needed dependencies, return values instead of generating output, and (recursively) no include calls. We continue to run our tests along the way to make sure we do not break any pre-existing functionality.

Convert To Dependency Injection and Test

When the unit test for our newly refactored class passes, we proceed to replace all our inline instantiations with dependency injection. We do so only in our class files; in our view files and other non-class files, the inline instantiation is not much of a problem

For example, we may see this inline instantiation and invocation in a class:

classes/Controller/NewUserPage.php


 1 <?php
 2 namespace Controller;
 3 
 4 class NewUserPage
 5 {
 6     // ...
 7 
 8     public function __invoke()
 9     {
10         // ...
11         $user = $this->request->post['user'];
12 
13         $validator = new \Validator\NewUserValidator;
14         $validator->__invoke($user, $user_messages, $user_is_valid);
15 
16         if ($user_is_valid) {
17             $this->user_transactions->addNewUser($user);
18             $this->response->setVars('success' => true);
19         } else {
20             $this->response->setVars(array(
21                 'success' => false,
22                 'user_messages' => $user_messages
23             ));
24         }
25 
26         return $this->response;
27     }
28 }
29 ?>

We move the $validator to a property injected via the constructor, and use the property in the method:

classes/Controller/NewUserPage.php


 1 <?php
 2 namespace Controller;
 3 
 4 class NewUserPage
 5 {
 6     // ...
 7 
 8     public function __construct(
 9         \Mlaphp\Request $request,
10         \Mlaphp\Response $response,
11         \Domain\Users\UserTransactions $user_transactions,
12         \Validator\NewUserValidator $validator
13     ) {
14         $this->request = $request;
15         $this->response = $response;
16         $this->user_transactions = $user_transactions;
17         $this->validator = $validator;
18     }
19 
20     public function __invoke()
21     {
22         // ...
23         $user = $this->request->post['user'];
24 
25         $this->validator->__invoke($user, $user_messages, $user_is_valid);
26 
27         if ($user_is_valid) {
28             $this->user_transactions->addNewUser($user);
29             $this->response->setVars('success' => true);
30         } else {
31             $this->response->setVars(array(
32                 'success' => false,
33                 'user_messages' => $user_messages
34             ));
35         }
36 
37         return $this->response;
38     }
39 }
40 ?>

Now we need to search the codebase and replace every instantiation of the modified class to pass the new dependency object. We run our tests as we go to make sure everything continues to operate properly.

Commit, Push, Notify QA

At this point we have either replaced a single include call, or multiple include calls to the same file. Because we have been testing along the way, we can now commit our new code and tests, push it all to common repository, and notify QA that we have new work for them to review.

Do … While

We begin again by searching for the next include call in a class file. When all include calls have been replaced by class method invocations, we are done.

Common Questions

Can One Class Receive Logic From Many include Files?

In the examples, we show the include code being extracted to a class by itself. If we have many related include files, it may be reasonable to collect them into the same class, each with their own method name. For example, the NewUserValidator logic might be only one of many user-related validators. We can reasonably imagine the class renamed as UserValidator with such methods as validateNewUser(), validateExistingUser(), and so on.

What About include Calls Originating In Non-Class Files?

In our search for include calls, we look only in the classes/ directory for the originating calls. It is likely that there are include calls that originate from other locations as well, such as the views/.

For the purposes of our refactoring, we don’t particularly care about include calls that originate outside our classes. If an include is called only from non-class files, we can safely leave that include in its existing state.

Our main goal here is to remove include calls from class files, not necessarily from the entire legacy application. At this point, it is likely that most or all include calls outside our classes are part of the presentation logic anyway.

Review and Next Steps

After we have extracted all the include calls from our classes, we will have finally removed one of the last major artifacts of our legacy architecture. We can load a class without any side effects, and logic is executed only as a result of invoking a method. This is a big step forward for us.

We can now begin paying attention to overarching end-to-end architecture of our legacy application.

As things stand now, the entire legacy application is still located in the web server document root. Users browse to each page script directly. This means that the URLs are coupled to the file system. In addition, each page script has quite a bit of repeated logic: load a setup script, instantiate a controller using dependency injection, invoke the controller, and send the response.

Our next major goal, then, is to begin using a Front Controller in our legacy application. The front controller will be composed of some bootstrapping logic, a router, and a dispatcher. This will decouple our application from the file system and allow us to start removing our page scripts entirely.

But before we do so, we need to separate the public resources in our application from the non-public resources.