8. Performance & stability - profiling and improving
Or, “Why is PHP sooooo slow” (hint - it isn’t, mostly)
PHP script performance (a term which we use to encompass indicators such as speed of execution and usage of resources) is an issue for both PHP based web sites as well as other applications written in PHP. This chapter looks at the issues affecting PHP performance in general, specific performance considerations for non-web applications, and the tools and resources available for solving performance related problems. We also look at stability of long running scripts, which is often tied closely to performance issues.
8.1 The background on performance
PHP is generally considered a “scripting” or “interpreted” language. This means that rather than compiling or transforming source code into machine executable instructions and distributing those as a standalone program (as is usually the case with languages like C), PHP programs are distributed as PHP source code. The user of the software requires the PHP interpreter (also known as the PHP Virtual Machine) to run that source code, with the interpreter converting the PHP source code to machine instructions on the fly as the application executes. This style of execution has upsides and downsides. The main upsides are ease of code updating and deployment (no compilation steps) and fewer architecture issues (write-once-run-anywhere, no need to compile for specific platforms). The main downside is the performance hit of the interpretation stage as the application runs. Modern versions of PHP actually reduce this performance hit by first transforming the code into intermediate “opcodes” which are then executed. Those opcodes don’t need to be re-interpreted every time the same code occurs (e.g. in a loop). For frequently used scripts, such as web scripts and some applications, these opcodes can even be cached between script runs for faster subsequent execution.
PHP is also a “high-level” language, which means that it uses abstractions to hide much of the lower-level detail of the code the computer needs from the user. Instead of dealing with implementation details like memory addressing and call stacks, PHP and other higher level languages present the computer to the programmer with abstract concepts such as variables, arrays and functions. Much of the appeal of a language like PHP is the wide range of in-built functions and operators to perform common tasks, which leads to higher developer productivity. Versatile data structures (like PHP’s array type, which is actually a type of managed ordered map that can be used as a traditional array, a list, a hash table, a dictionary or several other common data structure types) hide details of memory management, simplify access, and provide a rich variety of related manipulation functions. However these abstractions comes at a performance cost. Behind every function, array operation, or disk access are lower level algorithms written in C. The implementation of these algorithms must cater to the “general” case and include all possible uses of the algorithm. Thus they will rarely be as optimal as C algorithms written specifically for your particular need. This level of abstraction again adds an overhead to PHP when compared to lower-level languages like C, although the core PHP developers have invested a lot of time and effort in the last couple of years to optimise, trim, re-factor and otherwise increase the performance of the C back-end to great effect, wildly increasing speed and reducing memory consumption in many common cases. And for those common cases, these pre-built PHP algorithms are often better than those you may be able to write yourself!
The final reason for performance problems is perhaps one of the most prevalent issues but also one of the easiest to fix. And that reason is the PHP programmer. Many developers aren’t aware of how to find and solve performance issues in their own code, often blaming their own failings on PHP itself even where PHP is just as performant as other languages. While PHP takes away a lot of the pain and slog from programming, hiding and dealing with numerous tasks for you, it is still a general purpose programming language and as such it is still perfectly possible to write poorly performing code. Even with an appreciation of the issues, many developers aren’t aware of the tools and resources that are available to help them improve their programs performance.
For those that doubt that it is possible to write high performance systems in PHP, the set of slides in the Further Reading section below provides examples from one company that shows that it’s not just possible, it happens in the real world. Appendix D also shows some of the things that people are doing with PHP, where performance clearly isn’t holding them back.
|
Further Reading “More Than Websites:PHP And The Firehose @ Datasift” by Stuart Herbert |
8.2 Specific issues for general purpose programming
As noted above, both web and non-web PHP scripts can have performance problems. However there are some additional performance related issues to take into account when programming longer running scripts and those without memory restrictions. In the CLI SAPI, as with the traditional web PHP model of programming, PHP manages limits on memory consumption and executes garbage collection on your behalf. This works well on the web where the shorter lives & lower resource intensiveness of many scripts mean limits and management processes are rarely noticed. However when you program a general purpose application you will often want to remove the imposed memory limits to allow your application to consume all the memory it needs on different systems with different amounts of memory. Indeed, the default configuration for the CLI SAPI now sets the memory and execution time limits to 0, unless you specify otherwise. This transfers the burden of managing and limiting memory usage to your script. Likewise, in longer running, resource intensive and response-time-sensitive scripts garbage collection requires a different approach to avoid unwanted blocking or unnecessary conservancy. This involves the manual management of the garbage collection process within your scripts.
Inefficient programming and algorithm design on your part as the programmer can also be more noticeable than it is on the web. The time that an algorithm takes when generating a web page can get lost in the other time overheads of transmitting and displaying a page, but when the exact same algorithm is executed on the command line a noticeable pause may be visible. Responsiveness to a user is critical on the web, but also very noticeable to local users as well.
The rest of this chapter looks at how to profile and manage the performance of your scripts and the resources they use, including strategies to target the problems described above.
8.3 Profile, profile, profile!
We’ve all come across slow running scripts (often our own!), and usually the first response is to start looking up ways to increase PHP’s speed. Compiling, caching, re-factoring code, accelerators; these are topics that googling for PHP performance or speed issues will readily turn up, we may have already read about them, and then we dive right in to try them out. My advice, (derived from bitter personal experience), is to STOP RIGHT NOW. Throwing performance trick after performance trick against your code (often that you find online or in good books like these), even where they appear sensible and you can see the logic, can end up complicating your code or adding additional dependencies for no good reason. Why? because when we don’t know what the root cause of the problem is, we don’t know if a particular solution, no matter how good on paper, will actually addresses the issue we are having in this particular case. And even if it does appear to work, we don’t know if it was the simplest way to fix it, and thus whether we’re saddling ourselves with extra “technical debt” when we don’t have to.
The step we often miss out is to ask our script directly : “why are you running so slowly?”. If our script tells us we can then attempt to fix the issue directly without the use of external tools like compilers and caching systems. So how do we ask our script the “why” question? By “profiling” it.
A profiler, if you’re not familiar with the term, watches a piece of software (usually from the “inside”) as it runs and breaks down the time (and sometimes resources) that each part of the program uses. The profile information is often reported down to the level of an individual line of code or function call. This helps us spot exactly where our scripts are slowing down. Is it that complex database query? A badly written loop? A function that’s called more times than expected? Disk or network access pausing execution? What ever the problem, the profiler will tell us. Once we know exactly what the cause of our slowdown is, the solution to the problem is usually apparent (or at the very least we can rule out potential solutions that won’t actually fix it). It may just mean re-writing a few lines of code or caching some data instead of repeatedly generating it. It may point out problems external to PHP, such as a slow database server, or laggy network connection or resource. Of course in some cases it may end up being an intractable problem from a PHP programming point of view that does indeed require the help of an accelerator or external caching system. In any case, we will likely save time and prevent making unnecessary changes to our code or deployment environment by using a profiler to ask the “why” before we start trying the “what”.
With PHP you have several choices when it comes to profiling. You can manually profile your code by adding profiling/measuring statements directly to your code base, or you can use one of a number of tools to automatically profile your code for you. The former is quick and easy to do with no changes to your development environment, if you know roughly where in your code the problem lies. The latter, whilst requiring the setup and configuration of the tools, and learning how to use them the first time, provides more comprehensive profiling. It also doesn’t rely on you knowing where your problems may be located, and usually requires minimal or no changes to your code base. We’ll look at both options below.
8.4 Manual profiling
Manual profiling entails adding additional code to your source to measure time or resources directly from within the scripts. The following is an example of measuring execution time of different lines of code.
8.5 Profiling tools
8.6 Low level profiling
8.7 Profiling - the likely results
Profiling can reveal an inordinate number of different performance problems. However, the following are some of most common types of problems discovered when profiling PHP, along with some strategies for addressing them :
8.8 Silver bullets
Silver bullets are “solutions” that you can “throw” at your script, which will hopefully speed them up without you having to think too much about the cause of the slowdown. In the first version of this section that I wrote, the words “silver bullet” were always written with a question mark after them. I took it out as it cluttered the place up, but the point was that there are no guaranteed, universal silver bullets for increasing performance in PHP. Each of the oft-called “performance solutions” below have downsides, aren’t suitable for everyone, and require a reasonable amount of thought to implement. Nevertheless they can be useful, particularly when you have improved your script as much as you can manually and you are stretching the capabilities of PHP or the available hardware.
8.9 Silver bullet #1 - Better hardware
If funding allows, sending out for a beefier server or desktop can often produce instant performance gratification. In some cases hardware is cheaper than developer time, so it’s a no-brainer. However, there are some downsides, particularly for those who are cash-challenged:
8.10 Silver bullet #2 - Newer PHP versions
8.11 Silver bullet #3 - Opcode caching
8.12 Silver bullet #4 - Compiling
8.13 Silver bullet #5 - JIT compilers and alternative Virtual Machines
8.14 The SPL - Standard PHP Library
8.15 Garbage collection
8.16 Multi-threading and concurrent programming in PHP
8.17 Big data and PHP - MapReduce
8.18 Data caching
8.19 Know thy functions
8.20 Outsourcing code to other languages
8.21 Other performance tips and tricks
8.22 Stability and performance of long running processes
It’s not just performance that can be affected by the issues raised in this chapter. Stability of your scripts is also affected by the poor management of resources and non-optimal algorithms. By stability, we mean, essentially, “crashes”. PHP was initially “designed to die”, that is, it was designed for shorter running scripts where the prospect of still being running many minutes later was not a big issue, let alone many hours or days later. That said, if you program carefully there is no reason why you can’t create scripts (usually daemons) that can run pretty much indefinitely.
The key areas to think about when designing a stable program are :
8.23 Avoid micro and premature optimisations
Now that I’ve convinced you of the need to consider performance and instructed you on the finer arts of speed and resource management in PHP, I’m going to try and convince you to put it to the back of your mind, at least for now. It is easy to get caught up with performance and optimisation and let it affect your workflow and productivity. While it is generally positive to follow good practice and consider the performance of your code at all stages of development, premature optimisation is often just that - premature, and micro-optimisations are not usually worth the paper they are written on.
As your author I have an embarrassing confession to make. Having started programming in PHP many years ago, when the performance difference between using single and double quotes around a string was measurable in some contexts (or at least it was “common knowledge” that it was), I still find myself automatically considering what type of quote to use each time I type out a string, not for syntactical reasons but for performance reasons. That’s despite the fact that the performance difference between the two now-a-days is definitely extremely, extremely negligible. Old habits die hard though, and many of the micro-optimisations that you will find on the web fall into a similar category. Profile your code, and you will quite likely find that a single verbose database call will make the time saved by calling isset() dozens of time to check variables instead of falling back on @ to simply suppress warnings look incredibly small (that’s not to say that there aren’t other good reasons to do that though!). Developer time in writing and thinking about micro-optimisations is typically much more expensive than the extra processor cycles used to execute un-optimised code.
Premature optimisation typically occurs when developers start thinking about how to scale their code, before they’ve got any of the code working and deployed. There is no point working out how to scale code that may never see the light of day because the project over-runs and isn’t delivered. Unless you are sure of the volume of customers/users/data you will be processing, its usually a better idea to get something up and running, as a proof of concept if nothing else, and then scale it when you know that it will be needed. Indeed, scaling methods vary considerably depending on exactly what your code needs to do, and often this is not finalised until you have something up and running. So even if you deliver on time, your scaling efforts may all be in the wrong direction compared to how your project ends up functioning.
“But, but, but…” you cry, your project is different. It may well be, you might be lucky enough to work on a large scale well funded project that has a known high performance requirement, or be developing a tool that is specifically geared towards high performance uses. Congratulations, you can ignore this section (if you’re even at the level where reading this book is useful!). For the rest of us, and this is the majority of PHP projects, concentrate on getting it working and deployed first. Keep performance in mind during development, but don’t let it interfere with your work flow too much. Most projects fail for reasons other than poor performance.
|
Further Reading “PHP: Require/Include vs Autoloader”. An example of an article that espouses an optimisation method that is definitely premature (if worth it ever). The article shows that you can save a whole 6 milliseconds in total, in a script that runs the functions 100 times. That’s not even worth thinking about for most people, who would only call it a hand-full of times at most, and in any case for whom 6 milliseconds isn’t a concern in the slightest. |