affiliate_link

Tuesday, September 1, 2009

Improving Performance by Profiling PHP Applications

One of the best things about programming in PHP is how easy it is to learn the language and its libraries. Everything seems to fit together and you can almost guess how to do a specific task, even not knowing much about the function that you need to use. A lot of times you will just try doing something and be surprised that the language has that specific feature.




All of this is very nice, but it takes a little while to get to know PHP's tricks, especially those related to performance and memory consumption. There are lots of subtle things in PHP that can save a lot of memory and give a small extra boost in performance. This article serves as an introduction to profiling PHP applications, and to making changes to a script and comparing the values of the optimizations.
To quote the dictionary, a profile is a formal summary or analysis of data, often in the form of a graph or table, representing distinctive features or characteristics. By placing timers within the code and running them over and over, we can obtain a profile of how fast or slow specific areas of a program are executed. This data can be used to discover bottlenecks and other areas that can be optimized to improve the program's performance.
Assuming you have read my series of articles about PEAR, the concept of profiling PHP scripts is covered in a simple manner by a PEAR library. We will use this library to create examples of profiling scenarios, as this is the easiest and most compatible way to analyze your current code. You don't even have to use a commercial product or compile an extension to profile your code.
The library name is PEAR::Benchmark and it is very useful to profile and do quick benchmarks on your code set. One of the classes available in this library is named Benchmark_Timer() and tracks the time difference from one function call to the next. When you are done with the benchmark, you can get verbose result of the script execution. It is as simple as this:
include_once("Benchmark/Timer.php");
$bench = new Benchmark_Timer;

$bench->start();
$bench->setMarker('Start of the script');

// now sleep for a few seconds
sleep(5);

$bench->stop();

// get the profiling information from this timer
print_r($bench->getProfiling());
?>
This script will output something like this:
Array
(
    [0] => Array
        (
            [name] => Start
            [time] => 1013214253.05751200
            [diff] => -
            [total] => 0
        )

    [1] => Array
        (
            [name] => Start of the script
            [time] => 1013214253.05761100
            [diff] => 9.8943710327148E-05
            [total] => 9.8943710327148E-05
        )

    [2] => Array
        (
            [name] => Stop
            [time] => 1013214258.04920700
            [diff] => 4.9915959835052
            [total] => 4.9916949272156
        )

)
That might seem like a bunch of weird numbers, but it is very useful if used in a bigger scale.

Dissecting the Numbers

As most people will guess, the topmost array entries are the actual calls to the Benchmark_Timer() class, like $bench->start(), $bench->setMarker(), and $bench->stop(). The number associated with each of these array entries is pretty straightforward. Let's look closer at them:
    [0] => Array
        (
            [name] => Start
            [time] => 1013214253.05751200
            [diff] => -
            [total] => 0
        )
The entry time refers to the UNIX timestamp of when the call to the start() method of Benchmark_Timer() was done. The diff item shows the difference in time between this call and the previous one -- since there is no previous one, it shows a dash. The total item is the total time of execution since the beginning of the benchmark until this specific method call. Seeing the output of the next array entry is useful as well.
    [1] => Array
        (
            [name] => Start of the script
            [time] => 1013214253.05761100
            [diff] => 9.8943710327148E-05
            [total] => 9.8943710327148E-05
        )
We can see from these numbers that the call to $bench->setMarker(....) was made 9.8943710327148E-05 seconds after the call to $bench->start(). No need to go after a calculator, that means 0.0000989 seconds. :)

A Benchmark Anecdote

The example was good, but it is really not a good example of how to make decisions related to your site, and especially to your site code. I will use a personal experience that I had while working as a contractor for a company that had a huge Web community on a niche Web site and was having performance problems.

I really didn't understand much of the code because it was written along the years with specific needs -- one block of include files relating to the site translation code, another one to log the site's usage, and so on. We both knew (me and the main programmer of that site) that something needed to be optimized, but didn't know what the problem was.
To shorten the story, I ended up going into the main script and adding several lines of $bench->setMarker() all over the script and its included files. We then analyzed the output of $bench->getProfiling() and were surprised by the outcome. It seemed that the problem was actually related to a function call that was being used hundreds of times on every page to get the translation code for a specific language name (i.e. "en" for "english"). Every time the function was called, the script would do a query to a MySQL database to get the real name from a table.

We ended up creating a cache system for this type of information. In two days we had a huge performance boost, and consequently the number of page views increased by 40% in the first week. All in all, this serves as an interesting example of how profiling your code can help on the general health of your Web application or Web site.

Benchmarking Function Calls

While the Benchmark_Timer() class is very useful in profiling the code on a script or page (and its included files), it is not very scientific, since you have to reload the script several times to get the mean of the profiled values, and it is not very specific to a class or function call.
The other class in the PEAR::Benchmark library, called Benchmark_Iterator, is intended to tackle this very issue -- running a series of executions for a very specific function or class method and displaying its profiling information. Its purpose is to get a consistent response from the benchmarks, since everybody knows that running a script one time and receiving an execution time of 10 seconds doesn't mean that it always performs like that.

In any case, let's see some examples:
// code to connect to the database here
include_once("DB.php");
$dsn = array(
    'phptype'  => 'mysql',
    'hostspec' => 'localhost',
    'database' => 'database_name',
    'username' => 'user_name',
    'password' => 'password'
);
$dbh = DB::connect($dsn);

function getCreatedDate($id)
{
    global $dbh;

    $stmt = "SELECT created_date FROM users WHERE id=$id";
    // as always, let's use PEAR::DB here
    $created_date = $dbh->getOne($stmt);
    if ((PEAR::isError($created_date)) || 
     (empty($created_date))) {
        return false;
    } else {
        return $created_date;
    }
}

include_once 'Benchmark/Iterate.php';
$bench = new Benchmark_Iterate;

// run the getDate function 10 times
$bench->run(10, 'getCreatedDate', 1);

// print the profiling information
print_r($bench->get());
?>
The code above will generate output similar to this:
Array
(
    [1] => 0.055413007736206
    [2] => 0.0012860298156738
    [3] => 0.0010279417037964
    [4] => 0.00093603134155273
    [5] => 0.00094103813171387
    [6] => 0.00092899799346924
    [7] => 0.0010659694671631
    [8] => 0.00096404552459717
    [9] => 0.0010690689086914
    [10] => 0.00093603134155273
    [mean] => 0.0064568161964417
    [iterations] => 10
)
This should be pretty self explanatory -- the mean item shows the average value for all 10 iterations of the getCreatedDate() function. The benchmark should probably run for more than a thousand iterations, but this example is good enough :).
I hope that with this article I've given some insight into practical ideas of how to quickly profile PHP scripts. Please be aware that analyzing your code is not something simple, as you need to know a lot about the language's specific features. However, adding timers while testing your code will help you isolate the sluggish functions and using iterations will help test them until you figure out the correct optimizations.

Reference: http://www.oreillynet.com/pub/a/php/2002/02/28/profilingphp.html?page=2

No comments: