How PHP Executes – from Source Code to Render – SitePoint

This textual content material was peer reviewed by Younes Rafie. As a Outcome of of all of SitePoint’s peer reviewers for making SitePoint content material Definitely one of the biggest It Might be!

Impressed by a current article on how Ruby code executes, this article covers the execution course of for PHP code.

Introduction

…….

npressfetimg-6082.png

This textual content material was peer reviewed by Younes Rafie. As a Outcome of of all of SitePoint’s peer reviewers for making SitePoint content material Definitely one of the biggest It Might be!


Impressed by a current article on how Ruby code executes, this article covers the execution course of for PHP code.

Introduction

Tright here’s Tons Occurring beneath the hood As quickly as we execute A bit of PHP code. Broadly talking, the PHP interpreter goes by way of 4 levels when executing code:

  1. Lexing
  2. Parsing
  3. Compilation
  4. Interpretation

This textual content material will skim by way of these levels and current how We will view the output from every stage To actually see What Goes on on. Notice that wright hereas A pair of of the extensions used ought to already be An factor of your PHP set up (Similar to tokenizer and OPcache), others Might Want to be manually put in and enabled (Similar to php-ast and VLD).

Stage 1 – Lexing

Lexing (or tokenizing) is The tactic of fliping a string (PHP supply code, On this case) Proper into a sequence of tokens. A token Is simply a identifyd identifier for The worth it has matched. PHP makes use of re2c to generate its lexer from the zend_language_scanner.l definition file.

We can see the output of the lexing stage by way of the tokenizer extension:

$code = <<<'code'
<?php
$a = 1;
code;

$tokens = token_get_all($code);

forevery ($tokens as $token) {
    if (is_array($token)) {
        echo "Line {$token[2]}: ", token_identify($token[0]), " ('{$token[1]}')", PHP_EOL;
    } else {
        var_dump($token);
    }
}

Outputs:

Line 1: T_OPEN_TAG ('<?php
')
Line 2: T_VARIABLE ('$a')
Line 2: T_WHITESPACE (' ')
string(1) "="
Line 2: T_WHITESPACE (' ')
Line 2: T_LNUMBER ('1')
string(1) ";"

Tright here’s A pair of noteworthy factors from the above output. The primary level is that not all gadgets of the supply code are identifyd tokens. Instead, some symbols are althought-about tokens in and of themselves (Similar to =, ;, :, ?, and so on). The second level is thOn the lexer truly does Barely Greater than merely output a stream of tokens. It furtherly, Usually, shops the lexeme (The worth matched by the token) and The road Number of the matched token (which is used for issues like stack traces).

Stage 2 – Parsing

The parser May even be generated, this time with Bison by way of a BNF grammar file. PHP makes use of a LALR(1) (look forward, left-to-right) contextual content material-free grammar. The look forward half merely means thOn the parser Is in a place to look n tokens forward (1, On this case) to resolve ambiguities it may encounter wright hereas parsing. The left-to-right half Signifies that it parses the token stream from left-to-right.

The generated parser stage takes the token stream from the lexer as enter and has two jobs. It firstly verifies the legitimateity of the token order by Attempting to match them in the direction of any Definitely one of many grammar guidelines outlined in its BNF grammar file. This ensures that legitimate language constructs are being shaped by the tokens Inside the token stream. The second job of the parser is to generate the abstract syntax tree (AST) – a tree view of the supply code that Shall be used Through The subsequent stage (compilation).

We can view A Sort of the AST produced by the parser using the php-ast extension. The within AST Isn’t immediately uncovered Because it is not notably “clear” to work with (When it Includes consistency and widespread usability), and so the php-ast extension carry outs a few transformations upon it to make it nicer to work with.

Let’s Take a Take A look On the AST for a rudimentary piece of code:

$code = <<<'code'
<?php
$a = 1;
code;

print_r(astparse_code($code, 30));

Output:

astNode Object (
    [type] => 132
    [flags] => 0
    [lineno] => 1
    [youngsters] => Array (
        [0] => astNode Object (
            [type] => 517
            [flags] => 0
            [lineno] => 2
            [youngsters] => Array (
                [var] => astNode Object (
                    [type] => 256
                    [flags] => 0
                    [lineno] => 2
                    [youngsters] => Array (
                        [identify] => a
                    )
                )
                [expr] => 1
            )
        )
    )
)

The tree nodes (That are typically of type astNode) have a quantity of properties:

  • type – An integer worth to depict the node type; every has a corresponding fixed (e.g. AST_STMT_LIST => 132, AST_ASSIGN => 517, AST_VAR => 256)
  • flags – An integer that specifies overloaded behaviour (e.g. an astAST_BINARY_OP node will have flags To distinguish which binary operation Is occurring)
  • lineno – The line quantity, as seen from the token information earlier
  • youngsters – sub nodes, typically parts of the node damaged dpersonal further (e.g. a carry out node will have The youngsters: parameters, reflip type, physique, and so on)

The AST output of this stage is useful to work off of for devices Similar to static code analysers (e.g. Phan).

Stage 3 – Compilation

The compilation stage consumes the AST, wright here it emits opcodes by recursively traversing the tree. This stage furtherly carry outs a few optimizations. These embrace resolving some carry out calls with literal arguments (Similar to strlen("abc") to int(3)) and folding fixed mathematical expressions (Similar to 60 * 60 * 24 to int(86400)).

We can look at the opcode output at this stage in Pretty a few methods, collectively with with OPcache, VLD, and PHPDBG. I’m going To make the most of VLD for this, since I really feel the output is extra nice To take a Take A look at.

Let’s see whOn the output is for The subsequent file.php script:

if (PHP_VERSION === '7.1.0-dev') {
    echo 'Yay', PHP_EOL;
}

Executing The subsequent command:

php -dopcache.enable_cli=1 -dopcache.optimization_diploma=0 -dvld.lively=1 -dvld.execute=0 file.php

Our output is:

line     #* E I O op                           fand so onh          ext  reflip  operands
-------------------------------------------------------------------------------------
   3     0  E > > JMPZ                                                     <true>, ->3
   4     1    >   ECHO                                                     'Yay'
         2        ECHO                                                     'Percent0A'
   7     3    > > RETURN                                                   1

The opcodes Kind of resemble The distinctive supply code, enough to Adjust to Together with The important operations. (I’m not going to delve into The small print of opcodes On this article, since Which may take a quantity of complete articles in itself.) No optimizations have been utilized On the opcode diploma Inside the above script – however as We will see, the compilation half has made some by resolving the fixed situation (PHP_VERSION === '7.1.0-dev') to true.

OPcache does Greater than merely caching opcodes (thus bypassing the lexing, parsing, and compilation levels). It furtherly packs with it Many numerous levels of optimizations. Let’s flip up the optimization diploma to 4 passes to see what comes out:

Command:

php -dopcache.enable_cli=1 -dopcache.optimization_diploma=1111 -dvld.lively=-1 -dvld.execute=0 file.php

Output:

line     #* E I O op                           fand so onh          ext  reflip  operands
-------------------------------------------------------------------------------------
   4     0  E >   ECHO                                                     'YayPercent0A'
   7     1      > RETURN                                                   1

We can see thOn the fixed situation has been eliminated, and The two ECHO instructions have been compacted Proper into a single instruction. These are Solely a Sort of The numerous optimizations OPcache applies when carry outing passes over the opcodes of a script. I gained’t Bear The numerous optimization levels On this article although, since Which May even be an article in itself.

Stage 4 – Interpretation

The final stage is the interpretation of the opcodes. That is wright here the opcodes are run on the Zend Engine (ZE) VM. Tright here’s truly Little or no to say about this stage (from a extreme-diploma perspective, A minimal of). The output Is almost no matter your PHP script outputs by way of instructions Similar to echo, print, var_dump, And so forth.

So Rather than digging into something complicated at this stage, right here’s a nice actuality: PHP requires itself as a dependency when producing its personal VM. It is Since the VM is generated by a PHP script, As a Outcome of of it being simpler To write dpersonal and simpler To take care of.

Conclusion

We’ve taken A quick look by way of the 4 levels thOn the PHP interpreter goes by way of when working PHP code. This has involved using numerous extensions (collectively with tokenizer, php-ast, OPcache, and VLD) To regulate And think about the output Of every stage.

I hope this article has helped To Supply you A greater holistic beneathstanding of PHP’s interpreter, As properly as to proven the significance of the OPcache extension (For every its caching and optimization talents).

Source: https://www.sitepoint.com/how-php-executes-from-source-code-to-render/