Narendra Dhami

My Site

Practical Uses for the PHP Tokenizer

Posted by Narendra Dhami on August 20, 2008

In this article we take a look at the PHP tokenizer and its potential at analyzing and processing PHP source code. We will build several working examples, which you can start using and extending for your own purposes.


When PHP has to process a request, the engine goes through several passes of parsing until the code is expressed as a set of instructions that the interpreter can execute. The first such step is “lexical scanning”, which splits the code into smaller strings called “tokens”. The token is the smallest meaningful unit of your source code, and it can represent a reserved word (for, while, class, if, etc.), operator (+, -, *, /, && etc.), value literals (integers, floats, strings etc.) and other special symbols.

The same lexical scanner which PHP uses, is also available to userspace PHP developers via the function token_get_all(). It is very simple to use: you pass your PHP source code as text, and it returns an array of tokens, that we will further process in the examples of this article.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: