< WhatWebWhat Search >

Securing your site against code injections

All Internet applications have to secure their inner workings against attacks from outside. We all know sites that were successfully attacked and modified, often due to some inventive usage of input parameters. The challenge is to prevent that happening to your website.

How to solve this problem? Here I show a well known solution, filtering all input per default. Making it harder to get to the unfiltered input.

This solution is now part of anyMeta.

Click to download: Tainted.php r26521 28.9 KB Click to download: GPL version 2 17.6 KB

Tainted values and PHP

In Ruby we know the concept of tainted variables. Variables are untrusted until mentioned otherwise. An example:

require 'cgi';

$SAFE = 1

cgi = CGI::new("html4")

expr = cgi["field"].to_s

if expr =~ %r{^-+*/\d\seE.()*$}
  expr.untaint
  result = eval(expr)
  # display result back to user...
else
  # display error message...
end

Looks ok, on first sight. However I am a bit worried about the .untaint method. Who can be sure that I didn't make a slight error in the filter? Or who protects against lazy programmers, untainting all variables per default? Of course Ruby protects against this behaviour with the safety levels. In PHP we don't have anything similar.

What to do for PHP?

A solution for PHP is the filter extension developed by Rasmus Lerdorf and Derick Rethans. When enabled, and installed, this extension will populate the _POST, _GET and other superglobals with filtered data. Access to the unfiltered data is done using a special API.

Of course, here also, we can't protect against dangerous sloppy programmers who dig in and always get the unsafe-raw-very-dangerous-dont-use-this data and skip the input filters. We can't protect against these people.

I think the filter solution is brilliant, and a good solution for a problem that is in need to be solved.

Why don't I use it?

Well.. we have to host on different machines. We don't always control the configuration of those servers. So when we rely on the filter extension, how do we protect our site when the filter extension is not there?

I will paint the solution we reached, relying on new object oriented functionalities of PHP 5.

Wrapping the super globals

The idea starts with replacing the super globals (_GET, _POST etc.) with object wrappers. The object wrappers behave like arrays, implementing iterators and indexing. All array values are objects. Per default all values will be filtered to strings. When you need the raw or other formatted version of a variable, then you need to use special methods.

When we have the wrappers we could simply overrule the super globals:

$_GET  = new TaintedArray($_GET);

// This will echo a filtered value of the argument
echo $_GET['q'];

// The TaintedArray wrapper behaves as an array:
echo 'Does index "a" exists? ', array_key_exists('a',$_GET) ? 'Yes' : 'No';

On entering the url: test.php?q=<b>Hello</b> our script will just echo Hello. Effectively preventing the html injection.

Introducing TaintedValue and TaintedArray

We will need two classes, one to wrap around a single value, and another to wrap around arrays. We will call them respectively TaintedValue and TaintedArray.

The TaintedValue object will also have methods to fetch filtered versions of the wrapped data. They are the asSomething methods below. The asSomething methods return false when the wrapped value doesn't confirm to the kind of value you are requesting.

class TaintedValue
{
    const ALLOWQUOTES    = 1;
    const ALLOWHTML      = 2;
    const CHECKDOMAIN    = 4;
    const SCHEMEREQUIRED = 8;
    const HOSTREQUIRED   = 16;
    
    protected $safe;
    protected $raw;
    
    public function __construct ( $value )

    public function asFilepath ()
    public function asFilename ()
    public function asText ( $flags = 0 )
    public function asLine ( $flags = 0 )
    public function asNumber ()
    public function asInt ()
    public function asBoolean ()
    public function asRegexp ( $regexp )
    public function asRegexpReplace ( $regexp, $replace = '' )
    public function asUrl ( $flags = 0 )
    public function asEmail ( $flags = 0 )

    public function filterUrl ( $flags = 0 )
    public function filterEmail ()

    public function __tostring ()
    public function get ()
    public function getRawUnsafe ()
}

A TaintedArray is used to store TaintedValue objects. The TaintedArray must behave like an array, otherwise we loose the transparent application of our wrappers. We derive almost an almost complete transparent replacement by extending the ArrayObject class of PHP 5. The only thing we need is some knowledge about tainted values, so that we are able to access the data in different ways.

class TaintedArray extends ArrayObject
{
    protected $raw;
    protected $tainted;
    
    public function __construct ( $array )

    public function __get ( $key )
    public function __set ( $key, $value )

    public function get ( $key )

    public function offsetSet ( $key, $value )
    public function offsetGet ( $key )
    public function exchangeArray ( $array )
    public function offsetUnset ( $key )
    public function append ( $value )

    public function getRawUnsafe ( $key )
}

I added the __set and __get methods so that we can access the stored TaintedValue objects as if they are attributes of the tainted array. The dangerous method is getRawUnsafe, this method returns the stored data as-is, unfiltered and completely filled with all injection data you can think of.

A more complete example demonstrates what we can do with the Tainted objects.

$_REQUEST  = new TaintedArray($_REQUEST);

// Even the keys are now safe to echo!
foreach ($_REQUEST as $key=> $value)
{
    echo '[',$key, '] = "' . $value . '" ';
    echo 'raw="', nl2br(htmlspecialchars($_REQUEST->$key->getRawUnsafe())), '"<br/>';
}

// Get two request vars, line and url, make sure they are what the names suggest..
$line  = $_REQUEST->line->asLine())
$url   = $_REQUEST->url->asUrl(TaintedValue::SCHEMEREQUIRED|TaintedValue::CHECKDOMAIN);

echo '<br/>line: "', nl2br(htmlentities($line)), '"';
echo '<br/>url: "', nl2br(htmlentities($url)), '"';

TaintedArray is almost an array

We have now some wrappers. Are we done and over with it? Are they completely transparent for our existing code?

Almost. We need one change...

$_REQUEST  = new TaintedArray($_REQUEST);

if (is_array($_REQUEST))  echo "array! ";
if (is_a($_REQUEST, 'ArrayObject')) echo "ArrayObject";

This will echo ArrayObject, because an object is not an array. (Yes, the PHP developers know about this one, it is a feature, not a bug!). So we need a simple wrapper:

function any_is_array ( $a )
{
    return is_array($a) || (is_object($a) && is_a($a, 'ArrayObject'));
}

Now, replace all your is_array() calls with any_is_array() and you are up and running with our Tainted objects!

Building the code for TaintedValue and TaintedArray

We start with a simple wrapper around a value:

class TaintedValue
{
    protected $safe;
    protected $raw;

    public function __construct ( $value )
    {
        $this->raw  = $value;
        $this->safe = some_strict_filter($value);
    }

    public function __tostring ()
    {
        return $this->safe;
    }

    public function get ()
    {
        return $this->safe;
    }

    public function getRawUnsafe ()
    {
        return $this->raw;
    }
}

The TaintedValue objects are stored inside a TaintedArray object, which simulates the behaviour of the super global arrays.

class TaintedArray extends ObjectArray
{
    protected $raw;
    protected $tainted;
    
    public function __construct ( $array )
    {
        parent::__construct(array());

        $this->raw = array();
        foreach ($array as $key => $value)
        {
            $this->offsetSet($key, $value);
        }
    }

}

In the $raw property we will store the raw data, in case our script needs to have access to the unfiltered data. In the $tainted property we store the values wrapped in a TaintedValue or TaintedArray object. Finally, in the ObjectArray we store the filtered version of all the arguments, so that when we request a variable we always get a filtered version, except when we do something special.

Now we can add the offsetSet method for initializing our array, it is pretty straight forward, though we need some extra methods for filtering the array keys and wrapping the array values:

    public function offsetSet ( $key, $value )
    {
        $val = $this->safeValue($value);
        $k   = $this->safeKey($key);

        $this->raw[$k]     = $value;
        $this->tainted[$k] = $val;

        if (get_class($val) == 'TaintedValue')
        {
            parent::offsetSet($k, $val->get());
        }
        else
        {
            parent::offsetSet($k, $val);
        }
    }

    protected function safeKey ( $key )
    {
        if (is_numeric($key))
        {
            $k = $key;
        }
        else
        {
            $k = preg_replace('/[^a-zA-Z0-9_\-\.]/', '_', $key);
        }
        return $k;
    }
    
    protected function safeValue ( $value )
    {
        if (is_object($value))
        {
            $class = get_class($value);
            if ($class != 'TaintedValue' && $class != 'TaintedArray')
            {
                trigger_error('only tainted objects allowed', E_USER_ERROR);
            }
            $val = $value;
        }
        else if (is_array($value))
        {
            $val = new TaintedArray($value);
        }
        else
        {
            $val = new TaintedValue($value);
        }
        return $val;
    }

When accessing information we stored in our arrays we need to distinguish between the safe filtered values, the wrapped tainted values and the unwrapped raw data. The last one is the one we want to stay away from our code.

    public function __set ( $key, $value )
    {
        $this->offsetSet($key, $value);
    }

    public function __get ( $key )
    {
        return $this->get($key);
    }
 
    public function get ( $key )
    {
        if (isset($this->tainted[$key]))
        {
            return $this->tainted[$key];
        }
        else
        {
            trigger_error('unknown index ' . htmlspecialchars($key), E_USER_NOTICE);
        }
    }

    public function getRawUnsafe ( $key )
    {
        return $this->raw[$key];
    }

Now we just have to fill in the other methods to keep our three arrays in sync.

The methods in the TaintedValue object are pretty straight forward. Maybe with exception of the Email and Url filters, though example code can easily be found on the Internet.

Articles Thursday, April 6, 2006