serializing objects with different class definitions
When it comes to (un)serializing objects in PHP, some things may surprise you. In this post I show what I’ve found out last week, when I was testing serialization with different class definitions. This is generally a bad practice, and shows one of the biggest drawbacks of using serialization for persistent object storage: the serialized data holds a frozen version of an object. As project evolves, and classes change, the serialized information doesn’t change with them. When objects are unserialized with the new class definition, it can result in unexpected behavior. You should take care when using serialization for persistent or temporary storage (ie caching objects in memcache), because every change in the class definition may affect the unserialized objects, causing numerous bugs and crashes.
Every example is following the same procedure; file write.php declares a class X, creates an instance of it, and then writes it to a file serialized with obj_write(). The second file, read.php, declares a different class X, reads the file and unserializes the object with obj_read(), which results in creating an instance of the object. Then it executes some code, such as print_r of the object, or echoes some properties.
Helper functions:
function write_obj($obj)
{
file_put_contents('obj.data', serialize($obj));
}
function read_obj()
{
return unserialize(file_get_contents('obj.data'));
}
Example 1
Serializing an object with a public property.
class X
{
public $a = 'A';
}
obj_write(new X());
read.php declares a blank class.
class X
{
}
$x = obj_read();
print_r($x);
echo $x->a
As you might expect, the public property is restored correctly:
X Object
(
[a] => A
)
A
Example 2
Adding some protected properties.
class X
{
public $a = 'A';
protected $b = 'B';
protected $c = 'C';
}
obj_write(new X());
read.php declares only one protected variable $c.
class X
{
protected $c;
}
$x = obj_read();
print_r($x);
echo $x->a;
echo $x->b;
echo $x->c;
An this is the result:
X Object
(
[c:protected] => C
[a] => A
[b:protected] => B
)
A
Notice: Undefined property: X::$b in /home/gasper/phpser/test1-read.php on line 12
Fatal error: Cannot access protected property X::$c in /home/gasper/phpser/test1-read.php on line 13
As you see, print_r correctly prints out the object. Properties $b and $c are both protected. What differs is that when printing out $x->b, PHP reports that $b is undefined property, and it correctly throws a Fatal when accessing $c. The question is, why doesn’t the fatal error already occur when accessing $b? As you can see from print_r output, property $b is present in the $x, and it’s correctly marked as protected, just as is $c. The only difference here is that $b isn’t declared in the class definition, so I guess PHP checks the class definition when accessing properties, rather than actual object information.
Example 3
Now let’s twist up things some more by modifying X definition in read.php:
class X
{
public $c;
function getB()
{
return $this->b;
}
function getC()
{
return $this->c;
}
}
print_r($x);
echo "a: " . $x->a;
echo "b: " . $x->b;
echo "c: " . $x->c;
echo "getB: " . $x->getB();
echo "getC: " . $x->getC();
As you can see, I’ve changed $c visibility to public, and I’ve written two getters. The former is a test whether I can shift visibility of $c upon unserializing, while the second will hopefully allow me to read the variables $b and $c, which are protected in the original definition, and not declared in this one.
Here’s the output:
X Object
(
[c] =>
[a] => A
[b:protected] => B
[c:protected] => C
)
a: A
Notice: Undefined property: X::$b in /home/gasper/phpser/test1-read.php on line 20
b: c:
Notice: Undefined property: X::$b in /home/gasper/phpser/test1-read.php on line 9
getB:
getC:
The first strange thing is that there are two $c properties declared; one protected and one public. While this might be expected (the serialized information specifically tells PHP to unserialize a protected variable $c), it’s still strange that I now have two variables named $c. I don’t think this is possible to achieve without serialization. If you subclass a class and shift visibility of a variable from private/protected to protected/public, you still only have one single variable, so this behavior may come as unexpected. Still, the $x->c and getC() both return an empty value, because no value for public $c was present in the serialized object.
The other thing is that I still can’t access $b, even through a getter. The property is obviously present in the object (as print_r shows), but even when accessing it through a getter, which has access to instance’s protected variables, PHP reports that it’s undefined. I can’t think of a reasonable explanation for that, but this again shows that care should be taken when serializing objects.
Conclusion
As stated before, serializing and unserializing objects with different versions of classes can be a cause for a lot of trouble. If your classes rarely change, or if you have some means of invalidating the serialized objects (ie flushing the cache, or rewriting the rows in the database), then you’re probably fine, although you should always be aware of possible consequences. Likewise, if you cache object with public properties only, these seem to work fine, whether they’re declared or not.
But if you have classes that change often or rely heavily on persistent storage of serialized objects, you should use another way of doing it. One way I can think of is reading the written object with the old version of the class, and passing them to another script/service, which writes them in the new format. This is possible to achieve, but is quite volatile. Other means include using XML to store persistent object data, perhaps even JSON. In these cases, you don’t store the object itself, just as with serialization, but a subset of its properties that are essential to restoring it correctly. Upon recreating the object, these properties are read one by one into a blank object of a proper class version.
So, that’s it. Take care with serialization!