In my previous post, I wrote that I’m not a web guy. It wouldn’t take a genius to cross reference materials online and conclude that my field is security research or perhaps more broadly in the fundamental analysis of computer programs. Ultimately, I study programs and bugs. I have a marketable-level skill (some fools decide to pay me…) for having an intuition about the control a scenario gives to a potential attacker. Sometimes, I focus on risk mitigation, and in other cases I’m playing the aggressor.
Further, did I mention that I’m not a big fan of PHP? It’s not for me. It’s not for anybody. In fact, PHP is a security nightmare by design.
Look, look! Whose genius big idea was it to let your attacker choose which files to execute AND have great control of the environment in which those files execute? This is madness! You’re telling me that in running this server I’m giving blanket, public permission to point my executor at a text file of their choice with just a few constraints, where the primary constraint is based on limiting file paths we can point to from the server? A server that can process user-submitted content? This design degenerates into a pile of mixed code and data begging to get exploited.
I know what you’re probably thinking. The author is clearly not very experienced with PHP and by his own admission, too. You learn the language, frameworks, and tools and it starts to make sense. While I think a developer can become skillfull with a poor tool, a fundamental security mistake was made in the beginning, and the wages of this original sin is remote code execution.
Every time you boot the Linux kernel, in the default configuration, you’re going to see a log message in the kernel ring buffer that reads something like this:
[~] x86/mm: Checked W+X mappings: passed, no W+X pages found.
Briefly, this means that no memory page in the kernel should be both writable and executable at the same time. This constraint is so essential that we check on every boot and complain if our critical invariant does not hold. It’s something we want to verify on every user’s machine, on every deployment, to guard our desirable logical property jealously.
It’s important because the expression represents the divergence between code and data–the mathematical and prescriptive logic versus that which is consumed and processed. It’s as fundamental as the relationship between the actor and the object acted upon. Even in the realm of JIT compilers, while it’s tempting to think that in all the sawdust and smoke of dynamic code crafting we would allow writing and running the same block of memory, carefully separate write and execute because a single confusion between code and data is a necessary requirement for remote code execution.
There are platforms that take this dichotomy to the next level. Some microcontrollers do not allow code loading and have completely separate data and executable spaces to prevent contamination of one by the other. This is usually called the Harvard Architecture in the popular texts. These are some the most difficult targets to attack from a vulnerability research point of view.
PHP seems like it would be immune to this kind of RCE on the surface. It’s a managed language, we have garbage collection, we can’t get dangling pointers and memory corruption (as long as our extensions behave and even then it would be caused by the impurity of the non-PHP code alongside)… However, PHP is used to write web applications that use the Hypertext Transfer Protocol. The executable text of PHP is quite similar to the type of data it processes. PHP is often mixed with the HTML to be served, a wafer thin membrane running between the actor and the object. Now, consider the capabilities of submitting data to the server, in the form of text, to a server whose code is interpreted text, and is stored in files we can call upon to be interpreted at will with our arguments, and may share a directory tree with user submitted content.
I hope you can see that we might as well be exchanging code! Because PHP is so flexible, because it’s expressed in a runnable form that’s similar to the data and whose execution is directed by the client… eeek! It’s a design that encourages mistakes. This model provides too much attacker control, and the risk of confusing data and our application code is very high.
Coming back down to earth, we have bills to pay. There are good, common sense ways to help mitigate these issues. We test, we debug, we define interfaces and strict rules in front of PHP to avoid passing most of the malicious requests. We place constraints and via our web architecture we buttress PHP and patch around the design issue. We swiftly correct exploitable bugs through automatic updates enabled by default. These are important security measures that add value to any system but do not remove the essential confusion built into the language.
But, if a person becomes so privileged to architect a new software system, please, please implement a firm separation of application code and user data. Store these as far apart as you can in dramatically different containers, and demand that we should never mistake one for the other.