Shlomif's Technical Posts Community - Code/Markup Injection and Its Prevention [entries|archive|friends|userinfo]
Shlomif's Technical Posts Community

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Links
[Links:| Shlomi Fish's Homepage Main Journal Homesite Blog Planet Linux-IL Amir Aharoni in Unicode open dot dot dot ]

Code/Markup Injection and Its Prevention [Oct. 22nd, 2009|10:17 pm]
Previous Entry Share Next Entry

shlomif_tech

[shlomif]
[Tags|, , , , , , , , , , , , , , , ]
[Current Location |Home]
[Current Music |Sam Cardon and Kurt Bestor - Rainmaker]

I'm running out of time for a post to Planet Perl Iron Man, so I'm going to prepare something quick, but hopefully enlightening. I'll use Perl as a demonstration language, but the practices I'm going to cover tend to be more universal.

Starting from the early UNIXes and before, operating systems represented code and its data as sequences of units of a certain number of bits called bytes. Starting from Unix, which was designed to run on such machines as the 16-bit PDP-11 and later on the 32-bit VAX family of computers, this unit has generally become 8-bits. Today, there are many 8-bit based text encodings (and many more binary encodings for binary data), and the interested reader is referred to Joel Spolsky's introduction on the subject and Juerd Waalboer's perlunitut or your language's equivalent document.

In any case, let's suppose we have a string where we want to embed a variable containing a string. In Perl we can do:

my $total = "Hello " . $adjective " World!";

Or more simply:

my $total = "Hello $adjective World!";

So if we put "beautiful" in $adjective, we'll get "Hello beautiful World!" in $total and if we put "cruel" there we'll get "Hello cruel World!" there.

So far so good, if it's a plain string written in plaintext. However, what if it's in a more well-formed format? Let's say HTML:

# Untested
my $input = get_input_from_user_somehow();
print <<"EOF";
<p>
$input
</p>
EOF

The alert reader will notice that $input was inserted as is into the HTML output. And since we didn't check if it contains special characters or escaped its special characters, a malicious user can insert arbitrary HTML code and even JavaScript code there. This in turn can wreck havoc upon the users of the page.

This form of HTML injection is called a a cross-site-scripting attack (XSS). If present in web applications or web-sites, it may allow malicious crackers to set up traps to the unwary, and possibly gain access to sensitive information on the site, such as the passwords of users or administrators. And you did notice how easy it was to write code that exhibited this problem, right?

Here are some other forms of code or markup injection:

  1. Shell Command Injection - I've discussed it briefly in a different post about "shell variable injection" in Bash, but it also exists in Perl. Imagine doing system("ls $dir"); or as some newcomers are tempted to do `ls $dir`, which the latter still has some legitimate uses. Now I as a malicious user can put in the $dir variable some malicious shell code which will wreck havoc on the system of the user that is running the script.

  2. SQL injection allows a user to inject malevolent SQL code that can do untold damages in the database. It is very common in web applications and many other applications that use SQL code. If you do something like "SELECT id FROM users WHERE name='$name'" then by putting single-quotes in the name, and using SQL syntax one can insert arbitrary SQL there and do a lot of damage. There was also a very nice xkcd comic about it.

  3. Perl Code injection - let's suppose we want to construct an optimised anonymous function ( sub { ... } ) on the fly. We can build its code and then use the string eval - eval "". A lot of Perl programmers think it should be avoided at all costs, but metaprogramming has some legitimate uses. Moreover, this can happen in other cases, like when we construct a Perl program (or a program of any other language on the fly and execute it).

    In any case, if we insert a variable into the eval "" which was input from the user without being escaped or validated, we can have an arbitrary code execution.

  4. Regular expressions' code injection - imagine you want to see if a string is contained in a list of strings. One naïve way would be to concatenate the strings using a separator that is unlikely to be contained in them (such as \0) and then match this gigantic string using $haystack =~ m{$needle}. However, if $needle contains special regex characters, then the operation can take a lot of time to match or worse - yield an incorrect result. One way to avoid that is to use perldoc -f quotemeta or its \Q and \E regular expression escapes - $haystack =~ m{\Q$needle\E}g. In this particular case, it is also probably better to use a hash, but naturally this was just one example where we'd like to embed some arbitrary (but plain) text inside a regular expression.

These are the prominent examples I can think of now, but they are not the only ones. Your program is in danger whenever it accepts text input from the user and passes it directly to an output format that has some grammar and syntax that can be influenced by this string.

So how to mitigate such code injection problems? There are many ways - sometimes providing alternatives and sometimes complementing each other:

  1. Make sure you have enough discipline to escape the input before it is passed to the output venue. Write automated tests for that.
  2. If you still want to allow some user input, then make sure that you analyse it to make sure it doesn't contain any malicious code that can abuse the system. For example, you may wish to restrict input only to certain HTML tags and attributes.
  3. Taint your data using unsafe typing or "kinding" and make sure that it can only be output after either being escaped or being untainted. Joel on Software recommends making the wrong code look wrong, which while desirable and important, is probably less preferable than the wrong code to behave wrong and abort with a huge "You suck!" error or something. This may not be very possible given certain limitations of the programming language, but it is a better ideal.
  4. Use auto-escaping features of your environment such as SQL place-holders (e.g: "SELECT * FROM mytable WHERE id = ?"), and the list argument of "perldoc -f system" (e.g: system { $cmd[0] } @cmd).
  5. Perform frequent code reviews, black box tests, and encourage hackers to find problems in your code.
  6. Use complementary security measures that make sure that even if a problem occurs, its damage is mitigated. As examples, you can try running the script under an underprivileged operating system user, or as a database user that lacks certain database privileges.

There are probably several measures that I'm forgetting, so feel free to add them as comments or trackbacks. In any case, be careful when writing code that may cause code or markup injection because the consequences may be dire.

LinkReply

Comments:
From: brianphillips.myopenid.com
2009-10-23 02:45 am (UTC)

unsafe opens

(Link)

Here's another one that I hadn't thought about until recently:

open(my $fh, "/some/directory/$filename");

Naturally, you could worry about $filename containing a file that the user shouldn't be opening but there's another hidden danger:

my $filename = "|echo h4x0r3d|";
open(my $fh, "/some/directory/$filename");

The trailing pipe will cause perl to fork and execute the following command and allow the output to be read by the file handle:

% /some/directory|echo h4x0r3d

Surprisingly, bash (and presumably other shells) doesn't error out at the attempt to execute `/some/directory` (although some output ends up going to STDERR about /some/directory not being executable) and merrily executes the `echo` command, or any other command the hacker wants to execute.

Validating your input can easily prevent this from happening but this exploit is also a good reason to use the three-arg version of open so that user input can't affect how you're opening $filename.
[User Picture]From: shlomif
2009-10-23 09:00 am (UTC)

Re: unsafe opens

(Link)

Very interesting, thanks! Of course you should use the three-args open. One thing I'd like to fix in bleadperl is the fact that open "|-", @cmd and open "-|", @cmd do not work in non-cygwin Win32, which is a huge problem.

I also did not mention String-ShellQuote, which I wanted to.

Thanks again.