By: Walker Rowe, February 02, 2017 (11:05 AM)

Understanding Buffer Overflow and Shellcode Exploits

Understanding Buffer Overflows

In the official biography of Bill Gates, Paul Allen, the Microsoft co-founder, tells of flying into an airport with the code he and Bill had sold to Atari to provide Basic Language for that platform. Mr Allen says as the airplane circled the airport he realized he had not written code to load those instructions onto the disk drive of the machine. So he wrote machine language code to do that as the jet came landing on the runway.

Regular programmers are mightily impressed by that because it is so complicated to program in machine language. Easier-to-read operating system and other code is compiled into machine language, but few people actually write code using that. ML are strings of 1s and 0s, or bits. Together they reduce a computer program to the simplest of steps. After all, the CPU cannot understand something like “open file.” Instead it supports these basic operations: compare two values, jump to a specific address, write a value to memory, remove a value from memory, add, multiply, subtract, divide, etc.

 

Buffer Overflow Explained

These very low level operations are what a hacker tries to hijack when the use what are called buffer overflow and shellcode exploits. These are among the most common hacking techniques.

When a company, like Adobe, writes a program they usually do this in a low level programming language like C and C++. (I mention Adobe here because Adobe is frequently the target of buffer overflow attacks, which is what we explain here. It seems hardly a week goes by without Microsoft sending out a security advisory about yet another Adobe security flaw.)

C and C++ programming languages are more complicated than higher level languages like Python or Java because the programmer has to be concerned with memory management. And this is where the programmer makes mistakes.

Consider this example. Suppose a programmer wants to store the value “watermelon” in a string. In C language this looks like this in memory:

“watermelon” + \n

Where the “\n” is the null terminator, or “00”. That null is how the program knows where a string ends.

OK, letters are stored as HEX values. For example, the letter A is 65 in ASCII and 41 in HEX. But for the moment let’s assume that the computer stores an A as just an “A.”

Then the memory of the computer will look something like:

watermelon00xyxyxyxyxyxyxy

Where x and y are un-initialized random variables beyond the end of “watermelon”. To say that they are un-initialized means their values have not been set yet. So it could contain any kind of garbage. The hacker wants to use that extra space to replace that garbage with their own instructions.

Now the length of “watermelon” is 11, which is 10 characters plus the null character 0. If the hacker can change the null character to something else then all of a sudden there are not 11 memory spaces there, there are many more. In other words, the hacker has chopped off the end of the field so that the computer no longer knows where the string ends. When the computer reads “watermelon” it is now reading some dangerous instructions too that the hacker has put there.

ShellCode and JavaScript

Shellcode tries to flood the memory of a computer with assembly language instructions to take over a computer. Often the vehicle used to do that is JavaScript, since JavaScript is used in many web pages. (Plus JavaScript can be hidden in RTF documents too and elsewhere, which is what makes that particularly dangerous.)

So the hacker will study some program, like Adobe, and look for a weakness. Then they write shellcode to exploit it.

Here is an example from StackOverflow. In a JavaScript program, a programmer writes something like this.

var shellcode = unescape("%uc92b%u1fb1%u0cbd%uc … %ua07d%ued92%u09e1%u9631%u5580");

That is just a string of hexadecimal numbers. But if you translate that to assembly language code it looks something like:

mov cl,0xc

jmp 0x6a6a:0xbfc3183

That “mov” (move) means write a certain value to the memory. “Jmp” means jump or go to that memory address shown. So the program logic leaves what it was doing and branches off to do something else.

So if the hacker can do these three things, they can take over the watermelon program:

  1. Replace the \n in “watermelon\n” with any other letter, thus making that field longer.
  2. Write some instructions at that newly-created memory address.
  3. Cause the computer to execute those instructions by putting a jump instruction somewhere to the right of “watermelon.”

That in brief is how such buffer overflow exploits work.

Adobe Flash Exploit
For example, here’s one in Adobe Flash Player version 21. Fortunately this weakness was found by security researchers and not hackers with malicious intent. Adobe issued a critical patch when fellow researchers deemed this flaw “critical.”

This complicated memory and code dump below shows hackers using this approach to infect this targeted PC with ransomware.

Packet capture of Magnitude Exploit Kit - Proofpoint
Graphic Source – ProofPoint

So developers and hackers are playing cat and mouse. The programmers try to write programs that do not have memory errors. And the hackers and researchers look for those that do.

Walker Rowe

Walker Rowe is an American freelance tech writer and programmer living in Chile. He specializes in big data analytics, cybersecurity, and IoT and publishes the website SouthernPacificReview.com.

Notice: The views expressed here are those of the authors and do not necessarily represent or reflect the views of Cursive Security.

Be Informed. Stay One Step Ahead.

Sign up for our newsletter and stay up to date with the latest industry news, trends, and technologies