In the last homework, we saw how to recognize a virus pattern. While the patterns we found were very common patterns (a tricky jump and an interrupt hook), longer patterns can recognize a specific virus. This homework will write an obfuscator, which will take x64 code and modify it so that it can not be recognized in such a manner. Your program will read in an x64 assembly file, add obfuscation to the program, and print the output.
The program can be written in any language that you would like, with the following restrictions:
We provide a Makefile at the bottom that will ensure proper compilation. If you are not using C++, you will have to modify it, as described below. Specifically, there is a run target, so that calling make run will run your program.
Your program will read in an x64 assembly from standard input, and write the output to standard output. Thus, you can run your program as such:
cat vecsum.s | make run > vecsum-obfuscated.s
While testing, you may want to replace make run with whatever command you use to run your program.
The assembly files will follow a very strict format, intended to make parsing the file viable without a full fledged parser:
These restrictions should allow for easy reading of the x64 assembly input. The intent here is not for you to have to spend all your time writing a complicated lexer and/or parser to read in the file. If there are other restrictions on the x64 assembly input that will make the parsing job easier, feel free to chat with me about it.
The output of your program should be an obfuscated x64 program THAT COMPUTES THE EXACT SAME RESULT. Comments should not be output, and you are free to output blank lines or not (it's probably easier to not output them). We are going to run the output of your code through NASM, so it needs to compile. Furthermore, your output should conform to the x64 formatting guidelines above, as we will try to run your code through your program a second time.
You can start with the sample code provided in CS 2150 lab 8 (x64, part 1): Makefile, main.cpp, and vecsum.s. However, the vecsum.s has to be modified to conform to the above guidelines (reformatting of comments and removal of colons; all x64 opcodes stayed the same). Below is the vecsum.s file properly formatted, but without any comments:
global vecsum section .text vecsum: xor rax, rax xor r10, r10 start: cmp r10, rsi je done add rax, [rdi+8*r10] inc r10 jmp start done: ret
Running it through a VERY simple obfuscator might yield the following code. Note that the formatting (i.e. leading spaces) is optional, and is only included here for ease of reading. The NOPs are indicated in the program below. This intentionally uses very simple obfuscations; details on obfuscation complexity is detailed below.
global vecsum section .text vecsum: xor rax, rax ; the following line is a nop obfuscation imul rdx, 1 xor r10, r10 start: cmp r10, rsi je done ; the following line is a nop obfuscation add r11, 0 add rax, [rdi+8*r10] inc r10 ; the following line is a nop obfuscation nop jmp start done: ret
Note that in the above program the obfuscations are clearly labeled. Not only are you not expected to do that, but it will be impractical when you are doing more advanced obfuscations. We did it here for clarity in understanding the program that resulted.
The program above has as simple obfuscations as there can be: there are three types of NOPs: nop itself, adding zero, and multiplying by 1. You can imagine a bunch of other NOPs: subtracting 0, exchanging ( xchg opcode) a register with itself, etc. In each one, a random register can be chosen, which could be any of the x64 registers. One option would be to have a percentage chance to put such a nop after each line (that is not a ret or cmp ).
Implementing this will get you 2 points (out of 10) - it was done by a 40 line Java program. This type of obfuscation doesn't get us very far - the NOPs are easily detectable as NOPs, and can be easily removed by lex (or anything more powerful).
Your job is to implement more complicated obfuscation. In the program above, there is a dec opcode - perhaps multiple operations to yield the same result. As these command would not, individually, be NOPs, they are harder to detect and remove. Consider the various obfuscation techniques that we discussed in lecture.
It is likely that you will need to generate more complicated assembly routines to demonstrate your code obfuscation - you will be submitting these as well.
There are three different platforms that people are using: Windows, Mac OS X, and Linux. As a result, there are differences in how to compile and run x64 assembly.
YOUR SUBMITTED PROGRAM MUST RUN ON A 64 BIT LINUX MACHINE! And must be compiled using the -m64 flag (which compiles it into 64 bit assembly).
You can look at CS 2150 lab 8 (x64, part 1), which discusses the various ways to compile x64 for the various platforms.
Your obfuscations may need to use temporary registers for their computations. One way to do this is to trace the registers throughout the execution of the program and see which ones are not being used, but this is beyond the scope of this homework.
For this homework, you can safely assume that you may use the rcx, r8, and r9 registers, as those will not be used by the surrounding assembly code. Thus, you can use those three registers in your obfuscations (you don't have to, but you have that option). You may recall that these are registers that are used to pass in parameters 4-6 (from the register usage guidelines). Thus, we will not be providing you with subroutines that have more than three parameters.
Note that you will have to assure that your provided assembly code (in x64.s and whatever you test with) also does not use these registers.
You should submit the following files. BE SURE TO NAME THEM PROPERLY, including capitalization - otherwise, can can't call our testing scripts on your code, and we'll just give you a zero. For example, we will assume that your assembly file is called x64.s , your C++ file main.cpp . Your sample C++/assembly file needs to compile to an x64 executable (not a.out !). The submission system will call make to compile everything.
Obfuscations types will yield the following points:
Note that these obfuscations need to be general. So implementing just a dec replacement is not worth 2 points by itself, but replacing dec and inc (and similar commands) with replacements can yield 2 points for that part. And replacing it with the same pattern of opcodes is not much of an obfuscation, as that becomes an easy pattern to match and thus remove. Obviously, the quality of the implementation of each obfuscation will be on a scale of that amount (so poorly implementing dec/inc may only yield 1/2 points for that one).
This has the net effect of requiring at least two complicated algorithmic implementation for more than 6/10 on this homework.
NOTE: if your obfuscated code doesn't compile, then you will get a very low score. Anybody can scramble a program so that it doesn't compile. It will be far better to provide a small number of obfuscations that work properly rather than a lot that do not work.
We are going to run your obfuscator on your provided source code (x64.s), and compile the result along with your main.cpp, and make sure that it works the same way that your original (un-obfuscated) x64.s and main.cpp worked.
We are also going to obfuscate our own assembly code. In particular, we are going to obfuscate our code multiple times -- meaning we will take the output of our obfuscated assembly code and run it through the obfuscator again and again. It should produce the same result each time.
Below is a sample Makefile for an obfuscator written in C++. You are certainly welcome to use a more complicated Makefile; this is the minimum required for this assignment.
main: g++ -Wall obfuscator.cpp nasm -f elf64 -o x64.o x64.s g++ -m64 -Wall -c -o main.o main.cpp g++ -m64 -Wall -o x64 x64.o main.o run: @./a.out
You have to put the @ before the execution line! Your execution line will vary depending on your language: @./a.out for C/C++, @java Main.java for Java, @python obfuscator.py for Python, etc.
Note that if you cut-and-paste this into a Makefile for you to use, you will have to replace the leading 5 spaces on those lines with a single tab. And the -Wall flag is there for your sanity (it turns on all warnings), but it is certainly not required.
This Makefile does a number of things:
The second line of the Makefile ( g++ -Wall obfuscator.cpp ) will change depending on your choice of implementation language:
For this homework, please name the obfuscator source code file obfuscator.* for whatever language you are using.
The second target is what will run your obfuscator. Here are some sample lines for various languages: