Tuesday, February 20, 2018

The Case of the Tricky Tool

Looks can be deceiving. There are times when you think an analysis is going to be easy, and everything points in that direction, until you hit a snag. This happens. Sometimes you've made an assumption that is wrong, sometimes there is a little trick the attacker is doing, and sometimes your tools fail you. This is one of those times.

The Malware

I received a malicious attachment in my email yesterday that uses a technique that I've started to see more and more in documents - utilizing the metadata fields to hold some of the malicious code. The advantage to this technique is that it spreads the code throughout the document and makes it more difficult to analyze. Despite this, all signs pointed to this being an easy document to analyze. As you'll see, I was wrong.

MD5: e618b9ef551fe10bf83f29f963468ade
SHA1: 93993320c636c884e6f1b53f9f878410efca02da
SHA256: d400d6392a17311460442e76b26950a0a07e8a85c210c31e87a042a659dc9c52

Once more, I used REMNux to statically analyze the file. Yes, I could have executed it with Lazy Office Analyzer to speed up my analysis, but frankly my Windows VM is temporarily fubar'd, so I was stuck doing it this way.

The first step in my analysis was to figure out what type of document I was dealing with. Relying on the trusty-old file command, we see that this is a composite document file. This means that we can use the oletools suite to analyze it.

Notice that file also gives us some information obtained from the metadata of the document. The metadata of a document is stored inside the document and contains information about it, such as the last time it was saved, how many characters are in it, and the author. As analysts, we can use this information as indicators, to track actors across multiple documents, and sometimes to provide attribution.

In this case, the author is set to "Caleb Cheng" and was last saved by the username "caive". Are these legitimate? There's no way for us to tell as they could have been modified after the document was saved, but we could at least create a yara rule to track these usernames and see if they pop up in other documents!

Speaking of Yara, my next step was to use the rules from the Yara Rules Project and see what they found inside of the document.

Two of the Yara rules indicated the document contained embedded VBA code. Not surprising as many malicious documents use VBA to execute their second stage malware. Yara didn't find any embedded executables, so this probably meant the document downloads its next stage to execute. The only way to find out was to extract the VBA and analyze it.

The olevba.py script from the oletools suite can be used to extract VBA code from Office documents. Initially, I didn't give it any options to so I could see how the code looked natively. Fortunately in this document, the VBA code was pretty short and easy to understand.

The code can be broken down as follows:
  1. Declares a number of variables.
  2. Reads the value of the metadata title field with the ThisDocument.BuiltInDocumentProperties("title") command.
  3. Converts the title to ASCII from Unicode. In reality, this is just converting the characters to their integer equivalents and putting them into an array.
  4. Loops through each letter of the title and subtracts 7 from its ASCII value.
  5. Converts the array back to Unicode. This is just converting the array back to a string.
  6. Reverses the string (e.g. turns "abc" to "cba").
  7. Uses the shell() function to execute some of the string, after replacing and splitting it up with multiple values.
From this, it appeared that the title of the document contained the encoded command that gets run, so I wanted to see what this value was. There are multiple ways to do this, but one of the easiest is to use the metadata extraction tool exiftool. This is where my trouble started.

As expected, the title field contained obfuscated text that needed to be decoded in order to see exactly what the malicious document was doing. Before I could decode it, however, I needed to extract it properly.

The Problem

Exiftool will display periods for both actual period characters and binary data, so I first attempted to use some command-line fu in order to properly extract the characters. Normally, this can be done with the following command, which will convert all characters into a hex-encoded string that you can put directly into a python script.
exiftool -title -b  resume.doc | hexdump -v -e '"\\" "x" 1/1 "%02X" ' ;
Since the VBA code was fairly straight-forward, I wrote a quick python script to decode it. However, when I ran the script I didn't get the results I expected. While I saw some PowerShell commands and an obfuscated URL, there were some binary characters that shouldn't have been there; it should have decoded cleanly to text.

This is where I spent the next few hours trying to figure out what was going on.

At first I thought the VBA code was doing some value conversions when it converted from Unicode to ASCII integer values; this was not the case. I tested this by writing some similar VBA code, launching it in Word, and debugging it to see the values before and after the conversion - all was as expected.

Then I went over my python script to make sure I hadn't made a programming error (this would not have been the first time). Everything was good.

Finally, I went back to the Word document itself to see if I could figure out if my command-line fu had worked correctly. Thats when I noticed something interesting. If I opened the document in a hex editor and compared the location where the title string was located at to what was extracted by exiftool, some of the binary characters were different.

In the example above, the top hex dump is from the document itself and shows exactly what is within the document. The bottom is what exiftool extracted. You can see that in the original document, the highlighted byte was 0x83 but when exiftool extracted it, the byte was converted to 0xC6 0x92. This occurred multiple times in the extraction.

I'm currently not sure why exiftool did this. I tested it with multiple options and the latest version and all did the same thing. I'm waiting to hear back from the developers to see if I found a bug.

Unfortunately, I was unable to come up with some genius command line fu to extract the real title string in one fell swoop. So how did I do it? I copied the bytes from the hexdump command above and did some copy and pasting to get it in the right format. Sometimes thats just the easiest option. If anyone comes up with anything, please let me know.

I should also note that some other metadata extraction tools, like olemeta.py from oletools, crashed when attempting to extract the title. I suspect this is because they expect this to be a string and not have binary characters in it.

The final python code I came up with is below and can be downloaded from here.

When run, it gave me the output I expected.

As initially thought, the malicious document downloads an executable, saves it to the file system, and executes it. At the time of this writing, the file (61a5af5acea342ee5ca8dbd2fba0de06) is still accessible at that IP address. We'll save that analysis for another day.

This analysis is a prime example as to why you should trust, but verify, your tool results. In the beginning I assumed that exiftool was extracting the data properly - and why not? It had never failed me before. However, when the results were not what I was expecting I had to dig deeper to see what the issue was. Without the knowledge on how to look into a file and compare what my tool was giving me to what I was actually getting, I wouldn't have been able to figure out that my tool was giving me incorrect data and work towards a process of getting it rectified.


I posted on the exiftool forum asking about the potential bug I found. Phil Harvey, the creator of exiftool, said the changing of the characters is because exiftool is attempting to convert the binary character to UTF-8. Unfortunately, outside of using the -v4 option to dump the output in hex and carve it back (which is what I did with hexdump), there is other workaround in exiftool at this time. I'll keep looking for a better way to do this going forward.

Monday, January 29, 2018

Document Analysis - 2018-newsletters.xls

Today I received what was clearly a malicious document in my email, so to celebrate the publishing of my second PluralSight course - Performing Malware Analysis on Malicious Documents - I thought I'd go through the analysis of the document.

The document came in as an attachment in email and was named 2018-newsletters.xls.

MD5: 46fecfa6c32855c4fbf12d77b1dc761d
SHA1: c028bc46683617e7134aa9f3b7751117a38a177d
SHA256: 4e8449f84509f4d72b0b4baa4b8fd70571baaf9642f47523810ee933e972ebd9

You can download the file from here. The password is the last 8 characters of the filename, all lowercase.

To analyze it, I'm going to use REMNux, the malware analysis Linux distribution put together by Lenny Zeltser. This distro has all the tools we need to analyze the document.

The first thing I need to do is figure out what type of Office document we're dealing with. By running the Linux file command on the document, it tells us we're dealing with the composite file format, or structure storage format, of Office. Knowing this helps us figure out what tools we can use on the file.

Next, I want to see if there's anything interesting inside of the document. There are lots of tools that can be used for this, but for now I'm just going to use Yara with the rules downloaded from the Yara Rules project.

Two yara rules get set off - Contains_VBA_macro_code and office_document_vba. Both rules indicate that the XLS contains VBA macro code. Macros are often used by attackers within documents to download additional malware or execute more code, such as PowerShell. If we didn't think this spreadsheet was malicious before, this certainly raises our suspicions.

Next, I'll try and extract the macro code. My favorite tool for doing this is olevba, which is part of the oletools by decalage. When I run it, I use the --deobf and --decode options to allow olevba to attempt to deobfuscate and decode any strings it can.

The resulting file is an excellent example of the obfuscation that attackers will go to in order to try and hide what they are doing from analysts. Lets look at a few of the functions and obfuscation performed.

In the example to the right, the first function that is executed by the XLS is Workbook_Open(). This function calls the VBA Shell() function; Shell() is used to execute operating system commands. The parameters to the Shell() function are other functions, which lead to other functions, which lead to obfuscated strings.

We can manually trace through the code to figure out what this is doing.
  1. The first parameter to Shell() is a function call to a function named tabretable().
  2. tabretable() calls 3 different functions, one of them being sunafeelo().
  3. sunafeelo() has 4 lines in it.
    1. The first line sets a variable to the string "external hard".
    2. The second line sets a variable to the string "cM" using the Chr() function. Chr() returns the ASCII equivalent of the number given to it. This is a technique that is often used by attackers to obfuscate strings.
    3. The third line creates the string "D.ex" by combining Chr(), a period, and the results from the Left() function. In this case, the Left() function returns the first 2 letters from the left side of the string "external hard", or "ex".
    4. The last line combines all of these together, along with the results from the Right() function. Here, Right() returns the right-most two characters from the string "free ", which are "e " (e plus a space).
The result from the first parameter to Shell() is "cMD.exe /c ", so we know its creating a command to execute on the system. I could go through all of the rest of the code to figure it out, but why should I if there are tools that will do it for me?

To do this, I'll use Lazy Office Analyzer (LOA). LOA works by setting breakpoints on various APIs and recording their parameters. This allows us to watch when the malicious document writes files, connects to URLS, and most importantly, executes commands.

In the image above (click to enlarge), you can see how I ran LAO. In the end, the document executes obfuscated PowerShell that we could go in and deobfuscate some more. However, we see the URL hxxps://softarez[.]cf/mkeyb[.]gif in the code, which we can infer means that it will be downloading and executing whatever is returned.

This site was not up at the time I analyzed it, but fortunately it was analyzed by someone on hybrid-analysis, and shows that the downloaded files is a Windows executable, which VirusTotal indicates is a Zbot variant.

However, with regards to analyzing the malicious Excel file, we're done. Since documents are typically used as the first stage of a malware compromise - in other words, they download or drop more malware to execute - we've figured out it does. The malicious document downloads an executable and runs it.

From here, we can start looking on our network for anyone accessing this site, as they will most likely have opened this document.

As I stated in the beginning of this post, my second PluralSight course was published and teaches how to analyze malicious documents. If you want to learn how to do everything I discussed here, plus a lot more, go check out the course. I welcome any feedback on it - good or bad - and any new courses you'd like to see from me.



MD5: 46fecfa6c32855c4fbf12d77b1dc761d
SHA1: c028bc46683617e7134aa9f3b7751117a38a177d
SHA256: 4e8449f84509f4d72b0b4baa4b8fd70571baaf9642f47523810ee933e972ebd9

  • hxxps://softarez[.]cf/mkeyb[.]gi