Tuesday, February 20, 2018

The Case of the Tricky Tool

Looks can be deceiving. There are times when you think an analysis is going to be easy, and everything points in that direction, until you hit a snag. This happens. Sometimes you've made an assumption that is wrong, sometimes there is a little trick the attacker is doing, and sometimes your tools fail you. This is one of those times.

The Malware

I received a malicious attachment in my email yesterday that uses a technique that I've started to see more and more in documents - utilizing the metadata fields to hold some of the malicious code. The advantage to this technique is that it spreads the code throughout the document and makes it more difficult to analyze. Despite this, all signs pointed to this being an easy document to analyze. As you'll see, I was wrong.

MD5: e618b9ef551fe10bf83f29f963468ade
SHA1: 93993320c636c884e6f1b53f9f878410efca02da
SHA256: d400d6392a17311460442e76b26950a0a07e8a85c210c31e87a042a659dc9c52

Once more, I used REMNux to statically analyze the file. Yes, I could have executed it with Lazy Office Analyzer to speed up my analysis, but frankly my Windows VM is temporarily fubar'd, so I was stuck doing it this way.

The first step in my analysis was to figure out what type of document I was dealing with. Relying on the trusty-old file command, we see that this is a composite document file. This means that we can use the oletools suite to analyze it.

Notice that file also gives us some information obtained from the metadata of the document. The metadata of a document is stored inside the document and contains information about it, such as the last time it was saved, how many characters are in it, and the author. As analysts, we can use this information as indicators, to track actors across multiple documents, and sometimes to provide attribution.

In this case, the author is set to "Caleb Cheng" and was last saved by the username "caive". Are these legitimate? There's no way for us to tell as they could have been modified after the document was saved, but we could at least create a yara rule to track these usernames and see if they pop up in other documents!

Speaking of Yara, my next step was to use the rules from the Yara Rules Project and see what they found inside of the document.

Two of the Yara rules indicated the document contained embedded VBA code. Not surprising as many malicious documents use VBA to execute their second stage malware. Yara didn't find any embedded executables, so this probably meant the document downloads its next stage to execute. The only way to find out was to extract the VBA and analyze it.

The olevba.py script from the oletools suite can be used to extract VBA code from Office documents. Initially, I didn't give it any options to so I could see how the code looked natively. Fortunately in this document, the VBA code was pretty short and easy to understand.

The code can be broken down as follows:
  1. Declares a number of variables.
  2. Reads the value of the metadata title field with the ThisDocument.BuiltInDocumentProperties("title") command.
  3. Converts the title to ASCII from Unicode. In reality, this is just converting the characters to their integer equivalents and putting them into an array.
  4. Loops through each letter of the title and subtracts 7 from its ASCII value.
  5. Converts the array back to Unicode. This is just converting the array back to a string.
  6. Reverses the string (e.g. turns "abc" to "cba").
  7. Uses the shell() function to execute some of the string, after replacing and splitting it up with multiple values.
From this, it appeared that the title of the document contained the encoded command that gets run, so I wanted to see what this value was. There are multiple ways to do this, but one of the easiest is to use the metadata extraction tool exiftool. This is where my trouble started.

As expected, the title field contained obfuscated text that needed to be decoded in order to see exactly what the malicious document was doing. Before I could decode it, however, I needed to extract it properly.

The Problem

Exiftool will display periods for both actual period characters and binary data, so I first attempted to use some command-line fu in order to properly extract the characters. Normally, this can be done with the following command, which will convert all characters into a hex-encoded string that you can put directly into a python script.
exiftool -title -b  resume.doc | hexdump -v -e '"\\" "x" 1/1 "%02X" ' ;
Since the VBA code was fairly straight-forward, I wrote a quick python script to decode it. However, when I ran the script I didn't get the results I expected. While I saw some PowerShell commands and an obfuscated URL, there were some binary characters that shouldn't have been there; it should have decoded cleanly to text.

This is where I spent the next few hours trying to figure out what was going on.

At first I thought the VBA code was doing some value conversions when it converted from Unicode to ASCII integer values; this was not the case. I tested this by writing some similar VBA code, launching it in Word, and debugging it to see the values before and after the conversion - all was as expected.

Then I went over my python script to make sure I hadn't made a programming error (this would not have been the first time). Everything was good.

Finally, I went back to the Word document itself to see if I could figure out if my command-line fu had worked correctly. Thats when I noticed something interesting. If I opened the document in a hex editor and compared the location where the title string was located at to what was extracted by exiftool, some of the binary characters were different.

In the example above, the top hex dump is from the document itself and shows exactly what is within the document. The bottom is what exiftool extracted. You can see that in the original document, the highlighted byte was 0x83 but when exiftool extracted it, the byte was converted to 0xC6 0x92. This occurred multiple times in the extraction.

I'm currently not sure why exiftool did this. I tested it with multiple options and the latest version and all did the same thing. I'm waiting to hear back from the developers to see if I found a bug.

Unfortunately, I was unable to come up with some genius command line fu to extract the real title string in one fell swoop. So how did I do it? I copied the bytes from the hexdump command above and did some copy and pasting to get it in the right format. Sometimes thats just the easiest option. If anyone comes up with anything, please let me know.

I should also note that some other metadata extraction tools, like olemeta.py from oletools, crashed when attempting to extract the title. I suspect this is because they expect this to be a string and not have binary characters in it.

The final python code I came up with is below and can be downloaded from here.

When run, it gave me the output I expected.

As initially thought, the malicious document downloads an executable, saves it to the file system, and executes it. At the time of this writing, the file (61a5af5acea342ee5ca8dbd2fba0de06) is still accessible at that IP address. We'll save that analysis for another day.

This analysis is a prime example as to why you should trust, but verify, your tool results. In the beginning I assumed that exiftool was extracting the data properly - and why not? It had never failed me before. However, when the results were not what I was expecting I had to dig deeper to see what the issue was. Without the knowledge on how to look into a file and compare what my tool was giving me to what I was actually getting, I wouldn't have been able to figure out that my tool was giving me incorrect data and work towards a process of getting it rectified.


I posted on the exiftool forum asking about the potential bug I found. Phil Harvey, the creator of exiftool, said the changing of the characters is because exiftool is attempting to convert the binary character to UTF-8. Unfortunately, outside of using the -v4 option to dump the output in hex and carve it back (which is what I did with hexdump), there is other workaround in exiftool at this time. I'll keep looking for a better way to do this going forward.

Monday, January 29, 2018

Document Analysis - 2018-newsletters.xls

Today I received what was clearly a malicious document in my email, so to celebrate the publishing of my second PluralSight course - Performing Malware Analysis on Malicious Documents - I thought I'd go through the analysis of the document.

The document came in as an attachment in email and was named 2018-newsletters.xls.

MD5: 46fecfa6c32855c4fbf12d77b1dc761d
SHA1: c028bc46683617e7134aa9f3b7751117a38a177d
SHA256: 4e8449f84509f4d72b0b4baa4b8fd70571baaf9642f47523810ee933e972ebd9

You can download the file from here. The password is the last 8 characters of the filename, all lowercase.

To analyze it, I'm going to use REMNux, the malware analysis Linux distribution put together by Lenny Zeltser. This distro has all the tools we need to analyze the document.

The first thing I need to do is figure out what type of Office document we're dealing with. By running the Linux file command on the document, it tells us we're dealing with the composite file format, or structure storage format, of Office. Knowing this helps us figure out what tools we can use on the file.

Next, I want to see if there's anything interesting inside of the document. There are lots of tools that can be used for this, but for now I'm just going to use Yara with the rules downloaded from the Yara Rules project.

Two yara rules get set off - Contains_VBA_macro_code and office_document_vba. Both rules indicate that the XLS contains VBA macro code. Macros are often used by attackers within documents to download additional malware or execute more code, such as PowerShell. If we didn't think this spreadsheet was malicious before, this certainly raises our suspicions.

Next, I'll try and extract the macro code. My favorite tool for doing this is olevba, which is part of the oletools by decalage. When I run it, I use the --deobf and --decode options to allow olevba to attempt to deobfuscate and decode any strings it can.

The resulting file is an excellent example of the obfuscation that attackers will go to in order to try and hide what they are doing from analysts. Lets look at a few of the functions and obfuscation performed.

In the example to the right, the first function that is executed by the XLS is Workbook_Open(). This function calls the VBA Shell() function; Shell() is used to execute operating system commands. The parameters to the Shell() function are other functions, which lead to other functions, which lead to obfuscated strings.

We can manually trace through the code to figure out what this is doing.
  1. The first parameter to Shell() is a function call to a function named tabretable().
  2. tabretable() calls 3 different functions, one of them being sunafeelo().
  3. sunafeelo() has 4 lines in it.
    1. The first line sets a variable to the string "external hard".
    2. The second line sets a variable to the string "cM" using the Chr() function. Chr() returns the ASCII equivalent of the number given to it. This is a technique that is often used by attackers to obfuscate strings.
    3. The third line creates the string "D.ex" by combining Chr(), a period, and the results from the Left() function. In this case, the Left() function returns the first 2 letters from the left side of the string "external hard", or "ex".
    4. The last line combines all of these together, along with the results from the Right() function. Here, Right() returns the right-most two characters from the string "free ", which are "e " (e plus a space).
The result from the first parameter to Shell() is "cMD.exe /c ", so we know its creating a command to execute on the system. I could go through all of the rest of the code to figure it out, but why should I if there are tools that will do it for me?

To do this, I'll use Lazy Office Analyzer (LOA). LOA works by setting breakpoints on various APIs and recording their parameters. This allows us to watch when the malicious document writes files, connects to URLS, and most importantly, executes commands.

In the image above (click to enlarge), you can see how I ran LAO. In the end, the document executes obfuscated PowerShell that we could go in and deobfuscate some more. However, we see the URL hxxps://softarez[.]cf/mkeyb[.]gif in the code, which we can infer means that it will be downloading and executing whatever is returned.

This site was not up at the time I analyzed it, but fortunately it was analyzed by someone on hybrid-analysis, and shows that the downloaded files is a Windows executable, which VirusTotal indicates is a Zbot variant.

However, with regards to analyzing the malicious Excel file, we're done. Since documents are typically used as the first stage of a malware compromise - in other words, they download or drop more malware to execute - we've figured out it does. The malicious document downloads an executable and runs it.

From here, we can start looking on our network for anyone accessing this site, as they will most likely have opened this document.

As I stated in the beginning of this post, my second PluralSight course was published and teaches how to analyze malicious documents. If you want to learn how to do everything I discussed here, plus a lot more, go check out the course. I welcome any feedback on it - good or bad - and any new courses you'd like to see from me.



MD5: 46fecfa6c32855c4fbf12d77b1dc761d
SHA1: c028bc46683617e7134aa9f3b7751117a38a177d
SHA256: 4e8449f84509f4d72b0b4baa4b8fd70571baaf9642f47523810ee933e972ebd9

  • hxxps://softarez[.]cf/mkeyb[.]gi

Tuesday, April 4, 2017

Malware Analysis Course on Pluralsight!

Since 2010, I have been running my Introduction to Malware Analysis course at various conferences and organizations, and have taught over 200 students. I've heard from many of my former students that they've used what they learned in the course to help them successfully combat malware in their organizations - some have even gone into the malware analysis field themselves!

I only teach my course once or twice a year; for the past few years it has only been at DerbyCon. The problem with that is the material sits unused for most of the year, with no one gaining benefit from it.

So, when I was approached by the great people at Pluralsight to record my course and put it online, I jumped at the chance. It was a long process to do, but well worth it. This week, the course was released under the name Malware Analysis Fundamentals.

Malware Analysis Fundamentals is an online version of my Intro to Malware Analysis course. The course takes you from knowing nothing about malware analysis to being able to manually analyze malware in a safe and consistent manner. Like my regular course, you still analyze real malware using the techniques used by incident responders everywhere.

The one thing that I found out while creating this was that its not possible to put everything from my sit-down course into the online version. If I did, the course would have been at least double the length and no one wants to sit through that! Therefore, Malware Analysis Fundamentals gets to the essence of the material and teaches the fundamentals needed to get the job done.

There is also more to come. I already have plans for other MA courses at Pluralsight, branching into more advanced techniques and courses on analyzing alternative forms of malware.

I hope you enjoy the course and I look forward to hearing everyone's thoughts on the course!

Tuesday, November 15, 2016

Malicious DNS Namespace Collisions

Over the last few weeks, I've noticed a problem come up again in multiple places that I first saw many years ago and apparently is still very common - DNS Namespace Collisions. DNS namespace collisions occur when a private domain name is able to be resolved on the public Internet; whether it is intentional or not. ICANN has a lot of information on this if you are looking for a deep dive on the subject; instead I will be focusing on the potential security issues.

The Issue

Let's start with an example. Suppose you own the Internet domain example.org. This is your Internet presence - all your emails are @example.org, your web servers are in this domain, even your Active Directory domain is corp.example.org. All is well in the world.

When configuring hosts in your organization, one of the things you will do is set up your DNS suffix search list. This is the list of domains your systems will add to a host name if they can't initially resolve it. In our scenario, your DNS suffix search list is example.org and corp.example.org. So, if a host attempts to resolve mailserver, they might also try mailserver.example.org and mailserver.corp.example.org.

And let's also suppose that you follow good security practices and have split DNS so no one on the Internet can resolve your internal host names. You also do not allow internal hosts to directly resolve Internet host names.

Your happy domain.

Any issues so far? Nope. The computer gods are smiling upon us.

As your organization expands, you find the need to add a new internal domain so you choose example.com. Uh oh! You don't own that domain on the Internet, but you'll only be using it on the internal network. Not an issue, right? No, it is a problem.

The issue lies in that you do not own the domain example.com but are using it internally; this is a DNS name collision. The issue comes into play soon as a host accesses the Internet directly (from home, a client's network, etc.). When this happens, they won't be able to resolve hosts with the suffix example.org or corp.example.org - but as soon as they try to resolve with the suffix example.com (which you don't own) they will succeed.

So how is this an issue? In the best case, it isn't. If your hosts try to resolve something that example.com can't resolve then aside from some information leakage things should be OK. However, what if they try to resolve something that does exist in example.com and then try to start using it?

On the Internet, only example.com will resolve.
For example, our hosts are on the Internet and are trying to the internal mail server host name. The only one that is resolvable is mailserver.example.com, which we don't own. They then start to send emails - your private, internal-only emails - through a server you don't own. See the issue now?

This only happens if that host name already exists in the external domain, right? Wrong.

If DNS wildcards are used, now all of a sudden any host name is being resolved beyond your control and your hosts are sending data to potentially malicious servers. Think of how easy it would be to gain information on your organization or compromise your hosts if I could tell your hosts where their proxy, active directory, or mail servers were when they were outside your organization. And how would you ever know?

This is not a theoretical attack. In the last few weeks I have found multiple organizations where this is occurring. Specifically, they are using domains internally that they do not own, their hosts go outside their organization and are resolving these domains to malicious IP addresses.

And there are organizations that are squatting on multiple domains (including obviously internal ones) and setting up wildcard DNS to point them to their own IPs. For what purpose? I don't know, but I suspect it can't be good.

Detection, Prevention, and Response

So how can you detect this? A few ways:
  1. Create a list from your DNS for all domains being used by your clients related to your organization. Make sure you own all those domains. If not, what IPs do they resolve to? Consider switching from them. (This is also a good threat hunting technique!)
  2. Windows hosts like to resolve wpad/wpad.dat when browsing. The DNS search suffix tends to get added to that, so look for any HTTP requests to the Internet for wpad.dat, then look for what domains the requests are to. Even if they are not your own hosts (e.g. consultants), you should still be concerned as they could be used as a pivot point into your network.
By the way, wpad.dat is not something you want your hosts doing this with.

Prevention of this is actually pretty easy - just make sure you own any domain you use, or use ones that do not have Internet TLDs. (However, from my research there may be issues on this with some versions of Windows.)

If you do find this happening on your network, I would suggest immediately looking to see what your hosts are resolving, what data is going out, and more importantly, what is coming back in.

I would also recommend blocking the IP addresses and external domains on your Internet devices to prevent internal hosts from accessing them.

In the end, this is a big problem that I don't think many realize is going on. Fortunately, its fairly easy to detect and start investigating. Doing it now will probably save you a lot of hurt in the long run.

Wednesday, April 29, 2015


MASTIFF has been a pet project of mine for about two years now. While it has not progressed as far as I would have liked, we made a major announcement this week.

On Monday, a free online interface to MASTIFF was released at https://mastiff-online.korelogic.com/. This interface allows anyone to upload files, have MASTIFF process the files, and see the results generated.

If you are not familiar with MASTIFF, it is an open source framework for automating the static analysis of malware. It essentially will determine the type of file you are analyzing and only run the static analysis techniques for that file against it. This allows fast extraction of data the analyst can then examine.

The online interface was created for two reasons:

1. When you start processing a number of different file types, the pre-requisites start to get cumbersome and difficult to install. The online interface alleviates this by allowing you to analyze files without installing everything.

2. Our #1 request was a web interface to the system. While the interface used on MASTIFF Online is not open source itself, we are hoping this will give users what they want.

If anyone has any questions/comments/suggestions to MASTIFF or the site, please let me know!

Monday, February 10, 2014

Installing Yara into IDA Pro 64-bit Linux

tl;dr Install a 32-bit VM, compile Yara, copy files over. See link below for files to just install.

Last Friday, pnX posted that he updated his awesome IDA plug-in, IDAScope, to include Yara support. This means that you can now run Yara sigs against files you are reversing to help in the analysis process.

After I installed the new version of IDAScope into IDA Pro, however, I received errors stating that Yara could not be imported. I thought this was odd as I had Yara installed on my system, until I remembered how IDA works on a 64-bit Linux system.

The following is based off my observations and experiences. If I am incorrect on this, please forgive me and let me know in the comments.

IDA is a 32-bit program. Even the 64-bit version of IDA is compiled as a 32-bit program.

$ file idaq idaq64
idaq:   ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), BuildID[sha1]=0xcb635dd38de5c73f050de37a0f2e492688b3ab9a, stripped
idaq64: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), BuildID[sha1]=0x1f03dcff4bfd776b23df71c8d9d471fb63b0bf48, stripped

This causes a number of interesting issues on 64-bit Linux systems, especially with Python. Hex Rays has gotten the majority of these fixed in the default install so you don't worry about them, and the way it does this with Python is by allowing you to install a bundled Python into the IDA Pro directory. (There are other ways, but I have not done them.) This gives you a working "out of the box" product.

This also means that when you want to install a new Python library and use it in IDA, you have to install it into the IDA's bundled Python directory as well. If this is a pure Python module, then no problem. Just copy and it should work. Yara is different.

Since Yara compiles as a 64-bit library on a 64-bit system, and yara-python does the same, we can't just install it directly into the IDA Python directory. If you do, you'll receive errors that IDA is unable to load a 64-bit module.

In order to get Yara working, we'll need to compile it as a 32-bit library. The easiest way, IMO, to do this is to load a 32-bit Linux system into a VM, compile Yara, then copy the files into your IDA installation. I did this in a Debian 6.0.3 and it worked without a problem. Just to be safe, make sure you are using a system with Python 2.7 as well since that is what IDA bundles.

There are two files you will need: the Yara library libyara.so.0 and the Yara Python library yara.so (located in the Python dist-packages directory after installation). Follow the instructions to compile and install Yara in your 32-bit VM, and copy the files onto your 64-bit system. libyara.so.0 goes into your base IDA install directory, and yara.so goes into the python directory underneath that.

After you do that, Yara-python will be installed and will work great!

Don't want to go through all the trouble of installing a 32-bit VM, compiling, and copying? I don't blame you. I uploaded the version I compiled to my Google Drive here.

yara-ida-libs.tgz (SHA256: 38674b584adf3932e5cd1cafbd0bb288b7db3302304a83041bad9295472aa064)

Just untar this into your base install dir for IDA and you should be good to go.

Hex Rays has published instructions on how to install Python packages from Pip on a 64-bit system. I recommend checking them out. This time, my way just felt easier.

Wednesday, September 4, 2013

Installing BinDiff on Linux Mint 14

I recently upgraded my system to Linux Mint 14 and went about re-installing all my software. When I got to Zynamics/Google BinDiff, I found I had an issue:
$ sudo dpkg -i bindiff401-debian50-amd64.deb

Selecting previously unselected package bindiff.
Unpacking bindiff (from bindiff401-debian50-amd64.deb) ...
dpkg: dependency problems prevent configuration of bindiff:
 bindiff depends on sun-java6-jre; however:
  Package sun-java6-jre is not installed.

Unfortunately, BinDiff requires sun-java6-jre, which is not in the Linux Mint repository, nor any other repository I could find. I could circumvent this by installing BinDiff by using the --ignore-depends=sun-java6-jre option to dpkg. However, every time I went to install updates I would get an error message that BinDiff was broken, and be prompted to uninstall it before I could continue.

However, I found a work-around - create a dummy package named sun-java6-jre using the tool equivs. (There are some docs out there on this, but I was unable to find a non-Google cached copy, so here was what I did.)

Linux Mint has equivs in its repository, so if its not already installed, apt-get it.

Next, run equivs-control sun-java6-jre and this will create a file named sun-java6-jre that you will need to modify.

At minimum, you'll need to uncomment and/or fill out the following fields:
  • Package
  • Version
  • Maintainer
I also filled out the description fields so I would remember what it was.

After the file is modifoed, run equivs-build sun-java6-jre and you should see something similar to below:
$ equivs-build sun-java6-jre
dpkg-deb: building package `sun-java6-jre' in `../sun-java6-jre_6.0_all.deb'.

The package has been created.
Attention, the package has been created in the current directory,
not in ".." as indicated by the message above!
Once that has successfully completed, you should have a sun-java6-jre_6.0_all.deb file in your directory. If that failed, you probably forgot to modify one of the fields in the file.

Finally, just dpkg -i the new deb file and BinDiff, and you should be ready to go!
$ sudo dpkg -i sun-java6-jre_6.0_all.deb
Selecting previously unselected package sun-java6-jre.
(Reading database ... 237677 files and directories currently installed.)
Unpacking sun-java6-jre (from sun-java6-jre_6.0_all.deb) ...
Setting up sun-java6-jre (6.0) ...
$ sudo dpkg -i bindiff401-debian50-amd64.deb
Selecting previously unselected package bindiff.
(Reading database ... 237681 files and directories currently installed.)
Unpacking bindiff (from bindiff401-debian50-amd64.deb) ...
bindiff license has already been accepted
Setting up bindiff (4.0.1) ...

Then you are good to go!