In-depth analysis-The ISFB first loader
Last updated
Last updated
Earlier this year, I stumbled upon a captivating blog post by 0verfl0w, where he shared an in-depth analysis of one of the variants belonging to the notorious ISFB malware family. This sparked my curiosity and ignited a deep interest in exploring the rich history of this ever-evolving malware lineage.
Motivated by a desire to uncover the secrets and intricacies of this malware family, I embarked on a journey to scour the web for different variants. After careful consideration, I chose a particularly intriguing variant that stood out from the rest. In this blog post, I aim to provide an extensive and detailed analysis, shedding light on the inner workings and unveiling the hidden facets of this captivating variant.
Join me as we delve into the depths of this malware, exploring its techniques, encryption mechanisms, and potential impact. Together, we will uncover the secrets and gain a deeper understanding of this ever-evolving threat landscape.
ISFB has come a long way, evolving from a basic information stealer that targeted browser credentials and social media passwords to a sophisticated banking malware capable of extracting victims financial information.In fact in the post linked to above 0verfl0w_ goes over the different ways it can do so, there's actually no point of saying that again.With that said let's get on with the reversing.
To start off, I would basically use something like pestudio
to open up the malware and get a sense of whether or not the malware is packed, get an overview of the malware capabilities, get an idea of potential algorithms used to obfuscate strings.
In fact, pestudio
is a great tool to begin with when conducting basic static malware analysis, which is usually what we do at the very beginning of the analysis process.
We can tell from the image below that it's a valid executable since we see the 0x5A4D
, which translates to the MZ
string. We can also tell that it's a 32-bit executable bound to the GUI subsystem, hence we expect to see WinMain instead of just main. We can see a fairly high entropy value as well, even though not being enough of a trait that the malware is packed. There's a pretty high chance that it's packed, but we'll need to dig deeper to confirm that.
You can go and take a look at the imports and strings. You'll probably recognize that the strings are not revealing at all, and the same thing with the imports. No one of the imports appears to be suspicious, which can lead us to think that the malware is probably packed.
Few different types of packers are out there: free/open-source packers, commercial packers, and custom packers. The majority of malware authors go with custom ones because why pay for something that you can get for free. Custom packers rely on allocating run-time memory to decompress/decrypt the malware into.
With that in mind, let's open up the malware in x32dbg. Put a few breakpoints at some of the APIs that are likely to be used to allocate virtual memory in the process address space, write the malware into, and probably detect the presence of a debugger.
I have actually set a few breakpoints at: VirtualAlloc
, VirtualProtect
, IsDebuggerPresent
, CreateProcessInternalA
, CreateProcessInternalW
, WriteProcessMemory
.
Running the executable, we hit VirtualAlloc
a few times for each time we follow the allocated memory in dump until we see something that looks like a valid PE.
After doing that few times, we can see what appears to be a valid PE
copied to memory. Strangely enough, it doesn't look like obfuscated or something. In other cases, you might do a little more work trying to follow the dumps till you get to the valid, deobfuscated PE.
Now we can just go ahead and dump that out of memory. You can do this with x32dbg, however, I prefer to use Process Hacker. In Process Hacker, we can just double-click the process under x32dbg, go to the memory tap where the different memory allocations are, look for memory region 2B0000, and then save it to disk.
What we have just seen is a form of process injection called self-injection, since the malware injected itself into the same process. However, you may have noticed that we've put breakpoints at the CreateProcessInternalA
and CreateProcessInternalW
APIs, just in case the malware spawns another process and injects itself into.
Now, having saved the memory file to disk, let's open up the file in PEBear and look at the imports to confirm that this is the unpacked malware payload.
We can definitely see that we have all imports resolved correctly, which means we don't have to fix up the offsets in the section header table to unmap the malware and restore it in the file state.
Actually, one thing I wanna point out to here is that, in most cases, the dumped payload will be in memory state, which requires from the analyst to unmap it by setting the raw offsets to the same value of the virtual addresses. Only then will the imports be correctly resolved, and the file will be ready for investigation. In our case again, we didn't have to do this, but I recall situations where the unmapping was necessary.
Upon opening the sample in IDA, we notice that there's not much going on here. The sample only calls one custom routine, making it relatively easy to identify the malicious code. Additionally, we see some other API calls, including HeapCreate
, which creates and returns a heap object with the specified initial size value. In our case, the initial size is 0x400000. This means that all later memory allocations will be taken from the previously allocated heap memory. We also notice a call to GetModuleHandleA
with the module name set to NULL. This indicates the current module handle. The allocated heap memory pointer and the returned module handle are both stored in global variables for later reference.
Upon jumping into this custom routine, we can see that there are many functions called from within it. Let's walk through it step by step. The first function being called is sub_402094
. Let's take a look at that function and see if we can find anything interesting.
We can see that it first creates a manual reset event object. This action wasn't fully understood until i looked into the second loader, which checks for the existence of this object to avoid having two instances of the malware running on the same machine. Next, it obtains the OS version and performs an OS version check. Based on the results of this check, it decides whether to continue with its course of execution or simply exit.
The main point here is the call to the OpenProcess
API, which opens a handle to the current instance of the malware. The access mask set to 0x10047A (which I am not familiar with,if you do please let me know) grants the process specific access rights. This handle is then stored in a global variable for later reference. Finally, the routine either returns a zero or a non-zero value. A zero return value indicates success, while any other returned value indicates failure. In this case.
After renaming few things,the function will look just like this.
Getting back to our analysis, we can see that the return value of the earlier function is checked, and the malware jumps into the if block or exits. The main point here is the call to NtQuerySystemInformation
, which is used to query the system for specific type of information determined by a SystemInformationClass
value. In our case, IDA recognized this as SystemProcessorPerformanceInformation
. We also notice that the first dword of the returned structure(IdleTime
) is used to calculate a value, which is then passed into another routine, recognized as the config decryption routine.
One thing that stood out to me is that, for all of the ISFB variants I saw, they used a timestamp string of the malware compilation date to generate the magic value used in the config decryption routine. However, for this variant, they decided to go with a different approach. Instead of using a hard-coded value, they generate a value between 1 and 19 during run-time. This eliminates the need for any hard-coded values in the decryption routine. My thinking is that they did this to avoid using the compilation string as an IOC for malware detection.
That covered, let's jump into the config decryption routine and see how it works. A picky one would probably recognize the reference to these constants: 0x3C and 0x14 relative to the base address of the module. This tells me that it's walking the headers to finally get to the section headers table.
Then it enters a loop checking the current section header's Name field against the hard-coded string 'ssb.'. This actually refers to 'bss'. If you've already heard of ISFB before or read about it, then you may know that it stores its encrypted config at bss. Therefore, it's not surprising that it parses the section header table looking for bss to extract and decrypt the config.
Upon finding the targeted section header, it gets the raw size of the section and its RVA to use in the decryption process of the config. As far as the algorithm is concerned, initially, when you first look at it, you can't really tell there's any decryption going on as there's no reference to the usual xor or rotate operations. However, the math involved in the algorithm is addition and subtraction. Here are some key bits of the algorithm that may help you understand the process.
First, we can see that the routine takes an argument (arg_0), which is dynamically generated at run-time. We will refer to this as 'run_time' from now on. This value is moved into the edx register, then decremented by 1 and the result is bitwise anded with 1. Finally, the output value is stored in the edx register.
The reason why I am highlighting this is to show you how the run-time generated value is used in the process of decoding the config. By replicating this process and writing a static config decryptor, you can automate the process..
.Getting back to our analysis, we can see in the other highlighted code block how the run_time generated value, bss_rva, and the calculated value in edx are used to calculate a value. This value will be added to each encoded dword as we'll see later to decode the value.
Next, we enter a second loop that reads the encoded config a dword at a time. It tests to see if the dword is NULL. If not, it does the needed math in the other highlighted code block with the help of what I call the 'magic value' . This value was calculated earlier, as mentioned above, to decode that dword.
Lastly, you can see what appears to be a comparison against a hard-coded constant and if there's a match, it calls a custom written memcpy
to copy the decoded config at the location of bss where the original encoded config is. However, if there's no match, it knows that the current run_time value is not the one needed (remember, the algorithm is that the run_time can be any integer between 1 and 19). Therefore, it gets another run_time value and does the whole process all over again.
After spending quite some time on this, I managed to figure out that 0x13 (19 in decimal) is the perfect run-time value that will decode the config properly. Finally, I managed to replicate this whole algorithm outside in Python and wrote a static config decryptor which you can find on my GitHub:
Getting back to the reversing, you can see that decrypt_config is in a large loop that constantly checks the return of the function. As long as the return value is not 0x15, it exits and continues with the execution. However, if the return value is 0x15, which would indicate that the config wasn't decrypted properly, it loops back and goes for another iteration with another randomly generated value between 1 and 19..
With that said, we can see the malware tries to get the user local information then would check for 'RU'. If not 'RU', it would return and exits otherwise continues with the normal course of execution. My thinking is that this is just the malware trying to be selective about which machines to infect or not, excluding Russian users. This makes me guess either the author is Russia-based or tries to drive me thinking this way. In either case, I don't care.
The next function is just a wrapper for GetModuleFileNameW
. It obtains the full path name of the current module and writes it to the provided buffer. The returned pathname is then converted to its long form using the GetLongPathNameW
function call and stored in a global buffer for later use.
Moving forward, we can see the malware creating a new thread by calling CreateThread
with the start function argument set to point to SleepEx
. This function will put the thread in an alertable state.
Thread is nothing more than a code execution unit or can also be viewed as a stream of code execution. In malware context, malware may create another thread of execution along with the main thread and point it to execute its injected payload.
Each thread has a start function, which is the function the thread starts executing at. Here, the start function is SleepEx
. With a little bit of googling, we can see that SleepEx
will put the thread in an alertable state. This means that when a thread is put in this state, the thread's APCs (Asynchronous Procedure Calls) are executed.
Next, the malware calls QueueUserAPC
. This function will queue up an APC on the stack of the newly created thread. Notably, the first argument to the function is the APC to execute.
Let's go ahead and take a peek into that function. Initially, we see an OS version check to pick up the proper string-format security descriptor. This is then converted into a valid functional one with the call to a wrapper function that calls ConvertStringSecurityDescriptorToSecurityDescriptorA
..
Moving forward, we see a call to a function that takes three arguments. The arguments passed to the function appear to be a buffer or struct base, an offset into that buffer/struct, and lastly a value calculated from the 'magic value' we saw earlier.
Because the function is quite large, I am going to try to be brief and focus only on the sections of that function that matter.
Going into that function, we can see references to magic values like 0x3C relative to base. This indicates to me that we are parsing the PE headers.
Following along, we can see it's trying to get a pointer to a blob of data after the section header table. It's quite unusual to have data stored in the PE after the section table.
After some research, I came to know that ISFB, also known as GOZI, has what researchers refer to as the JJ, FJ, and F1 struct. These structures store pointers to the compressed malware 2nd stage as well as config data..
It's common for malware to store config data or another executable inside, either in the rsrc section or append it as joined data near the end of the executable just like an overlay. Here, the malware authors decided to go with the other way. They created a struct that starts with the magic bytes 4A4A, which translates to ASCII JJ. This is parsed during execution, and we'll see how to get a pointer to the malware 2nd stage.
We can see it walking the struct to get the pointer to the compressed 2nd stage. You might be asking, how in the world did I really know that it points to compressed data? Well, we see it calling next a function which is doing a lot of math
Going inside we see a reference to constants that appear to be unique to the function:0x7D00,0x500.
With a little bit of googling, I came to know that these constants are used in apLib, which is a legit compression algorithm. Honestly, looking further into the function and not seeing any XORs or crypto instructions led me to believe that it's decompressing the data, not decrypting it.
Notably comparing that to the variant mentioned in the blog, I referenced at the very beginning, we see that it's almost the same algorithm and the same variant with little changes.
If you look inside this routine, you'll see reference to offsets that are not consecutive. This is a flag that most likely it's struct access and can't be an array.
Tracing these offsets we see that the struct layout is as follows:
Having decompressed the data, we see an XOR operation with a hard-coded XOR key in the struct. This will decrypt the first 4 bytes of the decompressed data to fix up the headers. Finally, the function returns.
Having parsed and decompressed the joined data, we see a call to a function which in brief allocates memory through memory mapped file(section object),a memory mapped file is just an association of the file's content with a portion of the process virtual memory,in simple words what this is just a means to read from or write to large files such as database files without having to load up the whole file into memory by just mapping part of or the whole file into memory and use a pointer to read from or write to.
In our case however we are not mapping the file into memory,but allocating that much memory equal to 2 * full_pathname_length +2.
With the memory allocated, we have one more function to look at before we wrap up our analysis. This function will copy the decompressed 2nd stage sections into the allocated memory, resolve the imports, put it in memory state, and then transfer execution to the 2nd stage entry point..