18 January, 2019

Remotely compromise devices by using bugs in Marvell Avastar Wi-Fi: from zero knowledge to zero-click RCE

Introduction and motivation

With this research, I’m going to answer the question that has had to be answered for quite a time: to what extent is Marvell WiFi FullMAC SoC (not) secure. Since the wireless devices with the analyzed chip aren’t fully researched by the community yet, they may contain a tremendous volume of unaudited code, which may result in severe security issues swarming devices equipped with WLAN cards. At the outset, I should mention that this article is based on the info I presented during my ZeroNights 2018 talk. So, feel free to have a look at the original slides here There are also some notable researches on the subject of wireless SoC security. For example, Google Project Zero published a series of blog posts starting in April 2017 describing exploitation of Broadcom Wi-Fi stack on smartphones. This topic was also discussed at theBlackHat 2017 conference. Some smartphone baseband exploits write-ups might help understand the techniques used to reverse engineer firmware of wireless SoC.

How wireless device works and starts up

In general, there are two main categories of Wi-Fi dongles: FullMAC and SoftMAC. Both of them need a firmware image, which should be uploaded every time a device starts up. A device manufacturer supplies appropriate firmware images and operating system device drivers, so during startup, a driver can upload firmware enabling its main functionality to the Wi-Fi SoC. There is a picture below which illustrates the process. startup The main difference between SoftMAC and FullMAC dongles lies in their firmware functionality. Also, the firmware of FullMAC dongles has the MLME (MAC Layer Management Entity) functionality. In other words, it is capable of handling some Wi-Fi management frames and events entirely on SoC without any support from an operating system driver. dongles Obviously, the attack surface of FullMAC dongles is far wider, so these devices are of greater interest for us.

Interaction between Wi-Fi SoC and driver

There are two versions of drivers for linux kernel, which be used to work with Marvell Wi-Fi:
  1. mwifiex driver (sources can be found in the official linux repo)
  2. mlan and mlinux drivers (sources can be found in the official steamlink-sdk repo)
Both of them have some debug features, which allows us to read/write SoC memory. A driver uses an internal format to send information to Wi-Fi SoC or receive event or responses from SoC. This internal format can be researched using, for example, sources of Marvell open source driver mwifiex. There are several defined types of data to/from Wi-Fi SoC.
  1. COMMAND
  2. EVENT
  3. DATA
  4. SINGLE PORT AGGREGATED DATA
The schema of the interaction between Wi-Fi SoC and a device driver can be seen below interaction We prefer to think about commands like about API implemented by firmware. All these commands can be sorted out in several groups:
  1. READ/WRITE commands of SoC memory
  2. Extended version info from firmware (like w8897o-B0, RF8XXX, FP68, 15.68.7.p206 for SteamLink)
  3. Wi-Fi related stuff (like assoсiation, scanning, …)
Some of them can be accessed from usermode using driver-implemented IOCTLs or special debugfs files. One of the most useful features of a driver is that it can make a post-mortem firmware memory dump. This can be helpful for debugging our dynamic instrumentation or exploit. It looks like the timeout mechanism is implemented in an operating system driver. So when a command response is timed-out, a driver will try to make dump of the Wi-Fi SoC memory and store it inside host filesystem. Memory dumps have different formats in the mwifiex and mlan+mlinx drivers. I have researched the mwifiex PCI driver and found out that it stores full Wi-Fi SoC memory dump in format similar to firmware image. SDIO version of mlan+mlinx driver stores only ITCM, DTCM and SQRAM regions in raw binary format.

Firmware analysis

As described earlier, the Marvell Avastar Wi-Fi chipset family uses firmware files, which host most of the device functionality. There is also ROM, which contains startup code and interaction with host before loading main firmware into chip RAM. Several versions of firmware are available from the official linux-firmware git repo. So, first of all, it’s worth to investigate the firmware image used by the driver to initialize Wi-Fi SoC

Static firmware file analysis

In order to get some basic ideas about the structure of firmware RAM images you may look at Marvell Wi-Fi driver code, which loads firmware to Wi-Fi SoC (drivers/net/wireless/marvell/mwifiex/fw.h).
...

struct mwifiex_fw_header {
    __le32 dnld_cmd;
    __le32 base_addr;
    __le32 data_length;
    __le32 crc;
} __packed;

struct mwifiex_fw_data {
    struct mwifiex_fw_header header;
    __le32 seq_num;
    u8 data[1];
} __packed;

...
You may notice that the firmware file consists of some number of memory chunks with headers and checksums. A chunk header also contains an address in SoC, where this memory is about to be loaded. Using this knowledge we can implement the IDA Pro loader for Marvell Avastar’s firmware files to investigate them further. loader After some RE, we can find out that 88W8897 is the ARM946 microcontroller with 8 MPU regions. All memory is RWX. The firmware file also contains references to ROM functions. So, to research further, we’ll need a ROM dump. Below is the memory map of 88W8897 Wi-Fi chip. An unknown memory region seems to be a memory mapped registers area. layout

Dynamic firmware analysis. ThreadX runtime structures recovery

We can utilize the READ and WRITE commands to create a simple tool which can dump memory and instrument firmware. With this tool, we can obtain some runtime information. It’s worth noting that the firmware looks like a big opaque binary. It contains just several strings, which give absolutely no hint on how these devices operate and where or how we can start hunting for bugs. However, after researching the ROM dump (obtained with our tool), we can find out that this is a ThreadX-based firmware. ThreadX is a proprietary RTOS widely used in smart devices. Source code for this RTOS can be acquired with the license. ThreadX is basically just a run-time environment. It contains functions for managing dynamic memory, threads, and communication between threads. It’s one of the most popular RTOSs with over 6 billion deployments (according to their website). ThreadX runtime structures can be searched in memory because of ID fields, which appear to contain specific values if a structure is valid. For example, in the thread structure.
typedef  struct TX_THREAD_STRUCT
{
    /* The first section of the control block contains critical
       information that is referenced by the port-specific 
       assembly language code.  Any changes in this section could
       necessitate changes in the assembly language.  */
    ULONG       tx_thread_id;           /* Control block ID         */

    ...
}
The first 4-byte field tx_thread_id must contain the value 0x54485244 or THRD in ascii. This gives us even more information because some of these ThreadX objects can contain names, which can somehow clarify their purpose. We implement ThreadX runtime structure reconstruction as an IDA script. It can be used to research another ThreadX-based devices memory dumps. Some summary of ThreadX structure reconstruction (addresses are valid for default steamlink firmware with internal version w8897o-B0, RF8XXX, FP68, 15.68.7.p206):
object name function address possible meaning
Thread Idle 0xFFD06479 consumes free CPU resources
Thread MAC Tx 0xFFD50C39 this thread can parse some Wi-Fi frames
Thread MAC Tx Notify 0xFFD55B2F ???
Thread MAC Mgmt 0xFFD13E55 thread which parses management frames
Thread CB Proc 0xFFD24859 this thread handles internal communication with host
Thread IccTask 0xFFD066D5 ???
Timer SleepConfirmTmr 0xFFD1E055 ???
Timer AP_NullPktDoneTmr 0xFFD1DC55 ???
Timer NullPktDoneTmr 0xFFD1DC55 ???
Queue TxMgmt80211MsgQ ???
Queue MacMgmtSMEMsgQ ???
Queue TimerCbMsgQ ???

Dynamic firmware analysis. Dynamic firmware instrumentation

At the starting point of hunting bugs in the firmware, we have very few information about our target:
  1. source code is unavailable
  2. code which handles or parses frames is unknown
  3. Wi-Fi SoC contains a small amount of memory, sufficient to serve its purpose, we can’t place our code inside Wi-Fi SoC for fuzzing or coverage measuring.
Since we can’t find any workarounds for points 1 and 3, we can research running firmware to find functions, which are responsible for handling Wi-Fi frames. Several types of runtime analysis in-vivo on Wi-Fi SoC are available using READ/WRITE commands.
  1. We can hook a single function (using some splicing technique).
  2. We can replace pointers for some debug-or-log-like routines.
  3. We can trace block pool allocation/deallocation.
  4. We can even instrument entire code regions (not big though) with static thumb function calls (like DBI with function-level granularity).
The first three points are quite simple to implement in our Wi-Fi SoC research tool. The last one may look somewhat tricky. Basically, our tool uses capstone disassembly engine to find thumb BL (which is used in function calls) instructions and replace it with calls to our instrumentation sub. This instrumentation stub is responsible for calling our custom DBI tool, original firmware function with correct parameters, and returning to call cite. You may think of it like about your favorite DBI framework with function-level instrumentation granularity. The algorithm is quite straightforward: dbi_overview You can get more details about the instrumentation stub workflow from the picture below: stub_workflow So, in order to instrument code on Wi-Fi SoC, we’ll need:
  1. Read to-be instrumented memory region from Wi-Fi SoC.
  2. Disassemble it with capstone.
  3. Create structures the patcher code will use to patch firmware in memory and call instrumentation user-defined routine.
  4. Copy these structures, special patcher code, stubs and user-defined routine to Wi-Fi SoC.
  5. Execute patcher by hooking extended version routine and calling it using regular firmware API. This firmware function is rarely called by the driver. This will ensure that we can disable interrupt handling and process instrumentation safely.
  6. After our instrumentation tool collects the necessary runtime information, copy the result from Wi-Fi SoC memory using firmware/driver feature to access Wi-Fi SoC memory.
It’s worth noting that there are some microarchitectural problems with the instrumentation of the type. They may occur because we overwrite instructions on Wi-Fi SoC with our new BL calls to the instrumentation stub. So, with I/D-cache incoherency, we may lose some results (some calls may not be executed because an old original instruction from I-cache will still be valid). It also looks like firmware locks ARM CP15 coprocessor registers for writing after initialization, so flushing I-cache on Wi-Fi SoC isn’t a trivial task. Another technique that can be applied to investigate Wi-Fi SoC in-vivo is static firmware instrumentation. However, it requires Wi-Fi firmware to be rebuilt each time to apply new analysis payload. The device also should be rebooted to start the instrumented firmware. There are several types of DBI tools that can be of help here:
  1. Tools that search signatures in a function parameters (like BSSID or MAC).
  2. Tools that collect information about call stacks (this information can help in RE or in firmware fuzzing).
  3. Tools that monitor ThreadX block pools state.
All of this gives us information on how code processes frames because we can customize our DBI tool with different client binaries. This is a big step forward in the absence of source codes and any RE hints like logging strings or an exported function name. After applying this type of dynamic analysis to running firmware, we can understand which functions are used to parse input frames and parameters, where input data is passed to these functions. After that, numerous types of binary analysis and bug hunting techniques can be applied.

Hunting for bugs

Although we’ve applied various types of binary analysis (both static and dynamic) on firmware memory dump and Wi-Fi SoC in-vivo, it’s still hard to search for vulnerabilities manually.

Fuzzing

It looks like only 2 types of fuzzing available.
  1. over-the-air random fuzzing
  2. fuzzing firmware in emulated environment
The first type of fuzzing enables direct fuzzing of Wi-Fi SoC, although the algorithm of mutating input frames will be kinda dumb due to insufficient edge coverage, which we can’t collect. Generally, the goal of collecting edge coverage can be accomplished by using processor features like JTAG, ARM ETM or Intel Process Tracing technologies. However, this requires hardware support from the chip itself and some hardware hacking skills in order to use hardware debug functionality in production-grade devices. It looks like a non-trivial engineering task. And dumb fuzzing is really dumb. So, let’s proceed to the second type. The second type of fuzzing relies on firmware emulation, so it’s relatively easy to collect edge coverage for mutating input with the help of some feedback-driven algorithm. This is really SMART fuzzing of wireless devices. You’ll be surprised, but the tool, that allows you to fuzz some code in this way is already out. This is a mix of the original AFL fuzzer and Unicorn CPU emulator called afl-unicorn, and it was originally created by Nathan Voss. You should check out the materials on how it works and how to fuzz arbitrary code or CGC binary example So, to fuzz Wi-Fi firmware with the afl-unicorn tool, you’ll need to identify parsing routines (for example, using our Wi-Fi SoC DBI tool) and write a fuzzer that will feed mutated input (Wi-Fi frames) into these routines. Basically, your fuzzer should do the following things:
  1. MAP necessary memory regions using a modified version of Unicorn.
  2. Setup the register context.
  3. Read a mutated input file and map it into the emulator memory.
  4. Start code execution.
  5. Properly emulate firmware crashes by sending appropriate signals.
Looks like this is a straightforward and effective technique, but still there are some drawbacks. The most notable is a dependency on the global state which was captured at the time of creation Wi-Fi SoC memory dump. This state can contain some saved global variables which can prevent certain execution path to be reached by fuzzer. There is also no dynamic memory access sanitization, difficult to locate and remove checksum verification code. The communication between RTOS tasks can’t be implemented so this can also prevent reaching potentially interesting execution paths. But some results may be obtained using this fuzzing technique: afl Sticking to this technique, I’ve managed to identify ~4 total memory corruption issues in some parts of the firmware. Nonetheless, due to AFL being able to mutate input in a way it can’t be passed to the fuzzed function (for example, because of some sanitization checks before the fuzzed function) it is hard to investigate a potential impact that may be caused by these issues. I also tried to reproduce these bugs on different versions of the firmware and different versions of wireless SoCs and it looks like bugs are present in many of them.

The most interesting bug to be exploited

One of the discovered vulnerabilities was a special case of ThreadX block pool overflow. This vulnerability can be triggered without user interaction during the scanning for available networks. This procedure is launched every 5 minutes regardless of a device being connected to some Wi-Fi network or not. That’s why this bug is so cool and provides an opportunity to exploit devices literally with zero-click interaction at any state of wireless connection (even when a device isn’t connected to any network). For example, one can do RCE in just powered-on Samsung Chromebook. So just to summarize:
  1. It doesn’t require any user interaction.
  2. It can be triggered every 5 minutes in case of GNU/Linux operating system.
  3. It doesn’t require the knowledge of a Wi-Fi network name or passphrase/key.
  4. It can be triggered even when a device isn’t connected to any Wi-Fi network, just powered on.
Here, I will describe how to achieve arbitrary code execution on Wi-Fi SoC. Details on escalation techniques will be presented further in this article.

Basic ThreadX block pool overflow exploitation

ThreadX block pool is just a continuous memory region split into blocks of smaller size. Each block pool is presented by runtime structure, which can be found in the memory dump with the help of our IDA script described above. At the beginning of each block, there is a pointer to the next free block. Before the last free blocks, the NULL pointer resides. The first free pointer is stored in ThreadX block pool management structure. The pointer to this structure is used in the block pool allocation and destruction functions. tx-bp It’s easy to notice, that an attacker can overwrite the pointer to the next free block and control location, where the next block will be allocated. By controlling the location of next block allocation, an attacker can place this block to the place where some critical runtime structures or pointers are, thus achieving an attacker’s code execution. tx-bp-overflow

Marvell Avastar ThreadX block pool overflow exploitation

Most memory management routines in Marvell Avastar’s firmware relies on special wrapper functions. This function uses a special metadata header in the beginning of each ThreadX block. By reverse engineering this functions, one can find out that these headers can contain special pointers, which are called before freeing a block. So, in the case of Marvell Avastar’s firmware, an attacker can easily perform code execution on a wireless SoC. Here, you can find pseudocode of the block deallocator that allows execution of an arbitrary pointer: deallocator In order to execute code, an attacker just needs to overwrite some more additional space of the next block (only in case it’s busy). overflow2

Combining all things together

So, we have 2 techniques to exploit ThreadX block pool overflow. One is generic and can be applied to any ThreadX-based firmware (in case it has a block pool overflow bug, and the next block is free). The second technique is specific to the implementation of Marvell Wi-Fi firmware and works if the next block is busy. In other words, by combining them together we can achieve reliable exploitation.

Example of escalation to application processor on Valve Steamlink

Valve Steamlink is a simple desktop streaming device, which allows you to play PC games on a computer and stream gaming desktop to TV-box, for example. You can play your PC games on your TV. The firmware of this device is based on some Debinan-like GNU/Linux operating system with the Linux kernel “3.8.13-mrvl” that works on arm7l application processor. It has Marvell 88W8897 wireless chipset, which is connected with SDIO bus and proprietary mlan.ko and mlinux.ko device drivers. Fun fact: this device went out of production just a day before ZeroNights 2018:) You may notice, that the majority of devices which use Marvell Wi-Fi are gaming devices, like PS 4 (maybe because of high-performance 802.11ac and Bluetooth COMBO). It’s difficult to research them because of the DRM protection. So, I chose SteamLink because there’s no DRM in it, and it’s possible to easily launch their tools and kernel modules to research wireless SoC. Microsoft Surface and Samsung Chromebook use Marvell Wi-Fi as well.

Escalation attack surface

To execute code on SteamLink’s application processor, we’ll need a second escalation because SDIO bus doesn’t have direct memory access (from a device to host) by design. Some buses like PCIe allows DMA, and, therefore, escalation techniques are much simpler. In this case, the exploitation of the escalation vulnerability is similar to the exploitation of the remote vulnerability. The only difference is that an attacker sends data from a controlled Wi-Fi SoC over SDIO bus, not over the network. You may think about a typical device driver as a bridge between a device and an application or operating system. Therefore, it should receive data from a device, parse it, send it to an application (operating system) and vice versa. It contains code that parses data received from a device. In the particular case of Marvell Wi-Fi driver, this part of code should process many types of messages composed of information elements (IEs). And in fact, the escalation attack surface is quite wide. escalation

Exploitation of AP device driver vulnerability

The discovered vulnerability is extremely simple to exploit – a stack-based buffer overflow. There’s also no binary exploitation mitigations in the Linux kernel “3.8.13-mrvl”. However, AGAIN because of the I/D-cache incoherence and/or write-back buffer deffer commit, some preparatory stages are required. Also, there’s no control over stack because of function epilogues, which pops stack pointer from stack itself:
LDMFD           SP, {R4-R11,SP,PC}
To successfully exploit the escalation bug, one should do the following:
  1. Call the v7_flush_kern_cache_louis linux kernel function.
  2. Execute shellcode.
Because the stack pointer is lost, we can’t place gadgets on stack. Instead, we may rely on registers R4-R11, which are also restored from the stack before the execution will continue at the restored PC location. First of all, we’ll need to find a special gadget which contains two different registers call in one basic block. This gadget represents the invocation of two main actions, flushing caches and invocation of shellcode. For example, this one is good enough.
BLX             R3
MOV             R1, R4
MOV             R2, R5
SUBS            R3, R0, #0
MOV             R0, R10
BNE             loc_C00E7678
BLX             R9
Although it contains a conditional branch, it will be never taken, because v7_flush_kern_cache_louis always returns 0. It also doesn’t spoil R9, which can be controlled by an attacker. However, the first call is made with the R3 register, which isn’t restored from the stack. In that case, one should search a gadget to place a controled value in R3 before calling the main one. For example, like this one:
MOV             R3, R8
BLX             R7
The final gadget should calculate the location of shellcode and transfer execution to it. In this case, R0, R1, R2, R3 and R12 may be used as they may contain some stack pointer. And in case of Marvell’s driver, R12 indeed contains address from the stack. Therefore, one should find a gadget that will use controlled register and R12 to calculate an actual shellcode location and transfer execution like this one:
LDR             R6, [R12,R4,LSL#4]
MOV             R7, R0
ADD             R4, R12, R4,LSL#4
MOV             R8, R2
BLX             R6
It also should be noted, that an attacker can significantly increase the number of available gadgets by using the thumb instruction encoding. In fact, there are several cases of the R12 pointer location during overflow. I think it depends on the current state of scanning. One can research how to send event buffer from Wi-Fi SoC to AP properly, so stack layout will always be the same. Overall result – is ~50-60% success rate of exploit.

Exploit requirements for Valve Steamlink

In this research, I used ALFA networks wireless adapter in the monitor mode, which is based on Realtek 8187 wireless chipset. The exploit can be implemented with python Scapy framework. For some reason, Ubuntu GNU/Linux distrubution isn’t good enough to inject Wi-Fi frames fast, so it is better to use Kali. You may see full-chain exploit demo in the video below. The demonstrated payload just periodically prints messages in the kernel log.

Wrapping up

Some important things may be learned from this story:
  1. Wireless devices expose HUGE attack surface.
  2. Usually, there’s no exploitation mitigation on wireless SoC.
  3. Device drivers may expose WIDE attack surface for escalation from a device to host application processor even in cases when a device doesn’t have direct access to host memory.

Update

Some media resources misinterpreted the news about the vulnerabilities of the Marvell Wi-Fi stack and presented them as if they were vulnerabilities of RTOS ThreadX. That’s wrong though. In our article, there’s nothing said about the first vulnerability being in the code provided by SDK ThreadX. Moreover, we said that ThreadX only implements the scheduler and dynamic memory management functionality – and nothing but them. In other words, SDK ThreadX forms no attack surface because it doesn’t process any external user data. The bug in question is in third-party code (quite likely, in Marvell’s code), which incorrectly validates the size of data received from the Wi-Fi frame during copying to the ThreadX block pool. Apparently, the security issue is with this code and not RTOS, which only manages memory.