Monday, December 5, 2016

[1days] [0days] [PoCs] More gstreamer FLIC / vmnc issues

Overview
A part of any intellectually honest full disclosure experiment is to disclose the less interesting findings alongside the more serious issues and exploits.

Accordingly, if you were looking for spectacular 0day exploits, this is not the post you are looking for. If you’re generally interested in software failure conditions, though, here’s a bunch.
While looking at the gstreamer FLIC and vmnc decoders, I noticed various other issues aside from the ones that have already been blogged about. Some of these issues are also serious while others are trivial. For most of the issues, I have to commend the gstreamer team on a great job looking for variants of my initial post and in particular giving the FLIC decoder a thorough examination. For many security “fixes”, a project or vendor will just patch up the immediate issues reported, but gstreamer looked at the surrounding area of the initial fault and patched up many additional issues independently. Therefore, most of the vulnerabilities disclosed in this post are now 1days and not 0days.

1: FLIC decoder: uninitialized output canvas
(CESA-2016-0005)
Looks like it’s still an 0day. Images are from Fedora 25 with all updates applied.

This one goes first because it is nice and visual. The bug is that the output canvas is allocated but not initialized. Therefore if you have a file that does not start off by clearing the screen, or rendering an entire first frame, you will have uninitialized heap memory in the output canvas. It looks kind of pretty. Is that pointers I see? :-)

flx_uninit1.png
flx_uninit2.png

2: FLIC decoder: out-of-bounds writes in FLX_SS2 command
(CESA-2016-0006)
A near identical vulnerability to the one I exploited in the FLX_LC command. Caught and fixed by gstreamer upstream; patched in latest Ubuntu and Fedora updates.

3: FLIC decoder: integer underflow and subsequent wildness in FLX_BRUN command
(CESA-2016-0007)
Caught and fixed by gstreamer upstream; patched in latest Ubuntu and Fedora updates.

The faulty code:

 gulong count, lines, row;
...
   row = flxdec->hdr.width;
   while (row) {
     count = *data++;

     if (count > 0x7f) {
       /* literal run */
       count = 0x100 - count;
       row -= count;

       while (count--)
         *dest++ = *data++;

As you can see, no consideration was given as to whether any “run count” is in fact greater than the remaining number of pixels in a row. If that happens, integer underflow will occur on the row variable, which will become a very large (2^64 on 64-bit) positive integer. The loop will continue and suffer from buffer overflow on the output canvas and buffer overread on the input buffer. With no obvious way to exit the wild copy loop, exploitation is not favored.

4: FLIC decoder: out-of-bounds read with wild chunk size
(CESA-2016-0008)
In general, the FLIC decoder lacked any defences for reading off the end of the input buffer. Tons of bugs here. But a gstreamer rewrite to avoid raw pointer access and use buffer object APIs for reading and writing seems to have fixed this area reasonably. Bravo.

The most obvious way to demonstrate this was to declare a chunk in the file with a huge size. The input pointer would be incremented by this size and then the next chunk header read from a wild location, leading to a crash.

5: FLIC decoder: integer overflow in output buffer allocation
(CESA-2016-0009)
Looks fixed in the code; untested.

This bug is more interesting than it first appears. The faulty line of code is simple enough:

        out = gst_buffer_new_and_alloc (flxdec->size * 4);

But what type is flxdec->size?

struct _GstFlxDec {
 gsize size;

Where gsize appears to be like a size_t.
So this is one of those interesting cases this is a vulnerability on 32-bit but ok on 64-bit. On 64-bit, the largest value of size was 0xffff * 0xffff. Multiplying again by 4 cannot exceed the width of a 64-bit type. On 32-bit, integer overflow is possible and results in a memory corruption, although a fairly wild one!

6: vmnc decoder: wild read due to integer overflow
(CESA-2016-0010)
Fixed, possibly as an accidental side effect of fixing the more serious integer overflow CESA-2016-0002.

if (type == CURSOR_COLOUR) {
   datalen += rect->width * rect->height * dec->format.bytes_per_pixel * 2;
...
 if (len < datalen) {
   GST_LOG_OBJECT (dec, "Cursor data too short");

A simple integer overflow in calculating how much input data is required for a cursor of a given size.

Closing notes
Bugs bugs glorious bugs.


[1day] [PoC with $rip] Deterministic Linux heap grooming with huge allocations

Overview
In a previous blog post, I disclosed CESA-2016-0002, an 0day vulnerability (without exploit) in the vmnc decoder of the gstreamer media subsystem, which is installed by default in Fedora.

Because a Fedora fix was somewhat slow in coming, I decided to attempt to exploit this vulnerability. This would have to be another scriptless vulnerability. My previous scriptless exploit against the FLIC decoder showed that these can be tricky, at least for me.

TL;DR: I failed to get a full exploit going before Fedora issued a fix. At the time of writing, my Fedora 25 install just received gstreamer1-plugins-bad-free-1.10.1-1.fc25, which appears to fix the bug. However, Fedora 24 appears to remain unpatched.
Before stopping I did find another instance of a Linux allocator quirk that I think needs to be properly documented, discussed and fixed.

Recap of exploitation primitives
You can refer to the original post for a fuller description, but essentially, the vulnerability is an integer overflow in canvas allocation, leading to decoder commands operating on out of bounds memory. Because one of the decoder commands is “copy within canvas”, we have a very powerful exploitation primitive -- we can set both the source and the destination of the copy to be out of bounds, so we can start resolving ASLR by copying pointers around.

The main challenge in proceeding with exploitation is heap layout. If you run my original PoC vmnc_width_height_int_oflow.avi, you’ll get a crash with mappings something like this:

555555757000-55555645b000 rw-p 00000000 00:00 0 [heap]
7fffa0000000-7fffa006e000 rw-p 00000000 00:00 0
7fffa006e000-7fffa4000000 ---p 00000000 00:00 0
7fffa4000000-7fffa4022000 rw-p 00000000 00:00 0
7fffa4022000-7fffa8000000 ---p 00000000 00:00 0


The canvas dimensions for the video are 0xffff x 0x8001 x 16bpp, giving an allocation size of 65534 bytes. The crashing dereference address for the bad write off the end of the canvas is 0x7fffa40237fe. The corresponding mapping is highlighted with bold above. The immediate problem is there’s not too much of immediate interest inside the affected thread arena. The decoder metadata object -- often a very interesting target for corruption -- is in the previous thread arena (the one of size 0x6e000). On 64-bit, we don’t have enough range in our heap corruption primitive to “wrap around” the address space and target that. On 32-bit, this is likely feasible. But we’re going after 64-bit today.

Sure, there’s a bunch of pointers inside the affected thread arena, but going after any of them with a scriptless attack is likely going to be a headache. And it may require heap grooming. Today, we decline to proceed here.

Linux heap behavior to the rescue!
The way we at least start to try advancing reliable exploitation is by abusing deterministic behavior for huge allocations in the Linux glibc allocator.

By default, the glibc allocator will fall back to using mmap() to allocate very large allocations, and do so for some fairly large number of mappings if necessary. The parameters here are tunable but on 64-bit Linux, typically up to 65536 allocations will be allowed via mmap() and anything >= 128kB will use mmap().

Our integer overflow primitive is a straightforward 16-bit width x 16-bit height overflow, so it’s fairly easy to pick some values that when multiplied together result in an integer overflow but still a large allocation size.

So when glibc calls mmap() to service a large allocation, what happens? The code is glibc/malloc/malloc.c, sysmalloc(), with nb being the number of bytes requested:

#define MMAP(addr, size, prot, flags) \
__mmap((addr), (size), (prot), (flags)|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0)
 if (av == NULL
     || ((unsigned long) (nb) >= (unsigned long) (mp_.mmap_threshold)
         && (mp_.n_mmaps < mp_.n_mmaps_max)))
     size = ALIGN_UP (nb + SIZE_SZ, pagesize);
     /* Don't try if size wraps around 0 */
     if ((unsigned long) (size) > (unsigned long) (nb))
       {
         mm = (char *) (MMAP (0, size, PROT_READ | PROT_WRITE, 0));

Simple enough, and there’s even care to avoid integer overflow :-) Of interest is the first parameter to mmap(), the address parameter, which is passed as NULL. This is telling the kernel: “figure out a suitable address yourself”.

So how does the kernel decide where to put a mapping request? There are a few corner cases and complexities but for the cases we care about, we can look at the kernel x86_64 architecture specific default handling in arch_get_unmapped_area_topdown(). The algorithm is fairly simple: it picks the first address where the requested size fits, starting at the “mmap base” and working downwards in virtual address space. The “mmap base” is some random gap below the main process initial stack.

There are typically a few holes in the top down addresses space scan but if we cause a large allocation, we can make sure those holes are too small to fit, and that our allocation only fits below the recently allocated thread heap arena. Heap arenas are 64MB on 64-bit, and the way they are allocated can often leave huge 64MB address space holes between them. So a 128MB allocation should be nearly guaranteed to be placed just before the most recently allocated thread area.

Some tests and some possible exploit paths
Let’s now try a 16bpp (64k colors) file with width == 0xffff and height == 0x8400: vmnc_fault_with_large_alloc.avi. We cause an integer overflow and a 134150144 byte (~128MB) allocation and the mappings will look like this:

555555757000-55555645b000 rw-p 00000000 00:00 0 [heap]
7fff98010000-7fffa0000000 rw-p 00000000 00:00 0
7fffa0000000-7fffa006e000 rw-p 00000000 00:00 0
7fffa006e000-7fffa4000000 ---p 00000000 00:00 0
7fffa4000000-7fffa4022000 rw-p 00000000 00:00 0
7fffa4022000-7fffa8000000 ---p 00000000 00:00 0


Very useful! Our 128MB allocation -- highlighted in bold above -- is packed right up against a thread arena. It is also the thread arena that contains the decoder metadata, so one attack is to go after this. Let’s do that. The metadata object is defined like this:

typedef struct
{
 GstVideoDecoder parent;

 gboolean have_format;

 GstVideoCodecState *input_state;

 int framerate_num;
 int framerate_denom;

 struct Cursor cursor;
 struct RFBFormat format;
 guint8 *imagedata;
} GstVMncDec;
struct Cursor
{
 enum CursorType type;
 int visible;
 int x;
 int y;
 int width;
 int height;
 int hot_x;
 int hot_y;
 guint8 *cursordata;
 guint8 *cursormask;
};


There are possibilities here. The most obvious is to copy a valid pointer to a more interesting object on top of the imagedata value, which is the canvas pointer relative to which we can corrupt. The following demos apply to Fedora 25 with the v1.10.0-1 RPM versions of the various gstreamer1 packages.

Demo 1: $rip == 0x414141414141
Demo file: vmnc_rip_414141414141.avi. This crashes as noted when run in totem under gdb. It works because the GstVMncDec decoder object is consistently allocated at offset 0xb840 into the thread arena directly after our massive allocation. Therefore, we can use constant offsets in our PoC file to:
  1. Copy GstVMncDec::parent::srcpad on top of GstVMncDec::imagedata, causing the next canvas write to be relative to a GstPad object. (Note that the GstVMncDec object and the GstPad object are in different thread heap arenas, and the address delta between the arenas is not consistent, so this is a powerful primitive.)
  2. Write 0x414141414141 on top of GstPad::finalize_hook, a function pointer that will be called later.

In the world of scriptless exploits, pointing the instruction pointer to a known static constant might look impressive, but it’s worlds away from a successful exploit. Accordingly, to prove we’ve got just a little more control than that:

Demo 2: $rip == 0x7fffa400bdf0
Demo file: vmnc_rip_is_heap.avi. This crashes similarly to as noted. This is demonstrating that our powerful copy primitive can, to an extent, resolve ASLR. In this instance, we’ve copied a heap pointer on top of a function pointer to show the level of control we have. This crashes like this:

Thread 18 "multiqueue0:src" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffb271d700 (LWP 3336)]
0x00007fffa400bdf0 in ?? ()
(gdb) p $rip
$10 = (void (*)()) 0x7fffa400bdf0
(gdb) x/1s $rip
0x7fffa400bdf0: "src"


We’ve pointed the instruction pointer to a heap chunk that contains a string. The crash is because my processor supports a non-execute bit :-) To proceed with an exploit, we’d need to choose a different path, but we’ve demonstrated a certain level of control beyond blindly nuking a function pointer.

Unfortunately, a reliable exploit may not be possible with this path in general. Although the GstVMncDec object is reliably placed at offset 0xb840 in its arena when the exploit is run under gdb, the arena layout jiggles around a little bit from run to run when run normally. The reason has not been investigated.

Demo 3: reliable crash in malloc_consolidate with $rbx == 0x41414141
Demo file: vmnc_malloc_consolidate_41414141.avi. In order to try and get a more reliable start to my exploit, I decided to target malloc arena metadata. This can be done very reliably because it occurs right at the beginning of an arena’s mapping. It is not subject to heap jiggle!

Thread 18 "task2" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffb271d700 (LWP 3599)]
0x00007fffefc13008 in malloc_consolidate () from /lib64/libc.so.6
(gdb) i r
rbx            0x41414141       1094795585
(gdb) disass $rip-20,$rip+20
Dump of assembler code from 0x7fffefc12ff4 to 0x7fffefc1301c:
=> 0x00007fffefc13008 : mov    0x8(%rbx),%rax


The above effect is achieved by writing a pointer value on top of a malloc() bin pointer. When this bin is touched, a deterministic crash is achieved. The reliability here is strong, but I didn’t get far along with an exploit. The challenge is to find a primitive that will follow pointer chains. In order to “break out” of the arenas to something more interesting, it is necessary to follow the linked list of arena pointers until you find main_area, inside glibc. The likely path forward to do this would be to iteratively copy the glibc malloc_state->next pointer on top of something else, such as one of the bin pointers, or malloc_state->top, and then abusing side effects from malloc() and free() calls made in the decode loop. Proceeding in this manner will require evading glibc’s various internal corruption checks, but we have the ability to edit memory structures and copy pointers around, so it is well within the bounds of possibility.

Closing notes
This is not the first time that highly deterministic Linux mmap() behavior has been taken advantage of. In fact, just last week, Google Project Zero published a wonderful exploit against Android’s shared memory handling. Amongst other tricks, the deterministic behavior of mmap() placement was abused in order to get a favorable virtual memory layout. What is interesting is that this was on the 64-bit ARM architecture. Whereas an argument could be made that 32-bit address space is so limited that fragmentation is a concern preventing stronger randomization, 64-bit address spaces provide an opportunity to place mappings a little less predictably.

On platforms with decent sized user address spaces (x86_64, 47 bits and aarch64, 39 bits), I think it’s time to randomize unhinted mapping requests. There are concerns to talk through such as fragmentation, page table memory bloat and TLB impact. However, some significant software already implements virtual address mapping randomization in user space. This includes Adobe Flash, as well as Google Chrome’s main allocator, PartitionAlloc. The concept is proven.


Accordingly, paging Kees Cook of the kernel hardening project…. :-)