#### The Story of Meltdown and Spectre

Jann Horn & Daniel Gruss

May 17, 2018

- Jann Horn
- Google Project Zero
- 🔽 jannh@google.com

- Daniel Gruss
- Post-Doc @ Graz University Of Technology
- 🎔 @lavados
- 🗹 daniel.gruss@iaik.tugraz.at









#### 1337 4242

#### **FOOD CACHE**

#### Revolutionary concept!

Store your food at home, never go to the grocery store during cooking.

Can store **ALL** kinds of food.

ONLY TODAY INSTEAD OF \$1,300



ORDER VIA PHONE: +555 12345



























#### access

















Cache Hits

Cache Hits Cache Misses







7. Serve with cooked and peeled potatoes







## Wait for an hour

6



## Wait for an hour

# LATENCY

1. Wash and cut vegetables

2. Pick the basil leaves and set aside

3. Heat 2 tablespoons of oil in a pan

4. Fry vegetables until golden and softened



1. Wash and cut vegetables

### Parallelize

2. Pick the basil leaves and set aside

3. Heat 2 tablespoons of oil in a pan

4. Fry vegetables until golden and softened






segfault at ffffffff81a000e0 ip 000000000400535
sp 00007ffce4a80610 error 5 in reader



segfault at ffffffff81a000e0 ip 000000000400535
sp 00007ffce4a80610 error 5 in reader

• Kernel addresses are not accessible



segfault at ffffffff81a000e0 ip 000000000400535
sp 00007ffce4a80610 error 5 in reader

- Kernel addresses are not accessible
- Are privilege checks also done when executing instructions out of order?

• Adapted code



```
1 *(volatile char*)0;
2 array[84 * 4096] = 0; // unreachable
```

• Adapted code



```
1 *(volatile char*)0;
2 array[84 * 4096] = 0; // unreachable
```

• Static code analyzer is not happy

```
1 warning: Dereference of null pointer
2 *(volatile char*)0;
```

• Flush+Reload over all pages of the array



• "Unreachable" code line was actually executed



• Flush+Reload over all pages of the array



- "Unreachable" code line was actually executed
- Exception was only thrown afterwards





• Combine the two things

```
2 array[data * 4096] = 0;
```



• Combine the two things

• Then check whether any part of array is cached



• Flush+Reload over all pages of the array



• Index of cache hit reveals data







- Index of cache hit reveals data
- Permission check is in some cases not fast enough



|      |       |       |         |          | Terminal | × |
|------|-------|-------|---------|----------|----------|---|
| File | Edit  | View  | Search  | Terminal | Help     |   |
| msch | warz@ | lab06 | :~/Docu | uments\$ |          |   |
|      |       |       |         |          |          |   |
|      |       |       |         |          |          |   |
|      |       |       |         |          |          |   |
|      |       |       |         |          |          |   |
|      |       |       |         |          |          |   |

## Leaking Passwords from your Password Manager

|                                                             |                    |             | f94b76a0: 6 |          |          |          |       |       |       |        |                   |
|-------------------------------------------------------------|--------------------|-------------|-------------|----------|----------|----------|-------|-------|-------|--------|-------------------|
|                                                             |                    |             | f94b76c0: 0 |          |          |          |       |       |       |        | pR.k              |
|                                                             |                    |             | 194b76d0: 1 |          |          |          |       | XX XX |       |        |                   |
|                                                             |                    |             |             |          |          |          |       |       |       |        | 1                 |
|                                                             |                    | _           | f94b76f0; ; |          |          |          |       |       |       |        |                   |
| Saved Logins                                                | ×                  | f94b7700: 3 | 38 e5 e5    | e5 e5 e5 | e5 e5 e5 |          |       |       |       | 8      |                   |
|                                                             |                    |             | f94b7710: 7 | 70 52 Ъ8 | 6b 96 7f | XX XX XX | XX XX | XX XX | XX XX | (XX)   | pR.k              |
| Search                                                      |                    | ٩           |             |          |          |          |       |       |       |        | L                 |
|                                                             |                    | ~           |             |          |          |          |       |       |       |        | [J                |
| Logins for the following sites are stored on your computer: |                    |             |             |          |          |          |       |       |       |        |                   |
|                                                             |                    |             | f94b7750: J |          |          |          |       |       |       |        | inst              |
| Site   Username Pass                                        | word Last Changed  | 173         | f94b7760: 6 |          |          |          |       |       |       |        | a_0203<br> pR.}(  |
|                                                             | owd0 28. Dez. 2017 |             |             |          |          |          |       |       |       |        | 1 pR. J. (        |
| https://accounts.go meltdown@gmail.com secret               | pwd0 28. Dez. 2017 |             | f94b7790: 1 |          |          |          |       |       |       |        | I                 |
| https://signin.ebay meltdown@gmail.com Dolphi               | n18 28. Dez. 2017  |             |             |          |          |          |       |       |       |        | 1                 |
| https://www.amaz meltdown@gmail.com hunter                  | 2 28. Dez. 2017    |             |             |          |          |          |       |       |       |        |                   |
| - 1                                                         |                    |             |             |          |          |          |       |       |       |        | etpwd0            |
| https://www.faceb meltdown@facebook fb1234                  | 28. Dez. 2017      |             |             |          |          |          |       |       |       |        | 10                |
| https://www.instag meltdown@gmail.com insta_0               | 203 28. Dez. 2017  |             |             |          |          |          |       |       |       |        | 1                 |
| y nups.//www.instag metuown@ymail.com insta_t               | 20. 042. 2017      |             | f94b7800:   |          |          |          |       |       |       |        |                   |
|                                                             |                    |             |             |          |          |          |       |       |       |        | https://addons.c  |
|                                                             |                    |             |             |          |          |          |       |       |       |        | dn.mozilla.net/u  |
| Remove Remove All                                           | Hide Passwor       | rds         |             |          |          |          |       |       |       |        | ser-media/addon_  |
| 2                                                           |                    |             | f94b7840: 6 | 59 63 6f | 6e 73 2f | 33 35 34 | 2f 33 | 35 34 | 33 39 | 39     | licons/354/354399 |
|                                                             | Clos               | e           |             |          |          |          |       |       |       |        | -64.png?modified  |
|                                                             |                    |             | f94b7860: 3 | 3d 31 34 | 35 32 32 | 34 34 38 | 31 35 | YY YY | XX XX | ( XX ) | =1452244815       |





**KAISER** /'kAIzə/ 1. [german] Emperor, ruler of an empire 2. largest penguin, emperor penguin

Kernel Address Isolation to have Side channels Efficiently Removed

14



• We published KAISER in May 2017





- We published KAISER in May 2017
- Intel and others improved and merged it into Linux as KPTI (Kernel Page Table Isolation)



- We published KAISER in May 2017
- Intel and others improved and merged it into Linux as KPTI (Kernel Page Table Isolation)
- Microsoft implemented similar concept in Windows 10



- We published KAISER in May 2017
- Intel and others improved and merged it into Linux as KPTI (Kernel Page Table Isolation)
- Microsoft implemented similar concept in Windows 10
- Apple implemented it in macOS 10.13.2 and called it "Double Map"



- We published KAISER in May 2017
- Intel and others improved and merged it into Linux as KPTI (Kernel Page Table Isolation)
- Microsoft implemented similar concept in Windows 10
- Apple implemented it in macOS 10.13.2 and called it "Double Map"
- All share the same idea: switching address spaces on context switch

## Meltdown and Spectre







## **Meltdown and Spectre**





**SPECTRE** 

## if <access in bounds>



- processor predicts outcomes of branches
- predictions are based on previous behavior
- predictions help with executing more things in parallel



- processor predicts outcomes of branches
- predictions are based on previous behavior
- predictions help with executing more things in parallel



- processor predicts outcomes of branches
- predictions are based on previous behavior
- predictions help with executing more things in parallel



- processor predicts outcomes of branches
- predictions are based on previous behavior
- predictions help with executing more things in parallel



- processor predicts outcomes of branches
- predictions are based on previous behavior
- predictions help with executing more things in parallel



- processor predicts outcomes of branches
- predictions are based on previous behavior
- predictions help with executing more things in parallel



- processor predicts outcomes of branches
- predictions are based on previous behavior
- predictions help with executing more things in parallel



- processor predicts outcomes of branches
- predictions are based on previous behavior
- predictions help with executing more things in parallel



- processor predicts outcomes of branches
- predictions are based on previous behavior
- predictions help with executing more things in parallel



- processor predicts outcomes of branches
- predictions are based on previous behavior
- predictions help with executing more things in parallel



- processor predicts outcomes of branches
- predictions are based on previous behavior
- predictions help with executing more things in parallel

Spectre Variant 1

index = 0;








index = 1;









index = 2;









index = 
$$3;$$









index = 
$$4;$$









index = 5;









index = 
$$6;$$









# Branch Prediction: Other Patterns (Untested)

• type check

### Branch Prediction: Other Patterns (Untested)

- type check
- out-of-bounds access into object table with function pointers

```
1 struct foo_ops {
   void (*bar)(void);
2
3 };
4 struct foo {
    struct foo_ops *ops;
5
<sub>6</sub> };
7
8 struct foo **foo_array;
9 size_t foo_array_len;
10
11 void do_bar(size_t idx) {
    if (idx >= foo_array_len) return;
12
    foo_array[idx]->ops->bar();
13
14 }
```

```
1 kvm_x86_ops->handle_external_intr(vcpu);
2
3 struct kvm_x86_ops *kvm_x86_ops;
4
5 static struct kvm_x86_ops vmx_x86_ops = {
6 [...]
   .handle_external_intr =
7
    vmx_handle_external_intr ,
8 [...]
· }:
 (code simplified)
```

```
    instruction stream
    does not contain
    target address
```

```
1 kvm_x86_ops->handle_external_intr(vcpu);
3 struct kvm_x86_ops *kvm_x86_ops;
5 static struct kvm_x86_ops vmx_x86_ops = {
6 [...]
   .handle_external_intr =
7
    vmx_handle_external_intr ,
8 [...]
· }:
 (code simplified)
```

instruction stream

does not contain

target address

from memory

```
1 kvm_x86_ops->handle_external_intr(vcpu);
                     2
                     3 struct kvm_x86_ops *kvm_x86_ops;
                     4
                     5 static struct kvm_x86_ops vmx_x86_ops = {
                    6 [...]
• target must be fetched
                         .handle_external_intr =
                     7
                          vmx_handle_external_intr ,
                    8 [...]
                    9 };
                      (code simplified)
```

target address

from memory

```
1 kvm_x86_ops->handle_external_intr(vcpu);
                      2

    instruction stream

                      3 struct kvm_x86_ops *kvm_x86_ops;
 does not contain
                      4
                      5 static struct kvm_x86_ops vmx_x86_ops = {
                      6 [...]
• target must be fetched
                          .handle_external_intr =
                      7
                           vmx_handle_external_intr ,
• CPU will speculate
                      8 [...]
 about branch target
                      9 };
```

(code simplified)





• state is stored in a Branch Target Buffer (BTB)



• state is stored in a Branch Target Buffer (BTB)

• indexed and tagged by (on Intel Haswell):


- state is stored in a Branch Target Buffer (BTB)
  - indexed and tagged by (on Intel Haswell):
    - partial virtual address



- state is stored in a Branch Target Buffer (BTB)
  - indexed and tagged by (on Intel Haswell):
    - partial virtual address
    - recent branch history fingerprint [sometimes]



- state is stored in a Branch Target Buffer (BTB)
  - indexed and tagged by (on Intel Haswell):
    - partial virtual address
    - recent branch history fingerprint [sometimes]
- allowed to be wrong



- state is stored in a Branch Target Buffer (BTB)
  - indexed and tagged by (on Intel Haswell):
    - partial virtual address
    - recent branch history fingerprint [sometimes]
- allowed to be wrong

- state is stored in a Branch Target Buffer (BTB)
  - indexed and tagged by (on Intel Haswell):
    - partial virtual address
    - recent branch history fingerprint [sometimes]
- allowed to be wrong
- often not tagged by security domain



- state is stored in a Branch Target Buffer (BTB)
  - indexed and tagged by (on Intel Haswell):
    - partial virtual address
    - recent branch history fingerprint [sometimes]
- allowed to be wrong
- often not tagged by security domain
- $\rightarrow$  Break ASLR across security domains ("Jump over ASLR" paper)



• Why not also the other way round?



- Why not also the other way round?
- Inject misspeculation to controlled addresses across security domains



- Why not also the other way round?
- Inject misspeculation to controlled addresses across security domains
- Attack goal: Leak host memory from inside a KVM guest

• direct branches:



- direct branches:
  - bits 0-30 of the source go into BTB indexing function



- direct branches:
  - bits 0-30 of the source go into BTB indexing function
  - BTB collisions possible between different security contexts



- direct branches:
  - bits 0-30 of the source go into BTB indexing function
  - BTB collisions possible between different security contexts
- predictions are calculated for 32-byte blocks of source instructions



- direct branches:
  - bits 0-30 of the source go into BTB indexing function
  - BTB collisions possible between different security contexts
- predictions are calculated for 32-byte blocks of source instructions
- conditional branches: predicts both taken/not taken and target address



- direct branches:
  - bits 0-30 of the source go into BTB indexing function
  - BTB collisions possible between different security contexts
- predictions are calculated for 32-byte blocks of source instructions
- conditional branches: predicts both taken/not taken and target address
- indirect branches: two prediction modes:
  - "monotonic target"
  - "targets that vary in accordance with recent program behavior"





(explicit execution barriers omitted from diagram)

hyperthreaded



- hyperthreaded
- same code



- hyperthreaded
- same code
- same memory layout (no ASLR)



- hyperthreaded
- same code
- same memory layout (no ASLR)
- different indirect call targets



- hyperthreaded
- same code
- same memory layout (no ASLR)
- different indirect call targets
- process 1: Flush+Reload loop (always miss)



- hyperthreaded
- same code
- same memory layout (no ASLR)
- different indirect call targets
- process 1: Flush+Reload loop (always miss)
- target injection from process 2 can cause extra load

• shortcuts for minimal PoC



- shortcuts for minimal PoC
- BTB structure from prior research ("Jump over ASLR" paper)



- shortcuts for minimal PoC
- BTB structure from prior research ("Jump over ASLR" paper)
  - Source address: low 31 bits



- shortcuts for minimal PoC
- BTB structure from prior research ("Jump over ASLR" paper)
  - Source address: low 31 bits
  - ... direct branches only



- shortcuts for minimal PoC
- BTB structure from prior research ("Jump over ASLR" paper)
  - Source address: low 31 bits
  - ... direct branches only
- collide low 31 bits of source address, assume relative target



- shortcuts for minimal PoC
- BTB structure from prior research ("Jump over ASLR" paper)
  - Source address: low 31 bits
  - ... direct branches only
- collide low 31 bits of source address, assume relative target
- $\rightarrow$  leak rate:  $\approx 6$  bits/second



- shortcuts for minimal PoC
- BTB structure from prior research ("Jump over ASLR" paper)
  - Source address: low 31 bits
  - ... direct branches only
- collide low 31 bits of source address, assume relative target
- $\rightarrow$  leak rate:  $\approx$  6 bits/second almost all the injection attempts fail!



- shortcuts for minimal PoC
- BTB structure from prior research ("Jump over ASLR" paper)
  - Source address: low 31 bits
  - ... direct branches only
- collide low 31 bits of source address, assume relative target
- $\rightarrow$  leak rate:  $\approx$  6 bits/second almost all the injection attempts fail!
- $\rightarrow\,$  CPU distinguishes injections and hypervisor execution



- shortcuts for minimal PoC
- BTB structure from prior research ("Jump over ASLR" paper)
  - Source address: low 31 bits
  - ... direct branches only
- collide low 31 bits of source address, assume relative target
- $\rightarrow$  leak rate:  $\approx$  6 bits/second almost all the injection attempts fail!
- $\rightarrow\,$  CPU distinguishes injections and hypervisor execution
- $\rightarrow$  Theory:



- shortcuts for minimal PoC
- BTB structure from prior research ("Jump over ASLR" paper)
  - Source address: low 31 bits
  - ... direct branches only
- collide low 31 bits of source address, assume relative target
- $\rightarrow$  leak rate:  $\approx$  6 bits/second almost all the injection attempts fail!
- $\rightarrow\,$  CPU distinguishes injections and hypervisor execution
- $\rightarrow$  Theory:
  - injection only works for "monotonic target" prediction



- shortcuts for minimal PoC
- BTB structure from prior research ("Jump over ASLR" paper)
  - Source address: low 31 bits
  - ... direct branches only
- collide low 31 bits of source address, assume relative target
- $\rightarrow$  leak rate:  $\approx$  6 bits/second almost all the injection attempts fail!
- $\rightarrow\,$  CPU distinguishes injections and hypervisor execution
- $\rightarrow$  Theory:
  - injection only works for "monotonic target" prediction
  - CPU prefers history-based prediction



- shortcuts for minimal PoC
- BTB structure from prior research ("Jump over ASLR" paper)
  - Source address: low 31 bits
  - ... direct branches only
- collide low 31 bits of source address, assume relative target
- $\rightarrow$  leak rate:  $\approx 6$  bits/second almost all the injection attempts fail!
- $\rightarrow\,$  CPU distinguishes injections and hypervisor execution
- $\rightarrow$  Theory:
  - injection only works for "monotonic target" prediction
  - CPU prefers history-based prediction
  - injection works when history-based prediction fails due to system noise causing evictions



#### history-based prediction

- branch source address might be used
- preceding branches are used
  - which information?
  - how many branches?
  - which kinds of branches?

reverse this sufficiently for injections?

fallback

#### "monotonic target" prediction

uses branch source address for lookup

injection seems to work, but not usually used

#### **Predictor Reversing: History Length**


#### Predictor Reversing: History Length



- pprox 26 branches stored
- measurements get weird around the boundary [and are not yet entirely correct]







#### on Haswell:

• taken conditional branch  $\checkmark$ 



- taken conditional branch  $\checkmark$
- not-taken conditional branch X



- taken conditional branch  $\checkmark$
- not-taken conditional branch X
- unconditional direct jump



- taken conditional branch  $\checkmark$
- not-taken conditional branch X
- unconditional direct jump
- unconditional indirect branch  $\checkmark$



- taken conditional branch  $\checkmark$
- not-taken conditional branch X
- unconditional direct jump
- unconditional indirect branch  $\checkmark$
- RET 🗸



- taken conditional branch  $\checkmark$
- not-taken conditional branch X
- unconditional direct jump
- unconditional indirect branch  $\checkmark$
- RET 🗸
- IRETQ 🗡

### Address Bits in History



### Address Bits in History



 $\rightarrow$  only low 20 bits of any address affect history

#### Predictor Reversing: Branch Type influence?



• kinda like ROP



- kinda like ROP
- use RET instructions to add history entries
  - RET reads a target from RSP, jumps to the target, and advances RSP in one byte
  - RET target is fed into predictor as target
  - RET target is always an IRETQ



- kinda like ROP
- use RET instructions to add history entries
  - RET reads a target from RSP, jumps to the target, and advances RSP in one byte
  - RET target is fed into predictor as target
  - RET target is always an IRETQ
- use IRETQ instructions to move between RET instructions
  - IRETQ target is fed into predictor as source (by the following RET)
  - IRETQ target, apart from the last one, is always RET

| Ť                                     | IRETQ frame |               |
|---------------------------------------|-------------|---------------|
|                                       | RET frame   |               |
|                                       | IRETQ frame |               |
|                                       |             |               |
|                                       | RET frame   |               |
|                                       | IRETQ frame |               |
|                                       | RET frame   | creates one   |
|                                       | IRETQ frame | history entry |
| pivot stack to here;<br>execute IRETQ |             |               |



• a predictor with one bit of history (taken / not taken) per conditional branch [Agner Fog]

- a predictor with one bit of history (taken / not taken) per conditional branch [Agner Fog]
- good: compact storage (only one bit per history entry)

- a predictor with one bit of history (taken / not taken) per conditional branch [Agner Fog]
- good: compact storage (only one bit per history entry)
- mismatch: Haswell doesn't seem to store not-taken branches at all

- a predictor with one bit of history (taken / not taken) per conditional branch [Agner Fog]
- good: compact storage (only one bit per history entry)
- mismatch: Haswell doesn't seem to store not-taken branches at all
  - must still be able to differentiate between "taken, not taken" and "not taken; taken"

- a predictor with one bit of history (taken / not taken) per conditional branch [Agner Fog]
- good: compact storage (only one bit per history entry)
- mismatch: Haswell doesn't seem to store not-taken branches at all
  - must still be able to differentiate between "taken, not taken" and "not taken; taken"
  - address of taken branch is probably used

- a predictor with one bit of history (taken / not taken) per conditional branch [Agner Fog]
- good: compact storage (only one bit per history entry)
- mismatch: Haswell doesn't seem to store not-taken branches at all
  - must still be able to differentiate between "taken, not taken" and "not taken; taken"
  - address of taken branch is probably used
- mismatch: seems to differentiate between many targets for a single history entry

- a predictor with one bit of history (taken / not taken) per conditional branch [Agner Fog]
- good: compact storage (only one bit per history entry)
- mismatch: Haswell doesn't seem to store not-taken branches at all
  - must still be able to differentiate between "taken, not taken" and "not taken; taken"
  - address of taken branch is probably used
- mismatch: seems to differentiate between many targets for a single history entry
- good: naturally forgets about old branches (shifted out)

- a predictor with one bit of history (taken / not taken) per conditional branch [Agner Fog]
- good: compact storage (only one bit per history entry)
- mismatch: Haswell doesn't seem to store not-taken branches at all
  - must still be able to differentiate between "taken, not taken" and "not taken; taken"
  - address of taken branch is probably used
- mismatch: seems to differentiate between many targets for a single history entry
- good: naturally forgets about old branches (shifted out)



- goal: read from arbitrary host-kernel-virtual addresses
- attacker type: controls guest ring 0; knows precise host kernel build



- goal: read from arbitrary host-kernel-virtual addresses
- attacker type: controls guest ring 0; knows precise host kernel build
- misdirect first indirect call with memory operand after guest exit
  - provides speculative RIP control
  - requires breaking hypervisor code ASLR



• goal: read from arbitrary host-kernel-virtual addresses

- attacker type: controls guest ring 0; knows precise host kernel build
- misdirect first indirect call with memory operand after guest exit
  - provides speculative RIP control
  - requires breaking hypervisor code ASLR
- flush L3 cache line containing memory operand
  - requires L3 eviction sets (for long speculation)
  - requires identifying correct eviction set

• goal: read from arbitrary host-kernel-virtual addresses

- attacker type: controls guest ring 0; knows precise host kernel build
- misdirect first indirect call with memory operand after guest exit
  - provides speculative RIP control
  - requires breaking hypervisor code ASLR
- flush L3 cache line containing memory operand
  - requires L3 eviction sets (for long speculation)
  - requires identifying correct eviction set
- use gadget to call into BPF interpreter
  - requires register control: caller-saved registers stay intact after guest exit
  - requires data at known address: locate host physmap alias of guest memory

• goal: read from arbitrary host-kernel-virtual addresses

- attacker type: controls guest ring 0; knows precise host kernel build
- misdirect first indirect call with memory operand after guest exit
  - provides speculative RIP control
  - requires breaking hypervisor code ASLR
- flush L3 cache line containing memory operand
  - requires L3 eviction sets (for long speculation)
  - requires identifying correct eviction set
- use gadget to call into BPF interpreter
  - requires register control: caller-saved registers stay intact after guest exit
  - requires data at known address: locate host physmap alias of guest memory
- use BPF bytecode to read arbitrary host data and leak it

 leak host code address bits from history buffer and branch target buffer (BTB) [dump\_hyper\_bhb, hyper\_btb\_brute]



Ê→Ē→₿ Ê→₽→₽

- leak host code address bits from history buffer and branch target buffer (BTB) [dump\_hyper\_bhb, hyper\_btb\_brute]
- identify L3 cache sets using brute-force timing-based testing of eviction sets [cacheset\_identify]

⋛<del>→</del>₽→₿ ⋳⋺⋳⋺⋤

- leak host code address bits from history buffer and branch target buffer (BTB) [dump\_hyper\_bhb, hyper\_btb\_brute]
- identify L3 cache sets using brute-force timing-based testing of eviction sets [cacheset\_identify]
- determine physical address of guest page using "load from physical address" gadget and timing [find\_phys\_mapping\_kassist]

⋛<del>∙</del><u></u> ┣҅∙₽∙Σ

- leak host code address bits from history buffer and branch target buffer (BTB) [dump\_hyper\_bhb, hyper\_btb\_brute]
- identify L3 cache sets using brute-force timing-based testing of eviction sets [cacheset\_identify]
- determine physical address of guest page using "load from physical address" gadget and timing [find\_phys\_mapping\_kassist]
- determine address of physmap region using memory load gadget and timing [find\_page\_offset]

┋┿═┾⋚ ┇╈<u></u>ᢙᢦ᠌

- leak host code address bits from history buffer and branch target buffer (BTB) [dump\_hyper\_bhb, hyper\_btb\_brute]
- identify L3 cache sets using brute-force timing-based testing of eviction sets [cacheset\_identify]
- determine physical address of guest page using "load from physical address" gadget and timing [find\_phys\_mapping\_kassist]
- determine address of physmap region using memory load gadget and timing [find\_page\_offset]
- select L3 set containing the legitimate indirect call target using brute force [select\_set]

## Leaking host address bits (BHB)



approach: dump history buffer contents

- fill history buffer with state from VMCALL
- shift out some of VMCALL state by padding history buffer with zeroes; leaving 2 bits of unknown information
- compare history buffer against controlled history buffer using misprediction

### Leaking host address bits (BTB)



approach: execute an indirect call and observe where the CPU jumps


approach: execute an indirect call and observe where the CPU jumps

- perform VM exit (VMCALL / IN) to fill BTB with host jump addresses
- randomize history buffer to force predictor fallback
- execute CALL with mispredicted target



approach: execute an indirect call and observe where the CPU jumps

- perform VM exit (VMCALL / IN) to fill BTB with host jump addresses
- randomize history buffer to force predictor fallback
- execute CALL with mispredicted target
- place cache-signaling gadgets at all possible targets; two possible signals
- perform binary search over call targets

Find host-physical address:

- poison BTB and evict function pointer from L1D+L2  $\rightarrow$  misspeculated host code
- Use physical-load gadget (see right) to brute-force physical address
  - test guesses with *Flush+Reload*

```
1 ; controlled r8, r9
2 mov rax,r8
3 movsxd r15,r9d
4 ; load page_offset_base
5 mov r8,QWORD PTR [r15*8-0x7e594c40]
6 lea rdi,[rax+r8*1]
7 ; page_offset_base + phys_addr_guess
8 mov r12,QWORD PTR [r8+rax*1+0xf8]
```

Find host-virtual address:

- physmap is 1GiB-aligned
- bruteforce physmap base address
- test guesses by attempting to access page\_offset\_base + phys\_guest\_page\_address



1. place Spectre gadget BPF bytecode in guest memory



- 1. place Spectre gadget BPF bytecode in guest memory
- 2. "Flush" leak area



- 1. place Spectre gadget BPF bytecode in guest memory
- 2. "Flush" leak area
- 3. evict call target



- 1. place Spectre gadget BPF bytecode in guest memory
- 2. "Flush" leak area
- 3. evict call target
- 4. mistrain branch predictor to BPF interpreter call gadget



- 1. place Spectre gadget BPF bytecode in guest memory
- 2. "Flush" leak area
- 3. evict call target
- 4. mistrain branch predictor to BPF interpreter call gadget
- 5. execute VMCALL



- 1. place Spectre gadget BPF bytecode in guest memory
- 2. "Flush" leak area
- 3. evict call target
- 4. mistrain branch predictor to BPF interpreter call gadget
- 5. execute VMCALL
- 6. "Reload" leak area  $\rightarrow$  obtain value

# Defenses



• Trivial approach: disable speculative execution



- Trivial approach: disable speculative execution
- No wrong speculation if there is no speculation



- Trivial approach: disable speculative execution
- No wrong speculation if there is no speculation
- Problem: massive performance hit!



- Trivial approach: disable speculative execution
- No wrong speculation if there is no speculation
- Problem: massive performance hit!
- Also: How to disable it?



- Trivial approach: disable speculative execution
- No wrong speculation if there is no speculation
- Problem: massive performance hit!
- Also: How to disable it?
- Speculative execution is deeply integrated into CPU





• Workaround: insert instructions stopping speculation



- Workaround: insert instructions stopping speculation
- $\rightarrow\,$  insert after every bounds check



- Workaround: insert instructions stopping speculation
- $\rightarrow\,$  insert after every bounds check
  - ×86: LFENCE, ARM: CSDB



- Workaround: insert instructions stopping speculation
- $\rightarrow\,$  insert after every bounds check
  - ×86: LFENCE, ARM: CSDB
- Available on all Intel CPUs, retrofitted to existing ARMv7 and ARMv8

#### Spectre Variant 1 Mitigations

```
// Unprotected
int array[N];
int get_value(unsigned int n) {
  int tmp;
  if (n < N) {
   tmp = array[n]
  } else {
    tmp = FAIL;
  }
  return tmp;
}
```

#### Spectre Variant 1 Mitigations

```
// Unprotected
int array[N];
int get_value(unsigned int n) {
  int tmp;
  if (n < N) {
   tmp = array[n]
  } else {
    tmp = FAIL;
  }
 return tmp;
}
```

```
// Protected
int array[N];
int get_value(unsigned int n) {
  int *lower = array;
  int *ptr = array + n;
  int *upper = array + N;
  return
   ___builtin_load_no_speculate
    (ptr, lower, upper, FAIL);
}
```

• Indirect Branch Restricted Speculation (IBRS):

0-1-0-1-0 1-0-1-0-1 0-1-0-1-0 1-0-1-0-1

- Indirect Branch Restricted Speculation (IBRS):
  - Prevents branches at lower privilege level from influencing branches at higher privilege level

- Indirect Branch Restricted Speculation (IBRS):
  - Prevents branches at lower privilege level from influencing branches at higher privilege level
  - Must be re-enabled on every switch to higher privileges

- Indirect Branch Restricted Speculation (IBRS):
  - Prevents branches at lower privilege level from influencing branches at higher privilege level
  - Must be re-enabled on every switch to higher privileges
- Indirect Branch Predictor Barrier (IBPB):

- Indirect Branch Restricted Speculation (IBRS):
  - Prevents branches at lower privilege level from influencing branches at higher privilege level
  - Must be re-enabled on every switch to higher privileges
- Indirect Branch Predictor Barrier (IBPB):
  - Flush branch-target buffer

୦ାଚାଚ Iଚାଚା ୦ାଚାଚ Iଚାଚା

- Indirect Branch Restricted Speculation (IBRS):
  - Prevents branches at lower privilege level from influencing branches at higher privilege level
  - Must be re-enabled on every switch to higher privileges
- Indirect Branch Predictor Barrier (IBPB):
  - Flush branch-target buffer
- Single Thread Indirect Branch Predictors (STIBP):

୦ାଚାଚ ାଚାଚା ୦ାଚାଚ ାଚାଚା

- Indirect Branch Restricted Speculation (IBRS):
  - Prevents branches at lower privilege level from influencing branches at higher privilege level
  - Must be re-enabled on every switch to higher privileges
- Indirect Branch Predictor Barrier (IBPB):
  - Flush branch-target buffer
- Single Thread Indirect Branch Predictors (STIBP):
  - Isolates branch prediction state between two hyperthreads

Retpoline (compiler extension)





 $\rightarrow\,$  always predict to enter an endless loop



```
push <call_target>
  call 1f
2:
                    ; speculation will continue here
  lfence
                    : speculation barrier
                    ; endless loop
  jmp 2b
1:
  lea 8(%rsp), %rsp ; restore stack pointer
  ret
                    ; the actual call to <call_target>
```

- $\rightarrow$  always predict to enter an endless loop
- instead of the correct (or wrong) target function



```
push <call_target>
call 1f
2: ; speculation will continue here
lfence ; speculation barrier
jmp 2b ; endless loop
1:
lea 8(%rsp), %rsp ; restore stack pointer
ret ; the actual call to <call_target>
```

- $\rightarrow\,$  always predict to enter an endless loop
- instead of the correct (or wrong) target function  $\rightarrow$  performance?



```
push <call_target>
call 1f
2: ; speculation will continue here
lfence ; speculation barrier
jmp 2b ; endless loop
1:
lea 8(%rsp), %rsp ; restore stack pointer
ret ; the actual call to <call_target>
```

- $\rightarrow\,$  always predict to enter an endless loop
- instead of the correct (or wrong) target function  $\rightarrow$  performance?
- On Skylake or newer:



```
Retpoline (compiler extension)
```

```
push <call_target>
call 1f
2: ; speculation will continue here
lfence ; speculation barrier
jmp 2b ; endless loop
1:
lea 8(%rsp), %rsp ; restore stack pointer
ret ; the actual call to <call_target>
```

- $\rightarrow\,$  always predict to enter an endless loop
  - instead of the correct (or wrong) target function  $\rightarrow$  performance?
- On Skylake or newer:
  - ret may fall-back to the BTB for prediction



```
push <call_target>
call 1f
2: ; speculation will continue here
lfence ; speculation barrier
jmp 2b ; endless loop
1:
lea 8(%rsp), %rsp ; restore stack pointer
ret ; the actual call to <call_target>
```

- $\rightarrow\,$  always predict to enter an endless loop
- instead of the correct (or wrong) target function  $\rightarrow$  performance?
- On Skylake or newer:
  - ret may fall-back to the BTB for prediction
  - $\rightarrow\,$  microcode patches to prevent that
• Prevent access to high-resolution timer



• Prevent access to high-resolution timer

 $\rightarrow~{\rm Own}$  timer using timing thread





- Prevent access to high-resolution timer
- $\rightarrow~{\rm Own}$  timer using timing thread
- Flush instruction only privileged



- Prevent access to high-resolution timer
- $\rightarrow~$  Own timer using timing thread
- Flush instruction only privileged
- $\rightarrow\,$  Cache eviction through memory accesses



- Prevent access to high-resolution timer
- $\rightarrow~$  Own timer using timing thread
  - Flush instruction only privileged
- $\rightarrow\,$  Cache eviction through memory accesses
  - Just move secrets into secure world



- Prevent access to high-resolution timer
- $\rightarrow~$  Own timer using timing thread
  - Flush instruction only privileged
- $\rightarrow\,$  Cache eviction through memory accesses
  - Just move secrets into secure world
- $\rightarrow\,$  Spectre works on secure enclaves





• attacks on crypto



• attacks on crypto  $\rightarrow$  "software should be fixed"



- $\bullet$  attacks on crypto  $\rightarrow$  "software should be fixed"
- attacks on ASLR



- $\bullet$  attacks on crypto  $\rightarrow$  "software should be fixed"
- $\bullet\,$  attacks on ASLR  $\rightarrow\,$  "ASLR is broken anyway"



- $\bullet$  attacks on crypto  $\rightarrow$  "software should be fixed"
- $\bullet\,$  attacks on ASLR  $\rightarrow\,$  "ASLR is broken anyway"
- attacks on SGX and TrustZone



- $\bullet$  attacks on crypto  $\rightarrow$  "software should be fixed"
- $\bullet\,$  attacks on ASLR  $\rightarrow\,$  "ASLR is broken anyway"
- $\bullet$  attacks on SGX and TrustZone  $\rightarrow$  "not part of the threat model"



- $\bullet$  attacks on crypto  $\rightarrow$  "software should be fixed"
- $\bullet\,$  attacks on ASLR  $\rightarrow\,$  "ASLR is broken anyway"
- $\bullet$  attacks on SGX and TrustZone  $\rightarrow$  "not part of the threat model"
- Rowhammer attacks



- $\bullet$  attacks on crypto  $\rightarrow$  "software should be fixed"
- attacks on ASLR  $\rightarrow$  "ASLR is broken anyway"
- $\bullet$  attacks on SGX and TrustZone  $\rightarrow$  "not part of the threat model"
- $\bullet$  Rowhammer attacks  $\rightarrow$  "only affects cheap sub-standard modules"



- $\bullet\,$  attacks on crypto  $\rightarrow\,$  "software should be fixed"
- $\bullet$  attacks on ASLR  $\rightarrow$  "ASLR is broken anyway"
- $\bullet$  attacks on SGX and TrustZone  $\rightarrow$  "not part of the threat model"
- $\bullet$  Rowhammer attacks  $\rightarrow$  "only affects cheap sub-standard modules"
- $\rightarrow\,$  for years we solely optimized for performance



After learning about an exploitable microarchitectural behavior you realize:



After learning about an exploitable microarchitectural behavior you realize:

• it was documented in the Intel manual



After learning about an exploitable microarchitectural behavior you realize:

- it was documented in the Intel manual
- only now we understand the implications



• sometimes you can't see the wood for the trees: everything was documented



- sometimes you can't see the wood for the trees: everything was documented
- optimizations often have security implications



- sometimes you can't see the wood for the trees: everything was documented
- optimizations often have security implications
- dedicate more time into identifying problems and not solely in mitigating known problems

## The Story of Meltdown and Spectre

Jann Horn & Daniel Gruss

May 17, 2018