# High-Performance Scientific Computing Lecture 3: OpenCL

#### MATH-GA 2011 / CSCI-GA 2945 · September 19, 2012

### Today

#### HW2

Chips for Throughput

Synchronization

• New here? Please send email

- New here? Please send email
- Started looking for a final project yet?

- New here? Please send email
- Started looking for a final project yet?
- HW1 not found  $\rightarrow$  email

- New here? Please send email
- Started looking for a final project yet?
- HW1 not found  $\rightarrow$  email
- Grading

- New here? Please send email
- Started looking for a final project yet?
- HW1 not found  $\rightarrow$  email
- Grading
- Overall pace

- New here? Please send email
- Started looking for a final project yet?
- HW1 not found  $\rightarrow$  email
- Grading
- Overall pace
- HW3 out on the weekend

# Outline

#### HW2

Chips for Throughput

Synchronization

HW2 problem 2

# Demo time

• Critical section

- Critical section
- Locks

- Critical section
- Locks
- Atomics

- Critical section
- Locks
- Atomics
  - Update: x++;

- Critical section
- Locks
- Atomics
  - Update: x++;
  - Capture: v = x++;

- Critical section
- Locks
- Atomics
  - Update: x++;
  - Capture: v = x++;
  - Structured: v = x; x —= expr; ("Test-and-set")

- Critical section
- Locks
- Atomics
  - Update: x++;
  - Capture: v = x++;
  - Structured: v = x; x —= expr; ("Test-and-set")
  - Compare-and-swap (not in OpenMP)

• May OpenMP directives be nested?

#### • May OpenMP directives be nested?

• What is an orphaned directive?

#### • May OpenMP directives be nested?

- What is an orphaned directive?
- What is close nesting?

#### • May OpenMP directives be nested?

- What is an orphaned directive?
- What is close nesting?
- What is a 'dynamic extent' of a region?

- May OpenMP directives be nested?
  - What is an orphaned directive?
  - What is close nesting?
  - What is a 'dynamic extent' of a region?
- May a worksharing region be closely nested inside another one?

- May OpenMP directives be nested?
  - What is an orphaned directive?
  - What is close nesting?
  - What is a 'dynamic extent' of a region?
- May a worksharing region be closely nested inside another one?
- What happens if I nest two critical regions of the same name?

• Corresponding getter function for omp\_set\_num\_threads()?

- Corresponding getter function for omp\_set\_num\_threads()?
- Relation between omp\_set\_dynamic() and schedule(dynamic)?

- Corresponding getter function for omp\_set\_num\_threads()?
- Relation between omp\_set\_dynamic() and schedule(dynamic)?
- What is wrong with this statement?

- Corresponding getter function for omp\_set\_num\_threads()?
- Relation between omp\_set\_dynamic() and schedule(dynamic)?
- What is wrong with this statement?

A barrier region may not be closely nested inside a worksharing region. (from the OpenMP tutorial)

- Corresponding getter function for omp\_set\_num\_threads()?
- Relation between omp\_set\_dynamic() and schedule(dynamic)?
- What is wrong with this statement?

A barrier region may not be closely nested inside a worksharing region. (from the OpenMP tutorial)

• What threads does a barrier bind to?

- Corresponding getter function for omp\_set\_num\_threads()?
- Relation between omp\_set\_dynamic() and schedule(dynamic)?
- What is wrong with this statement?

A barrier region may not be closely nested inside a worksharing region. (from the OpenMP tutorial)

- What threads does a barrier bind to?
- What threads does a critical region bind to?

# Outline

#### HW2

### Chips for Throughput

Synchronization

### CPU Chip Real Estate



Die floorplan: VIA Isaiah (2008). 65 nm, 4 SP ops at a time, 1 MiB L2.

# "CPU-style" Cores



Credit: Kayvon Fatahalian (Stanford)

# Slimming down



### More Space: Double the Number of Cores



### ...again



### ... and again



#### ... and again



| Fetch/<br>Decode     |
|----------------------|
| ALU<br>(Execute)     |
| Execution<br>Context |
|                      |



#### Credit: Kayvon Fatahalian (Stanford)

#### Idea #2

Amortize cost/complexity of managing an instruction stream across many ALUs

#### ightarrow SIMD



Credit: Kayvon Fatahalian (Stanford)

#### Idea #2

Amortize cost/complexity of managing an instruction stream across many ALUs

#### ightarrow SIMD



Credit: Kayvon Fatahalian (Stanford)

#### Idea #2

Amortize cost/complexity of managing an instruction stream across many ALUs

#### ightarrow SIMD

#### Gratuitous Amounts of Parallelism!



#### Gratuitous Amounts of Parallelism!

Example:

128 instruction streams in parallel

16 independent groups of 8 synchronized streams

| 8888 |
|------|
|      |
|      |





### Gratuitous Amounts of Parallelism!

Example:

128 instruction streams in parallel

16 independent groups of 8 synchronized streams















### Recent Processor Architecture

- Commodity chips
- "Infinitely" many cores
- "Infinite" vector width
- Must hide memory latency  $(\rightarrow$  ILP, SMT)

- Compute bandwidth
  Memory bandwidth
- Bandwidth only achievable by *homogeneity*



T200 08)

Nv Fermi (2010)



(2012)



AMD Tahiti (2012)



Nv GK2 (2012

### Outline

#### HW2

Chips for Throughput

Synchronization

What is a Barrier?



What is a Barrier?



What is a Barrier?



What is a Barrier?



What is a Barrier?



What is a Barrier?



What is a Barrier?



What is a Memory Fence?



What is a Memory Fence?



What is a Memory Fence?



What is a Memory Fence?



What is a Memory Fence?



What is a Memory Fence?



What is a Memory Fence?



















Collaborative (inter-block) Global Memory Update:



Collaborative (inter-block) Global Memory Update:



Collaborative (inter-block) Global Memory Update:



Collaborative (inter-block) Global Memory Update:



Atomic Global Memory Update:



Collaborative (inter-block) Global Memory Update:



Atomic Global Memory Update:



Collaborative (inter-block) Global Memory Update:



Atomic Global Memory Update:



Collaborative (inter-block) Global Memory Update:



Atomic Global Memory Update:



#### How?

atomic\_{add,inc,cmpxchg,...}(int \*global, int value);

### Questions?

?

#### Image Credits

• Isaiah die shot: VIA Technologies