0% found this document useful (0 votes)

21 views

Presentation2 HS OpenMP

OpenMP is an API that allows developers to write code that can run concurrently on multi-core CPUs. It uses compiler directives to specify parallel regions. OpenMP supports shared memory parallelism, thread creation and management, data sharing between threads, synchronization constructs, and parallelizing loops and tasks. It is portable, widely used in scientific computing, and provides a consistent model for parallel programming across platforms.

Uploaded by

Iqbal Tawakal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views

Presentation2 HS OpenMP

Uploaded by

Iqbal Tawakal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Parallel Programming with

OpenMP
What is OpenMP
• OpenMP is an API (Application Programming
Interface) that allows developers to write code that
can be executed concurrently on multi-core CPUs and
other shared-memory architectures.
• **1. Shared-Memory Parallelism:**
• **2. Compiler Directives:**
• **3. Parallel Regions:**
• **4. Thread Creation and Management:**
• **5. Data Sharing and Private Variables:**
• **6. Synchronization:**
• **7. Loop Parallelism:**
• **8. Task Parallelism:**
• **9. Performance Considerations:**
• **10. Portability:**
• **11. Error Handling:**
• **12. Examples of Use Cases:**
• **1. Shared-Memory Parallelism:**
• - OpenMP is designed for shared-memory parallelism, where multiple threads
of execution share the same memory space within a single process.
• - It is well-suited for multi-core processors and SMP (Symmetric Multi-
Processing) systems.
•
• **2. Compiler Directives:**
• - OpenMP uses compiler directives to specify parallel regions in the code.
• - These directives are pragmas (e.g., `#pragma omp`) that guide the compiler in
generating parallel code.
• **3. Parallel Regions:**
• - A parallel region is a block of code that can be executed concurrently by multiple
threads.
• - Threads are created automatically when entering a parallel region and terminated
when exiting.
• - Parallel regions can be created using directives like `#pragma omp parallel` or
`#pragma omp for`.
•
• **4. Thread Creation and Management:**
• - OpenMP abstracts thread creation and management, making it easier for
developers.
• - Threads are typically managed by the OpenMP runtime library, which handles
tasks like thread creation, synchronization, and load balancing.
• **5. Data Sharing and Private Variables:**
• - OpenMP provides mechanisms for sharing data among threads, such as shared
variables and thread-private variables.
• - Shared variables are accessible by all threads in a parallel region, while thread-
private variables are unique to each thread.
•
• **6. Synchronization:**
• - OpenMP supports various synchronization constructs to coordinate threads, such
as barriers (`#pragma omp barrier`) and critical sections (`#pragma omp critical`).
• - These constructs ensure that threads do not interfere with each other's execution
when accessing shared resources.
•
• **7. Loop Parallelism:**
• - OpenMP is commonly used to parallelize loops using directives like
`#pragma omp for`.
• - Parallel loops can be efficiently divided among threads, and loop
iterations are executed concurrently.
•
• **8. Task Parallelism:**
• - OpenMP also supports task parallelism, allowing developers to specify
tasks that can be executed independently.
• - Tasks can be created using `#pragma omp task` directives and can be
used for more fine-grained parallelism.
• **9. Performance Considerations:**
• - Efficient use of OpenMP requires careful consideration of load
balancing, data dependencies, and minimizing synchronization overhead.
• - Profiling and performance tuning tools can help identify bottlenecks and
optimize parallel code.
•
• **10. Portability:**
• - OpenMP is supported by many compilers and is highly portable, making
it easier to write parallel code that can run on various platforms.
• - It provides a consistent API for shared-memory parallelism across
different systems.
•
• **11. Error Handling:**
• - OpenMP provides mechanisms for handling errors, such as
runtime library functions for querying the number of threads and
checking for errors during parallel execution.
•
• **12. Examples of Use Cases:**
• - OpenMP is commonly used in scientific computing, numerical
simulations, data processing, and other applications where
parallelism can be exploited to improve performance.
examples
#include <omp.h>
#include <stdio.h>
int main() {
#pragma omp parallel
{
int thread_id = omp_get_thread_num();
printf("Hello from thread %d\n", thread_id);
}
return 0;
}

In this code, the `#pragma omp parallel` directive creates a parallel region, and each thread prints its own thread
ID. The code will run with multiple threads, and each thread will execute the specified block in parallel.
• Output on a computer with two cores, and thus two threads:

• Hello, world.

• On computer with 24 threads, I got 24 hellos, for 24 threads. On my

desktop I get (only) 8. How many do you get?
Code Example of addition two vectors
examples #include <stdio.h>
#include <omp.h>
#define N 1000
int main() {
int A[N], B[N], C[N]; // Input and output arrays
// Initialize the input arrays A and B
for (int i = 0; i < N; i++) {
A[i] = i;
B[i] = 2 * i;
}
// Parallelize the vector addition using OpenMP
#pragma omp parallel for
for (int i = 0; i < N; i++) {
C[i] = A[i] + B[i];
}
// Print the result (C)
printf("Resultant vector (C):\n");
for (int i = 0; i < N; i++) {
printf("%d ", C[i]);
}
printf("\n");
return 0;
}
Some notes: private and shared vars
• In a parallel section variables can be private (each thread owns a copy of
the variable) or shared among all threads. Shared variables must be used
with care because they cause race conditions.

• shared: the data within a parallel region is shared, which means visible and
accessible by all threads simultaneously. By default, all variables in the
work sharing region are shared except the loop iteration counter.

• private: the data within a parallel region is private to each thread, which
means each thread will have a local copy and use it as a temporary
variable. A private variable is not initialized and the value is not maintained
for use outside the parallel region. By default, the loop iteration counters in
the OpenMP loop constructs are private.
Some notes: private and shared vars
int main (int argc, char *argv[]) {
int th_id, nthreads;

#pragma omp parallel private(th_id)

// th_id is declared above.
// It is is specified as private; so each thread will have its own copy of th_id
{
th_id = omp_get_thread_num();
printf("Hello World from thread %d\n", th_id);
}

Sharing variables is sometimes what you want, other times its not, and can lead to race conditions. Put
differently, some variables need to be shared, some need to be private, and you the programmer have to
specify what you want.
Some notes : Synchronization
• • critical: the enclosed code block will be executed by only one thread at a time, and not
simultaneously executed by multiple threads. It is often used to protect shared data from race conditions.

• • atomic: the memory update (write, or read-modify-write) in the next instruction will be performed
atomically. It does not make the entire statement atomic; only the memory update is atomic. A compiler
might use special hardware instructions for better performance than when using critical.

• • ordered: the structured block is executed in the order in which iterations would be executed in a
sequential loop

• • barrier: each thread waits until all of the other threads of a team have reached this point. A work-
sharing construct has an implicit barrier synchronization at the end.

• • nowait: specifies that threads completing assigned work can proceed without waiting for all threads
in the team to finish. In the absence of this clause, threads encounter a barrier synchronization at the end of
the work sharing construct.
Some notes : synchronization
• on barriers: If we wanted all threads to be at a specific point in their execution before proceeding,
we would use a barrier.
• A barrier basically tells each thread, "wait here until all other threads have reached this point...".
int main (int argc, char *argv[]) {
Some other runtime functions
int th_id, nthreads;
#pragma omp parallel private(th_id)
are:
{
th_id = omp_get_thread_num(); • omp_get_num_threads
printf("Hello World from thread %d\n", th_id); • omp_get_num_procs
#pragma omp barrier <----------- master waits until all threads finish before • omp_set_num_threads
printing • omp_get_max_threads
if ( th_id == 0 ) {
nthreads = omp_get_num_threads();
printf("There are %d threads\n",nthreads);
}
}
}//m
//compute the sum of two arrays in parallel
Parallelizing loops #include < stdio.h >
#include < omp.h >
#define N 1000000
int main(void) {
float a[N], b[N], c[N];
int i;
/* Initialize arrays a and b */
for (i = 0; i < N; i++) {
a[i] = i * 2.0;
b[i] = i * 3.0;
}
/* Compute values of array c = a+b in parallel. */
#pragma omp parallel shared(a, b, c) private(i)
{
#pragma omp for
for (i = 0; i < N; i++) {
c[i] = a[i] + b[i];
printf ("%f\n", c[10]);
}
}
}
Adding two elements
//example4.c: add all elements in an array in //the array is distributde statically between
parallel
threads
#include < stdio.h >
#pragma omp for schedule(static,1)
int main() { for (int i=0; i< N; i++) {
const int N=100; local_sum += a[i];
int a[N]; }
//each thread calculated its local_sum. ALl
//initialize threads have to add to
for (int i=0; i < N; i++) //the global sum. It is critical that this
operation is atomic.
a[i] = i;
#pragma omp critical
//compute sum
int local_sum, sum; sum += local_sum;
#pragma omp parallel private(local_sum) }
shared(sum)
{ printf("sum=%d should be %d\n", sum, N*(N-
local_sum =0; 1)/2);
}
Performance consideration
• Critical sections and atomic sections serialize the
execution and eliminate the concurrent
execution of threads.
• If used unwisely, OpenMP code can be worse
than serial code because of all the thread
overhead.
Some comments
Some comments
Exercises (OpenMP-1)
ref: https://www.r-ccs.riken.jp/en/wp-content/uploads/sites/2/2021/08/RIKEN-iphcss_day1-rev1.pdf
Exercises (OpenMP-2):
ref: https://www.r-ccs.riken.jp/en/wp-content/uploads/sites/2/2021/08/RIKEN-iphcss_day1-rev1.pdf
Exercises (OpenMP-3):
ref: https://www.r-ccs.riken.jp/en/wp-content/uploads/sites/2/2021/08/RIKEN-iphcss_day1-rev1.pdf
Tested at Fugaku
Tested at Fugaku
Tested at Fugaku
Tested at Fugaku
Tested at Fugaku
• Job submission
$ pjsub hello_world_omp.pjsh

• Observing the submitted jobs

$ pjstat

Ti - Bluetooth - Guide
100% (2)
Ti - Bluetooth - Guide
163 pages
Oracle X86 Server Installation Specialist Online Assessment 2020
No ratings yet
Oracle X86 Server Installation Specialist Online Assessment 2020
21 pages
Abc
No ratings yet
Abc
8 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
4 Openmp
No ratings yet
4 Openmp
32 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
CS-3006_8_UsingOpenMP_SharedMemoryProgramming
No ratings yet
CS-3006_8_UsingOpenMP_SharedMemoryProgramming
61 pages
Unit 4 Shared-Memory Parallel Programming With Openmp
No ratings yet
Unit 4 Shared-Memory Parallel Programming With Openmp
37 pages
Openmp: Parallel Processing
No ratings yet
Openmp: Parallel Processing
40 pages
Mcap-lab Manual 1
No ratings yet
Mcap-lab Manual 1
19 pages
Parallel Programming Module 2
No ratings yet
Parallel Programming Module 2
112 pages
Shared Memory: Openmp Environment and Synchronization
No ratings yet
Shared Memory: Openmp Environment and Synchronization
32 pages
OPENMP1
No ratings yet
OPENMP1
67 pages
OpenMP Examples
No ratings yet
OpenMP Examples
12 pages
10 OpenMP-2
No ratings yet
10 OpenMP-2
25 pages
Lecture Open MP
No ratings yet
Lecture Open MP
35 pages
Openmp Programming: Aiichiro Nakano
No ratings yet
Openmp Programming: Aiichiro Nakano
10 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
OpenMP Reference
No ratings yet
OpenMP Reference
2 pages
Lecture 06 - OpenMP
No ratings yet
Lecture 06 - OpenMP
37 pages
Lab # 2 by Akram
No ratings yet
Lab # 2 by Akram
14 pages
Open MP
No ratings yet
Open MP
35 pages
OpenMPSlides Tamu SC PDF
No ratings yet
OpenMPSlides Tamu SC PDF
74 pages
OpenMP 2
No ratings yet
OpenMP 2
3 pages
ipc_assig 1
No ratings yet
ipc_assig 1
9 pages
Openmp: Openmp Adds Constructs For Shared-Memory
No ratings yet
Openmp: Openmp Adds Constructs For Shared-Memory
15 pages
Lect11 Openmp1
No ratings yet
Lect11 Openmp1
35 pages
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
No ratings yet
Programming Shared-Memory Platforms With Openmp: John Mellor-Crummey
46 pages
Program Excecution ExpFinal
No ratings yet
Program Excecution ExpFinal
10 pages
openMP
No ratings yet
openMP
28 pages
OpenMPSlides Tamu SC
No ratings yet
OpenMPSlides Tamu SC
80 pages
Lecture 10 Shared Memory Programming with OpenMP.pptx
No ratings yet
Lecture 10 Shared Memory Programming with OpenMP.pptx
30 pages
PDC-Lab 21BCE10419
No ratings yet
PDC-Lab 21BCE10419
20 pages
Num Tech
No ratings yet
Num Tech
39 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
Mpsoc Architectures Openmp
No ratings yet
Mpsoc Architectures Openmp
35 pages
NGK Openmp
No ratings yet
NGK Openmp
13 pages
OPENMP
No ratings yet
OPENMP
37 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
Cao Da1
No ratings yet
Cao Da1
9 pages
Beginning OpenMP
No ratings yet
Beginning OpenMP
20 pages
openmp_HPC_ass1
No ratings yet
openmp_HPC_ass1
43 pages
OpenMP 01 Introduction
No ratings yet
OpenMP 01 Introduction
70 pages
A Tutorial On Parallel Computing On Shared Memory Systems
No ratings yet
A Tutorial On Parallel Computing On Shared Memory Systems
23 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
Exercise 1 (Openmp-I)
No ratings yet
Exercise 1 (Openmp-I)
10 pages
Chap4 OpenMP
No ratings yet
Chap4 OpenMP
35 pages
Open MP
No ratings yet
Open MP
30 pages
OpenMP_SPM
No ratings yet
OpenMP_SPM
9 pages
Introduction To Open MP
No ratings yet
Introduction To Open MP
42 pages
Unit III
No ratings yet
Unit III
15 pages
Openmp
No ratings yet
Openmp
115 pages
Openmp: Martin Kruliš Ji Ří Dokulil
No ratings yet
Openmp: Martin Kruliš Ji Ří Dokulil
38 pages
About OpenMP
No ratings yet
About OpenMP
86 pages
DS1822-Parallel Computing - Unit2
No ratings yet
DS1822-Parallel Computing - Unit2
25 pages
Openmp
No ratings yet
Openmp
21 pages
OpenMP and MPI Multiple Choice Questions (MCQs) for Exam Preparation (1)
No ratings yet
OpenMP and MPI Multiple Choice Questions (MCQs) for Exam Preparation (1)
13 pages
C Programming
From Everand
C Programming
Netra
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Basic Information About C language PDF
From Everand
Basic Information About C language PDF
Suraj Das
No ratings yet
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
Numark Mixstream Pro Serato DJ Pro Quick-Start Guide
No ratings yet
Numark Mixstream Pro Serato DJ Pro Quick-Start Guide
6 pages
A709646 Handbuch Zenjet Range Operating Manual GB V2 1
No ratings yet
A709646 Handbuch Zenjet Range Operating Manual GB V2 1
57 pages
Class and Object Diagram
No ratings yet
Class and Object Diagram
12 pages
Yokogawa Centum CS3000
83% (6)
Yokogawa Centum CS3000
43 pages
Unit-III 8051 Timers, Interrupts and Serial Communication: TH/TL (Timer High/ Timer Low)
No ratings yet
Unit-III 8051 Timers, Interrupts and Serial Communication: TH/TL (Timer High/ Timer Low)
19 pages
Release Notes: Product Name: VIA Current Version: Type Number Date
No ratings yet
Release Notes: Product Name: VIA Current Version: Type Number Date
50 pages
CA (CL) - IT - (Module-1) - (02) - The Components of Information Systems - by MD - Monowar FCA, CISA
No ratings yet
CA (CL) - IT - (Module-1) - (02) - The Components of Information Systems - by MD - Monowar FCA, CISA
11 pages
Portablehandheldmarkingmachineusermanual
No ratings yet
Portablehandheldmarkingmachineusermanual
52 pages
Digsi 4 One Software For All SIPROTEC Protection Relays: Function Overview
No ratings yet
Digsi 4 One Software For All SIPROTEC Protection Relays: Function Overview
3 pages
4 Motherboard
No ratings yet
4 Motherboard
4 pages
5526 Specs
No ratings yet
5526 Specs
2 pages
FORMATTING unit 2
No ratings yet
FORMATTING unit 2
9 pages
OS Practical File
No ratings yet
OS Practical File
47 pages
Manual PDF
No ratings yet
Manual PDF
58 pages
Amd Epyc 7313 - Amd
No ratings yet
Amd Epyc 7313 - Amd
7 pages
XC4411 Manualmain
No ratings yet
XC4411 Manualmain
1 page
MAX MyFleet
No ratings yet
MAX MyFleet
16 pages
Operating System Question
No ratings yet
Operating System Question
18 pages
Activity Sheets For Week2 - CHS10
No ratings yet
Activity Sheets For Week2 - CHS10
7 pages
Brand Book EU 2020
No ratings yet
Brand Book EU 2020
22 pages
Class 01 (Introduction of Adobe Illustrator)
No ratings yet
Class 01 (Introduction of Adobe Illustrator)
8 pages
Mac Pro (Late 2013) - Technical Specifications - Apple Support
No ratings yet
Mac Pro (Late 2013) - Technical Specifications - Apple Support
4 pages
ANATOM 16HD ClearView Operation Manual
No ratings yet
ANATOM 16HD ClearView Operation Manual
255 pages
Python Modeling
100% (1)
Python Modeling
49 pages
(Ebook) Data Visualization with Python and JavaScript: Scrape, Clean, Explore & Transform Your Data by Kyran Dale ISBN 9781491920510, 1491920513 download
100% (2)
(Ebook) Data Visualization with Python and JavaScript: Scrape, Clean, Explore & Transform Your Data by Kyran Dale ISBN 9781491920510, 1491920513 download
51 pages
1 Development of Auto Cad 3D Model
No ratings yet
1 Development of Auto Cad 3D Model
3 pages
Grade 8 IT MC T2
No ratings yet
Grade 8 IT MC T2
4 pages

Uploaded by

Uploaded by

Parallel Programming with

• On computer with 24 threads, I got 24 hellos, for 24 threads. On my

#pragma omp parallel private(th_id)

• Observing the submitted jobs

You might also like