0% found this document useful (0 votes)

7 views

Radix Sort

Uploaded by

Bezawada Manasa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Radix Sort

Uploaded by

Bezawada Manasa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Radix Sort

To convert the given Parallel Straight Radix Sort function to OpenMP, we need to focus on
parallelizing the computation of lCount (which holds counts of bit patterns for each thread), the
prefix sum calculation across threads, and the reordering of the array. The synchronization in the
original code is managed with barriers (pth_barrier), which we will replace with OpenMP’s
#pragma omp barrier directive.

Key Concepts:

1. Parallelization:
o We can parallelize the loop that counts the bit patterns (lCount) across threads.
o We can also parallelize the computation of the global count (gCount) and the
exclusive prefix sum (fCount) in a way that avoids race conditions.
o The reordering step where we distribute the elements into tA can be parallelized
using a #pragma omp for directive.
2. Barriers:
o OpenMP provides #pragma omp barrier to synchronize threads, similar to the
pth_barrier used in the original code.
3. Data Sharing:
o We'll need to ensure proper data sharing for variables such as lCount, gCount,
and fCount. These variables are updated by multiple threads, so we'll need to be
careful with data race conditions and ensure they are correctly shared or private.

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

#define mbits 4
#define M 16
#define NUM_THREADS 4 // Number of threads (this can be set dynamically)
int A[1000]; // Example input array, size should be set according to your case
int tA[1000]; // Temporary array for storing results
int lCount[NUM_THREADS][M]; // Local counts for each thread
int gCount[M]; // Global counts for all threads
int fCount[M]; // Prefix sums for counts

// Simulate the bits function (extract the bits from a number)

// Extract mbits bits starting at position 'start'
int bits(int value, int start, int mbits) {
return (value >> start) & ((1 << mbits) - 1);
}

// Parallelized Radix Sort function

void ParallelStraightRadixSort(void *par) {
tParams *lpar = (tParams *)par;
int N = lpar->num;
int b = lpar->keylength;
int tNum = lpar->tid;
int i, j, pass, tBits;
int *tempPtr;
int start, end, offset;

// Calculate the range of elements each thread will work on

start = ((float)N / NUM_THREADS) * tNum;
end = ((float)N / NUM_THREADS) * (tNum + 1);
if (tNum == NUM_THREADS - 1) end = N; // Ensure the last thread handles the remaining
elements

// Loop over each pass of the radix sort

for (pass = 0; pass < (b / mbits); pass++) {

// Initialize local count for each thread (clear lCount array)

#pragma omp parallel for
for (j = 0; j < M; j++) {
lCount[tNum][j] = 0;
}

// Count bit patterns in parallel

#pragma omp parallel for
for (i = start; i < end; i++) {
lCount[tNum][bits(A[i], pass * mbits, mbits)]++;
}

#pragma omp barrier // Synchronize all threads before calculating global counts

// Only the first thread computes the global counts

if (tNum == 0) {
// Compute the global count (gCount) from local counts (lCount)
for (i = 0; i < M; i++) {
gCount[i] = 0;
for (j = 0; j < NUM_THREADS; j++) {
gCount[i] += lCount[j][i];
}
}

// Compute exclusive prefix sum for gCount into fCount

fCount[1] = gCount[0] - 1;
for (i = 2; i < M; i++) {
fCount[i] = fCount[i - 1] + gCount[i - 1];
}
fCount[0] = -1;

// Prefix scan for each bit pattern in lCount using fCount

for (i = 0; i < M; i++) {
lCount[0][i] += fCount[i];
for (j = 1; j < NUM_THREADS; j++) {
lCount[j][i] += lCount[j - 1][i];
}
}
}

#pragma omp barrier // Synchronize all threads before proceeding with the next phase

// Reorder elements into temporary array `tA` based on the local counts
#pragma omp parallel for
for (i = end - 1; i >= start; i--) {
tBits = bits(A[i], pass * mbits, mbits);
tA[lCount[tNum][tBits]] = A[i];
lCount[tNum][tBits]--;
}

#pragma omp barrier // Synchronize all threads after reordering

// Swap pointers between A and tA for the next pass

if (tNum == 0) {
tempPtr = tA;
tA = A;
A = tempPtr;
}

#pragma omp barrier // Synchronize all threads before starting the next pass
}
}

int main() {
// Example usage with arbitrary data and parameters
tParams params;
params.num = 1000; // Array size
params.keylength = 32; // Number of bits per key
params.tid = 0; // Thread ID (will be set for each thread in OpenMP)

// Initialize the array A with some random data (for demonstration purposes)
for (int i = 0; i < 1000; i++) {
A[i] = rand() % 10000; // Random numbers for sorting
}

// Parallel Straight Radix Sort with OpenMP

#pragma omp parallel
{
params.tid = omp_get_thread_num();
ParallelStraightRadixSort(&params);
}

// Print sorted array (optional)

for (int i = 0; i < 1000; i++) {
printf("%d ", A[i]);
}
printf("\n");

return 0;
}

Key Changes and Considerations:

1. Parallel Count Calculation:

o The lCount array for each thread is calculated in parallel using the #pragma omp
parallel for directive.
o The count of bit patterns (from bits(A[i], pass * mbits, mbits)) is done by
each thread in parallel.
2. Global Count Calculation (gCount):
o This part is performed by thread tNum == 0 to sum the lCount arrays across all
threads. We use OpenMP to parallelize the summation over NUM_THREADS.
3. Prefix Sum Calculation:
o We perform the exclusive prefix sum of gCount into fCount in a sequential
fashion (only in thread 0). This step is critical for correctly positioning the
elements in the array during the next phase.
4. Reordering Step:
o The final step of moving the elements from A to tA is done in parallel by each
thread. Each thread writes to a different part of the array (tA), using the updated
lCount values.
5. Barriers:
o #pragma omp barrier is used to synchronize the threads after each significant
step, ensuring that all threads complete their part before proceeding.
6. Array Pointer Swap:
o After each pass, the array A and tA are swapped using pointers. This ensures that
the next pass works on the sorted result from the previous pass.

Version 2 OpenMP

To convert the provided parallel straight radix sort code, which uses custom threading with barriers and
locks, to OpenMP, we need to utilize OpenMP's constructs to handle parallelism, synchronization, and
shared memory management more effectively.

#include <omp.h>
#define mbits 4
#define M 16

void ParallelStraightRadixSort(void *par) {

tParams *lpar = (tParams *) par;
int N = lpar->num;
int b = lpar->keylength;
int tNum = lpar->tid;
int i, j, pass, tBits;
int *tempPtr;
int start, end, offset;
start = ((float)N / NUM_THREADS) * tNum;
end = ((float)N / NUM_THREADS) * (tNum + 1);
if (tNum == NUM_THREADS - 1) end = N;

for (pass = 0; pass < (b / mbits); pass++) {

// Reset local counts
#pragma omp parallel for
for (j = 0; j < M; j++) {
lCount[tNum][j] = 0;
}

#pragma omp parallel for

for (i = start; i < end; i++) {
lCount[tNum][bits(A[i], pass * mbits, mbits)]++;
}

#pragma omp barrier // Synchronize threads before computing the sums

if (tNum == 0) {
// Compute global counts and exclusive prefix scan
for (i = 0; i < M; i++) {
gCount[i] = 0;
for (j = 0; j < NUM_THREADS; j++) {
gCount[i] += lCount[j][i];
}
}

fCount[1] = gCount[0] - 1;
for (i = 2; i < M; i++) {
fCount[i] = fCount[i - 1] + gCount[i - 1];
}
fCount[0] = -1;

for (i = 0; i < M; i++) {

lCount[0][i] += fCount[i];
for (j = 1; j < NUM_THREADS; j++) {
lCount[j][i] += lCount[j - 1][i];
}
}
}

#pragma omp barrier // Ensure all threads complete the sum and scan

#pragma omp parallel for

for (i = end - 1; i >= start; i--) {
tBits = bits(A[i], pass * mbits, mbits);
tA[lCount[tNum][tBits]] = A[i];
lCount[tNum][tBits]--;
}

#pragma omp barrier // Synchronize before swapping arrays

if (tNum == 0) {
// Swap arrays for the next pass
tempPtr = tA;
tA = A;
A = tempPtr;
}

#pragma omp barrier // Ensure array swap is complete before next pass
}
}

Explanation of Changes:

1. Parallelization of Loops:
o The loops that calculate local counts (lCount) and those that move data into the
sorted array (tA) are parallelized using #pragma omp parallel for.
o The barrier synchronization (#pragma omp barrier) is used to synchronize
threads at points where all threads must wait for others to complete before
proceeding, similar to the custom pth_barrier used in the original code.
2. Shared Data:
o The arrays lCount, gCount, and fCount are shared among all threads, but since
lCount is indexed by tNum, each thread is responsible for its own portion of this
array. OpenMP handles the concurrent updates without explicit locks, as long as
we ensure no data races within threads.
3. Array Swapping:
o The array swapping step, where A and tA are swapped, is done by thread 0 only
(if (tNum == 0)). This prevents race conditions on the pointer swap, ensuring
only one thread performs the swap.
4. Synchronization:
o Barriers (#pragma omp barrier) are used to ensure proper synchronization
between the different stages of the algorithm. This is necessary because the sum
computation and prefix scan must be completed before the data can be moved to
the sorted array, and the array swap must wait for all threads to finish the sorting
for the current pass.
5. Task Management:
o Unlike the original code, which uses custom barriers and thread management,
OpenMP simplifies this by handling task scheduling, synchronization, and
execution automatically.

Core Technologies in An Effective IT System
0% (1)
Core Technologies in An Effective IT System
2 pages
BC Manager Manual (001-144)
No ratings yet
BC Manager Manual (001-144)
144 pages
Current Log
No ratings yet
Current Log
63 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Hpc Printout 1
No ratings yet
Hpc Printout 1
22 pages
Bubble Sort - OpenMP
No ratings yet
Bubble Sort - OpenMP
4 pages
PDC-Lab 21BCE10419
No ratings yet
PDC-Lab 21BCE10419
20 pages
Exercise 1 (Openmp-I)
No ratings yet
Exercise 1 (Openmp-I)
10 pages
Vector Addition: Exercise 1 (Openmp-I) Scenario - I
100% (1)
Vector Addition: Exercise 1 (Openmp-I) Scenario - I
15 pages
Presentation2 HS OpenMP
No ratings yet
Presentation2 HS OpenMP
29 pages
PDC LAB Experiment 2
No ratings yet
PDC LAB Experiment 2
12 pages
CP4292-MCAP
No ratings yet
CP4292-MCAP
24 pages
CP 4292 MCP lab manual
No ratings yet
CP 4292 MCP lab manual
20 pages
HPC Codes-2
No ratings yet
HPC Codes-2
15 pages
Lab # 2 by Akram
No ratings yet
Lab # 2 by Akram
14 pages
CP4292 Multicore Architecture lab manual
No ratings yet
CP4292 Multicore Architecture lab manual
36 pages
Excelente
No ratings yet
Excelente
64 pages
HPC Output
No ratings yet
HPC Output
12 pages
Mcap-lab Manual 1
No ratings yet
Mcap-lab Manual 1
19 pages
4 Performance.4x
No ratings yet
4 Performance.4x
14 pages
Multicore Architecture and Programming Lab Manual
No ratings yet
Multicore Architecture and Programming Lab Manual
29 pages
gauravkumar_221it027@it301_Lab2
No ratings yet
gauravkumar_221it027@it301_Lab2
28 pages
MAP lab completed doc
No ratings yet
MAP lab completed doc
29 pages
OpenMP Programs
No ratings yet
OpenMP Programs
4 pages
Cse 4001-Parallel and Distributed Computing Lab Digital Assessment-1 Name: Avulapati Anusha REG - NO: 17BCE0435
No ratings yet
Cse 4001-Parallel and Distributed Computing Lab Digital Assessment-1 Name: Avulapati Anusha REG - NO: 17BCE0435
5 pages
22l-6819
No ratings yet
22l-6819
8 pages
PPA Lab 10
No ratings yet
PPA Lab 10
10 pages
22l-6831
No ratings yet
22l-6831
9 pages
CP4252 Multicore Architecture and Programming Lab Manual
No ratings yet
CP4252 Multicore Architecture and Programming Lab Manual
26 pages
MAP laB mannual
No ratings yet
MAP laB mannual
24 pages
MPC LAB Manual new
No ratings yet
MPC LAB Manual new
24 pages
Comparison Threads
No ratings yet
Comparison Threads
10 pages
HPC Programs
No ratings yet
HPC Programs
19 pages
E 3 (Openmp - Iii) : Matrix Multiplication
No ratings yet
E 3 (Openmp - Iii) : Matrix Multiplication
10 pages
HPC CODES
No ratings yet
HPC CODES
18 pages
Parallel and Distributed Computing Lab Digital Assignment - 3
No ratings yet
Parallel and Distributed Computing Lab Digital Assignment - 3
10 pages
HPC Practicals
No ratings yet
HPC Practicals
26 pages
Untitled document
No ratings yet
Untitled document
23 pages
Lab 3
No ratings yet
Lab 3
23 pages
Multicore
No ratings yet
Multicore
23 pages
Name: Harshvardhan Singh Gahlaut Reg. No.: 19BCE2372 Slot: L41+L42
No ratings yet
Name: Harshvardhan Singh Gahlaut Reg. No.: 19BCE2372 Slot: L41+L42
3 pages
Lab 7
No ratings yet
Lab 7
3 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
Assignment2 20BCE0023
No ratings yet
Assignment2 20BCE0023
10 pages
Openmp Tutorial: Seung-Jai Min
No ratings yet
Openmp Tutorial: Seung-Jai Min
30 pages
Openmp Tutorial: Seung-Jai Min
No ratings yet
Openmp Tutorial: Seung-Jai Min
30 pages
OpenMP 2
No ratings yet
OpenMP 2
3 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
Lab Manual
No ratings yet
Lab Manual
31 pages
PC File
No ratings yet
PC File
57 pages
OpenMP Examples
No ratings yet
OpenMP Examples
12 pages
Soto Ferrari
No ratings yet
Soto Ferrari
9 pages
Openmp 4
No ratings yet
Openmp 4
31 pages
Day 2 1 Advanced-Openmp
No ratings yet
Day 2 1 Advanced-Openmp
52 pages
10 OpenMP-2
No ratings yet
10 OpenMP-2
25 pages
Quick Sort
No ratings yet
Quick Sort
8 pages
SWE2017 - Lab Assignment 1pages-7
No ratings yet
SWE2017 - Lab Assignment 1pages-7
5 pages
4 Openmp
No ratings yet
4 Openmp
32 pages
OpenMPSlides Tamu SC
No ratings yet
OpenMPSlides Tamu SC
80 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Hcai Interface Thesis
No ratings yet
Hcai Interface Thesis
118 pages
AWS Cloud Practitioner Essentials Resume
No ratings yet
AWS Cloud Practitioner Essentials Resume
40 pages
MXT
No ratings yet
MXT
18 pages
Database Programming With SQL 6-3: Inner Versus Outer Joins Practice Activities
No ratings yet
Database Programming With SQL 6-3: Inner Versus Outer Joins Practice Activities
4 pages
General Information: CST2110 Individual Programming Assignment #1 (RESIT)
No ratings yet
General Information: CST2110 Individual Programming Assignment #1 (RESIT)
8 pages
Set of 15 Sample Papers With Solutions & Blueprint For Class 12 IP, 2024-25 Exam Edition
No ratings yet
Set of 15 Sample Papers With Solutions & Blueprint For Class 12 IP, 2024-25 Exam Edition
142 pages
Important Questions in CF&P-1
No ratings yet
Important Questions in CF&P-1
14 pages
IGCSE 1CT 0417 Chap Development & Testing
No ratings yet
IGCSE 1CT 0417 Chap Development & Testing
20 pages
SD MMC Card Fat16 Demo in CCS C and Proteus Design File - Sonsivri
No ratings yet
SD MMC Card Fat16 Demo in CCS C and Proteus Design File - Sonsivri
14 pages
DBS Project Progress
No ratings yet
DBS Project Progress
11 pages
Configuring T24 Browser and Listener in JBoss 6.XX EAP Using TOCF (EE)
No ratings yet
Configuring T24 Browser and Listener in JBoss 6.XX EAP Using TOCF (EE)
19 pages
(PDF Download) FastAPI: Modern Python Web Development (First Early Release) Bill Lubanovic Fulll Chapter
100% (2)
(PDF Download) FastAPI: Modern Python Web Development (First Early Release) Bill Lubanovic Fulll Chapter
64 pages
Gis Module 1
No ratings yet
Gis Module 1
46 pages
Grade 10 IT-Practice Sample Paper 3
No ratings yet
Grade 10 IT-Practice Sample Paper 3
6 pages
Excel Magic: Mastering V-Lookup and H-Lookup: Sparsh Srivastatava Abhiv Chawla Vaidika Agarwal
No ratings yet
Excel Magic: Mastering V-Lookup and H-Lookup: Sparsh Srivastatava Abhiv Chawla Vaidika Agarwal
9 pages
Moderation 101
No ratings yet
Moderation 101
2 pages
CV@GRMSH
No ratings yet
CV@GRMSH
5 pages
CV_Valeriy_Tymofieiev
No ratings yet
CV_Valeriy_Tymofieiev
2 pages
Custom Media Setup Guide: For C911 Dicom Printer Series: C911/C931 DICOM ES9411/ES9431 DICOM Pro9431 DICOM
No ratings yet
Custom Media Setup Guide: For C911 Dicom Printer Series: C911/C931 DICOM ES9411/ES9431 DICOM Pro9431 DICOM
10 pages
Use Only: Introduction To XML
No ratings yet
Use Only: Introduction To XML
399 pages
REPORT
No ratings yet
REPORT
49 pages
company-profile-systems-limited
No ratings yet
company-profile-systems-limited
21 pages
Document Information
No ratings yet
Document Information
40 pages
Metadata Guideline
No ratings yet
Metadata Guideline
7 pages
Modify The Mouse Settings From The Command Line - Ghacks Tech News
No ratings yet
Modify The Mouse Settings From The Command Line - Ghacks Tech News
4 pages
Sai Jaswanth UI React Developer
No ratings yet
Sai Jaswanth UI React Developer
7 pages
Screenshot 2023-10-09 at 16.07.25
No ratings yet
Screenshot 2023-10-09 at 16.07.25
1 page

Uploaded by

Uploaded by

Radix Sort

// Simulate the bits function (extract the bits from a number)

// Parallelized Radix Sort function

// Calculate the range of elements each thread will work on

// Loop over each pass of the radix sort

// Initialize local count for each thread (clear lCount array)

// Count bit patterns in parallel

// Only the first thread computes the global counts

// Compute exclusive prefix sum for gCount into fCount

// Prefix scan for each bit pattern in lCount using fCount

#pragma omp barrier // Synchronize all threads after reordering

// Swap pointers between A and tA for the next pass

// Parallel Straight Radix Sort with OpenMP

// Print sorted array (optional)

Key Changes and Considerations:

1. Parallel Count Calculation:

void ParallelStraightRadixSort(void *par) {

for (pass = 0; pass < (b / mbits); pass++) {

#pragma omp parallel for

#pragma omp barrier // Synchronize threads before computing the sums

for (i = 0; i < M; i++) {

#pragma omp parallel for

#pragma omp barrier // Synchronize before swapping arrays

You might also like