Radix Sort
Radix Sort
To convert the given Parallel Straight Radix Sort function to OpenMP, we need to focus on
parallelizing the computation of lCount (which holds counts of bit patterns for each thread), the
prefix sum calculation across threads, and the reordering of the array. The synchronization in the
original code is managed with barriers (pth_barrier), which we will replace with OpenMP’s
#pragma omp barrier directive.
Key Concepts:
1. Parallelization:
o We can parallelize the loop that counts the bit patterns (lCount) across threads.
o We can also parallelize the computation of the global count (gCount) and the
exclusive prefix sum (fCount) in a way that avoids race conditions.
o The reordering step where we distribute the elements into tA can be parallelized
using a #pragma omp for directive.
2. Barriers:
o OpenMP provides #pragma omp barrier to synchronize threads, similar to the
pth_barrier used in the original code.
3. Data Sharing:
o We'll need to ensure proper data sharing for variables such as lCount, gCount,
and fCount. These variables are updated by multiple threads, so we'll need to be
careful with data race conditions and ensure they are correctly shared or private.
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#define mbits 4
#define M 16
#define NUM_THREADS 4 // Number of threads (this can be set dynamically)
int A[1000]; // Example input array, size should be set according to your case
int tA[1000]; // Temporary array for storing results
int lCount[NUM_THREADS][M]; // Local counts for each thread
int gCount[M]; // Global counts for all threads
int fCount[M]; // Prefix sums for counts
#pragma omp barrier // Synchronize all threads before calculating global counts
#pragma omp barrier // Synchronize all threads before proceeding with the next phase
// Reorder elements into temporary array `tA` based on the local counts
#pragma omp parallel for
for (i = end - 1; i >= start; i--) {
tBits = bits(A[i], pass * mbits, mbits);
tA[lCount[tNum][tBits]] = A[i];
lCount[tNum][tBits]--;
}
#pragma omp barrier // Synchronize all threads before starting the next pass
}
}
int main() {
// Example usage with arbitrary data and parameters
tParams params;
params.num = 1000; // Array size
params.keylength = 32; // Number of bits per key
params.tid = 0; // Thread ID (will be set for each thread in OpenMP)
// Initialize the array A with some random data (for demonstration purposes)
for (int i = 0; i < 1000; i++) {
A[i] = rand() % 10000; // Random numbers for sorting
}
return 0;
}
Version 2 OpenMP
To convert the provided parallel straight radix sort code, which uses custom threading with barriers and
locks, to OpenMP, we need to utilize OpenMP's constructs to handle parallelism, synchronization, and
shared memory management more effectively.
#include <omp.h>
#define mbits 4
#define M 16
fCount[1] = gCount[0] - 1;
for (i = 2; i < M; i++) {
fCount[i] = fCount[i - 1] + gCount[i - 1];
}
fCount[0] = -1;
#pragma omp barrier // Ensure all threads complete the sum and scan
if (tNum == 0) {
// Swap arrays for the next pass
tempPtr = tA;
tA = A;
A = tempPtr;
}
#pragma omp barrier // Ensure array swap is complete before next pass
}
}
Explanation of Changes:
1. Parallelization of Loops:
o The loops that calculate local counts (lCount) and those that move data into the
sorted array (tA) are parallelized using #pragma omp parallel for.
o The barrier synchronization (#pragma omp barrier) is used to synchronize
threads at points where all threads must wait for others to complete before
proceeding, similar to the custom pth_barrier used in the original code.
2. Shared Data:
o The arrays lCount, gCount, and fCount are shared among all threads, but since
lCount is indexed by tNum, each thread is responsible for its own portion of this
array. OpenMP handles the concurrent updates without explicit locks, as long as
we ensure no data races within threads.
3. Array Swapping:
o The array swapping step, where A and tA are swapped, is done by thread 0 only
(if (tNum == 0)). This prevents race conditions on the pointer swap, ensuring
only one thread performs the swap.
4. Synchronization:
o Barriers (#pragma omp barrier) are used to ensure proper synchronization
between the different stages of the algorithm. This is necessary because the sum
computation and prefix scan must be completed before the data can be moved to
the sorted array, and the array swap must wait for all threads to finish the sorting
for the current pass.
5. Task Management:
o Unlike the original code, which uses custom barriers and thread management,
OpenMP simplifies this by handling task scheduling, synchronization, and
execution automatically.