0% found this document useful (0 votes)
7 views

Radix Sort

Uploaded by

Bezawada Manasa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Radix Sort

Uploaded by

Bezawada Manasa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Radix Sort

To convert the given Parallel Straight Radix Sort function to OpenMP, we need to focus on
parallelizing the computation of lCount (which holds counts of bit patterns for each thread), the
prefix sum calculation across threads, and the reordering of the array. The synchronization in the
original code is managed with barriers (pth_barrier), which we will replace with OpenMP’s
#pragma omp barrier directive.

Key Concepts:

1. Parallelization:
o We can parallelize the loop that counts the bit patterns (lCount) across threads.
o We can also parallelize the computation of the global count (gCount) and the
exclusive prefix sum (fCount) in a way that avoids race conditions.
o The reordering step where we distribute the elements into tA can be parallelized
using a #pragma omp for directive.
2. Barriers:
o OpenMP provides #pragma omp barrier to synchronize threads, similar to the
pth_barrier used in the original code.
3. Data Sharing:
o We'll need to ensure proper data sharing for variables such as lCount, gCount,
and fCount. These variables are updated by multiple threads, so we'll need to be
careful with data race conditions and ensure they are correctly shared or private.

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>

#define mbits 4
#define M 16
#define NUM_THREADS 4 // Number of threads (this can be set dynamically)
int A[1000]; // Example input array, size should be set according to your case
int tA[1000]; // Temporary array for storing results
int lCount[NUM_THREADS][M]; // Local counts for each thread
int gCount[M]; // Global counts for all threads
int fCount[M]; // Prefix sums for counts

// Simulate the bits function (extract the bits from a number)


// Extract mbits bits starting at position 'start'
int bits(int value, int start, int mbits) {
return (value >> start) & ((1 << mbits) - 1);
}

// Parallelized Radix Sort function


void ParallelStraightRadixSort(void *par) {
tParams *lpar = (tParams *)par;
int N = lpar->num;
int b = lpar->keylength;
int tNum = lpar->tid;
int i, j, pass, tBits;
int *tempPtr;
int start, end, offset;

// Calculate the range of elements each thread will work on


start = ((float)N / NUM_THREADS) * tNum;
end = ((float)N / NUM_THREADS) * (tNum + 1);
if (tNum == NUM_THREADS - 1) end = N; // Ensure the last thread handles the remaining
elements

// Loop over each pass of the radix sort


for (pass = 0; pass < (b / mbits); pass++) {

// Initialize local count for each thread (clear lCount array)


#pragma omp parallel for
for (j = 0; j < M; j++) {
lCount[tNum][j] = 0;
}

// Count bit patterns in parallel


#pragma omp parallel for
for (i = start; i < end; i++) {
lCount[tNum][bits(A[i], pass * mbits, mbits)]++;
}

#pragma omp barrier // Synchronize all threads before calculating global counts

// Only the first thread computes the global counts


if (tNum == 0) {
// Compute the global count (gCount) from local counts (lCount)
for (i = 0; i < M; i++) {
gCount[i] = 0;
for (j = 0; j < NUM_THREADS; j++) {
gCount[i] += lCount[j][i];
}
}

// Compute exclusive prefix sum for gCount into fCount


fCount[1] = gCount[0] - 1;
for (i = 2; i < M; i++) {
fCount[i] = fCount[i - 1] + gCount[i - 1];
}
fCount[0] = -1;

// Prefix scan for each bit pattern in lCount using fCount


for (i = 0; i < M; i++) {
lCount[0][i] += fCount[i];
for (j = 1; j < NUM_THREADS; j++) {
lCount[j][i] += lCount[j - 1][i];
}
}
}

#pragma omp barrier // Synchronize all threads before proceeding with the next phase

// Reorder elements into temporary array `tA` based on the local counts
#pragma omp parallel for
for (i = end - 1; i >= start; i--) {
tBits = bits(A[i], pass * mbits, mbits);
tA[lCount[tNum][tBits]] = A[i];
lCount[tNum][tBits]--;
}

#pragma omp barrier // Synchronize all threads after reordering

// Swap pointers between A and tA for the next pass


if (tNum == 0) {
tempPtr = tA;
tA = A;
A = tempPtr;
}

#pragma omp barrier // Synchronize all threads before starting the next pass
}
}

int main() {
// Example usage with arbitrary data and parameters
tParams params;
params.num = 1000; // Array size
params.keylength = 32; // Number of bits per key
params.tid = 0; // Thread ID (will be set for each thread in OpenMP)

// Initialize the array A with some random data (for demonstration purposes)
for (int i = 0; i < 1000; i++) {
A[i] = rand() % 10000; // Random numbers for sorting
}

// Parallel Straight Radix Sort with OpenMP


#pragma omp parallel
{
params.tid = omp_get_thread_num();
ParallelStraightRadixSort(&params);
}

// Print sorted array (optional)


for (int i = 0; i < 1000; i++) {
printf("%d ", A[i]);
}
printf("\n");

return 0;
}

Key Changes and Considerations:

1. Parallel Count Calculation:


o The lCount array for each thread is calculated in parallel using the #pragma omp
parallel for directive.
o The count of bit patterns (from bits(A[i], pass * mbits, mbits)) is done by
each thread in parallel.
2. Global Count Calculation (gCount):
o This part is performed by thread tNum == 0 to sum the lCount arrays across all
threads. We use OpenMP to parallelize the summation over NUM_THREADS.
3. Prefix Sum Calculation:
o We perform the exclusive prefix sum of gCount into fCount in a sequential
fashion (only in thread 0). This step is critical for correctly positioning the
elements in the array during the next phase.
4. Reordering Step:
o The final step of moving the elements from A to tA is done in parallel by each
thread. Each thread writes to a different part of the array (tA), using the updated
lCount values.
5. Barriers:
o #pragma omp barrier is used to synchronize the threads after each significant
step, ensuring that all threads complete their part before proceeding.
6. Array Pointer Swap:
o After each pass, the array A and tA are swapped using pointers. This ensures that
the next pass works on the sorted result from the previous pass.

Version 2 OpenMP

To convert the provided parallel straight radix sort code, which uses custom threading with barriers and
locks, to OpenMP, we need to utilize OpenMP's constructs to handle parallelism, synchronization, and
shared memory management more effectively.

#include <omp.h>
#define mbits 4
#define M 16

void ParallelStraightRadixSort(void *par) {


tParams *lpar = (tParams *) par;
int N = lpar->num;
int b = lpar->keylength;
int tNum = lpar->tid;
int i, j, pass, tBits;
int *tempPtr;
int start, end, offset;
start = ((float)N / NUM_THREADS) * tNum;
end = ((float)N / NUM_THREADS) * (tNum + 1);
if (tNum == NUM_THREADS - 1) end = N;

for (pass = 0; pass < (b / mbits); pass++) {


// Reset local counts
#pragma omp parallel for
for (j = 0; j < M; j++) {
lCount[tNum][j] = 0;
}

#pragma omp parallel for


for (i = start; i < end; i++) {
lCount[tNum][bits(A[i], pass * mbits, mbits)]++;
}

#pragma omp barrier // Synchronize threads before computing the sums


if (tNum == 0) {
// Compute global counts and exclusive prefix scan
for (i = 0; i < M; i++) {
gCount[i] = 0;
for (j = 0; j < NUM_THREADS; j++) {
gCount[i] += lCount[j][i];
}
}

fCount[1] = gCount[0] - 1;
for (i = 2; i < M; i++) {
fCount[i] = fCount[i - 1] + gCount[i - 1];
}
fCount[0] = -1;

for (i = 0; i < M; i++) {


lCount[0][i] += fCount[i];
for (j = 1; j < NUM_THREADS; j++) {
lCount[j][i] += lCount[j - 1][i];
}
}
}

#pragma omp barrier // Ensure all threads complete the sum and scan

#pragma omp parallel for


for (i = end - 1; i >= start; i--) {
tBits = bits(A[i], pass * mbits, mbits);
tA[lCount[tNum][tBits]] = A[i];
lCount[tNum][tBits]--;
}

#pragma omp barrier // Synchronize before swapping arrays

if (tNum == 0) {
// Swap arrays for the next pass
tempPtr = tA;
tA = A;
A = tempPtr;
}

#pragma omp barrier // Ensure array swap is complete before next pass
}
}

Explanation of Changes:

1. Parallelization of Loops:
o The loops that calculate local counts (lCount) and those that move data into the
sorted array (tA) are parallelized using #pragma omp parallel for.
o The barrier synchronization (#pragma omp barrier) is used to synchronize
threads at points where all threads must wait for others to complete before
proceeding, similar to the custom pth_barrier used in the original code.
2. Shared Data:
o The arrays lCount, gCount, and fCount are shared among all threads, but since
lCount is indexed by tNum, each thread is responsible for its own portion of this
array. OpenMP handles the concurrent updates without explicit locks, as long as
we ensure no data races within threads.
3. Array Swapping:
o The array swapping step, where A and tA are swapped, is done by thread 0 only
(if (tNum == 0)). This prevents race conditions on the pointer swap, ensuring
only one thread performs the swap.
4. Synchronization:
o Barriers (#pragma omp barrier) are used to ensure proper synchronization
between the different stages of the algorithm. This is necessary because the sum
computation and prefix scan must be completed before the data can be moved to
the sorted array, and the array swap must wait for all threads to finish the sorting
for the current pass.
5. Task Management:
o Unlike the original code, which uses custom barriers and thread management,
OpenMP simplifies this by handling task scheduling, synchronization, and
execution automatically.

You might also like