Algorithm parallelization is a technique used to improve the performance of algorithms by dividing tasks into smaller sub-tasks that can be executed simultaneously on multiple processors. This section will cover the basic concepts, types of parallelism, and practical examples to help you understand and implement parallel algorithms.
Key Concepts
- Parallel Computing: The simultaneous use of multiple compute resources to solve a computational problem.
- Concurrency vs. Parallelism: Concurrency is about dealing with multiple tasks at once, while parallelism is about doing multiple tasks simultaneously.
- Granularity: The size of the tasks into which a problem is divided. Fine-grained parallelism involves small tasks, while coarse-grained parallelism involves larger tasks.
- Speedup: The ratio of the time taken to solve a problem on a single processor to the time taken on multiple processors.
- Amdahl's Law: A formula used to find the maximum improvement in performance using multiple processors.
Types of Parallelism
- Data Parallelism: Distributing subsets of the same data across multiple processors and performing the same operation on each subset.
- Task Parallelism: Distributing different tasks (or threads) across multiple processors.
- Pipeline Parallelism: Dividing a task into stages and processing different stages in parallel.
Practical Examples
Example 1: Parallel Sum of an Array
Let's start with a simple example of summing the elements of an array in parallel using Python's multiprocessing
module.
import multiprocessing def parallel_sum(arr, start, end, result, index): total = sum(arr[start:end]) result[index] = total if __name__ == "__main__": arr = [i for i in range(1000000)] num_processes = 4 chunk_size = len(arr) // num_processes processes = [] result = multiprocessing.Array('i', num_processes) for i in range(num_processes): start = i * chunk_size end = (i + 1) * chunk_size if i != num_processes - 1 else len(arr) process = multiprocessing.Process(target=parallel_sum, args=(arr, start, end, result, i)) processes.append(process) process.start() for process in processes: process.join() total_sum = sum(result) print(f"Total Sum: {total_sum}")
Explanation:
- We divide the array into chunks and assign each chunk to a separate process.
- Each process calculates the sum of its chunk and stores the result in a shared array.
- Finally, we sum the partial results to get the total sum.
Example 2: Parallel Matrix Multiplication
Matrix multiplication can also be parallelized. Here is an example using Python's concurrent.futures
module.
import concurrent.futures import numpy as np def matrix_multiply(A, B, result, row): for j in range(len(B[0])): result[row][j] = sum(A[row][k] * B[k][j] for k in range(len(B))) if __name__ == "__main__": A = np.random.rand(100, 100) B = np.random.rand(100, 100) result = np.zeros((100, 100)) with concurrent.futures.ThreadPoolExecutor() as executor: futures = [executor.submit(matrix_multiply, A, B, result, i) for i in range(len(A))] concurrent.futures.wait(futures) print("Resultant Matrix:") print(result)
Explanation:
- We use a thread pool to parallelize the multiplication of each row of matrix A with matrix B.
- Each thread computes one row of the resultant matrix.
Practical Exercises
Exercise 1: Parallel Array Maximum
Write a parallel algorithm to find the maximum element in an array using Python's multiprocessing
module.
Solution:
import multiprocessing def parallel_max(arr, start, end, result, index): result[index] = max(arr[start:end]) if __name__ == "__main__": arr = [i for i in range(1000000)] num_processes = 4 chunk_size = len(arr) // num_processes processes = [] result = multiprocessing.Array('i', num_processes) for i in range(num_processes): start = i * chunk_size end = (i + 1) * chunk_size if i != num_processes - 1 else len(arr) process = multiprocessing.Process(target=parallel_max, args=(arr, start, end, result, i)) processes.append(process) process.start() for process in processes: process.join() max_value = max(result) print(f"Maximum Value: {max_value}")
Exercise 2: Parallel Merge Sort
Implement a parallel version of the merge sort algorithm.
Solution:
import multiprocessing def merge_sort(arr): if len(arr) > 1: mid = len(arr) // 2 L = arr[:mid] R = arr[mid:] merge_sort(L) merge_sort(R) i = j = k = 0 while i < len(L) and j < len(R): if L[i] < R[j]: arr[k] = L[i] i += 1 else: arr[k] = R[j] j += 1 k += 1 while i < len(L): arr[k] = L[i] i += 1 k += 1 while j < len(R): arr[k] = R[j] j += 1 k += 1 def parallel_merge_sort(arr): if len(arr) <= 1000: merge_sort(arr) else: mid = len(arr) // 2 L = arr[:mid] R = arr[mid:] p1 = multiprocessing.Process(target=parallel_merge_sort, args=(L,)) p2 = multiprocessing.Process(target=parallel_merge_sort, args=(R,)) p1.start() p2.start() p1.join() p2.join() i = j = k = 0 while i < len(L) and j < len(R): if L[i] < R[j]: arr[k] = L[i] i += 1 else: arr[k] = R[j] j += 1 k += 1 while i < len(L): arr[k] = L[i] i += 1 k += 1 while j < len(R): arr[k] = R[j] j += 1 k += 1 if __name__ == "__main__": arr = [i for i in range(1000000, 0, -1)] parallel_merge_sort(arr) print("Sorted Array:") print(arr[:100]) # Print first 100 elements to verify
Conclusion
In this section, we covered the basics of algorithm parallelization, including key concepts, types of parallelism, and practical examples. We also provided exercises to help you practice and reinforce your understanding of parallel algorithms. By leveraging parallel computing, you can significantly improve the performance of your algorithms, making them more efficient and scalable.