Abstract
In multiprocessor systems, data parallelism is the execution of the same task on data distributed across multiple processors. It involves splitting the data set into smaller data partitions or batches. The process to split the data among the different processors is call “Data Partitioning” and it is an important factor of efficiency for data parallel processing implementation. Data partitioning influences the workload in each processing unit and the network traffic between processes. A poor partition quality can lead to serious performance problems. This research presents a data partitioning method that can be used to improve the performance of data parallel implementations. The proposed method relies on using an initial screening experiment to run a portion of data units. Regression is then used to create a prediction model of the processing times for each data unit. Using the estimated processing time, load balancing is achieved by implementing a greedy algorithm to distribute the units in a parallel environment. Discrete event simulation is used as the application of this research. Comparisons between equal data partitioning and the methodology proposed in this research indicate that time savings and equal load balancing can be achieved.
Library of Congress Subject Headings
Multiprocessors--Data processing; Statistics; Parallel processing (Electronic computers)
Publication Date
9-2016
Document Type
Thesis
Student Type
Graduate
Degree Name
Industrial and Systems Engineering (MS)
Department, Program, or Center
Industrial and Systems Engineering (KGCOE)
Advisor
Rachel Silvestrini
Advisor/Committee Member
Katie McConky
Recommended Citation
Hidalgo Murillo, Manuel E., "Using Statistical Analysis to Improve Data Partitioning in Algorithms for Data Parallel Processing Implementation" (2016). Thesis. Rochester Institute of Technology. Accessed from
https://repository.rit.edu/theses/9388
Campus
RIT – Main Campus
Plan Codes
ISEE-MS
Comments
Physical copy available from RIT's Wallace Library at QA76.5 .H44 2016