http://www.computer.org/parascope/#journal
A god-sent resource for the likes of me, who do research on Parallel Computing and HPC.
I definitely need to read on most of the journals listed here, because the paper is due the end of this month. Which reminds me, I have yet to find an explanation to the anomalous behaviour from 1K210K (I have just coined it such, because of anomalous behaviour from the tests from chunk size 1K to 10 K). Possible explanations should crop up from hardware performce analysis using microbenchmarks, but that's the last resort. I should be able to find similar tests and results from other studies, right?
I'm starting to believe that I might be the only one with such a setup, and with a test problem like what I have. Well, technically it's not really a parallel algorithm, but the idea of load delegation ala-scheduling is something that people have already definitely studied. However, I have not found any resources showing that what I'm trying to do has been done before, and that there are significant or conclusive results that will back up their claims.
But as for now, I'm still thinking that the performance degradation/irregulity from 1k210k must be realted to some hardware properties associated with the message sizes. After all, I have implemented the solution to the parallel prime number funding problem using MPI and an Ethernet interconnection network (with hub with one setup and a switch with one setup). The degradation/irregularity could also be brought about not only by hardware features and capabilities, but by the actual scheduling algorithm being employed.
I have only been using blocking IO for the message passing from the slaves to the master. One issue could be the waiting time brought about by the first-come-first-served (FCFS) scheduling of the master when it comes to getting results and distributing work. I have yet to confirm this, but on-paper analysis could be sufficient to explain some of the irregularities and performance degradation.
Another issue that can be of value would be the memory access and paging method employed by the Linux kernel. I am not yet familiar with the internals, but I intend to look into them as I try to see whether the kernel has something to do with issues when ti comes to handling the memory allocation (malloc, and free) of irregular block sizes (those not a power of 2). OR, it might still be the actualy hardware details that lie beneath the OS, which I might still have to investigate.
So far, these are just still ideas which I may look into and elaborate on as I go along with my reading. I hope someone else has gone ito these details, so that I would spare myself of having to do research into things I shouldn't be worrying myself about in this stage of my thesis.
Or, I could go till summer or the next semester to finish my thesis. But I wouldn't want that, would I? It's bad enough that the exeriment I'm doing is hard, let alone the analysis and actual research involved with the writeup. Now I appreciate why theses are usually done in at least pairs. It gets really lonely as time goes on...
Chilled.
A god-sent resource for the likes of me, who do research on Parallel Computing and HPC.
I definitely need to read on most of the journals listed here, because the paper is due the end of this month. Which reminds me, I have yet to find an explanation to the anomalous behaviour from 1K210K (I have just coined it such, because of anomalous behaviour from the tests from chunk size 1K to 10 K). Possible explanations should crop up from hardware performce analysis using microbenchmarks, but that's the last resort. I should be able to find similar tests and results from other studies, right?
I'm starting to believe that I might be the only one with such a setup, and with a test problem like what I have. Well, technically it's not really a parallel algorithm, but the idea of load delegation ala-scheduling is something that people have already definitely studied. However, I have not found any resources showing that what I'm trying to do has been done before, and that there are significant or conclusive results that will back up their claims.
But as for now, I'm still thinking that the performance degradation/irregulity from 1k210k must be realted to some hardware properties associated with the message sizes. After all, I have implemented the solution to the parallel prime number funding problem using MPI and an Ethernet interconnection network (with hub with one setup and a switch with one setup). The degradation/irregularity could also be brought about not only by hardware features and capabilities, but by the actual scheduling algorithm being employed.
I have only been using blocking IO for the message passing from the slaves to the master. One issue could be the waiting time brought about by the first-come-first-served (FCFS) scheduling of the master when it comes to getting results and distributing work. I have yet to confirm this, but on-paper analysis could be sufficient to explain some of the irregularities and performance degradation.
Another issue that can be of value would be the memory access and paging method employed by the Linux kernel. I am not yet familiar with the internals, but I intend to look into them as I try to see whether the kernel has something to do with issues when ti comes to handling the memory allocation (malloc, and free) of irregular block sizes (those not a power of 2). OR, it might still be the actualy hardware details that lie beneath the OS, which I might still have to investigate.
So far, these are just still ideas which I may look into and elaborate on as I go along with my reading. I hope someone else has gone ito these details, so that I would spare myself of having to do research into things I shouldn't be worrying myself about in this stage of my thesis.
Or, I could go till summer or the next semester to finish my thesis. But I wouldn't want that, would I? It's bad enough that the exeriment I'm doing is hard, let alone the analysis and actual research involved with the writeup. Now I appreciate why theses are usually done in at least pairs. It gets really lonely as time goes on...
Chilled.
Comments
Post a Comment