TY - JOUR
T1 - Dimensioning scientific computing systems to improve performance of Map-Reduce based applications
AU - Castañé, Gabriel G.
AU - Núñez, Alberto
AU - Filgueira, Rosa
AU - Carretero, Jesús
N1 - Funding Information:
This research was partially supported by the Spanish Ministry of Science and Innovation under the grants TIN2010-16497 and TIN2009-14312-C02-01 and the Santander-UCM Programme to fund research groups (GR35/10-A - group number 910606).
PY - 2012
Y1 - 2012
N2 - Map-Reduce is a programming model widely used for processing large data sets on scientific clusters. Most of the efforts and research are focused on enhancing and alleviating the drawbacks of the model proposed by Google. The requirements of Map-Reduce based applications are often unclear because of the difficulty in satisfying the overall system throughput, as well as exploring alternatives to obtain a good tradeoff between the performance of basic systems such as storage, networking and CPU. In this paper we present an evaluation of the compared performance of scaling up scientific computing systems using a Map-Reduce application model. This work is specifically focused on medium-size multi-core systems, frequently used by researchers to compute scientific applications. The scaling process is oriented towards the three main resources: computing power, communications and storage. By performing an extensive set of simulations using iCanCloud simulator, we also show that main bottlenecks of those kinds of applications executed in cluster systems are found in storage and network systems. Thence, in order to increase the overall performance of those applications, the computing power must be scaled up proportionally along the network and storage system.
AB - Map-Reduce is a programming model widely used for processing large data sets on scientific clusters. Most of the efforts and research are focused on enhancing and alleviating the drawbacks of the model proposed by Google. The requirements of Map-Reduce based applications are often unclear because of the difficulty in satisfying the overall system throughput, as well as exploring alternatives to obtain a good tradeoff between the performance of basic systems such as storage, networking and CPU. In this paper we present an evaluation of the compared performance of scaling up scientific computing systems using a Map-Reduce application model. This work is specifically focused on medium-size multi-core systems, frequently used by researchers to compute scientific applications. The scaling process is oriented towards the three main resources: computing power, communications and storage. By performing an extensive set of simulations using iCanCloud simulator, we also show that main bottlenecks of those kinds of applications executed in cluster systems are found in storage and network systems. Thence, in order to increase the overall performance of those applications, the computing power must be scaled up proportionally along the network and storage system.
KW - Map-Reduce applications
KW - Modeling and simulation
KW - Performance prediction
KW - Scientific applications
KW - Scientific clusters
UR - http://www.scopus.com/inward/record.url?scp=84896976812&partnerID=8YFLogxK
U2 - 10.1016/j.procs.2012.04.024
DO - 10.1016/j.procs.2012.04.024
M3 - Conference article
AN - SCOPUS:84896976812
SN - 1877-0509
VL - 9
SP - 226
EP - 235
JO - Procedia Computer Science
JF - Procedia Computer Science
T2 - 12th Annual International Conference on Computational Science 2012
Y2 - 4 June 2012 through 6 June 2012
ER -