Abstract
Bayesian analysis using reversible jump Markov chain Monte Carlo (RJMCMC) algorithms improves the measurement accuracy, resolution and sensitivity of full waveform laser detection and ranging (LaDAR), but at a significant computational cost. Parallel processing has the potential to significantly reduce the processing time, but although there have been several strategies for Markov chain Monte Carlo (MCMC) parallelization, adaptation of these strategies to RJMCMC may degrade parallel performance.
In this paper, we describe an approach to parallel RJMCMC processing that combines data and sampling parallelism in a single framework. This approach, Data Parallel State Space Decomposed RJMCMC (DP SSD-RJMCMC), can be adapted to different parallel cluster size, improve sampling efficiency and maintain parameter estimation accuracy. Formally, it forms a group of parallel chains by decomposing the state space into subsets of parameter space. Each subset has different but restricted dimensionality, and is assigned with an independent chain of variable length. To further improve load balancing, we also employ data decomposition, forming a task queue and conducting dynamic task allocation. The MPI-based implementation on a 32-node Beowulf cluster leads to significant speedup, typically of the order of 15–25 times, while maintaining the estimation accuracy.
In this paper, we describe an approach to parallel RJMCMC processing that combines data and sampling parallelism in a single framework. This approach, Data Parallel State Space Decomposed RJMCMC (DP SSD-RJMCMC), can be adapted to different parallel cluster size, improve sampling efficiency and maintain parameter estimation accuracy. Formally, it forms a group of parallel chains by decomposing the state space into subsets of parameter space. Each subset has different but restricted dimensionality, and is assigned with an independent chain of variable length. To further improve load balancing, we also employ data decomposition, forming a task queue and conducting dynamic task allocation. The MPI-based implementation on a 32-node Beowulf cluster leads to significant speedup, typically of the order of 15–25 times, while maintaining the estimation accuracy.
Original language | English |
---|---|
Pages (from-to) | 383-399 |
Number of pages | 17 |
Journal | Journal of Parallel and Distributed Computing |
Volume | 73 |
Issue number | 4 |
DOIs | |
Publication status | Published - Apr 2013 |