I am from Xi'an that is a very old city in China. I graduated with an MSc in High Performance Computing from the Edinburgh Parallel Computing Centre (EPCC) of the University of Edinburgh having already achieved a BEng in Telecommunication Engineering from Xidian University in China. I worked as a software engineer to develop embedded software for GPS chips before coming to Simula.
Here, I am a PhD student working in Computational Geoscience Group to do the research focusing on how to efficiently use multicore-based parallel computers for solving partial differential equations (PDEs).
- Parallel programming model and algorithms
- Numerical solution of PDEs
- Performance optimization for parallel codes
- Multi-core architecture
1. Parallelizing Lumped Particle Model with OpenMP on Multi-core Computer
Sand and mud particles that are suspended in a turbidity ﬂow. These particles are initially transported by the ﬂow, but
will eventually settle onto the bottom of a river or the sea ﬂoor. This process is at the heart of computational studies on
how sedimentary rocks are formed by erosion and deposition . A new method based on tracing a group of particles, i.e.,
“lumped particles”, other than tracing each particle to describe particle dispersion and diffusion has been proposed. This
model makes simulations much more practical when number of particles is very huge. More details of algorithm can be
found in .
In this project, I am in charge of parallel implementation, performance optimization and parallel simulation. OpenMP,
a shared memory programming model, is adopted for parallelization based on C++ programming language. Good parallel
performance is essential for detailed studies of the effects of Brownian diffusion and uniform particle transport by such a new
numerical method . The whole simulation is divided into several steps, most of them run in parallel.
We test our code on several platforms. Originally, the speedup of parallel parts is not ideal due to complex cache envi-
ronment on multi-core architecture and NUMA effects. In optimization, we try to improve OpenMP code performance by
controlling data locality with ”First Touch in Parallel” principle, which is used by many Linux/Unix OSes to allocate memory
in a distributed way. We obtained better speedup on all platforms, on some of them, even two times faster. Due to the memory
conﬁgurations of different platforms vary, how to ”touch memory” for better data locality is quite substantial. We are still
studying this part. Other aims are to parallelize those originally serial steps for more parallelism and simulate more complex
The following are snapshots of a simulation.
A full simulation video is offered as follows.
2. A numerical comparison of three parallel solvers for a mathematical model of sedimentation
The inital height of the basin, in meters. Out-lines of Lake Okeechobee and the Kissim- mee River are dotted in white.
The fractions of mud after 10,000 years.
The height of the basin after 10,000 years.
Simulating the evolution of marine basins is a challenging computational task due to complex interplay between erosion, deposition and mass ﬂow of sand, shale, silt, etc. Uncertainties in the transportation modes and ﬂow properties also require a large number of trial computations using different model settings and coefﬁcients, see e.g. . Parallel computing is thus not only indispensable for achieving high spatial and temporal resolution, but also of great importance for handling the repeated computations.
In this project, I developed three parallel solvers respectively using three numerical methods, i.e., fully-explicit, semi-implicit and fully-implicit and compared the stability, accuracy and computational load on a lake with a river ﬂowing in it. All implementations are parallelized with MPI while semi/fully-implicit methods use an external parallel numerical package Trilinos . All implementations give satisﬁed parallel efﬁciency on multi-core clusters.
The fully-explicit method is the most straightforward to implement but can solely achieve ﬁrst order accuracy and has a strict requirement on time step size, which substantially impairs its practicability in large simulations. Although both semi-implicit and fully-implicit methods have much better numerical stability and second order accuracy, semi-implicit still has faster computing speed and better stability and accuracy. But both semi/fully-implicit methods cannot achieve an ideal second order accuracy when spacial resolution is high and time step size is too small due to the round-off error.
In the next move, we will try to expand model to cover sediment compaction and tectonics and also try to optimize solvers by adapting mixed programming, i.e., MPI with OpenMP or PThreads, for better performance. Techniques, like“ﬁrst touch in parallel” and “thread binding”, will be studied more.
 O. Al-Khayat, A. M. Bruaset, and H. P. Langtangen. "A lumped particle modeling framework for simulating particletransport in ﬂuids". Communications in Computational Physics
 Clark. S. R., A. M. Bruaset, T. O. Smme, and T. Lseth. "A Flexible Stochastic Approach to Constraining Uncertainty in
Forward Stratigraphic Models". Cairns, 2009. Proceedings of the 18th IMACS World Congress, MODSIM09.
 The Trilinos Project Home Page. http://http://trilinos.sandia.gov/.