Research
Generating numerical solutions of the Eikonal equation, including its
many variations, have a broad range of applications both in the natural
sciences and in computational analysis. Efficient solvers on
cutting-edge, parallel architectures require new algorithms that
are not, in many cases, optimal, but that are designed to allow
asynchronous solution updates and have limited memory access
patterns. Our proposed method is appropriate for the type of fine-grained
parallelism found on modern massively-SIMD architectures, such as
graphics processors, and takes into account the particular constraints
and capabilities of these computing platforms.
Eikonal equation solution on parallel systems
The finite element method (FEM) is a widely employed numerical
technique for approximating the solution of partial differential equations
(PDEs) in various science and engineering applications. Many of these
applications benefit from fast execution of the FEM pipeline. One way to
accelerate the FEM pipeline is by exploiting advances in modern
computational hardware, such as the many-core streaming processors
like the graphical processing unit (GPU). In this work, we present the
algorithms and data-structures necessary to move the entire FEM
pipeline to the GPU.
FEM on many-core architecture
The levelset method has been used in a large number of applications,
ranging from geometry, fluid mechanics and computer vision to
manufacturing processes and virtually any applications that require
interface tracking. This study presents a parallel algorithm for solving the
levelset equation on fully unstructured 2D or 3D meshes or manifolds.
The method is suitable for the type of fine-grained parallelism found on
modern massively-SIMD architectures such as graphics processors and
takes into account the particular constraints and capabilities of these
computing platforms. In this work, we propose to combine the
narrowband scheme and domain decomposition for efficient levelset
equation solving. We present the efficient narrowband fast iterative
method (nbFIM) to compute the distance transform by solving an eikonal
equation and the patched narrowband (patchNB) scheme to evolve the
embedding. We also propose the Hybrid Gathering parallelism strategy
to enable regular and lock-free computations in both the nbFIM and
patchNB.
Solving levelset equations on GPUs
Numerical methods for elliptic partial dierential equations (PDEs) within
both continuous (CG) and discontinuous Galerkin (DG) frameworks
share the same general structure: local (elemental) matrix generation
followed by a global linear system assembly and solve. The lack of
inter-element communication and easily parallelizable nature of the local
matrix generation stage coupled with the parallelization techniques
developed for the linear system solvers make a numerical scheme for
elliptic PDEs a good candidate for implementation on streaming
architectures such as modern graphical processing units (GPUs). We
propose an algorithmic pipeline for mapping an elliptic finite element
method to the GPU and perform a case study for a particular method
within the DG framework, namely the hybridized DG (HDG) method.
This study provides comparison between CPU and GPU
implementations of the method as well as highlights certain
performance-crucial implementation details. The choice of the HDG
method for the case study was dictated by the computationally-heavy
local matrix generation stage as well as the reduced trace-based
communication pattern, which together make the method amenable to
the fine-grained parallelism of GPUs. We demonstrate that the HDG
method is well-suited for GPU implementation, obtaining total speedups
on the order of 30 times over a serial CPU implementation.
Hybridizable Discontinous Galerkin method on GPUs
The simulation of electrical activity in the heart, such as normal and
abnormal ventricular rhythms and ischemia, utilize computational
methods that rely on an underlying geometric model, or polygonal mesh
of cardiac tissues and boundaries. Because of the complex shape of
many biological structures, it is often difficult to create meshes that
conform to the boundaries between distinct regions. The resulting
meshes can be non-conformal, i.e., they have element faces that do not
align with the surface tangents and the elements represent a smooth
surface as a jagged boundary. We hypothesize that these jagged,
non-conformal meshes produce local concentrations of current that lead
to artifacts large enough to distort the resulting potential ?elds and
generate misleading results. In simulations of acute ischemia, these
artifacts can alter the location and severity of the epicardial elevations
and depressions, which, in turn, can impact clinical diagnosis. In the
case of defibrillation, these artifacts can distort the current density
computed through thin structures such as the myocardial wall.
Conformal meshing
Segmentation of the left atrium wall from delayed enhancement MRI is
challenging because of inconsistent contrast combined with noise and
high variation in atrial shape and size. This paper presents a method for
left-atrium wall segmentation by using a novel sophisticated
mesh-generation strategy and graph cuts on a proper ordered graph.
The mesh is part of a template/model that has an associated set of
learned intensity features. When this mesh is overlaid onto a test
image, it produces a set of costs on the graph vertices which eventually
leads to an optimal segmentation. The novelty also lies in the
construction of proper ordered graphs on complex shapes and for
choosing among distinct classes of base shapes/meshes for automatic
segmentation. We evaluate the proposed segmentation framework
quantitatively on simulated and clinical cardiac MRI.
Meshing and image segmentation
Zhisong Fu's homepage