Research
Generating numerical solutions of the Eikonal equation, including its many variations, have a broad range of applications both in the natural sciences and in computational analysis. Efficient solvers on cutting-edge, parallel architectures require new algorithms that
are not, in many cases, optimal, but that are designed to allow asynchronous solution updates and have limited memory access patterns. Our proposed method is appropriate for the type of fine-grained parallelism found on modern massively-SIMD architectures, such as graphics processors, and takes into account the particular constraints and capabilities of these computing platforms.
+ Home
+ Research
+ Publications
+ Personal
Eikonal equation solution on parallel systems
The finite element method (FEM) is a widely employed numerical technique for approximating the solution of partial differential equations (PDEs) in various science and engineering applications. Many of these applications benefit from fast execution of the FEM pipeline. One way to accelerate the FEM pipeline is by exploiting advances in modern computational hardware, such as the many-core streaming processors like the graphical processing unit (GPU). In this work, we present the algorithms and data-structures necessary to move the entire FEM pipeline to the GPU.
FEM on many-core architecture
The levelset method has been used in a large number of applications, ranging from geometry, fluid mechanics and computer vision to manufacturing processes and virtually any applications that require interface tracking. This study presents a parallel algorithm for solving the levelset equation on fully unstructured 2D or 3D meshes or manifolds. The method is suitable for the type of fine-grained parallelism found on modern massively-SIMD architectures such as graphics processors and takes into account the particular constraints and capabilities of these computing platforms. In this work, we propose to combine the narrowband scheme and domain decomposition for efficient levelset equation solving. We present the efficient narrowband fast iterative method (nbFIM) to compute the distance transform by solving an eikonal equation and the patched narrowband (patchNB) scheme to evolve the embedding. We also propose the Hybrid Gathering parallelism strategy to enable regular and lock-free computations in both the nbFIM and patchNB.
Solving levelset equations on GPUs
Numerical methods for elliptic partial dierential equations (PDEs) within both continuous (CG) and discontinuous Galerkin (DG) frameworks share the same general structure: local (elemental) matrix generation followed by a global linear system assembly and solve. The lack of inter-element communication and easily parallelizable nature of the local matrix generation stage coupled with the parallelization techniques developed for the linear system solvers make a numerical scheme for elliptic PDEs a good candidate for implementation on streaming architectures such as modern graphical processing units (GPUs). We propose an algorithmic pipeline for mapping an elliptic finite element method to the GPU and perform a case study for a particular method within the DG framework, namely the hybridized DG (HDG) method. This study provides comparison between CPU and GPU implementations of the method as well as highlights certain performance-crucial implementation details. The choice of the HDG method for the case study was dictated by the computationally-heavy local matrix generation stage as well as the reduced trace-based communication pattern, which together make the method amenable to the fine-grained parallelism of GPUs. We demonstrate that the HDG method is well-suited for GPU implementation, obtaining total speedups on the order of 30 times over a serial CPU implementation.
Hybridizable Discontinous Galerkin method on GPUs
The simulation of electrical activity in the heart, such as normal and abnormal ventricular rhythms and ischemia, utilize computational methods that rely on an underlying geometric model, or polygonal mesh of cardiac tissues and boundaries. Because of the complex shape of many biological structures, it is often difficult to create meshes that conform to the boundaries between distinct regions. The resulting meshes can be non-conformal, i.e., they have element faces that do not align with the surface tangents and the elements represent a smooth surface as a jagged boundary. We hypothesize that these jagged, non-conformal meshes produce local concentrations of current that lead to artifacts large enough to distort the resulting potential ?elds and generate misleading results. In simulations of acute ischemia, these artifacts can alter the location and severity of the epicardial elevations and depressions, which, in turn, can impact clinical diagnosis. In the case of defibrillation, these artifacts can distort the current density computed through thin structures such as the myocardial wall.
Conformal meshing
Segmentation of the left atrium wall from delayed enhancement MRI is challenging because of inconsistent contrast combined with noise and high variation in atrial shape and size. This paper presents a method for left-atrium wall segmentation by using a novel sophisticated mesh-generation strategy and graph cuts on a proper ordered graph. The mesh is part of a template/model that has an associated set of learned intensity features. When this mesh is overlaid onto a test image, it produces a set of costs on the graph vertices which eventually leads to an optimal segmentation. The novelty also lies in the construction of proper ordered graphs on complex shapes and for choosing among distinct classes of base shapes/meshes for automatic segmentation. We evaluate the proposed segmentation framework quantitatively on simulated and clinical cardiac MRI.
Meshing and image segmentation
Zhisong Fu's homepage