Titelaufnahme

Titel
Very fast finite element poisson solvers on lower precision accelerator hardware : a "Proof-of-Concept" study for NVIDIA Tesla V100 / D. Ruda, S. Turek, D. Ribbrock, P. Zajac
VerfasserRuda, Dustin ; Turek, Stefan ; Ribbrock, Dirk ; Zajac, Peter
Erschienen[Dortmund] : [Technische Universität Dortmund, Fakultät für Mathematik], July 2021
Ausgabe
Elektronische Ressource
Umfang1 Online-Ressource (22 Seiten) : Diagramme
SerieErgebnisberichte angewandte Mathematik ; no. 647
SchlagwörterFinite-Elemente-Methode
URNurn:nbn:de:hbz:6:2-1526803 
DOI10.17877/DE290R-22230 
Zugänglichkeit
 Das Dokument ist öffentlich im Netz zugänglich.
Dateien
Very fast finite element poisson solvers on lower precision accelerator hardware [0.89 mb]
Zusammenfassung

Recently, accelerator hardware in the form of graphics cards including Tensor Cores, specialized for AI, has significantly gained in importance in the domain of high performance computing. For example, NVIDIAs Tesla V100 promises a com-puting power of up to 125 TFLOP/s achieved by Tensor Cores, but only if half precision floating point format is used. We describe the diÿculties and discrepancy between theoretical and actual computing power if one seeks to use such hardware for numerical simulations, i.e., solving partial dierential equations with a matrix-based finite element method, with numerical examples. If certain requirements, namely low condition numbers and many dense matrix operations, are met, the indicated high performance can be reached without an excessive loss of accuracy. A new method to solve linear systems arising from Poissons equation in 2D that meets these re-quirements, based on “prehandling” by means of hierarchical finite elements and an additional Schur complement approach, is presented and analyzed. We provide numerical results illustrating the computational performance of this method and compare it to a commonly used (geometric) multigrid solver on standard hardware. It turns out that we can exploit nearly the full computational power of Tensor Cores and achieve a significant speed-up compared to the standard methodology without losing accuracy.

Klassifikation
Links
Nachweis
Statistik
Das PDF-Dokument wurde 2 mal heruntergeladen.
Nutzungshinweis
Das Medienwerk ist im Rahmen des deutschen Urheberrechts nutzbar.