Error-controlled Lossy Compression for Large-scale Scientific Simulation Datasets
This event is in the past.
Detroit, MI 48201
Today's large-scale high-performance computing (HPC) scientific applications are producing extremely vast volume of data, thus compressing them before storage/transmission is critical to the success of many scientific research projects. Lossless compression, however, suffers from significantly limited compression ratios on scientific data because of random ending mantissa bits in floating-point numbers. Error-bounded lossy compression has been studied for years since not only can it significantly reduce the data size (by one or two order of magnitudes) but it can also control the data distortion strictly based on user's requirement. In this talk, I will describe a significant research issue in HPC: error-controlled lossy compression for scientific simulations. will first introduce the research background and motivation: why lossy compression is significant to HPC applications and various use-cases such as reducing I/O time, storage footprint and memory footprint. Then, I will talk about some technical details, including different state-of-the-art lossy compressors for scientific datasets, compression quality assessment methodology and evaluation results. Finally, I will conclude my presentation with a vision of the future directions
in the lossy compression community.
Dr. Sheng Di is an assistant computer scientist at Argonne National Laboratory and a senior member of IEEE. He received his master degree from Huazhong University of Science and Technology in 2007 and Ph.D degree from The University of Hong Kong in 2012. With 10+ years solid experience in developing distributed computing projects (including Cloud, HPC/Grid, P2P), his strengths include theoretical analysis, system design/development and performance optimization. He has a broad
research interest, including scientific data compression, data analysis, fault tolerance, resource discovery in HPC, Grid computing, P2P and Cloud computing, analysis and prediction of Google workload based on Google trace, optimization of distributed resource allocation, virtual machine migration, etc. He is the pioneer of error-controlled lossy compressor for scientific datasets and multi-level checkpointing/restart model for large-scale HPC simulations. He published 90+ refereed journal and conference papers (including TPDS, TC, TCC, TKDE, JPDC, SC’XY, IPDPS, HPDC, ICPP, CLUSTER, IWQoS, CCGrid, HiPC, Grid, CLOUD, UCC, EuroPar, ICPADS, and so on) and served as program committee members 20+ times. He is the recipient of 2018 IEEE-Chicago Distinguished Mentoring Award.