James Bednar - SOSA: The Scalable Open-Source Analysis Stack
Description: Are you a researcher, scientist, engineer, data analyst, or in some other technical role where you find yourself limited by your domain’s software stack? Many technical disciplines are built around their own legacy sets of tools for storing, reading, processing, plotting, analyzing, modeling, and exploring data. Typical tools in wide use are tied to outdated architectures and assumptions, and are not cloud or remote friendly (as they are tied to a local desktop GUI or OS), not scalable (as they are tied to a single CPU either by a license or from software limitations), and not general purpose (with expertise concentrated in a small group of maintainers and users). Given that the vast majority of tasks involved in any data processing are shared across different academic disciplines and industries, could there be a better way? Yes! Consider using SOSA: The Scalable Open-Source Analysis Stack. SOSA is a collection of interoperable tools that form a solid basis for processing data of almost any kind. Some of these tools are only a couple of years old, while others have been in wide use for a decade or more, but they are all actively being maintained, extended, and applied to a wide variety of research areas and problems. These tools are domain independent (validated in many different contexts), scalable (from laptops to compute clusters to petabyte-scale supercomputers), scriptable (for parameter searches or automation), cloud friendly (usable locally or remotely with any file storage system), runnable on CPUs or GPUs, compositional (usable independently or together), instantly visualizable (at full scale, without subsampling), interactive (in web browsers), shareable (with results as HTML documents or apps), and open source (for both commercial and academic use). In this talk I'll present the packages in SOSA, including parquet, kerchunk, pandas, xarray, rapids, dask, numba, hvplot, panel, and jupyter. I'll discuss how they achieve scalability, being cloud friendly, and all those other properties. I'll also show how you can customize this stack to apply to your own specific domain, with examples from a wide variety of existing scientific and research areas. Let SOSA do all the heavy lifting, and you can focus on your own domain! Bio: Jim Bednar is the Director of Custom Services at Anaconda, Inc. Dr. Bednar holds a Ph.D. in Computer Science from the University of Texas, along with degrees in Electrical Engineering and Philosophy. He has published more than 50 papers and books about the visual system, software development, and reproducible science. Dr. Bednar manages the HoloViz project, a collection of open-source Python tools that includes Panel, hvPlot, Datashader, HoloViews, GeoViews, Param, Lumen, and Colorcet. Dr. Bednar was a Lecturer and Reader in Computational Neuroscience at the University of Edinburgh from 2004-2015, and previously worked in hardware engineering and data acquisition at National Instruments.
Download
0 formatsNo download links available.