The NSDF pilot, a collaborative effort connecting an open network of institutions, offers a modular and easily accessible data delivery environment. Configurable for individual and shared scientific use, this environment operates at the best economies of scale, filling a crucial gap in the current computational infrastructure. Funded by the National Science Foundation, the pilot embraces equity in access to data and cyberinfrastructure resources, benefiting a wide range of scientific domains. The active involvement of Historically Black Colleges, the Minority Serving Cyberinfrastructure Consortium, and Hispanic Serving Institutions informs the NSDF’s development, advancing inclusivity in data-driven science.
The vision of the NSDF is to establish a globally connected infrastructure that transcends the limitations of extreme data. The mission is clear: to democratize access to large-scale scientific data by developing scalable solutions for data storage, movement, and processing – deployable on various platforms, including commodity hardware and cloud computing.
Werner Sun, Director of CHESS IT, shed light on this transformative collaboration, "CHESS is collaborating with NSDF to develop a set of applications for data-intensive science, focusing on real-time visualization of large three-dimensional datasets. The eventual goal is for CHESS to be part of this national cyberinfrastructure, facilitating the transport of CHESS data across the country for analysis by other researchers."
Commissioned in November 2023, the NSDF Entry Point at CHESS serves as a customized server connecting CHESS to NSDF storage, compute, and networking components. This Entry Point empowers CHESS users with NSDF dashboards for easy-to-use and scalable tools, offering a complete software stack for accessing data services while simplifying the intricacies of high-speed data movements.
A pivotal development in this collaboration is the implementation of the NSDF dashboard built on OpenViSUS technology, a data-intensive analytics and visualization platform that streamlines data collection, improves data quality, and increases scientific productivity. By facilitating real-time visualization of large three-dimensional datasets collected at CHESS, OpenViSUS enables experimenters to perform preliminary analysis at the beamline, with data visualized in as little as 20 minutes. NSDF Dashboards integrated into the system provide interactive data quality monitoring, allowing researchers to identify and address issues during data collection. These dashboards can be accessed onsite or offsite, allowing remote users to monitor experimental progress from their home institutions.
“We started with beamlines that are producing a lot of data: the 3A/FAST [Forming and Shaping Technology] beamline and the 4B/QM2 [Q-Mapping for Quantum Materials] beamline as well. Eventually we hope to incorporate more and more beamlines into this system,” says Dr. Sun.
“FAST users routinely collect several 10s of terabytes of data per week, and they need to be able to rapidly visualize those datasets in order to make informed decision about data collection strategy during complex in-situ experiments. The OpenVISUS dashboard is game-changing in terms of allowing this, especially for the hybrid on-site/remote teams we're seeing more often these days,” said Kate Shanks, FAST beamline scientist.
In the short term, the real-time data visualization capability aims to improve the quality of data being collected, providing greater visibility to CHESS users and enabling adjustments on the fly. In the long term, the collaboration aligns with CHESS's broader data initiatives, addressing challenges related to managing large datasets through hardware investments, data services development, and training materials. This also contributes to the CHESS FAIR data curation principles of being findable, accessible, interoperable, and reusable.
Dr. Sun emphasizes the collaboration's impact on diversity, equity, and inclusion, “it is a central goal of the NSDF – to democratize data access so that individuals, organizations, research groups, from institutions across the board, including minority-serving institutions and under-resourced institutions, have the ability to access and benefit from the data that is being collected at major research centers across the US."
The NSDF/CHESS collaboration represents a significant stride in creating a federated data fabric focused on inclusivity, scalability, and real-time data visualization. As CHESS continues to be at the forefront of data-intensive science, this partnership will ultimately lead to more efficient use of limited resources: beam time, travel expenses, and the one-of-a-kind datasets collected at CHESS.
As the collaboration continues to unfold, it promises to impact not only the CHESS community but also other synchrotrons and large scientific projects, already including the IceCube neutrino observatory, the XenonNT dark matter detector, and the Materials Commons. The NSDF pilot sets the stage for a connected infrastructure that transcends data domain boundaries, advancing the ethos of open science in the United States and around the world. This partnership is more than a technological advancement; it symbolizes a paradigm shift towards democratizing collaborative, accessible, and equitable scientific exploration.
In total, this project funds five Principal Investigators through an NSF award to the University of Utah, #21138811, “Piloting the National Science Data Fabric: A Platform Agnostic Testbed for Democratizing Data Delivery.”
- Valerio Pascucci (University of Utah)
- Frank Wuerthwein (San Diego Supercomputing Center)
- Alexander Szalay (Johns Hopkins University)
- John Allison (University of Michigan)
- Michela Taufer (University of Tennessee)