Publication of research datasets is now a requirement of most funding agencies and journals. Data curation is the process of ensuring that these datasets are findable, accessible, and usable. In the era of Big Data, the generation of datasets with sizes on the order of 100s of gigabytes and larger is increasingly common. Such large datasets create challenges for both the curation and publishing of data as they often cannot be accessed on standard computer hardware or hosted in traditional online repositories. This presentation provides an overview of a collaborative process between the CU Boulder Libraries and CU Boulder Research Computing in which high-performance computing infrastructure is used to curate and publish gigabyte- and terabyte-scale datasets in a manner that makes them accessible to the research community.
The University of North Dakota (UND) Genomics Core has launched GenomEX 2.0, the first comprehensive and user-friendly bioinformatics platform powered by Oracle Cloud Infrastructure. This innovative platform enables biologists to seamlessly install over 13,000 bioinformatics tools, generate and execute custom code or command lines, and receive real-time guidance from an AI-based bioinformatics assistant—all through intuitive, one-click processes. To support the computing requirements for any bioinformatics tools, the platform is powered by the Oracle Cloud Infrastructure that provides fully secured (built-in security features and compliance certifications), personalized (adjustable CPU/GPU numbers & memory/storage capacity), dedicated (resources available 24/7 without any queue) and customizable (users have administrator rights) cloud-based high-performance computing environments at unbeatable pricing. Through the combined expertise of the UND Genomics Core and Oracle, GenomEX 2.0 emerges as a powerful and unique bioinformatics platform, providing every biologist with the freedom to explore biological data independently, regardless of their coding proficiency.
In the presentation, we would share info on topics such as metrics, our user survey, and some other approaches. As part of the talk, we would like to engender a discussion and exchange of info about what other sites do to measure the effectiveness of their HPC environments.