NATIONAL ARTIFICIAL
INTELLIGENCE RESEARCH RESOURCE

Annual Meeting

March 10 - 13, 2026 – Hyatt Regency Crystal City, Arlington, VA

Tutorials

Invitees have the opportunity to select from the below Tutorials when they register. All Tutorials will take place March 10 (Day 1) of the conference, with half-day and full-day options. Each registrant may register for a single full-day tutorial, or for two non-overlapping, half-day tutorials (one AM, one PM).

Abstracts include information about learning objectives, intended audience, skill and knowledge pre-requisites, learner-provided equipment (typically a laptop with WiFi and browser) and details on how the tutorial will support access to additional technology resources relevant to the tutorial.

Deadline for registering for tutorials is Feb. 27th.

 

Half-Day AM Tutorials

Teaching in the AI Classroom: Expanding Access to Interactive Computing through JupyterHub and CloudBank >

Teaching in the AI Classroom: Expanding Access to Interactive Computing through JupyterHub and CloudBank

Eric Van Dusen, University of California Berkeley

Sean Morris, University of California Berkeley

The National AI Research Resource (NAIRR) seeks to broaden equitable access to AI education and computing. This tutorial will showcase a proven pathway for onboarding educators and students into AI-ready environments using open-source JupyterHub infrastructure deployed through the CloudBank–2i2c partnership. Over the past four years, UC Berkeley and partner institutions have supported thousands of learners and faculty—particularly from community colleges and minority-serving institutions—through scalable cloud-based Jupyter environments.

This session provides a hands-on introduction to deploying and teaching with these AI-ready hubs. Participants will experience the full workflow from user onboarding to GPU-enabled notebooks running foundational AI exercises. The focus will be on low-friction setup: how instructors and students can log in with a browser, access curated environments (Python, pandas, visualization libraries, and AI access through small models or API access), and immediately begin exploring AI workflows without local installs or admin privileges.

Attendees will learn how to open Kubernetes-based JupyterHubs (via CloudBank and 2i2c) to support scalable AI education; how to configure and manage classroom environments for diverse teaching contexts (from Data 8–style computing courses to applied AI labs); how to integrate NAIRR-supported GPU resources such as NRP and Jetstream2 into interactive teaching; and how to build sustainable faculty communities around shared AI teaching infrastructure. The workshop will include guided activities on both the instructor and learner sides—creating and sharing notebooks, monitoring usage, and connecting to cloud GPU resources. Participants will leave with templates and documentation they can immediately adapt for their own courses, as well as links to open instructional materials.

Target Audience & Prerequisites: This tutorial is intended for faculty, instructional staff, and research computing professionals who teach or support AI, machine learning, data science, or computational STEM courses. No prior cloud engineering or DevOps experience is required. Participants should have basic familiarity with Python and with teaching or supporting computational coursework, though the hands-on exercises are designed to be accessible to beginners.

Learning Objectives: By the end of the session, participants will be able to (1) launch and operate an educational JupyterHub; (2) configure reproducible AI-ready environments; (3) connect notebooks to NAIRR-supported GPU resources; and (4) onboard students into cloud-based workflows using only a web browser.

Technology Requirements: Learners will need a laptop with WiFi and a modern web browser. No local installation of Python or software tools is required.

Access & Account Support: During the event, registrants will receive emailed instructions for accessing the workshop JupyterHub(s). During the tutorial, facilitators will provide real-time support for login issues, environment navigation, and GPU access. All tutorial materials—example notebooks, deployment guides, and environment specifications—will remain available after the workshop for participants to reuse at their home institutions. This session aligns closely with NAIRR goals of democratizing access to AI computing and expanding participation in AI teaching and research. It will engage stakeholders across institutions and backgrounds, modeling how a national AI classroom infrastructure could be deployed sustainably and equitably.

National Data Platform >

National Data Platform

Pedro Ramonetti, San Diego Supercomputer Center, UCSD

Saleem Alharir, University of Utah

Ilkay Altintas, San Diego Supercomputer Center, UCSD

Manish Parashar, University of Utah

The National Data Platform (NDP) is an online, data-driven ecosystem that integrates national cyberinfrastructure to advance open and collaborative research and education. NDP enables data providers to make datasets discoverable and accessible; researchers to seamlessly integrate data, computing services, and AI models into their scientific workflows; educators to incorporate computational resources into their courses; and students to engage in experiential learning through collaborative projects and open data challenges.

For Annual Meeting attendees, the tutorial provides a practical entry point into leveraging national cyberinfrastructure through NDP, showcasing how researchers can register and access datasets, connect institutional resources, and build reproducible scientific workflows that contribute to a growing network of open and FAIR research assets.

Session Objectives: The goal of this tutorial is to onboard researchers and educators to the National Data Platform (NDP) and demonstrate how its integrated ecosystem supports data-driven research collaboration and reproducibility.

The session will provide a guided, hands-on overview of:

 
  • Exploring and managing data through the NDP Catalog. Guiding participants in navigating the data catalog, exploring sample datasets and resources, and reviewing how new datasets are registered using NDP’s submission and metadata workflows.
  • Collaborative research environments. Using shared workspaces, dedicated catalogs, and project-based collaboration studios to facilitate team science and cross-institutional collaboration.
  • Connecting institutional infrastructure through NDP endpoints. Demonstrating how institutions can link their own computational or storage resources to NDP via lightweight endpoint scripts.

By the end of the tutorial, participants will understand how to leverage NDP to support reproducible, data-driven research and education within their own institutions, share resources with collaborators, and contribute to the broader NAIRR effort to democratize AI-enabled research across domains.

Relevance to the NAIRR Program and Annual Meeting Attendees This tutorial highlights NDP as a core demonstration project within the NAIRR, designed to advance broad access to data, AI tools, and computational infrastructure. By demonstrating how NDP enables seamless data sharing, collaborative research, and integration with distributed resources, the session directly supports the NAIRR’s mission to democratize AI-driven research and education.

Session Design: This is a half-day (3-hour) hands-on tutorial. Participants should bring their own laptops and have an institutional email account.

Proposed Schedule

 
  • NDP Introduction and Tutorial Overview – 25 min
  • Registering on NDP – 5 min
  • Exploring and Adding Resources to the Data Catalog – 30 min
  • Working with Workspaces and Connecting Data to Computing Infrastructure – 30 min
  • Creating a Collaborative Project: Workspaces and Curated Catalogs – 45 min
  • Exploring the Education Hub: Classrooms and Data Challenges – 15 min
  • Overview of Running an NDP Endpoint – 20 min
  • Wrap-up and Discussion – 10 min

Target Audience: The tutorial is designed for three main audiences: researchers seeking to expand their collaboration opportunities, educators interested in integrating NDP into their courses, and data providers aiming to share their datasets, services, and workflows with broader communities.

Expected Background/Skill Levels: Participants are expected to have basic Python skills and familiarity with Jupyter Notebooks. Experience with Docker images is helpful but not required.

The Expanding Scope of AI in Research in the Context of Cloud Services >

The Expanding Scope of AI in Research in the Context of Cloud Services

Rob Fatland, University of Washington

Ariel Rokem, University of Washington

Cloud computing provides the scale and elasticity that is required to develop and deploy the next generation of AI models and applications. This CloudBank-provided half-day workshop will explore the expanding scope of AI in research in the context of cloud services, particularly focusing on the "big three" platforms: AWS, Microsoft Azure, and Google Cloud Platform. It is intended for a researcher or research administrator who is enthusiastic about the potential to implement AI-based technology on the cloud; and who at the same time might be somewhat reluctant or intimidated due to the plethora of terms and technologies. The tutorial objective is to demystify the topic of AI on the cloud by mapping out next steps for the Learner; and by giving cloud vendors an opportunity to articulate the utility of their respective AI services. The first half of the three hour workshop will specifically focus on practical cloud use and community building within a research domain via the "hack event" paradigm that emphasizes hands-on learning. The second half of this workshop will feature presentations by cloud providers on their respective environments with emphasis on features and tools in the AI space in relation to research. Topics covered will include agentic AI, RAG, MCP, and a horizon-view perspective on the integration of core cloud services with cloud marketplace ecosystems.

Learners should bring a laptop with WiFi and a browser: Throughout the program we will provide URLs that the Learner is encouraged to explore in passing, creating a "breadcrumb history" that they will be able to investigate later, in depth and at their leisure.

Diamond: A Web User Interface for Neural Network Training across NAIRR GPU Clusters>

Diamond: A Web User Interface for Neural Network Training across NAIRR GPU Clusters

Zhao Zhang, Rutgers University

Haotian Xie, Rutgers University

Yadu Babuji, University of Chicago

Diamond (https://diamondhpc.ai/) is a platform for training neural networks across federally funded GPU clusters provided by the NAIRR. Users can train a neural network in three easy steps: 1) container image building, 2) job composition, and 3) job execution. Diamond provides a uniform experience of neural network training across GPU clusters. The goals of this tutorial are: 1) introducing the basics of distributed neural network training, 2) making participants aware of the software configuration and portability challenge of neural network training across GPU clusters, 3) providing hands-on experience for users to resolve the aforementioned challenges with Diamond, and 4) sharing the latest progress and future features of Diamond with the community.

The tutorial is for participants with diverse backgrounds in HPC and deep learning (DL) Thus, the tutorial is designed to benefit a wide spectrum of participants of different levels of expertise in HPC and DL. For novice users with limited knowledge of HPC and DL, this tutorial offers a rare opportunity to gain hands-on experience without the burden of software installation and distributed training configuration. For participants who have already started exploring deep learning methods, the hands-on exercises of OpenFold training and OPT-125M fine-tuning will lower the barriers for them to try new methods. For participants with access to multiple GPU clusters, this tutorial provides an efficient solution for scaling across multiple machines.

The audience of this tutorial, including domain scientists, DL practitioners, and HPC experts, will learn about the basics of distributed neural network training, Diamond container image customization and building, neural network workload composition, and training job management with Diamond.

This tutorial is designed for learners from various research domains who are interested in deep learning on supercomputers. Machine learning system engineers and researchers can leverage Diamond to efficiently scale to multiple supercomputers with minimal software configuration overhead. Supercomputing center staff may choose to support Diamond or deploy Diamond locally for their users

Learners need to bring their laptops with WIFI connection and browsers, such as Chrome or Firefox. Learners will also need to apply for TACC accounts or ACCESS accounts and set up multi-factor authentication before the tutorial.

 

 

Half-Day PM Tutorials

Democratizing AI: Building AI Agents Without Writing Code >

Democratizing AI: Building AI Agents Without Writing Code

Mohamed Farag, Carnegie Mellon University

This interactive tutorial examines the swiftly evolving domain of AI agents alongside the evolving ecosystem of no-code platforms that is democratizing AI agent design and deployment. This session will enable participants to develop both executable proficiency and theoretical acuity in designing and building AI agents using no code tools. The target audience for this tutorial includes researchers and educators with foundational to intermediate expertise in AI.

During this hands-on tutorial, we will (i) delineate the conceptual boundary between standalone Large Language Models (LLMs) and agentic systems; (ii) survey the current state of AI-agent development and principal agent archetypes; (iii) unpack the anatomy of AI agents; (iv) present a comparative evaluation of leading no-code AI-agent builders; (v) conduct a focused exploration of LangFlow as a No-Code Agent Builder (NCAB); (vi) guide participants through the end-to-end design of two different AI agents; and (vii) demonstrate practical integration pathways for leveraging agents within university learning ecosystems (e.g., Canvas, Gradescope, Moodle), highlighting authentication, data-governance, and workflow considerations.

The workshop is intended to serve simultaneously as an intellectual gateway and a practical accelerator for advancing agent-based approaches in research, pedagogy, and applied innovation. By the end of this session, attendees will:

 
  1. Understand the anatomy of AI agents and identify the key differences between LLMs and AI agents.
  2. Recognize common design patterns used in architecting AI agents.
  3. Gain insight into the capabilities of several no-code AI agent builders.
  4. Design and implement two small-scale AI agents within a no-code environment, each representing a distinct agentic pattern.
  5. Explore the integration of pre-developed agents into a university's Learning Management Systems (LMS), such as Canvas, Moodle, or Sakai.

This tutorial doesn’t assume any prior learner knowledge, although a basic understanding of what Large Language Models are would be helpful. Also, this tutorial doesn’t require any prerequisite software installation. That said, participants are expected to bring a laptop with Wi-Fi capability and have access to an email account. All the tools used during the session are either open-source or offer a free-trial option, and they will be accessed online (i.e., no installations are required).

Distributed Deep Learning on GPU-based Clusters >

Distributed Deep Learning on GPU-based Clusters

Abhinav Bhatele, University of Maryland

Prajwal Singhania, University of Maryland

Lannie Dalton Hough, University of Maryland

Deep learning (DL) is rapidly becoming pervasive in almost all areas of computer science, and is even being used to assist computational science simulations and data analysis. A key behavior of deep neural networks (DNNs) is that they reliably scale i.e., they continuously improve in performance as the number of model parameters and amount of data grow. As the demand for larger, more sophisticated, and more accurate DL models increases, the need for large-scale parallel model training, fine-tuning and inference has become increasingly important. Subsequently, in recent years, several parallel algorithms and frameworks have been developed to parallelize model training and inference on GPU-based platforms. This tutorial will introduce and provide basics of the state-of-the-art in distributed deep learning. We will use large language models (LLMs) as a running example, and teach the audience the fundamentals involved in performing two essential tasks in working with LLMs: i. continued training/fine-tuning of an LLM from a checkpoint, and ii. inference on a trained LLM. We will cover algorithms and frameworks falling under the purview of data parallelism (PytorchDDP and FSDP), tensor parallelism (AxoNN, YALIS), and hybrid parallelism (vLLM).

The learning objective of this tutorial is to familiarize attendees with the fundamentals of distributed model training/fine-tuning and inference. By the end of the tutorial, attendees will be well-equipped to leverage parallel deep learning, particularly LLMs, for their research and product development. This tutorial is designed for individuals who have basic experience with sequential/single GPU model training/inference using a framework such as PyTorch or Tensorflow, and want to start doing parallel training/inference. Attendees should be comfortable with Python and ideally have introductory-level familiarity with PyTorch. Familiarity with running parallel programs on HPC clusters will be a plus. We will provide tutorial attendees with temporary login credentials on the day of the tutorial for access to a University of Maryland GPU cluster. Alternatively, attendees may utilize their existing accounts on NAIRR resources that include GPU access, such as Delta, Delta GPU, Vista, or Lonestar. Attendees will only need a laptop with Wi-Fi to remotely access (via terminal and ssh) one of the specified clusters. Attendees without these capabilities can still participate by following the tutorial presentation.

 

Training-Free Alignment of Large Language Models: Making AI Safer and Smarter Without Re-Training >

Training-Free Alignment of Large Language Models: Making AI Safer and Smarter Without Re-Training

Theja Tulabandhula, University of Illinois Chicago

Son The Nguyen, University of Illinois Chicago

Tommy Cheng, University of Illinois Chicago

Large Language Models (LLMs) continue to advance in capability, yet their safe and reliable deployment remains a major challenge. Traditional alignment methods such as reinforcement learning from human feedback (RLHF) or fine-tuning require substantial compute, proprietary data, and significant engineering effort, limiting who can participate in AI safety work. This tutorial introduces a complementary paradigm: training-free alignment, an emerging family of methods that steer and constrain LLM behavior at inference time, without retraining or updating model weights.

We will begin with a conceptual overview of alignment goals and failure modes, contrasting RLHF-style approaches with post-training control techniques such as prompt steering, activation and representation editing, rule-based scaffolding, and decoding-time interventions. We will then work through hands-on notebooks that apply these methods to open models (e.g., Llama 3–class models) hosted on NAIRR resources, using open evaluation datasets for reasoning, safety, and robustness.

Learning objectives: By the end of the tutorial, participants will be able to:

 
  1. describe key families of training-free alignment methods and when they are appropriate;
  2. apply prompt-, decoding-, and objective-based steering techniques to real LLMs; and
  3. evaluate trade-offs among safety, controllability, and computational cost using simple, reproducible experiments.

Target audience: Researchers, educators, practitioners, and advanced students interested in AI safety, interpretability, and practical alignment techniques.

Prerequisites: Basic familiarity with Python and large language model APIs (or similar tools). Prior experience running Jupyter notebooks is helpful but not required.

Required learner technology: Learners will need a laptop with WiFi, a modern web browser, and the ability to access a cloud-hosted Jupyter environment.

Accounts and access to external technology: Approximately one week before the tutorial, we will email registrants with step-by-step instructions for (a) obtaining appropriate NAIRR access (where applicable), and/or (b) using a pre-configured JupyterHub or similar environment provisioned by the instructors. All exercises will be designed to run within modest resource limits so that participants can reproduce them during or after the tutorial, even on constrained allocations.

Throughout, we will emphasize workflows that align with the NAIRR mission: open, reproducible, and resource-efficient safety experimentation that can be adopted by a wide range of institutions, including those with limited compute. Participants will leave with concrete code examples, evaluation templates, and design patterns they can re-use in their own research, teaching, and applied AI projects.

Training Ensembles Across NAIRR Resources >

Training Ensembles Across NAIRR Resources

Ian Ross, University of Wisconsin–Madison

Danny Morales, University of Wisconsin–Madison

Modern AI research requires training ensembles of models: hyperparameter optimization explores multiple configurations, cross-validation needs models trained on different data splits, and multiple models can be combined for better predictions. The traditional approach of training an ensemble of models sequentially is time-consuming and lengthens the time-to-insight. This tutorial demonstrates a throughput oriented approach: plan once, then distribute your ensemble training across all available resources simultaneously.

This hands-on tutorial teaches you how to leverage services provided by the Partnership to Advance Throughput Computing (PATh) to train ensembles of machine learning models across the NAIRR resources. After planning and running the first training, scaling to dozens of models requires minimal additional effort.

What You'll Learn: The tutorial follows a natural progression: planning, testing, executing, and scaling (0→1 model→dozens of models). First, you'll learn how to think about ensemble training as a throughput oriented distributed computing problem: breaking down a research question into independent jobs, reasoning about the resource needs, and planning the input and output data flow. Understanding what varies between jobs, what stays constant, where data lives, and where results go will provide a solid foundation for scaling out computing workloads.

Then, you'll implement your first training job. This is the 0→1 jump: writing a submit file, specifying your container and resources, running the job, and retrieving data (inputs or model checkpoints) via the Open Science Data Federation (OSDF).

Finally, you'll scale to dozens of models in parallel. You'll modify your submit file to use parameter substitution, submit dozens of jobs with a single command, and collect results from OSDF. You'll see that scaling from one to dozens of models requires changing just a few lines, not rewriting your workflow. We'll discuss how scaling to 100+ models uses the same pattern.

Who Should Attend: This tutorial is designed for AI/ML researchers, graduate students, and educators who want to scale-out their ensemble model training. Participants should have basic Linux command-line skills and familiarity with Python and machine learning frameworks (PyTorch, TensorFlow, or scikit-learn). No prior experience with distributed computing or HTCondor, the software used to coordinate and schedule training workloads, is required.

Prerequisites: Participants must bring a laptop with WiFi capability and an SSH client. Tutorial accounts on PATh Access Points will be provided, with setup instructions sent one week before the meeting. All training datasets, container images, and code will be pre-staged.

What You'll Take Home: Working code for ensemble training, understanding of OSDF data management patterns, a plan of execution for applying to your own research, and a clear path to requesting NAIRR allocations for your own research - or using existing allocations in new ways.

 

 

Full-Day Tutorials

AI-Powered Teaching: From Foundation to Future-Proof Curriculum >

AI-Powered Teaching: From Foundation to Future-Proof Curriculum

Vincent Nestler, California State University, San Bernardino

Brandon Gray, California State University, San Bernardino

Desmond Workman, California State University, San Bernardino

As AI transforms every sector, educators face a critical challenge: preparing students who are not just employable, but irreplaceable in an AI-augmented workforce. This hands-on tutorial equips educators with essential AI skills and practical tools to leverage NAIRR resources effectively in research and teaching.

Drawing from The AI Horizon Project (NSF NAIRR EAGER #2528858, PI: Nestler, Co-PI: Coulson)—which helps educators anticipate what lies just beyond the AI horizon, the line separating what we can see from what we cannot—this tutorial provides a comprehensive pathway from foundational AI literacy to curriculum transformation. Participants gain practical experience with prompt engineering, agentic IDEs, AI research assistants, local models, and frameworks for preparing students for the AI-driven future that's rapidly approaching.

What You'll Learn

 
  • Prompt Engineering Mastery: Learn battle-tested formulas (CRAFT, RTF, RISEN) to maximize effectiveness with NAIRR-provided models. Develop prompts for research tasks, content generation, and assessment through hands-on practice.
  • Agentic IDEs and "Vibe Coding": Use AI-assisted tools like Cursor IDE to build applications and analyze data without traditional programming barriers—essential for utilizing NAIRR's computational resources across disciplines.
  • NotebookLM for Research and Education: Master Google's AI assistant to synthesize literature, generate study guides, and create audio podcasts. Work with your own materials to build immediately usable educational resources.
  • Local Models and Privacy-Conscious AI: Deploy local AI models using Ollama. Learn trade-offs between cloud-based NAIRR resources and on-premises solutions, addressing privacy, costs, and accessibility.
  • Future-Proofing Students: Apply frameworks to help students identify emerging skills and build AI-augmented portfolios. Create actionable development plans adaptable to any discipline.
  • Curriculum Retooling Framework: Transform existing courses to organically integrate AI tools. Use proven templates to redesign curriculum components, leaving with immediately implementable changes.

Prerequisites and Requirements: Participants should bring:

 
  • Laptop with WiFi capability
  • Google account (for NotebookLM)
  • Accounts: Claude.ai (free tier), Cursor IDE (free for educators)
  • Ollama installed (installation guide provided pre-workshop)

No coding experience required This tutorial welcomes educators from all disciplines, researchers transitioning to teaching roles, and research computing professionals supporting AI education initiatives.

Why This Tutorial Matters for NAIRR NAIRR's mission to democratize AI resources succeeds only when educators can effectively guide students and researchers in using these tools. This tutorial directly addresses the skills gap preventing optimal NAIRR resource utilization. By helping educators see beyond the AI horizon—anticipating which tasks AI will augment, replace, or leave human-driven—participants leave ready to maximize research productivity, enhance teaching effectiveness, and prepare the next generation of AI-literate professionals for a workforce in transformation. Takeaways: Prompt templates repository, curriculum integration toolkit, future-proofing worksheets, local model setup guides, and access to The AI Horizon community - http://theaihorizon.org

AI-Enabled Education and Research on the National Research Platform >

AI-Enabled Education and Research on the National Research Platform

Mahidhar Tatineni, San Diego Supercomputer Center, UCSD

Mohammad Sada, San Diego Supercomputer Center, UCSD

Daniel Diaz, San Diego Supercomputer Center, UCSD

In this tutorial we will cover how the National Research Platform can be used in the classroom and in research projects. The tutorial will be geared towards educational use of NRP, but the tools covered, Jupyter notebooks, LLM as a service, AI workflows, AI chatbots for the classroom, are of broader relevance to education, and research alike. We describe the learning objectives, target audience, and the prerequisite knowledge, learner technology, account requirements below.

Tutorial learning objectives: 1) Teach students, researchers and educators on how to run AI integrated Jupyter notebooks on the National Research Platform 2) Inform attendees about the available resources including hardware (CPU, GPU, and custom inference cards from Qualcomm) and tools/services including LLM services, chat UIs, and code development environments like Coder and 3) teach educators on how to set up custom JupyterHub instances for their classes. Hands-on material will allow attendees access to national AI infrastructure, including CPUs, GPUs and Qualcomm Cloud AI 100 Ultra accelerators. We will cover various methods of requesting these resources including JupyterHub and Coder options that provide interactive access and lower barriers for students new to such resources and provide an easy-to-use development environment. An example of NRP usage for an AI course will be provided with sample class material leveraging the tools discussed. The tutorial will also introduce agentic AI development and integrated tools for an enhanced learning experience. The tutorial is structured for beginner material in the morning session and advanced material in the afternoon session.

Target audience: The first half of the tutorial is aimed at students, researchers with beginner/intermediate skills in the morning session; with advanced material for students, researchers, and educators in the afternoon session.

Learner knowledge and/or skill pre-requisites: Some familiarity with using Jupyter notebooks and python.

Required learner technology: Learners will need a laptop with WiFi, web browser, and a local installation of kubectl preferred but not necessary (Install information at: https://kubernetes.io/docs/tasks/tools/).

Learner Accounts: A training specific namespace will be set up on NRP and a config file for access will be provided. For Jupyter Hub access CILogon based authentication will be preferred (organization info will be collected) but workarounds are possible for attendees whose organizations are not part of CILogon.

Enabling Reproducible AI Workflows for the NAIRR Ecosystem >

Enabling Reproducible AI Workflows for the NAIRR Ecosystem

Joe Stubbs, Texas Advanced Computing Center, University of Texas at Austin

Sean Cleveland, University of Hawaii

Christian Garcia, Texas Advanced Computing Center, University of Texas at Austin

Anagha Jamthe, Texas Advanced Computing Center, University of Texas at Austin

Nathan Freeman, Texas Advanced Computing Center, University of Texas at Austin

This tutorial aims to provide researchers with an introduction to the latest reproducible Machine Learning (ML) workflows and tools available through the NSF-funded Tapis v3 platform including a Application Programming Interface (API) and User Interface (UI). Using Tapis, researchers can discover AI/ML models and tools directly to resources within the NAIRR ecosystem, supporting NAIRR's vision of providing computation, data, software, models, training, and educational materials to advance research, discovery, and innovation. Through hands-on exercises, participants will gain experience in developing ML workflows and deploying them on a variety of NAIRR resources such as Jetstream2, Chameleon, and Stampede3. We will emphasize the utilization of various Tapis core APIs, alongside specialized APIs such as Tapis Workflows, Tapis Pods and ML Hub, all seamlessly integrated within the user-friendly TapisUI. Using these production-grade services we will demonstrate the creation and facilitation of trustworthy, reproducible scientific machine learning workflows. By the end of this tutorial, researchers will be empowered to efficiently develop, deploy, and maintain their own ML workflows. We will not assume detailed knowledge or expertise in AI/ML and will provide brief introductions to all topics covered. Attendees will need a laptop with web browser and WiFi access to follow along with the exercises. However, no additional software will be required as we will make use of cloud-hosted software. We will also provide users with TACC training accounts that will be authorized to access the cloud environment.

Programming AI Accelerators for Scientific Machine Learning >

Programming AI Accelerators for Scientific Machine Learning

Varuni Katti Sastry, Argonne National Laboratory

Murali Emani, Argonne National Laboratory

Sylvia Howland, Cerebras

Timothy Clarke, SambaNova Systems

Scientific applications are increasingly adopting Artificial Intelligence (AI) techniques to advance science. Specialized hardware accelerators have been designed and built to efficiently run AI applications. However, with the wide diversity in hardware architectures and software stacks, it is challenging to understand the differences between these accelerators, their capabilities, programming approaches, and performance, particularly for scientific applications . In this tutorial, we will provide an overview of the AI accelerators deployed at ALCF, that is also available as part of NAIRR resources to the scientific community. This session will focus on SambaNova, Cerebras, Groq, and Tenstorrent systems.

Tutorial Learning Objectives: By the end of this tutorial, attendees will:

 
  • Understand the differences in architectural features and software stacks of the above-mentioned systems.
  • Gain hands-on experience in pre-training and fine-tuning open-source Large Language Models (LLMs) and how to refactor code and optimize workflows on these systems.
  • Explore deployment strategies for AI inference solutions in scientific contexts. And evaluate the performance implications of AI accelerators for scientific applications.

Target Audience: This tutorial is designed for:

 
  • Researchers, scientists, and engineers working in scientific domains who are interested in leveraging AI accelerators for their scientific workloads.
  • Members of the NAIRR community who want to maximize the potential of NAIRR resources for their research and innovation for advancing scientific discovery, fostering collaboration, and driving impactful AI-driven solutions.

Skill Pre-Requisites: Attendees should have:

 
  • A basic understanding of AI and machine learning concepts.
  • Familiarity with Python programming and common AI frameworks such as PyTorch or TensorFlow.
  • Experience with running AI models on GPUs or other hardware accelerators (preferred but not mandatory).

Required Learner Technology: Attendees will need:

 
  • A laptop with WiFi connectivity and a web browser.
  • A local installation of Python 3.x and common AI libraries (e.g., PyTorch, TensorFlow, Hugging Face Transformers).
  • Access to SSH tools for connecting to remote systems (e.g., OpenSSH or PuTTY).

Learner Accounts and Access to External Technology: ALCF accounts can be requested using the following link - https://my.alcf.anl.gov/accounts/#/accountRequest.