Abstract & Details
Description
Award ID: 2346273
Python and R are the predominant open platforms for computation in academia and industry, driving innovation in data science and AI. R is largely developed by statisticians, while scientific Python is mostly built by researchers from the applied sciences. As a consequence, Python's statistical capabilities lack cutting edge methods and techniques, and statisticians do not see their algorithmic innovations disseminated widely on this popular platform. This scoping project explores sustainable and effective pathways for establishing an open-source ecosystem which would catalyze the development of a robust set of statistical software for Python. The effort will also build a vibrant ecosystem of statisticians, domain practitioners, and software developers around the open platforms. The team aims to establish better software engineering practices in the statistical community and to provide onboarding pathways for young researchers, while documenting and implementing healthy and inclusive community practices that can be replicated in other communities. To anchor the effort, this effort focuses on two pilot projects (R and Python) with different scopes, target audiences, and levels of maturity, and determines how they should be modified to comply with modern software engineering and community governance best practices. YAGLM is an open-source Python package that makes modern generalized linear models (GLMs) easily accessible to data scientists. GLMs are flexible and powerful generalizations of ordinary linear regressions that cover many statistical models widely used in applications. The ISLP open-source Python package accompanies the new introductory text on statistical learning and Python ("An Introduction to Statistical Learning: with Applications in Python"). Through detailed code and governance audits of these pilot project, as well as feedback from the statistical community, the team will document the need for innovation within the current technological landscape; outline how to identify potential contributors and users; specify the necessary infrastructure, organization, and governance; and explore mechanisms for long-term sustainability. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
NSF Program Director: Florence Rabanal
Python and R are the predominant open platforms for computation in academia and industry, driving innovation in data science and AI. R is largely developed by statisticians, while scientific Python is mostly built by researchers from the applied sciences. As a consequence, Python's statistical capabilities lack cutting edge methods and techniques, and statisticians do not see their algorithmic innovations disseminated widely on this popular platform. This scoping project explores sustainable and effective pathways for establishing an open-source ecosystem which would catalyze the development of a robust set of statistical software for Python. The effort will also build a vibrant ecosystem of statisticians, domain practitioners, and software developers around the open platforms. The team aims to establish better software engineering practices in the statistical community and to provide onboarding pathways for young researchers, while documenting and implementing healthy and inclusive community practices that can be replicated in other communities. To anchor the effort, this effort focuses on two pilot projects (R and Python) with different scopes, target audiences, and levels of maturity, and determines how they should be modified to comply with modern software engineering and community governance best practices. YAGLM is an open-source Python package that makes modern generalized linear models (GLMs) easily accessible to data scientists. GLMs are flexible and powerful generalizations of ordinary linear regressions that cover many statistical models widely used in applications. The ISLP open-source Python package accompanies the new introductory text on statistical learning and Python ("An Introduction to Statistical Learning: with Applications in Python"). Through detailed code and governance audits of these pilot project, as well as feedback from the statistical community, the team will document the need for innovation within the current technological landscape; outline how to identify potential contributors and users; specify the necessary infrastructure, organization, and governance; and explore mechanisms for long-term sustainability. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
NSF Program Director: Florence Rabanal
| Status | Closed |
|---|---|
| Effective start/end date | 06/15/24 → 05/31/25 |
Lead and Sub-Awardee Organization(s)
Funding
- University of California-Berkeley: $299,737.00
Active Fiscal Year
- FY2024
- FY2025
Start Fiscal Year
- FY2024
TIP Programs
- (POSE) Pathways to enable Open-Source Ecosystems
Program Status
- Active
Key Technology Areas
- Artificial Intelligence
- (confidence score: 100%)
- Advanced Computing and Semiconductors
- (confidence score: 100%)
Technology Foci
- Advanced Computer Software
- (confidence score: 100%)
- Machine Learning (ML)
- (confidence score: 98%)
Congressional District at Award
- District n. 12 of California
Current Congressional District
- District n. 12 of California
United States
- California
Core Based Statistical Area (CBSA)
- San Francisco-Oakland-Fremont, CA
County
- County: Alameda, CA
Main Awarded Institution
- GS3YEVSS12N6
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint. Learn more about Elsevier's Fingerprint Engine here: https://beta.elsevier.com/products/elsevier-fingerprint-engine