Statistical & Data Sciences
The Statistical & Data Sciences (SDS) Program links faculty and students from across the college interested in learning things from data. At Smith, students learn statistics by doing—class time emphasizes problem-solving and hands-on contact with data. Many courses employ student-driven projects that allow students to pursue their interest in fields such as economics, psychology, political science, sociology, engineering, biology, environmental science, neuroscience and geology.
Upcoming Events, Talks & Lectures
Requirements & Courses
Goals for Majors in Statistical and Data Sciences
- Identify and work with a wide variety of data types (including, but not limited to, categorical, numerical, text, spatial and temporal) and formats (e.g. CSV, XML, JSON, relational databases, audio, video, etc.).
- Extract meaningful information from data sets that have a variety of sizes and formats.
- Fit and interpret statistical models, including but not limited to linear regression models. Use models to make predictions, and evaluate the efficacy of those models and the accuracy of those predictions.
- Understand the strengths and limits of different research methods for the collection, analysis and interpretation of data. Be able to design studies for various purposes.
- Attend to and explain the role of uncertainty in inferential statistical procedures.
- Read and understand data analyses used in research reports. Contribute to the data analysis portion of a research project in at least one applied discipline.
- Compute with data in at least one high-level programming language, as evidenced by the ability to analyze a complex data set.
- Work in multiple languages and computational environments.
- Convey quantitative information in written, oral and graphical forms of communication to both technical and nontechnical audiences.
- Assess the ethical implications to society of data-based research, analyses, and technology in an informed manner. Use resources, such as professional guidelines, institutional review boards, and published research, to inform ethical responsibilities.
Statistical and Data Sciences Major
Requirements
Ten courses plus SDS 100
- SDS 100
- Five foundations and core courses:
- CSC 110 or CSC 120
- SDS 192
- MTH 211
- SDS 220 or SDS 201
- SDS 291
- One programming depth course: SDS 270, SDS 271, CSC 210, or CSC 294
- One statistics depth course: SDS 290, SDS 293/ CSC 293, MTH 320/ SDS 320 or a topic of SDS 390
- One data in context course: SDS 109/ CSC 109, FYS 189, CSC 235/ SDS 235, SDS 236 or SDS 237
- One application domain course. A student and their advisor should identify potential application domains of interest as early as possible, since many suitable courses will have prerequisites. Normally, this should happen during the fourth semester or at the time of major declaration, whichever comes first. The determination of whether a course satisfies the requirement will be made by the student’s major advisor. The requirement is normally satisfied by one of the following:
- A topic of SDS 300
- A research seminar (normally 300-level) or special studies of at least two credits. Normally, the domain would be outside of mathematics, statistics, and computer science.
- A departmental honors thesis in another major (normally not MTH or CSC).
- One capstone course: SDS 410
- Electives (as needed to fulfill the 10-course requirement): Provided that the requirements listed above are met, any of the courses listed above may be counted as electives to reach the 10-course requirement. Five College courses in statistics and computer science may be taken as electives. Additionally, any course crosslisted with SDS may be counted toward completion of the major.
Additional Guidelines
- With the exception of SDS 192, SDS 201, SDS 220, SDS 291, and SDS 410 (or any mandatory S/U course), students may switch any one of the remaining SDS courses from graded to S/U. If a student converts from an SDS minor to an SDS major after a core course has been taken with the S/U grading option, that will count as the student's one S/U course for the major.
- SDS 201 may be replaced by a 4 or 5 on the AP statistics exam. Replacement by AP courses does not diminish the total number of courses required for either the major or the minor.
- MTH 211 may be replaced by petition in exceptional circumstances.
- Any one of ECO 220, GOV 203, PSY 201 or SOC 204 may directly substitute for SDS 220 without the need to take another course, in both the major and minor. Note that SDS 220 and ECO 220 require Calculus.
- Note that non 100-level prerequisites for the application domain course will count as inside the major credits.
- Five College equivalents may substitute with permission of the program.
Mathematical Statistics Major
Information on the interdepartmental major in mathematical statistics can be found on the Mathematical Sciences page of this catalog.
Statistical and Data Sciences Minor
Requirements
Six courses plus SDS 100
- SDS 100
- Four foundation and core courses:
- CSC 110 or CSC 120
- SDS 192
- SDS 220 or SDS 201
- SDS 291
- One programming depth course: SDS 270, SDS 271, CSC 210, or CSC 294
- One data in context course: CSC 109/ SDS 109, FYS 189, CSC 235/ SDS 235, SDS 236 or SDS 237
- Should these three requirements be fulfilled by fewer than six courses, any of the courses in SDS or CSC that count towards the major may be counted towards the minor.
- Normally, no more than one course graded S/U will be counted toward the minor.
- If a student converts from an SDS minor to an SDS major after a core course has been taken with the S/U grading option, that S/U will count as the student's one S/U for the major.
Applied Statistics Minor
The interdepartmental minor in applied statistics offers students a chance to study statistics in the context of a field of application of interest to the student. The minor is designed with enough flexibility to allow a student to choose among many possible fields of application.
Requirements
Five courses plus SDS 100
- SDS 100
- One introductory statistics course: SDS 201, SDS 220, PSY 201, ECO 220, SOC 204 or GOV 203
- SDS 290 and SDS 291
- Two application domain courses. A student and their adviser should identify potential application domains of interest as early as possible, since many suitable courses will have prerequisites. The determination of whether a course satisfies the requirement will be made by the student’s minor adviser. The requirement is normally satisfied by two of the following:
- A topic of SDS 300
- A research seminar (normally 300-level) or special studies of at least two credits. Normally, the domain would be outside of mathematics, statistics, and computer science.
- A departmental honors thesis in another major (normally not MTH or CSC).
- Among the courses used to satisfy the student’s major requirement, a maximum of two courses can count towards the minor.
- Only one introductory statistics course may count toward the minor.
- Normally, no more than one course graded S/U will be counted towards the minor.
- If a student converts from an SDS minor to an SDS major after a core course has been taken with the S/U grading option, that S/U will count as the student's one S/U for the major.
- Students who have taken AP Statistics in high school and received a 4 or 5 on the AP Statistics Examination, or who have had other equivalent preparation in statistics, are not required to repeat the introductory statistics course, but they are required to complete five courses.
Courses
SDS 100 Laboratory: Reproducible Scientific Computing with Data (1 Credit)
The practice of data science rests upon computing environments that foster responsible uses of data and reproducible scientific inquiries. This course develops students’ ability to engage in data science work using modern workflows, open-source tools and ethical practices. Students learn how to author a scientific report written in a lightweight markup language (e.g., markdown) that includes code (e.g., R), data, graphics, text and other media. Students also learn to reason about ethical practices in data science. S/U only. Concurrent registration required in any of: SDS 192, SDS 201, SDS 220, SDS 290 or SDS 291. Restrictions: Not open to students who have already completed any of: SDS 192, SDS 201, SDS 220, SDS 290 or SDS 291. Enrollment limited to 30. Students not registered for a corequisite course will be dropped without notification.
Fall, Spring
SDS 109/ CSC 109 Communicating with Data (4 Credits)
Offered as SDS 109 and CSC 109. The world is growing increasingly reliant on collecting and analyzing information to help people make decisions. Because of this, the ability to communicate effectively about data is an important component of future job prospects across nearly all disciplines. In this course, students learn the foundations of information visualization and sharpen their skills in communicating using data. This course explores concepts in decision-making, human perception, color theory and storytelling as they apply to data-driven communication. This course helps students build a strong foundation in how to talk to people about data, for both aspiring data scientists and students who want to learn new ways of presenting information. Enrollment limited to 40. {M}
Fall, Spring
SDS 192 Introduction to Data Science (4 Credits)
An introduction to data science using Python, R and SQL. Students learn how to scrape, process and clean data from the web; manipulate data in a variety of formats; contextualize variation in data; construct point and interval estimates using resampling techniques; visualize multidimensional data; design accurate, clear and appropriate data graphics; create data maps and perform basic spatial analysis; and query large relational databases. Prerequisite: concurrent registration in SDS 100 required for students who have not previously completed SDS 201, SDS 220, SDS 290 or SDS 291. {M}
Fall, Spring
SDS 201 Statistical Methods for Undergraduates (4 Credits)
(Formerly MTH 201/ PSY 201). An overview of the statistical methods needed for undergraduate research, emphasizing methods for data collection, data description and statistical inference, including an introduction to study design, confidence intervals, testing hypotheses, analysis of variance and regression analysis. Techniques for analyzing both quantitative and categorical data are discussed. Applications are emphasized and students use R for data analysis. This course satisfies the basic requirement for the psychology major. Students who have taken MTH 111 or equivalent should take SDS 220, which also satisfies the basic requirement. Prerequisite: concurrent registration in SDS 100 required for students who have not completed SDS 192, SDS 220, SDS 290 or SDS 291. Restrictions: Students do not normally earn credit for more than one course on this list: ECO 220, GOV 203, MTH 220, PSY 201, SDS 201, SDS 220 or SOC 204. Enrollment limited to 40. {M}
Fall, Spring
SDS 220 Introduction to Probability and Statistics (4 Credits)
(Formerly MTH 220/SDS 220). An application-oriented introduction to modern statistical inference: study design, descriptive statistics, random variables, probability and sampling distributions, point and interval estimates, hypothesis tests, resampling procedures, and multiple regression. A wide variety of applications from the natural and social sciences are used. This course satisfies the basic requirement for biological science, engineering, environmental science, neuroscience, and psychology. Prerequisite: MTH 111, or equivalent; SDS 100 must be taken concurrently for students who have not completed SDS 192, SDS 201, SDS 290 or SDS 291. Restrictions: Students do not normally earn credit for more than one course on this list: ECO 220, GOV 203, MTH 220, PSY 201, SDS 201, SDS 220 or SOC 204. Enrollment limited to 40. {M}
Fall, Spring
SDS 235/ CSC 235 Visual Analytics (4 Credits)
Offered as CSC 235 and SDS 235. Visual analytics techniques can help people to derive insight from massive, dynamic, ambiguous and often conflicting data. During this course, students learn the foundations of the emerging, multidisciplinary field of visual analytics and apply these techniques toward a focused research problem in a domain of personal interest. Students who elect to take this course as a programming intensive course should have previously taken CSC 212. In this track, students learn to use R, Python and HTML5/JavaScript to develop custom visual analytic tools. Students preferring a non-programming intensive track may elect to use existing visual analytic software, such as Tableau or Plotly. Designations: Theory, Programming. Prerequisite: CSC 120 or equivalent. {M}
Fall, Spring, Variable
SDS 236 Data Journalism (4 Credits)
Data journalism is the practice of telling stories with data. This course will focus on journalistic practices, interviewing data as a source, and interpreting results in context. We will discuss the importance of audience in a journalistic context, and will focus on statistical ideas of variation and bias. The course will include hands-on work with data, using appropriate computational tools such as R, Python, and data APIs. In addition, we will explore the use of visualization and storytelling tools such as Tableau, plot.ly, and D3. No prior experience with programming or journalism is required. Prerequisites: An introductory statistics course (including SDS 220, SOC 204, GOV 203, ECO 220, PSY 201). Enrollment limited to 20. WI {M}
Fall, Spring, Variable
SDS 237 Data Ethnography (4 Credits)
This course introduces the theory and practice of data ethnography, demonstrating how qualitative data collection and analysis can be used to study data settings and artifacts. Students will learn techniques in field-note writing, participant observation, in-depth interviewing, documentary analysis and archival research and how they may be used to contextualize the cultural underpinnings of datasets. Students will learn how to visualize datasets in ways that foreground their sociopolitical provenance in R. Students will also learn how ethnographic methods can be leveraged to improve data documentation and communication. The course will introduce debates regarding the politics of technoscientific fieldwork. Recommended prerequisite: SDS 192. Enrollment limited to 40. {S}
Fall, Spring
SDS 238 Community-Based Data Science (4 Credits)
This course introduces concepts in human-centered design and design justice, considering how their principles can be applied in the context of community-based data science work. Students learn how to define social problems, engage stakeholders, design data science solutions, and evaluate social impact. Students also learn techniques in collaborative data science project planning and execution, engaging best practices (e.g. version control and code review) in the context of a community-based data science project. Strategies for effectively communicating project approach, outcomes, and impact are addressed throughout the course. Enrollment limited to 24.
Fall, Spring, Alternate Years
SDS 270 Programming for Data Science in R (4 Credits)
This course is not about data analysis—rather, students learn the R programming language at a deep level. Topics may include data structures, control flow, regular expressions, functions, environments, functional programming, object-oriented programming, debugging, testing, version control, documentation, literate programming, code review and package development. The major goal for the course is to contribute to a viable, collaborative, open-source, publishable R package. Prerequisites: SDS 192 and CSC 110, or equivalent. Enrollment limited to 40. {M}
Fall, Spring
SDS 271 Programming for Data Science in Python (4 Credits)
This course covers the skills and tools needed to process, analyze and visualize data in Python and work on collaborative projects. Topics include functional and object oriented programming in Python, data wrangling in Pandas, visualization in Matplotlib in seaborn, as well as creating a reproducible workflow: debugging, testing and documenting programs, and effectively using version control. The major goal for the course is to create a viable, open-source Python package like those in the Python Package Index (PyPI). Prerequisites: SDS 192 and CSC 110. Enrollment limited to 40. (E) {M}
Fall, Spring, Variable
SDS 290 Research Design and Analysis (4 Credits)
(Formerly MTH/SDS 290). A survey of statistical methods needed for scientific research, including planning data collection and data analyses that provide evidence about a research hypothesis. The course can include coverage of analyses of variance, interactions, contrasts, multiple comparisons, multiple regression, factor analysis, causal inference for observational and randomized studies and graphical methods for displaying data. Special attention is given to analysis of data from student projects such as theses and special studies. Statistical software is used for data analysis. Prerequisites: One of the following: PSY 201, SDS 201, GOV 203, ECO 220, SDS 220 or a score of 4 or 5 on the AP Statistics examination or the equivalent; concurrent registration in SDS 100 required for students who have not completed SDS 192, SDS 201, SDS 220 or SDS 291. Enrollment limited to 40. {M}
Fall, Spring
SDS 291 Multiple Regression (4 Credits)
(Formerly MTH 291/ SDS 291). Theory and applications of regression techniques: linear and nonlinear multiple regression models, residual and influence analysis, correlation, covariance analysis, indicator variables and time series analysis. This course includes methods for choosing, fitting, evaluating and comparing statistical models and analyzes data sets taken from the natural, physical and social sciences. Prerequisite: SDS 201, PSY 201, GOV 203, SDS 220, ECO 220 or equivalent or a score of 4 or 5 on the AP Statistics examination; concurrent registration in SDS 100 required for students who have not completed SDS 192, 201, 220 or 290. Enrollment limited to 40. {M}{N}
Fall, Spring
SDS 293/ CSC 293 Machine Learning (4 Credits)
Offered as CSC 293 and SDS 293. The field of statistical learning encompasses a variety of computational tools for modeling and understanding complex data. In this introductory course, we will explore many of the most popular of these tools, such as sparse regression, classification trees, boosting and support vector machines. In addition to unpacking the mathematics underlying the computational methods, students will also gain hands-on experience in applying these techniques to real datasets using R. Prerequisite: SDS 201, SDS 220 or CSC 210, or equivalent intro statistics course. Enrollment limited to 60. {M}
Fall, Spring, Annually
SDS 300di Seminar: Topics in the Applications of Statistics and Data Science-Disability Inclusion and Data Analytics (4 Credits)
Students learn the social model of disability and critical disability theory as well as research design and process, and work on a research project analyzing disability inclusion public data. The statistical methods covered in this course may include logistic regression, multivariate analysis, factor analysis, etc. Students are expected to submit their final projects to a journal, conference or competition by the end of the semester. Prerequisite: SDS 201, SDS 220 or ECO 220. Restrictions: Juniors and seniors only. Enrollment limited to 15. Instructor permission required. {M}
Fall, Spring, Variable
SDS 300fs Seminar: Topics in the Applications of Statistics and Data Science-Understanding Food Systems through Engaging the Data (4 Credits)
This course examines the global Food System, with a focus on the US, through the examination of established data used to study the system and recommend food policy. In the United States, the US Department of Agriculture (USDA) oversees much of the food system, both promoting food products and regulating health impacts. In general, the operation of these systems generates a vast amount of data, much of which is in large open or semi-open online databases. Researchers and policy makers draw on these databases to aid in their decision making. This course aims to familiarize students with the data and its uses. Prerequisite: SDS 201 or SDS 220. Restrictions: Juniors and seniors only. Enrollment limited to 15. Instructor permission required. {M}{S}
Fall, Spring, Alternate Years
SDS 320/ MTH 320 Mathematical Statistics (4 Credits)
Offered as MTH 320 and SDS 320. An introduction to the mathematical theory of statistics and to the application of that theory to the real world. Discussions include functions of random variables, estimation, likelihood and Bayesian methods, hypothesis testing and linear models. Prerequisites: a course in introductory statistics, MTH 212 and MTH 246, or equivalent. Enrollment limited to 20. {M}
Spring
SDS 338/ GOV 338 Research Seminar in Political Networks (4 Credits)
Offered as GOV 338 and SDS 338. How does the behavior of a state, politician or interest group affect the behavior of others? Does Massachusetts’s decision to legalize recreational marijuana influence Vermont’s marijuana policies? From declarations of war to the decision of who congress members' voting alignments, social scientists are increasingly looking to political networks to recognize the inter-connectedness of the world. This course presents the essentials of social network analysis and how they can be applied to American politics. Prerequisites: SDS 220 or an equivalent introductory statistics course. Restrictions: Juniors and seniors only. Enrollment limited to 12. Instructor permission required. {S}
Fall, Spring, Alternate Years
SDS 364/ PSY 364 Research Seminar: Intergroup Relationships (4 Credits)
Offered as PSY 364 and SDS 364. Research on intergroup relationships and an exploration of theoretical and statistical models used to study mixed interpersonal interactions. Example research projects include examining the consequences of sexual objectification for both women and men, empathetic accuracy in interracial interactions and gender inequality in household labor. A variety of skills including, but not limited to, literature review, research design, data collection, measurement evaluation, advanced data analysis and scientific writing are developed. Prerequisites: PSY 201, SDS 201, SDS 220 or equivalent; and PSY 202. Restrictions: Juniors and seniors only. Enrollment limited to 12. Instructor permission required. {M}{N}{S}
Fall, Spring, Alternate Years
SDS 390be Topics in Statistical and Data Sciences-Methods in Biostatistics and Epidemiology (4 Credits)
Epidemiology concerns the distribution and determinants of disease in human populations, while biostatistics focuses on the development and application of statistical methods to a wide range of topics in biology, medicine and public health. This course focuses on foundational concepts in epidemiology, including measures of association and common epidemiological study designs, and statistical methods for public health data. Discussions include categorical data analysis (contingency table analysis, multinomial regression, ordinal regression and Poisson regression) and survival analysis (Kaplan-Meier estimators and Cox proportional hazards models). No background in biology is expected or required. Prerequisites: SDS 291 and [MTH 112 or (MTH 111 and MTH 153)]. Restrictions: SDS 390 may be taken a total of 3 times with different topics. Enrollment limited to 30. {M}
Fall, Spring, Variable
SDS 390cd Topics in Statistical and Data Sciences-Categorical Data Analysis (4 Credits)
Theory and applications of statistical methods for the analysis of categorical data. The course includes an overview of statistical methods for analyzing discrete data including binary, multinomial and count response variables. Nominal and ordinal responses are considered. Discussions may include contingency table and chi-squared analyses, logistic, Poisson and negative-binomial regression models. R statistical software is used. Prerequisites: SDS 291 or SDS 290 or equivalent. Restrictions: SDS 390 may be taken a total of 3 times with different topics. Enrollment limited to 30.
Fall, Spring, Variable
SDS 400 Special Studies (1-4 Credits)
Normally for juniors and seniors. Instructor permission required.
Fall, Spring
SDS 410 Capstone in Statistical & Data Sciences (4 Credits)
This one-semester course leverages students’ previous coursework to address a real-world data analysis problem. Students collaborate in teams on projects sponsored by academia, government or industry. Professional skills developed include: ethics, project management, collaborative software development, documentation and consulting. Regular team meetings, weekly progress reports, interim and final reports, and multiple presentations are required. Open only to Statistical and Data Science majors. Prerequisites: SDS 192, SDS 291 and CSC 111. Restrictions: Statistical and Data Science majors only. Enrollment limited to 20. Instructor permission required. {M}
Fall, Spring
SDS 430D Honors Thesis (4 Credits)
Department permission required.
Fall, Spring
Crosslisted Courses
CSC 109/ SDS 109 Communicating with Data (4 Credits)
Offered as SDS 109 and CSC 109. The world is growing increasingly reliant on collecting and analyzing information to help people make decisions. Because of this, the ability to communicate effectively about data is an important component of future job prospects across nearly all disciplines. In this course, students learn the foundations of information visualization and sharpen their skills in communicating using data. This course explores concepts in decision-making, human perception, color theory and storytelling as they apply to data-driven communication. This course helps students build a strong foundation in how to talk to people about data, for both aspiring data scientists and students who want to learn new ways of presenting information. Enrollment limited to 40. {M}
Fall, Spring
CSC 235/ SDS 235 Visual Analytics (4 Credits)
Offered as CSC 235 and SDS 235. Visual analytics techniques can help people to derive insight from massive, dynamic, ambiguous and often conflicting data. During this course, students learn the foundations of the emerging, multidisciplinary field of visual analytics and apply these techniques toward a focused research problem in a domain of personal interest. Students who elect to take this course as a programming intensive course should have previously taken CSC 212. In this track, students learn to use R, Python and HTML5/JavaScript to develop custom visual analytic tools. Students preferring a non-programming intensive track may elect to use existing visual analytic software, such as Tableau or Plotly. Designations: Theory, Programming. Prerequisite: CSC 120 or equivalent. {M}
Fall, Spring, Variable
CSC 252 Algorithms (4 Credits)
Covers algorithm design techniques ("divide-and-conquer," dynamic programming, "greedy" algorithms, etc.), analysis techniques (including big-O notation, recurrence relations), useful data structures (including heaps, search trees, adjacency lists), efficient algorithms for a variety of problems and NP-completeness. Designation: Theory. Prerequisites: CSC 210, MTH 111 and MTH 153. Enrollment limited to 30. {M}
Fall, Spring, Alternate Years
CSC 294 Computational Machine Learning (4 Credits)
An introduction to machine learning from a programming perspective. Students develop an understanding of the basic machine learning concepts (including underfitting/overfitting, measures of model complexity, training/test set splitting and cross validation), but with an explicit focus on machine learning systems design (including evaluating algorithmic complexity and development of programming architecture) and on machine learning at scale. Principles of supervised and unsupervised learning are demonstrated via an array of machine learning methods including decision trees, k-nearest neighbors, ensemble methods and neural-networks/deep-learning, as well as dimension reduction, clustering and recommender systems. Students implement classic machine learning techniques, including gradient descent. Designations: Theory, Programming. Prerequisites: CSC 210, CSC 250, (MTH 112 or MTH 211), and knowledge of Python. Enrollment limited to 40. {M}
Fall, Spring, Annually
CSC 325 Seminar: Responsible Computing (4 Credits)
When is disruption good? Who is responsible for ensuring that an innovation has a positive impact? Are these impacts shared equitably? How can bias be eliminated from algorithms, if they exist? What assurances can anyone make about the technology they develop? What are the limitations of professional ethics? This seminar examines the ethical implication (i.e., ethics, justice, political philosophy) of computing and automation. Participants explore how to design technology responsibly while contributing to progress and growth. Discussions include: intellectual property; privacy, security and freedom of information; automation; globalization; access to technology; artificial intelligence; mass society; and emerging issues. Prerequisite: CSC 210. Restrictions: Juniors and seniors only. Enrollment limited to 12. Instructor permission required. {S}
Fall, Spring, Variable
ECO 220 Introduction to Statistics and Econometrics (5 Credits)
Summarizing, interpreting and analyzing empirical data. Attention to descriptive statistics and statistical inference. Topics include elementary sampling, probability, sampling distributions, estimation, hypothesis testing and regression. Assignments include use of statistical software to analyze labor market and other economic data. Prerequisite: ECO 150 or ECO 153. Restrictions: Students do not normally earn credit for more than one course on this list: ECO 220, GOV 203, MTH 220, PSY 201, SDS 201, SDS 220 or SOC 204. Enrollment limited to 55. {M}{S}
Fall, Spring, Annually
ECO 240 Econometrics (4 Credits)
This course offers an introduction to the basic principles of econometrics and the methods used to present and analyze economic data. Knowledge of statistical methods is essential for understanding and evaluating critically much of what is written about economics and social policy. The main goal of the course is for you to leave it as an informed and critical consumer of empirical studies and with the foundational skills to conduct your own original empirical research. Prerequisites: ECO 150, ECO 153, MTH 111 and either ECO 220, SDS 220 or SDS 291. {M}{S}
Fall, Spring, Annually
FYS 189 Data and Social Justice (4 Credits)
Students examine sociopolitical forces that impact the availability, structure and governance of data regarding various social justice issues. Students learn techniques for presenting data in ways that foreground the contexts of data production and remain accountable to diverse communities. Datasets about health equity, housing justice, environmental justice and carceral justice are studied, analyzed and visualized. Students identify institutions and stakeholders involved in data production, unpack the vested interests animating data semantics, consider what people and problems get erased in data structuring and evaluate ethical tradeoffs that data scientists grapple with as they plan for data presentation. Restrictions: First years only; students are limited to one first-year seminar. Enrollment limited to 16. WI {S}
Fall, Spring, Variable
GOV 203 Empirical Methods in Political Science (5 Credits)
The fundamental problems in summarizing, interpreting and analyzing empirical data. Discussions include research design and measurement, descriptive statistics, sampling, significance tests, correlation and regression. Special attention is paid to survey data and to data analysis using computer software. Restrictions: Students do not normally earn credit for more than one course on this list: ECO 220, GOV 203, MTH 220, PSY 201, SDS 201, SDS 220 or SOC 204. Enrollment limited to 75. {M}{S}
Spring
GOV 338/ SDS 338 Research Seminar in Political Networks (4 Credits)
Offered as GOV 338 and SDS 338. How does the behavior of a state, politician or interest group affect the behavior of others? Does Massachusetts’s decision to legalize recreational marijuana influence Vermont’s marijuana policies? From declarations of war to the decision of who congress members' voting alignments, social scientists are increasingly looking to political networks to recognize the inter-connectedness of the world. This course presents the essentials of social network analysis and how they can be applied to American politics. Prerequisites: SDS 220 or an equivalent introductory statistics course. Restrictions: Juniors and seniors only. Enrollment limited to 12. Instructor permission required. {S}
Fall, Spring, Alternate Years
MTH 246 Probability (4 Credits)
An introduction to probability, including combinatorial probability, random variables, discrete and continuous distributions. Prerequisites: MTH 153 and MTH 212 (may be taken concurrently), or equivalent. {M}
Fall
MTH 320/ SDS 320 Mathematical Statistics (4 Credits)
Offered as MTH 320 and SDS 320. An introduction to the mathematical theory of statistics and to the application of that theory to the real world. Discussions include functions of random variables, estimation, likelihood and Bayesian methods, hypothesis testing and linear models. Prerequisites: a course in introductory statistics, MTH 212 and MTH 246, or equivalent. Enrollment limited to 20. {M}
Spring
MTH 354 Mathematics of Deep Learning (4 Credits)
The developments of Artificial Intelligence (AI) are tied to an unprecedented reshaping of the human experience throughout society, impacting the arts, literature, science, politics, commerce, law, education, etc. Despite these consequential effects, understanding of AI is mostly empirical. The state of knowledge of deep learning has been recently likened to a pseudo-science like alchemy. Progress in this direction rests on truly interdisciplinary approaches that are equally informed from mathematics, computer science, statistics and data science. The course goals are: (1) Understand the mathematical foundations of deep learning, (2) Develop proficiency in using mathematical tools to analyze deep learning algorithms, (3) Apply mathematical concepts to implement real-world applications of deep learning. Not recommended for first-years. Prerequisites: MTH 211 and MTH 212. Enrollment limited to 12. {M}
Fall, Spring, Variable
PSY 201 Statistical Methods for Undergraduate Research (5 Credits)
An overview of the statistical methods needed for undergraduate research emphasizing methods for data collection, data description and statistical inference including an introduction to study design, confidence intervals, testing hypotheses, analysis of variance and regression analysis. Techniques for analyzing both quantitative and categorical data are discussed. Applications are emphasized, and students use R and other statistical software for data analysis. This course satisfies the basis requirement for the psychology major. Students who have taken MTH 111 or the equivalent or who have taken AP STAT should take SDS 220, which also satisfies the major requirement. Restrictions: Students do not normally earn credit for more than one course on this list: ECO 220, GOV 203, MTH 220, PSY 201, SDS 201, SDS 220 or SOC 204. Enrollment limited to 40. {M}
Fall, Spring, Variable
PSY 364/ SDS 364 Research Seminar: Intergroup Relationships (4 Credits)
Offered as PSY 364 and SDS 364. Research on intergroup relationships and an exploration of theoretical and statistical models used to study mixed interpersonal interactions. Example research projects include examining the consequences of sexual objectification for both women and men, empathetic accuracy in interracial interactions and gender inequality in household labor. A variety of skills including, but not limited to, literature review, research design, data collection, measurement evaluation, advanced data analysis and scientific writing are developed. Prerequisites: PSY 201, SDS 201, SDS 220 or equivalent; and PSY 202. Restrictions: Juniors and seniors only. Enrollment limited to 12. Instructor permission required. {M}{N}{S}
Fall, Spring, Alternate Years
SOC 204 Statistics and Quantitative Research Methods for Sociology (5 Credits)
This project-based course covers the study of statistics for the analysis of sociological data and the study of methods for quantitative sociological research more generally. Topics in statistics include descriptive statistics, probability theory, correlation, deduction and induction, error and bias, confidence intervals and simple linear regression. Topics in research methods include positivism, research design, measurement, sampling methods and survey design. All students participate in a lab which emphasizes the use of computer software to analyze real data. Students design and complete a survey research project over the course of the semester. Prerequisite: SOC 101. Restrictions: Students do not normally earn credit for more than one course on this list: ECO 220, GOV 203, MTH 220, PSY 201, SDS 201, SDS 220 or SOC 204. Enrollment limited to 40. {M}{S}
Fall
Additional Programmatic Information
It is possible for a Smith student to obtain a master of science in statistics from the University of Massachusetts Amherst in five years (four years at Smith plus one at UMass), through the Fifth Year MS in Statistics Program. Interested students should consult with the director of the program.
Students interested in pursuing graduate work in statistics or data science should consult with their major adviser to plan an appropriate course of study. In either case, a solid foundation in mathematics (calculus I, II, and III, as well as linear algebra) is essential.
Graduate Programs in Statistics
The ASA maintains several lists of graduate programs in statistics that may help you find options that suit your needs.
Graduate Programs in Data Science
As a newer discipline, programs in data science are still in their infancy. The ASA maintains a list of graduate programs in “Big Data”, although this should not be conflated with data science. A more comprehensive list of data science degree programs is maintained by datascience.community.
Advisers
Benjamin Baumer, Shiya Cao, Kaitlyn Cook, Randi Garcia, Albert Y. Kim, Katherine Kinnaird, Scott LaCombe, Lindsay Poirier.
Choosing Your Adviser
If you wish to declare an SDS major or minor and need an advisor, please fill out the SDS major/minor adviser request form.
Students interested in doctoral programs in Statistics should consider the Major in Mathematical Statistics jointly operated by SDS and MTH.
Programs we have recently sent students to that have fulfilled major requirements:
-
PRESHCO (Programa de Estudios Hispánicos en Córdoba)
Language requirement- Spanish
For MST Majors:
-
CIMAT- Mathematics and Stats program in Guanajuato, Mexico
Courses taught in English
For more information about study abroad, see the Smith study abroad page or contact the SDS study abroad advisor, Scott LaCombe.
Students who received a score of 4 or 5 on the AP Statistics Exam should take SDS 290 (Research Design and Analysis) or SDS 291 (Multiple Regression).
Additional Course Information
Faculty
Program Committee
Contact Department of Statistical & Data Sciences
Wright Hall 226
Smith College
Northampton, MA 01063
Phone: 413-585-3520 Email: kdunphy@hghgjm.com
Kelley Dunphy
Administrative Coordinator
Randi L. Garcia
Chair, Program in Statistical & Data Sciences