INST 414 / SDSI 414: Data Science Techniques (Spring 2026)

Format: Blended — async video lectures (online) + in-person Friday labs

Lab Location: BLD4 3321

Lab Time: Fridays 9:00 to 10:30 a.m.

Instructor: Zubin Jelveh (zjelveh@umd.edu)

Office Hours: TBD

Course Description

This course provides a strong foundation in key data science techniques. As the amount of data available to public- and private-sector organizations continues to grow, it has become critical to have data scientists who can leverage this data to generate insights and enable data-driven decision-making, while ensuring that these decisions are based on fair and ethical principles.

You will learn foundational data science techniques including basic probability and statistics, supervised machine learning algorithms (such as Naive Bayes, linear and logistic regression, decision trees, and random forests), feature engineering, working with Python libraries, principles of model selection and evaluation (including cross-validation), and considerations around fairness and bias in machine learning applications. We’ll also touch on text mining techniques and explore generative AI.

Course format: Lecture content is delivered asynchronously through video. Friday sessions are hands-on labs where you’ll apply techniques to real datasets with instructor support.

Note: This course primarily focuses on foundational supervised learning techniques.

Note: Knowledge of Python is assumed.

Course Objectives

After completing this course, students will be able to:

  1. Apply basic probability concepts and relate them to uncertainty in predictions
  2. Implement standard machine learning algorithms (linear regression, logistic regression, decision trees, random forest)
  3. Select and evaluate models using cross-validation and appropriate performance metrics
  4. Engineer features from raw data to improve model performance
  5. Identify and address fairness considerations in machine learning applications

Prerequisites

Minimum grade of C- in MATH115 and STAT100; minimum C- in INST126 or GEOG276; minimum C- in INST201, INST301, or BSOS233; minimum C- in a social science course (AASP101, ANTH210, ECON200, GVPT170, PSYC100, SOCY100, etc.); and minimum C- in BSOS233 or INST314.

Technical Requirements

  • A laptop capable of running Python (bring to Friday labs)
  • No software purchases required — we use free, open-source tools
  • Access to ELMS for async lecture videos, assignments, and announcements

Working with Lab Notebooks

Lab notebooks open in Google Colab directly from the links below.

Important: When you open a lab notebook, immediately click File → Save a copy in Drive. This creates your own version in your Google Drive that you can edit and save.

If you skip this step, your work will not be saved.

Turn off AI assistance: Go to Settings → AI Assistance and uncheck everything. AI-generated code is not allowed on assignments in this course.

Assessment

Homework Assignments (30%) Four homework assignments throughout the semester that build on lecture and lab content. Assignments will be distributed and submitted through ELMS.
Weekly Quizzes (20%) 10 weekly quizzes covering lecture material. These are administered through ELMS.
Midterm Exam (25%) Covers material from Weeks 1-8 (probability through random forests). Scheduled for Week 9 (4/3).
Final Project or Exam (25%) Details to come.

Weekly Schedule

Week 1 (1/30): Data Science Background


Week 2 (2/6): Probability


Week 3 (2/13): Conditional Probability

Problem Set 1 Out


Week 4 (2/20): Bayes Theorem / Making Predictions


Week 5 (2/27): Performance Metrics / Naive Bayes

Problem Set 1 Due


Week 6 (3/6): Linear and Logistic Regression

Problem Set 2 Out


Week 7 (3/13): Overfitting / Decision Trees


Spring Break (3/20): No Class


Week 8 (3/27): Random Forest / Gradient Boosting

Problem Set 2 Due


Week 9 (4/3): Midterm (No Lab)


Week 10 (4/10): Fairness

Problem Set 3 Out


Week 11 (4/17): Sources of Bias, Part I


Week 12 (4/24): Sources of Bias, Part II

Problem Set 3 Due / Problem Set 4 Out


Week 13 (5/1): Text Mining


Week 14 (5/8): Text Mining / Generative AI

Problem Set 4 Due


Course Policies

Communications

ELMS — Official course site for video lectures, materials, assignments, announcements, and grades. Make sure your email and announcement notifications are enabled.

Email — Administrative requests, quick clarifications. Please prefix the subject line with [INST414]. If you haven’t received a reply within 2 days, please email again.

Office Hours — Complex technical questions.

Accessibility and Disability Support

The University of Maryland is committed to creating and maintaining a welcoming and inclusive educational, working, and living environment for people of all abilities. Students with disabilities who require accommodations should contact the Accessibility and Disability Service (ADS) at 301-314-7682 or adsfrontdesk@umd.edu. Please inform me of any accommodations as soon as possible, preferably within the first two weeks of the semester.

More information: https://www.counseling.umd.edu/ads/

LLM Policy

While you are allowed and encouraged to use LLMs (ChatGPT, Copilot, etc.) to help you learn the material — for example, to explain concepts, debug error messages, or explore topics in more depth — do not use them to complete assignments. If it is clear that an LLM has been used on an assignment, you will receive an automatic zero.

UMD Policies

It is our shared responsibility to know and abide by the University of Maryland’s policies that relate to all courses, which include topics like academic integrity, student and instructor conduct, accessibility and accommodations, attendance and excused absences, grades and appeals, and copyright and intellectual property.

Please visit www.ugst.umd.edu/courserelatedpolicies.html for the full list of campus-wide policies.