MATHEMATICAL METHODS in DATA SCIENCE (with Python)

MATHEMATICAL METHODS in DATA SCIENCE (with Python)#

Author: Sebastien Roch, Department of Mathematics, University of Wisconsin-Madison

This textbook on the mathematics of data has two intended audiences:

  • For students majoring in math or other quantitative fields like physics, economics, engineering, etc.: it is meant as an invitation to data science and AI from a rigorous mathematical perspective.

  • For mathematically-inclined students in data science related fields (at the undergraduate or graduate level): it can serve as a mathematical companion to machine learning, AI, and statistics courses.

Content-wise it is a second course in linear algebra, multivariable calculus, and probability theory motivated by and illustrated on data science applications. As such, the reader is expected to be familiar with the basics of those areas, as well as to have been exposed to proofs – but no knowledge of data science is assumed. Moreover, while the emphasis is on the mathematical concepts and methods, coding is used throughout. Basic familiarity with Python will suffice. The book provides an introduction to some specialized packages, especially Numpy, NetworkX, and PyTorch.

The book is based on Jupyter notebooks that were developed for MATH 535: Mathematical Methods in Data Science, a one-semester advanced undergraduate and Master’s level course offered at UW-Madison.

Important

To run the code in these notes, you need to import the following librairies.

# PYTHON 3
import numpy as np
from numpy import linalg as LA
import matplotlib.pyplot as plt
import pandas as pd
import networkx as nx
import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, TensorDataset
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import mmids
seed = 535
rng = np.random.default_rng(seed)

The file mmids.py is here.

All datasets can be downloaded on the GitHub page of the notes.

Jupyter notebooks containing just the code are provided at the end of each chapter. Running them in Google Colaboratory is recommended.

Note

If you find typos, please open an issue on GitHub by using the provided button in the top right menu.

Image credit: Sidebar logo made with Midjourney