Course Outcome:
After taking this course, students will be able to understand and implement in Python algorithms of Unsupervised Machine Learning and apply them to real-world datasets.
Course Topics and Approach:
Unsupervised Machine Learning involves finding patterns in datasets. The core of this course involves study of the following algorithms:
Clustering: Hierarchical, DBSCAN, K Means & Gaussian Mixture Model
Dimension Reduction: Principal Component Analysis
Unlike many other courses, this course:
-
Has a detailed presentation of the the math underlying the above algorithms, including normal distributions, expectation maximization, and singular value decomposition.
-
Has a detailed explanation of how algorithms are converted into Python code with lectures on code design and use of vectorization
-
Has questions (programming and theory) and solutions that allow learners to get practice with the course material
The course codes are then used to address case studies involving real-world data to perform dimension reduction/clustering for the Iris Flowers Dataset, MNIST Digits Dataset (images), and BBC Text Dataset (articles).
Course Audience:
This course is designed for:
-
Scientists, engineers, and programmers and others interested in machine learning/data science
-
No prior experience with machine learning is needed
-
Students should have knowledge of
-
Basic linear algebra (vectors, transpose, matrices, matrix multiplication, inverses, determinants, linear spaces)
-
Basic probability and statistics (mean, covariance matrices, normal distributions)
-
Python 3 programming
-
Students should have a Python installation, such as the Anaconda platform, on their machine with the ability to run programs in the command window and in Jupyter Notebooks
Teaching Style and Resources:
-
Course includes many examples with plots and animations used to help students get a better understanding of the material
-
Course has many exercises with solutions (theoretical, Jupyter Notebook, and programming) to allow students to gain additional practice
-
All resources (presentations, supplementary documents, demos, codes, solutions to exercises) are downloadable from the course Github site.
2021.08.28 Update:
-
Section 9.5: added Autoencoder example
-
Section 9.6: added this new section with an Autoencoder Demo
2021.11.02 Update:
-
Sections 2.3, 2.4, 3.4, 4.3: updates so codes can run in more recent versions of python and matplotlib and updates to presentations to point out the changes
2021.11.02 Update:
-
Added English captions to the course videos
Introduction
Introduction to Unsupervised Machine Learning with Python Course
Information about course audience, prerequisites, and how to get most from course
Information about course Github site and resources, installing Anaconda distribution if required, installing python packages, and testing set up
Python Demos
This brief section gives an overview of the demos in Section 2
Jupyter notebook demo of basic numpy functionality used in the course
Exercises for Section 2.1
Jupyter notebook demos of numpy matrix operations functionality used in the course
Exercises for Section 2.2
Jupyter notebook demos of basic matplotlib plotting functionality used in this course
Exercises for Section 2.3
Jupyter notebook demos of matplotlib colormesh, scatter plot, and animation functionality used in this course
Exercises for Section 2.4
Jupyter notebook demo of basic pandas functionality for reading data from csv files
Exercises for Section 2.5
Jupyter notebook demo of generating dataset using sklearn datasets functionality
Review of Mathematical Concepts
Review of what is covered in Section 3
Description of data for Unsupervised Machine Learning and demo of using sklearn and wordcloud to process and visualize text. Students will be able set up datasets for their applications and be able to use basic sklearn functionality to convert text to feature matrices.
Exercises for Section 3.1
Review of computational complexity and relevance to algorithms with demos using numpy package. Student will be able to estimate complexity power using numpy.
Exercises for Section 3.2
Description of distance measures and now to compute them using the numpy package functionality. Students will be able to compute distances between vectors using numpy.
Exercises for Section 3.3
Description of singular value decomposition and demo of how to compute svd using numpy. Students will understand what the singular value decomposition is, how to compute it, and how it will be used in the course.
Exercises for Section 3.4
Review of mean, variance, and covariance, which are used in various unsupervised machine learning algorithms. Demo shows how to use numpy functions to compute mean, variance, and covariance.
Exercises for Section 3.5
Hierarchical Clustering
Description of Hierarchical Clustering Algorithm. Students will be able to understand algorithm, its complexity, and strengths and weaknesses.
Exercises for Section 4.1
Description of course code design for the Hierarchical Clustering Algorithm. Given this code design, students will be able to implement the algorithm using Python.
Walkthrough of course Hierarchical Clustering code. Students will be able to understand and use the course Hierarchical Clustering code.
Exercises for Section 4.3
DBSCAN Clustering
Description of the DBSCAN algorithm
Exercises for Section 5.1
Review of DBSCAN code design for course
Walkthrough of course DBSCAN code
Exercises for Section 5.3
K Means Clustering
Description of K Means Clustering Algorithm. Students will be able to understand algorithm, its complexity, and strengths and weaknesses.
Exercises for Section 6.1
Review of course K Means code design
Walkthrough of course K Means code
Exercises for Section 6.3
Gaussian Mixture Model Clustering
Description of the Normal Distribution Probability Density Function for one dimension and multiple dimensions.
Exercises for Section 7.1
Description of Gaussian Mixture Model Clustering Algorithm. Students will be able to understand algorithm, its complexity, and strengths and weaknesses.
Exercises for Section 7.2
Review of course Gaussian Mixture Model code design
Walkthrough of course Gaussian Mixture Model code
Exercises for Section 7.4
Comparison of Clustering Algorithms
Description of Silhouette Index for measuring quality of clustering
Exercises for Section 8.1
Comparison of DBSCAN, K Means, and Gaussian Mixture Model clustering algorithms for 6 sklearn datasets
Dimension Reduction
Overview of the dimension reduction algorithms
Description of the Principal Component Analysis Algorithm and Jupyter Notebook demo.
Exercises for Section 9.1
Review of design for Principal Component Analysis code.
Walkthrough of Principal Component Analysis code.
Exercises for Section 9.3
Application of Principal Component Analysis to MNIST Digits Dataset.
Exercises for Section 9.4
Description of how Autoencoders can be used for dimension reduction.
This optional section has demo on using autoencoders for dimension reduction.
Case Studies
Description of the Purity and Bar Chart metrics for measuring quality of clustering plus demo and code walkthrough of Python implementation
Discussion of using clustering algorithms and PCA to reduce dimension to find clusters in the Iris Flower Dataset
Exercises for Section 10.2
Discuss of using clustering algorithms and PCA to reduce dimension to find clusters in the MNIST Digits Dataset
Exercises for Section 10.3
Discussion of using clustering algorithms and PCA to reduce dimension to group articles for the BBC Text dataset
Exercises for Section 10.4
Concluding Remarks and Thank You
Summary of algorithms, listing of software packages for clustering and PCA, and thank you
Optional
Bonus Lecture