Although the term deep learning is commonly heard, it is not well defined.
For our purposes, let’s let “deep” refer to any neural-network based model that contains more than a single “hidden layer”. This includes everything from Multi-Layer Perceptrons all the way up to convolutional neural networks or recurrent neural networks with hundreds of hidden layers.
This is the original “neural network”.
Useful if you have “traditional” (bag of variables) datasets and need a nonlinear model, if you have enough data.
Beware of making MLPs too “deep” - they can become hard to train.
Image from https://anhreynolds.com/blogs/cnn.html
Each \(N\times N\) patch in the Input is “compared” (via dot product) to the filter (or kernel) and the result creates a single pseudo-pixel in the Output.
Image from https://anhreynolds.com/blogs/cnn.html
An example kernel that will provide vertical edge detection. Notice how it responds highly at the boundary between the “lighter” and “darker” pixels in the input.
In practice, we don’t hand-craft the filters—we let the network learn them. (In other words, the values in the filter are weights (or parameters) in the model.)
Image from https://anhreynolds.com/blogs/cnn.html
Recurrent Neural Networks (RNNs) & LSTMs
Generative Adversarial Networks (GANs)

LSTM Cell image: https://upload.wikimedia.org/wikipedia/commons/5/56/LSTM_cell.svg
Autoencoder Image: https://upload.wikimedia.org/wikipedia/commons/2/23/Autoencoder-BodySketch.svg
Examples: Stable Diffusion, DALL-E 3, Sora (video), MusicGen (audio).
Transformer networks were introduced for natural language processing (NLP) but have since become the dominant architecture across many domains — text, vision, audio, and time-series.
The key feature of transformer networks is the self-attention mechanism, which allows the network to weigh different parts of the input differently based on relevance. This replaces the recurrent state used in RNNs.
Vision Transformers (ViT) apply the same architecture to images (split into patches), and now match or exceed CNNs on many vision benchmarks.

Images: A. Vaswani et al., “Attention Is All You Need,” arXiv:1706.03762 [cs], Dec. 2017 [Online]. Available: http://arxiv.org/abs/1706.03762.
An LLM (Large Language Model) is a transformer-based model pre-trained on massive text corpora, then fine-tuned to follow instructions. LLMs can generate fluent text and perform a broad range of language and reasoning tasks.
Is identifying birds that different from identifying objects in an image? They are both visual tasks… They both require us to use similar parts of our vision system. It makes sense that a network trained for one task might be able to become quickly proficient at a different (but similar) task.
Labeled data is usually hard to get. (Correctly labeled data is even harder.)
We need techniques to train networks with fewer labeled examples.
Trick is to first train network to perform a task that can be automated, then final training requires less data.
Few-Shot Learning survey paper: https://arxiv.org/abs/1904.05046
In LLMs: Few-shot learning can be accomplished by providing some example (input and output) pairs as part of the prompt, prior to providing the input for an unknown. The LLM will then use the pattern from the examples to construct a prediction for the unknown.
You can often use an existing model that can either apply directly to your task or can be fine-tuned through transfer learning to fit your task.
Hugging Face (https://huggingface.co) has emerged as the central hub for pre-trained models, datasets, and live demos. The transformers, diffusers, and datasets libraries provide a unified interface to thousands of models.
Other sources:
This article takes a look at three kinds of machine learning tasks (Classification, Regression, and Clustering) and present some of the best known models for each.
https://elitedatascience.com/machine-learning-algorithms
Here is a similar article that looks at many more kinds of machine learning tasks (and of course the same ones as above as well).
https://www.dataquest.io/blog/top-10-machine-learning-algorithms-for-beginners/
Papers with Code maintains a repository of state-of-the art research papers and provides open-source implementations and evaluation metrics.
https://paperswithcode.com/sota
Deep Learning Book (by Ian Goodfellow, Yoshua Bengio, and Aaron Courville) - a comprehensive textbook covering a wide range of deep learning topics. https://www.deeplearningbook.org/
Papers with Code - a website that aggregates recent research papers and provides open-source implementations and evaluation metrics available with each of them. https://paperswithcode.com
MIT Deep Learning Series - a collection of video lectures by prominent researchers, designed to give a broad overview of the field of machine learning and deep learning. https://deeplearning.mit.edu/
Coursera Deep Learning Specialization - a series of online courses providing a graduate-level introduction to deep learning. https://www.coursera.org/specializations/deep-learning
PyTorch.org - the leading open-source platform for deep learning, with broad adoption across both research and industry. https://pytorch.org/
TensorFlow.org - a widely-used open-source platform for constructing and training machine learning models, including deep learning. https://www.tensorflow.org/
Deep Learning & How to Choose the Right Model

CS 4/5623 Fundamentals of Data Science