Although the term deep learning is commonly heard, it is not well defined.
For our purposes, let’s let “deep” refer to any neural-network based model that contains more than a single “hidden layer”. This includes everything from Multi-Layer Perceptrons all the way up to convolutional neural networks or recurrent neural networks with hundreds of hidden layers.
This is the original “neural network”.
Useful if you have “traditional” (bag of variables) datasets and need a nonlinear model, if you have enough data.
Beware of making MLPs too “deep” - they can become hard to train.
Image from https://anhreynolds.com/blogs/cnn.html
Each \(N\times N\) patch in the Input is “compared” (via dot product) to the filter (or kernel) and the result creates a single pseudo-pixel in the Output.
Image from https://anhreynolds.com/blogs/cnn.html
An example kernel that will provide vertical edge detection. Notice how it responds highly at the boundary between the “lighter” and “darker” pixels in the input.
In practice, we don’t hand-craft the filters—we let the network learn them. (In other words, the values in the filter are weights (or parameters) in the model.)
Image from https://anhreynolds.com/blogs/cnn.html
RNN Image: https://upload.wikimedia.org/wikipedia/commons/b/b5/Recurrent_neural_network_unfold.svg
LSTM Cell image: https://upload.wikimedia.org/wikipedia/commons/5/56/LSTM_cell.svg
Autoencoder Image: https://upload.wikimedia.org/wikipedia/commons/2/23/Autoencoder-BodySketch.svg
Image: Dan, Y., Zhao, Y., Li, X. et al. Generative adversarial networks (GAN) based efficient sampling of chemical composition space for inverse design of inorganic materials. npj Comput Mater 6, 84 (2020). https://doi.org/10.1038/s41524-020-00352-0
Transformer networks are a type of neural network architecture used in natural language processing (NLP) tasks such as machine translation and sentiment analysis.
The key feature of transformer networks is the self-attention mechanism, which allows the network to weigh different parts of the input sequence differently based on relevance. This is used instead of recurrent state (as used in a RNN) to model the time-series relationship.
Images: A. Vaswani et al., “Attention Is All You Need,” arXiv:1706.03762 [cs], Dec. 2017 [Online]. Available: http://arxiv.org/abs/1706.03762.
LLM (Large Language Model) is a type of NLP model designed to generate text that is similar to human writing.
Is identifying birds that different from identifying objects in an image? They are both visual tasks… They both require us to use similar parts of our vision system. It makes sense that a network trained for one task might be able to become quickly proficient at a different (but similar) task.
Labeled data is usually hard to get. (Correctly labeled data is even harder.)
We need techniques to train networks with fewer labeled examples.
Trick is to first train network to perform a task that can be automated, then final training requires less data.
Few-Shot Learning survey paper: https://arxiv.org/abs/1904.05046
You can often use an existing model that can either apply directly to your task or can be fine-tuned through transfer learning to fit your task. Some places to look are listed below:
This article takes a look at three kinds of machine learning tasks (Classification, Regression, and Clustering) and present some of the best known models for each.
https://elitedatascience.com/machine-learning-algorithms
Here is a similar article that looks at many more kinds of machine learning tasks (and of course the same ones as above as well).
https://www.dataquest.io/blog/top-10-machine-learning-algorithms-for-beginners/
Papers with Code maintains a repository of state-of-the art research papers and provides open-source implementations and evaluation metrics.
https://paperswithcode.com/sota
Deep Learning Book (by Ian Goodfellow, Yoshua Bengio, and Aaron Courville) - a comprehensive textbook covering a wide range of deep learning topics. https://www.deeplearningbook.org/
Papers with Code - a website that aggregates recent research papers and provides open-source implementations and evaluation metrics available with each of them. https://paperswithcode.com
MIT Deep Learning Series - a collection of video lectures by prominent researchers, designed to give a broad overview of the field of machine learning and deep learning. https://deeplearning.mit.edu/
Coursera Deep Learning Specialization - a series of online courses providing a graduate-level introduction to deep learning. https://www.coursera.org/specializations/deep-learning
TensorFlow.org - a popular open-source platform for constructing and training machine learning models, including deep learning. https://www.tensorflow.org/
PyTorch.org - another popular open-source platform for constructing ML models. Probably more popular than Tensorflow among ML researchers at the moment. https://pytorch.org/
Deep Learning & How to Choose the Right Model
CS 4/5623 Fundamentals of Data Science