This note will be accompanied by a live demonstration of a Python-based analysis of a dataset (“Wine Quality”).
https://www.python.org/ : The official Python site has the most current documentation, and links to tutorials, forums, etc.
https://packaging.python.org/tutorials/installing-packages/ : How to install Python packages.
pip
package manager installed.https://docs.astral.sh/uv/ : uv is an extremely fast Python package and project manager.
pip
and a virtual environment tool alone, but uv combines these tasks and makes it easier.The ideal for a user who needs multiple “small” projects is to use an environment manager like uv so that you can have a “\(\textrm{project} = \textrm{directory}\)” logical model, as seen with the yellow boxes above.
https://jupyter.org/ : “Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages.”
uv run jupyter lab
.
See Computational and Inferential Thinking, Chapter 3: https://www.inferentialthinking.com/chapters/03/programming-in-python.html
But, we will not focus on the datascience
library they use – instead, we will use widely-available and standard libraries (next slides).
Python Libraries for Data Science
Although almost any library could be used in a data science analysis, the following libraries are very commonly used for data analysis, wrangling, cleaning, and visualization, as well as mathematical modeling.
Python Libraries for Data Science (2)
Python Libraries for Data Science (3)
Two datasets are included, related to red and white vinho verde wine samples, from the north of Portugal. The goal is to model wine quality based on physicochemical tests (see [Cortez et al., 2009]).
Python Tools
CS 4/5623