Data scientists must be knowledgeable about every aspect of the data lifecycle, from acquisition to archival storage.
The main phases are:
Common data quality issues
What kind of analysis are you performing?
What questions are you hoping to answer?
What will be done with the answers once you find them?
Descriptive Modeling
Generally, you just run the analysis and report the findings. For example, you might analyze historical housing data for an area to see how housing prices have fluctuated as industry came to or left the area.
Predictive Modeling
Predictive models are often “trained” on an initial sample of the kinds of data that you are interested in (whether supervised or unsupervised). Then, you can make predictions about new values as needed.
Whether your model is descriptive or predictive, you need some way to evaluate and interpret the results.
This ties back to the original question you are hoping to answer.
How well did your analysis answer the question?
What insights does it give you into the process that produced the data you are analyzing?
Results are not useful until someone knows about them!
In industry, this usually means informing management about the findings and their implications or potential ramifications.
In research/academia, this means presenting at a conference, publishing in a journal, writing a book, or teaching in a classroom.
It is important that the process of your experiment is properly archived so that it can be repeated in the future.
The “digital age” makes that easier in some ways, and harder in others.

Image: https://www.flickr.com/photos/doctorow/49501794586 - Cory Doctorow
The Data Analysis Life Cycle

CS 4/5623