Learn Data Analysis with Python in this comprehensive tutorial for beginners, with exercises included!
Data Analysis has been around for a long time, but up until a few years ago, it was practiced using closed, expensive and limited tools like Excel or Tableau. Python, SQL and other open libraries have changed Data Analysis forever.
In this tutorial you'll learn the whole process of Data Analysis: reading data from multiple sources (CSVs, SQL, Excel, etc), processing them using NumPy and Pandas, visualize them using Matplotlib and Seaborn and clean and process it to create reports.
⚠️ Note: Instead of loading the notebooks on notebooks.ai, you should use Google Colab instead. Here are instructions on loading a notebook directly from GitHub into Google Colab: https://colab.research.google.com/github/googlecolab/colabtools/blob/master/notebooks/colab-github-demo.ipynb#scrollTo=K-NVg7RjyeTk
⭐️ Course Contents ⭐️
⌨️ Part 1: Introduction
What is Data Analysis, why Python?, what other options are there? what's the cycle of a Data Analysis project? What's the difference between Data Analysis and Data Science?
🔗 Slides for this section: https://docs.google.com/presentation/d/1fDpjlyMiOMJyuc7_jMekcYLPP2XlSl1eWw9F7yE7byk/edit#slide=id.p
⌨️ Part 2: Real Life Example of a Python/Pandas Data Analysis project (00:11:11)
A demonstration of a real life data analysis project using Python, Pandas, SQL and Seaborn. Don't worry, we'll dig deeper in the following sections
🔗 Notebooks: https://github.com/ine-rmotr-curriculum/FreeCodeCamp-Pandas-Real-Life-Example
⌨️ Part 3: Jupyter Notebooks Tutorial (00:30:50)
A step by step tutorial to learn how to use Juptyer Notebooks
🔗 Notebooks: https://github.com/ine-rmotr-curriculum/ds-content-interactive-jupyterlab-tutorial
⌨️ Part 4: Intro to NumPy (01:04:58)
Learn why NumPy was such an important library for the data-processing world in Python. Learn about low level details of computations and memory storage, and why tools like Excel will always be limited when processing large volumes of data.
🔗 Notebooks: https://github.com/ine-rmotr-curriculum/freecodecamp-intro-to-numpy
⌨️ Part 5: Intro to Pandas (01:57:08)
Pandas is arguably the most important library for Data Processing in the Python world. Learn how it works and how its main data structure, the Data Frame, compares to other tools like spreadsheets or DFs used for Big Data
🔗 Notebooks: https://github.com/ine-rmotr-curriculum/freecodecamp-intro-to-pandas
⌨️ Part 6: Data Cleaning (02:47:18)
Learn the different types of issues that we'll face with our data: null values, invalid values, statistical outliers, etc, and how to clean them.
🔗 Notebooks: https://github.com/ine-rmotr-curriculum/data-cleaning-rmotr-freecodecamp
⌨️ Part 7: Reading Data from other sources (03:25:15)
🔗 Notebooks: https://github.com/ine-rmotr-curriculum/RDP-Reading-Data-with-Python-and-Pandas
⌨️ Part 8: Python Recap (03:55:19)
If your Python or coding skills are rusty, check out this section for a quick recap of Python main features and control flow structures.
🔗 Notebooks: https://github.com/ine-rmotr-curriculum/ds-content-python-under-10-minutes