Tasks Overview
🗓️ 5-Day Data Analyst Onboarding Overview
Section titled “🗓️ 5-Day Data Analyst Onboarding Overview”Welcome to your 5-day onboarding sprint! This program is designed to introduce you to our real-world data stack and workflows. Each day you’ll tackle a focused, hands-on challenge that simulates core parts of our pipeline—from raw data ingestion to analysis and integration.
In every daily_tasks/day_X/
folder, you’ll find:
- 🔗 A link to the task’s GitHub Issue (in the README or markdown files)
- 📂 All necessary datasets, database connection details, and starter materials
- 🧑🏫 You’ll get guided explanations and walkthroughs by a team lead during the morning masterclass.
📍 What to Expect Each Day
Section titled “📍 What to Expect Each Day”📊 Day 1 – Onboarding & Setup + Data Exploration & Submission
Section titled “📊 Day 1 – Onboarding & Setup + Data Exploration & Submission”Get access to the GitHub repository and Slack. Meet the team.
Start with a small dataset. Explore it using Google Sheets.
You’ll answer simple analytical questions and share your work.
🧰 Skills: Data loading, quick exploration, Google Sheets, basic markdown submission
🧹 Day 2 – Data Cleaning & Preparation with Python
Section titled “🧹 Day 2 – Data Cleaning & Preparation with Python”Work with messy data. You’ll clean, transform, and prepare a dataset for analysis.
Handle duplicates, missing values, naming conventions, and outliers.
🧰 Skills: pandas, data types, cleaning strategies, CSV export
🗃️ Day 3 – SQL + Python Exploration
Section titled “🗃️ Day 3 – SQL + Python Exploration”Connect to the PostgreSQL training DB and explore it with SQL inside Python notebooks.
Answer analytical questions using real queries and joins.
🧰 Skills:
psycopg2
/sqlalchemy
,pandas.read_sql
, JOINs, filtering, logic in SQL
🧮 Day 4 – Data Integration & Schema Design
Section titled “🧮 Day 4 – Data Integration & Schema Design”Integrate a new dataset into the existing PostgreSQL schema.
Identify keys, clean/prep with Python, and write a script to append it to the database.
🧰 Skills: Data modeling, ETL scripting, foreign key relationships, INSERTs via Python
🚀 Day 5 – Project Wrap-up & GitHub Deployment
Section titled “🚀 Day 5 – Project Wrap-up & GitHub Deployment”Finalize your project by integrating all components from Days 1-4 into a cohesive whole. Prepare your codebase for public sharing, ensuring all scripts are well-documented and your repository is structured professionally. Push your complete project to your personal GitHub, ready to showcase your new skills!
🧰 Skills: Project management, code organization, Git & GitHub, documentation, data storytelling
We’re here to help — ask questions during the daily masterclass or on Slack!
Let’s build something great. 💪