Projects
PythonBigQueryGoogle ColabPower BI
01

Chicago Taxi Analysis

Explored Chicago's taxi industry using Google BigQuery, sampling 100,000 trips to uncover demand patterns across peak hours and neighbourhoods, and exploring how a company could increase their trips.

Split across two parts:

  • Google Colab Notebook that connects to BigQuery via Google Cloud, uses Pandas to clean and explore the data, and Plotly Express for visualisations.
  • Power BI Report across four pages:
    • Company — fleet size vs trip volume per company, revealing a strong positive correlation (R=0.96) between the two, with a full breakdown table
    • Community Area Analysis — Pareto chart by pick-up area with a Top N selector and company filter, showing the top 10 areas account for 85% of all trips, plus average fare by area
    • Day Analysis — date range selector with trips over the year and by day of the week, with May to October the busiest period and demand peaking Wednesday to Friday
    • Time Analysis — trips by time of day and duration, with 54% of trips between 12:00 and 20:00 and 65% under 20 minutes

The full BigQuery dataset spans 2013 onwards with over 200 million trips. For this project I restricted the data to 2022 and sampled 100,000 rows using Pandas, a trade-off to keep processing times manageable.

A natural extension would be to include multiple years to answer questions like "How did COVID-19 affect the taxi industry in Chicago?"

This project strengthened my understanding of Pandas and Plotly in Python notebooks, as well as Measures and Parameters in Power BI for user-defined visualisations.