This project is divided into two parts:
• Google Colab Notebook
• PowerBI Report
The Google Colab Notebook utilises a google cloud connection to retrieve the data from BigQuery, creates a pandas dataset to first clean and explore the data and plotly express for visualisations.
The PowerBI report connects to a SQL database to retrieve the data and utilises multiple methods to display the data in an interactable fashion for the user.
On the Community Area Analysis tab the user can select the Top N Community Areas they wish to view as well as filter by Company, and a Pareto Chart displays the cumulative percentage of number of trips.
The Day Analysis tab gives the user the option to restrict the date range for the data, and displays the number of trips during the selected period.
The full BigQuery dataset contains data from 2013 onwards consisting of over 200 million taxi trips. For the purpose of this project I have restricted the data only to 2022, and further taken a random sample using Pandas of 100,000 rows.
This decision was made due to the difficult nature of processing larger datasets but a possible avenue for futher analysis would include more years and more data points. A possible question that require more data from previous years
would be "How did Covid-19 affect the taxi industry within Chicago?"
This project has helped improve my understanding of Pandas and Plotly within Python notebooks, as well as utilising Measures and Parameters within PowerBI to create user defined visualisations.