in Data

Analysing my Uber Cab Usage

A conversation with a friend about Uber prompted me to analyze my Uber usage pattern. While doing that, I thought of sharing the results here along with a few tips on analyzing and visualizing the data. So, here we are.

Data Source

I obtained the data for my Uber rides from its trip page. I processed and cleaned this raw data before using it for my analysis. The analysis below is from April 2014 (that’s when I started using Uber) until April 2016.

You can download the data used for this exercise from here.

Completed Uber Rides vs Cancelled Uber Rides

The first thing that I looked at was the ratio of completed and canceled rides.

Completed Uber Rides vs Cancelled Uber Rides

2014 has the least ratio of completed to canceled rides.
But looking at percentages when the base is not the same can be deceptive. For e.g., let’s say I lost 20% of my total investment in share market last month while my friend lost 1%. On the surface, it appears that I lost more than my friend. But what if I tell you that my total investment was Rs 1000/- (so, I lost Rs 200) while my friend’s total investment was Rs 1,000,000, and so he lost Rs 10,000. Will you still sympathize with me? I doubt.

So, let’s look at the complete picture in the current scenario:

Completed Uber Rides vs Cancelled Uber Rides

Well, there isn’t a huge variation in the rides booked year-over-year, so we could have omitted the ‘Total Rides Booked’ column, but as a best practice let’s keep it.

Note: I did not use a chart here to represent this data. Contrary to popular belief, graphs are not always the best medium to provide information. Sometimes tabular data does that more efficiently.
Also, note the usage of color here. I used green color to show completed rides (positive behavior) and red color to show canceled rides (negative behavior). 

Uber Rides Completed Per Year

Next, I looked at my ride completion history:

Rides CompletedFrom 2014 to 2015, the no. of rides that I completed shot up by 144%. From 2015 to 2016, it increased by 45%.

There is a mistake in the above chart and statement. Can you spot it?

I am not comparing the same duration while calculating the percentage increase in the no. of rides. As I mentioned at the beginning, I started using Uber from April 2014. So, in 2014, I was active on Uber for 9 months. In 2015, I was active on Uber for 12 months. Again, in 2016, the data that we have is for first 4 months. So, if as an analyst I present the percentage increase as-is, I will not be showing the true picture. Instead, I should either provide a disclaimer, or look at the same duration year-over-year (i.e., Apr 2014-Mar 2015, Apr 2015- Mar 2016 and so on).

Interesting observation: I have completed more rides in the first 4 months of 2016 than I completed in 2014 and 2015 together!

What led to so many rides in 2016? Let’s do a deep-dive into the data to find the answer to this (click to enlarge image):

monthly rides taken

Once we drill down to rides per month, we notice that the majority of the rides in 2016 were completed in the first two months- that was the time when I moved to a new city and did not have my own vehicle. I got my bike in March and the number of rides completed in that month dropped. In April of this year, the number of rides that I completed was the same as the number of rides completed in April 2015.

Tips:

  1. Mention axis title and chart title in graphs to make it easy to understand.

  2. Don’t use colors randomly. Use colors and patterns to link two separate charts showing similar data cuts. In the first graph above, I used blue color for 2014, orange for 2015 and purple for 2016. In the next graph, I used the same color pattern for monthly data, so that even without looking at the axis one can realize which year each bar represents.

  3. I did not use grid lines in the first graph. Instead, I mentioned the actual data points on top of each bar. This makes the graph look clean. In the second graph, however, I used gridlines instead of data labels. This is because the second graph has a lot of data points so using data labels will make the graph cluttered. 

Cities Covered

Now, let’s look at the cities where I booked Uber:

Uber Rides Taken by City

Uber Rides by Car

A look at the cars used for these trips:

Uber Rides by Car

  1. A majority of my Uber rides have been through Uber Go, which is the cheapest option available in Uber.
  2. I need to start using Uber Pool more often to do my bit in reducing the traffic and pollution.

Payment Analysis

So, how much did I spend on Uber till now and what mode of payment did I use? Let’s look at the answers to these questions.

Payment Methods Used

Clearly, Paytm is a preferred mode of payment for me.

Another question that immediately comes to mind is, how much does one Uber ride cost me on an average?

By looking at the data in the excel for completed rides and using the Average function of excel, this value comes out to be Rs 215/-. However, if I use Median to calculate the average cost, I get Rs 138/- as the answer. Which one should I believe? We’ll find that in the next article where we will discuss different methods of calculating averages. 🙂

Share This:

Write a Comment

Comment