Analysis of Delays in the Airline Industry
Note 1: What you will learn? This assignment provides you with a real-world operational dataset
and guides you step-by-step to find where the problems are and how to improve
performance. You will gain experience with data analysis and data visualization
which are highly in demand in today’s job market. If you know how to answer
questions from data, you are a hot commodity!
Note 2: Knowing
how to learn quickly is key. Learn Tableau! This assignment is based
on learning! You will need to learn new things on your own. The best education is
one that teaches you how to learn new things rather than giving you specific
answers. No one clearly knows what skills will be the most important ten years
from now, so much of what you learn now can become obsolete. But, those who
know how to learn new things quickly will be the most successful.
For the same reason, I recommend you learn
Tableau for doing this assignment rather than solely relying on Excel.
If you do the assignment with
Tableau (or combination of Tableau and Excel) it will be worth double, i.e. 10%
of the final grade or a full letter grade.
If you do it solely with Excel,
it will be worth 5% of the final grade, i.e. half a letter grade.
I will share resources for learning
Tableau. Depending on time, I will also provide some tutorials for it in class.
Tableau is one of the hottest software packages for data analysis and I can
guarantee this will be one of highlights of your resume. However, if you can’t
do Tableau, all questions can also be answered with Excel.
Note 3: Seek help. If do not have time to do all questions, do as many as you can. Or
you can come to my office hours and we will do some questions together. My main
goal is that you learn a new skill; so ask for help any time and as much as you
You just got hired as a business analyst at
Delta Airlines. Your manager gives you your first big assignment: Analysis of Delays
in the Airline Industry.
Delta has recently started a new initiative
to improve its on-time performance and reduce flight delays. Your manager asks
you to analyze flight data to help her understand the major sources of delay
and the most problematic areas. She wants to gain an overall picture of the
airline industry and how Delta, ATL airport and the State of Georgia compares
with others. She has limited resources so asks you to figure out what factor
causes the most delays. A delayed flight is defined as one that arrives later
December has historically been one of the
toughest months for airlines, so your manager gives you a data from December
2019. The dataset includes every flight performed by Delta’s rivals in December
and their delay information. Here is the link to the dataset: https://www.dropbox.com/s/yohfkqgwvqdstz1/Flights%20Dec%202019.xlsx?dl=0
Now buckle up for some serious data
Understand the dataset.
In the attached
excel sheet, you will find various sheet that include the actual dataset, data
definitions, and links for additional information. Read all of the sheets and
make sure you understand what each variable is.
Get a general sense of the
dataset. Report the following:
carriers/airports/flights are there in the dataset?
What percentage of all
flights are delayed?
You first need
to create a new binary variable called Delayed (1 = delayed arrival, 0=
no delayed arrival) and then compare the total number of flights with the total
number of delayed flights.
What is the average amount
of arrival delay in the industry?
Who are the busiest?
Provide two bar charts: the count of departures per state and per airline.
What are the ranks of Georgia and
Where most problems are.
From the previous analysis
you find that frequency of flights vastly differ across airlines and airports.
So simply counting the delays does not reveal worst performers. In other words,
an airline/airport with a large number of flights will also have a large number
of delayed flights, but does it mean it is managed badly? Instead of total
number of delays, we focus on the percentage of delayed flights which is a
better metric for operational performance.
Provide two bar charts: the percentage of delayed flights per airport and per
airline. Sort descending. Only keep the top 20 airports.
What are the ranks of ATL and
Now let’s focus on the sources
of delays. First you notice that a vast majority of flights are not delayed. To
make the analysis easier, only keep the delayed flights (in Tableau you can
simply filter for the delayed flights). You have data on five sources of
delays. Make sure you understand what each source means (definitions available
in the excel file). Do the following to figure out, how much each source
Provide a pie chart that includes the five categories (the size of the pie should be
proportional to the total delay duration that each source caused.) What is the
top source of delay overall?
Repeat the same pie chart but this time only for Delta Airlines. What is the top source of
delay for Delta?
Any other analyses?
There is an
unlimited number of additional analyses you can do. Be creative, ask a question
yourself and draw a graph that answers it. Some suggestions might be looking at
the flight time (e.g., da of month, day of week) and how it related to the
chance of delay.
Based on your analysis, write a
recommendation to your manager and explain where should she focus her attention
regarding the delays (one paragraph at least).
Your submission must be a word document with
numbered answers (written and graphical) to questions 2-7 and all
sub-questions. Example format (fictional answers):
5 carriers. 10 airports. 1000
a. [Bar chart of the count of departures per state]
[Bar chart of the count of
departures per airline]
b. Georgia ranks number 3 among all states. Delta ranks number one
among all airlines.