
Author: Bradley Shedden
Downloads:
Table Of Contents
-
Overview
-
Data Preparation
-
Exploratory Data Analysis
-
Conclusion
Tasks Completed
-
Created a new schema and Imported the World Layoffs dataset into MySQL
-
Investigated and cleaned the World Layoffs dataset
-
Performed exploratory data analysis (EDA) in MySQL to identify trends and patterns.
-
Utilized the following SQL expressions: Aggregate functions, Logical statements (AND/OR's), CTE's, Data types, Joins, Window Functions (DENSE RANK & ROW NUMBER) & Subqueries
Overview
In this project, I have taken raw data on world layoffs, cleaned the data in MySQL, and conducted exploratory data analysis to better understand the dataset.
The dataset for this project is data containing world layoffs from the 3 year period of March 2020 to March 2023 right in the midst of Covid-19. This dataset is interesting because not only does it showcase the effect COVID-19 had on layoffs but it also showcases the companies, industries and trends for job security during a pandemic like what we saw from Covid-19. ​​​​
Data Preparation
Data Integration:
To start working on the raw dataset I created a new schema and imported the data into MySQL using the import wizard. I then created a duplicate table called layoffs_staging so I am not working with the raw dataset, this is insurance against mistakes or program errors as if any are made I still have the original unchanged data to refer back to.
​
Data Cleaning
The process of preparing the data for analysis was done in 4 main stages with those being: Removing duplicates, standardising the data, addressing null and blank values, and finally removing any unnecessary data.​
​​​​
Documented cleaning procedure
Removing duplicates
• Wrote a query using a CTE partitioning on chosen columns to identify duplicates.
• Double checked results to ensure the returned columns were true duplicates finding that not all were.
• I found that this was because I didn’t partition by enough columns, with this in mind I revised my original query and tested again.
• Created a new table called layoff_staging2 with row numbers so duplicates can be deleted. This was done by right clicking on the layoffs_staging table selecting ‘copy to clipboard’ then ‘create statement’ after pasting the create statement I added 1 additional row called row_num with the integer datatype
​
Standardising the Data
• Searched the dataset for incorrect spacings finding some in the company column.
• Updated and set the company column using the TRIM function fixing incorrect spacing.
• Searched dataset for multiple instances of the same category through spelling errors, capitalisation, and formatting errors.
• Found 3 instances of a repeated category of 'crypto currency' in the industry column, I searched further confirming how the instances were presented, and standardised all the variations into one format.
• In the country column I found 1 miss-spelling of ‘United States’ that had a full stop and fixed the mistake by removing it.
• Looked over all column data types and found that the ‘date’ column was a text data type instead of a date data type. I first reformatted the date column to a proper date format and then changed the 'date' column from a text to a date data type using MODIFY COLUMN.
​
Addressing Null and Blank Values
I found many NULL and blank values In the dataset. I can see that they are mainly found in the ‘total_laid_off’ and the ‘Percentage_laid_off’ columns with some others being present in other columns as well.
​
I am not willing to indiscriminately remove all Null and blank values as although some columns contain these Null/ Blank values the other columns still contain useful data. With this in mind, I will delete rows where a NULL and Blank value can be found in both the ‘total_laid_off’ and the ‘Percentage_laid_off’ columns as this data doesn't contribute toward this analysis without those fields occupied. For the other columns, I will investigate if they can be filled with pre-existing data in the dataset.
​
• Found Null/ Blank values in the industry column and updated them with pre-existing data from the same company
• Utilised the AND function to identify and delete rows that contain NULLS in both the ‘total_laid_off’ and the ‘Percentage_laid_off’ columns
​
Removing Unnecessary Data
• Used the DROP COLUMN function to remove the row_num column I added to address duplicate data.
​
​
Exploratory Data Analysis
Data Statistics
-
383,659 Total layoffs worldwide from March 2020 to March 2023
-
475 Total Companies
-
The average layoff rate was 26% of a company's workforce​
​
Findings​​​
-
Queried: What was the first and last instance of layoffs in the dataset?
-
Result: The dataset spanned a total of just 5 days shy of 3 years from March 2020 to March 2023.
​
-
Queried: What was the maximum amount of people laid off and the maximum percentage laid off?
-
Result: The maximum amount of people laid off in a single day was 12,000 people and the maximum percentage was 1 meaning that 100% of the company's staff was laid off.
​​
-
Queried: What was the total amount of companies that laid off all staff and what specific companies they were?
-
Result: A total of 116 companies laid every staff member off in a span of 3 years.
​​
-
Queried: What companies laid off the most people and how many people in total have they laid off?
-
Result: The top 5 were:
-
18,150 from Amazon
-
12,000 from Google
-
11,000 from Meta
-
10,090 from Salesforce
-
10,000 from Microsoft
-
​​
-
Queried: Using multiple CTE’s and DENSE RANK I searched to find the top 5 companies by the amount of layoffs for each year.​
​​​​​​​​​​
-
Queried: What industries have laid off the most people and how many in total have been laid off from those industries?
-
Result: The top 5 were:
-
45,182 people from the Consumer sector
-
43,613 people from Retail
-
33,748 people from transportation
-
28,344 people from finance
-
25,953 people from healthcare
-
​​​
-
Queried: What industries had the least layoffs?
-
Result: The top 5 were :
-
Manufacturing with 20
-
Fin-Tech with 215
-
Aerospace with 661
-
Energy with 802
-
Legal with 836.
-
​​​
-
Queried: What country laid off the most people?
-
Result: The top 5 were:
-
The United States
-
India
-
The Netherlands
-
Sweden
-
Brazil
-
​​​
-
Queried: How were the layoffs distributed throughout the 3 years?
-
Result:
-
80,998 got laid off in 2020
-
15,823 got laid off in 2021
-
160,661 got laid off in 2022
-
and 125,677 got laid off in 2023
-
​
(A note-worthy thing to keep in mind is that the total for 2023 is 125,677 however the database only has 3 months of data meaning that 2023 is on track to overtake 2022’s number of layoffs.)​​
​
-
Queried: What months did the most layoffs happen?
-
Result: November was the highest month with the second highest being January. It seems that most layoffs are centered around the new year, however, December is way down at the 11th spot perhaps because of Christmas?
​​
-
Queried: Created a rolling total of the total amount of people laid off by month from March 2020 to March 2023 to see how the layoffs trended over time. I then quickly imported the result into Excel to visualise the data.​​​​​
-
Result: From this visualisation, we can see a gradual cumulative increase of layoffs throughout 2020 until around May 2022 where the amount of layoffs peaks up dramatically.
​
​
​
​
​
​
​
​
​
​
​
​
​
​​​​
​
​End of Exploratory Data Analysis
