BACK

Data Science Job Postings Analysis

Author: Bradley Shedden

Downloads:

Working Sheet

Analysis Report

Overview
Questions & Hypothesis
Data
Data Preparation
Analysis and Results
Conclusion

Tasks Completed

Extracted and cleaned job datasets from Kaggle to ensure data quality and accuracy.
Transformed and combined datasets using Power Query to prepare data for analysis.
Performed exploratory data analysis (EDA) in Excel to identify trends and patterns.
Created an interactive dashboard in Excel to visualise key data science job metrics, such as most common tools, salaries, industries, and regions, that is filterable by job type and state.
Produced an analysis report on the project outlining hypotheses, procedures, visualisations and findings.

Overview

The objective of this case study is for myself and any other people aspiring to enter the data science field to gain a broad understanding of the data science career landscape in Australia. This is so time can be best directed in study as well as getting an overall understanding of the career path. As I am personally pursuing a role in data analytics my analysis will focus on that field however, I will provide my finalised excel dashboard for people interested in data engineering and data scientist’s roles, this will help in my own and other peoples understanding of the fields of data science.

This will be done by analysing metrics such as most common skills required, industries, states, salaries etc. As a result of this analysis, I aim to produce a filterable interactive dashboard and compelling visualisations of data answering my questions.

Questions & Hypothesis:

Question 1:

What are the most important skills for Data Analysts?

Hypothesis: I predict that Excel, SQL and a visualisation tool like Tableau or Power BI will prove to be the most popular from knowledge attained in my certificate in data analytics

Question 2:

What state in Australia hires the most Data Analysts?

Hypothesis: I predict that Victoria or New South Wales, due to the high population in the cities of Melbourne and Sydney

Question 3:

What Industry/ Sector has the highest demand for Data analysts?

Hypothesis: I predict that the government and finance sectors due to the scale of the industries and the large amount of data they collect that requires analysis to make usable.

Question 4:

What is the average salary for Data Analysts and how does it differ by state?

Hypothesis: I predict that the salary of a data analyst will sit in the range of around $75,000 to $95,000. This is due to the role involving specific skills and knowledge in the IT area of knowledge. In terms of differences between states I would guess that Victoria would have a higher average salary due to population density that would push up the competition and demand for analysts in the state.

Data

Dataset 1: By: NADZMI

https://www.kaggle.com/datasets/nadzmiagthomas/australia-data-science-jobs/code

You can find Nadzmi’s source code at: https://github.com/nadzmi27/Multithread-Webscraping

Data Description:

This data was collected on August and December of 2022 by Nadzmi using Web scrapping of Glassdoor with Selenium (Python). The dataset is from Kaggle containing columns such as: title of jobs, location of jobs, company, salary estimates, ratings, skills and etc.

Dataset 2: By: AUSTIN NGUYEN

https://www.kaggle.com/datasets/austinnguyen00/data-jobs-in-australia

Data Description:

This data was collected on September of 2022 by AUSTIN NGUYEN using Web scrapping of LinkedIn with Selenium (Python). The dataset is from Kaggle containing columns such as: Job Type, State, Seniority level, Employment type, Industries and Skills

Explanation:

I decided to use these datasets as they have a great number of metrics that will allow many different insights to be derived from them. Some restrains of these datasets is that due to the dataset containing two collection dates there are many duplicates, there seems to be many jobs rows that seem unrelated to data science, there is a lot of unnecessary data for my use case and there are many instances of blank cells.

Data Preparation

Data Cleaning

The process of preparing the data for analysis was thorough cutting the original row count from 2232 down to 894 rows through the removal of unnecessary and duplicate data. Additionally, through modification and formatting the data was prepped for my specific analysis requirements.

Data cleaning overview:

Deleted unnecessary columns and rows that would not contribute to the analysis
Formatted columns and row names to be consistent, elementary and easier to work with
Imported additional dataset modifying it to match the first
Combined both datasets using Power Query
Used the Removed duplicates tool deleting a total of 880 rows
Deleted web scrapping formatting errors
Formatted columns to appropriate text, currency or number categorisation

Data Integration: The two datasets were integrated into one using Power query and modifying the second dataset’s structure to match the first.

Verification: Once finished preparing the data for analysis, I verified that data through the use of the remove duplicates tool, looking through column filters for blanks/ out of place options and using sort options to identify potential outliers that would skew the analysis.

Documented cleaning procedure

Deleted Job description as it was unnecessary for my analysis
Autofitted column and row widths with alt, H, O, A/I
Noticed not all the job rows fit the Data Science job description so created a new column and used the following Function to identify rows that I need:

=IF((COUNTIF(A2, "*Data*") + COUNTIF(A2, "*Analyst*") + COUNTIF(A2, "*Scien*"))>0, TRUE, FALSE)

Returning 1097 True results and 991 False results.

Filtered new column to false and looked over rows as well as looking over the filtered category of the Job title column to ensure nothing was missed
Created a backup sheet and deleted all False Rows
Deleted True or False column on new sheet
Used alt, W, F, R to freeze the top row for ease of navigating the spreadsheet
Deleted “Company revenue”, “Url”,”Company founded”, “Company Culture and Values”, “Company Senior Management”, “Company Friend Recommendation”, “Company CEO Approval” Columns
Used Find and replace to remove “_yn” in the column names of programs / skills
Used find and replace on Company size column to remove the word “Employees” from data moving it to the column header in brackets
Selected Data set and replaced blank cells with “Unkown”
Selected entire dataset and removed 869 duplicates using ALT, A, M leaving 228 rows
--------
Imported additional dataset
Autofitted column and row widths with alt, H, O, A/I
Added individual skills / program columns to match original dataset
Added country column and filled “Australia” after confirming with state and location column
Used =COUNTIF($J2,"*Column name*") function to replicate original skill columns format
Deleted row 593 due to formatting error from web scrapping
Removed 11 duplicates leaving 770 rows
Moved and renamed columns to match original dataset
Used power query to combine datasets
Moved combined data to new sheet
Using filter tool renamed state column data to be consistent
Copied over Job title column data from original dataset into job title column
Filtered to hide all “Data Analyst, Data Scientist and Data Engineer” roles skimmed remaining cells reformatting ones that fell into the later categories and deleted all other rows leaving 894 rows, This is because I am only analysing the 3 job types mentioned previously.
Using CTRL 1 formatted columns to Text, Currency and Number
Consolidated company sector column names between the two datasets
Created two new columns and used the SUM function to count the total number of instances of each skill column.
Found that columns Julia, mongodb, Cassandra and bigml contained no instances after cleaning and thus deleted those columns
Transformed data into a Table using CTRL T
Inserted Pivot tables to conduct analysis

Analysis and Results

Data Statistics

893 total Jobs
278 Data Analyst
453 Data Engineer
162 Data Scientists
34 Sectors

Insights

From the initial exploring stage of analysis, I found the following insights:

Majority of roles are concentrated in the south eastern states
Majority of roles are full time
Over half of all roles analysed can be found in two sectors

Question 1: What are the most important skills for Data Analysts?

The most reoccurring skills for data analysts to have, were SQL, Python, Excel, Tableau, R and Power BI in that order of relevance. Here are the percentage of jobs that listed the following skills:

30.9% SQL
16.3% Python
12.6% Excel
12.4% Tableau
11.6% R
6.2% Power BI

Question 2: What state in Australia hires the most Data Analysts?

New south wales hire the most data analysts with a total of 39.93% of data analyst Job postings. In close second Victoria with 38.49%

Question 3: What Industry/ Sector has the highest demand for Data analysts?

The industry with the highest demand for data analysts is the IT services and consulting industry at 30% of posted jobs belonging to this sector. In close second the finance industry came in at 19% with these two sectors just coming shy of 50% of all posted data analyst positions.

Question 4: What is the average salary for Data Analysts and how does it differ by state?

The average salary for data analysts from a total of 278 jobs came to $106,916 for the whole of Australia the below is the state breakdown:

NSW: Average of $104,972 from a total of 111 jobs
VIC: Average of $96,456 from a total of 107 jobs
QLD: Average of $114,181 from a total of 20 jobs
SA: Average of $132,619 from a total of 8 jobs
TAS: Average of $185,000 from a total of 4 jobs
WA: Average of $122,247 from a total of 16 jobs
NT: Insufficient data
ACT: Insufficient data

Conclusion

SQL, Python, Excel, Tableau, R and Power BI are the most common skills needed

From this information more directed and effective study time can be spent on learning and honing skills that are in demand.

NSW and Victoria have the most data analyst positions

With this knowledge educated decision making can be made for targeting applying efforts

IT services and Finance are the hiring sectors for data analysts

Industry knowledge learning efforts can be made for a popular sector of interest.

Average salary of around $100,000

With this knowledge expected salary conversations can rely on data of past position salaries with further in depth understanding based on which state the position is in.

Limitations

Some limitations encountered during this project was the following:

limited time frame of the data, the analysis would prove more significant with more up to date data
Limited amount of data, with more detailed data more accurate insights could be derived.
No Historical data that could be used to see if any trends over time emerge.

Further Analysis potential

After completing this analysis project, I can see that further time and effort in analysing historical data would be interesting to see for how the trend of data analyst positions is increasing or decreasing.

Additionally, a view into more specific break downs of the top metrics holders in areas like sectors or states would be interesting to see. An approach to this maybe the breakdown of the company hiring the most data analysts from each sector or the breakdown of data analyst jobs from cities and rural locations.

Recommendations

After the completion of this project my recommendation for people aspiring to get into a data analytics role in Australia would be to learn the most important skills of SQL, Python, Excel, Tableau, R and Power BI as well as study industry knowledge from the IT or finance industry. By doing this you will build up the knowledge and skills that are most commonly sought after from hiring companies increasing your chances of being hired.