
Author: Bradley Shedden
Downloads:

Table Of Contents
-
Overview
-
Questions & Hypothesis
-
Data
-
Data Preparation
-
Analysis and Results
-
Conclusion
Tasks Completed
-
Extracted and cleaned job datasets from Kaggle to ensure data quality and accuracy.
-
Transformed and combined datasets using Power Query to prepare data for analysis.
-
Performed exploratory data analysis (EDA) in Excel to identify trends and patterns.
-
Created an interactive dashboard in Excel to visualise key data science job metrics, such as most common tools, salaries, industries, and regions, that is filterable by job type and state.
-
Produced an analysis report on the project outlining hypotheses, procedures, visualisations and findings.
Overview
The objective of this case study is for myself and any other people aspiring to enter the data science field to gain a broad understanding of the data science career landscape in Australia. This is so time can be best directed in study as well as getting an overall understanding of the career path. As I am personally pursuing a role in data analytics my analysis will focus on that field however, I will provide my finalised excel dashboard for people interested in data engineering and data scientist’s roles, this will help in my own and other peoples understanding of the fields of data science.
​
This will be done by analysing metrics such as most common skills required, industries, states, salaries etc. As a result of this analysis, I aim to produce a filterable interactive dashboard and compelling visualisations of data answering my questions.
​
Questions & Hypothesis:​
Question 1:
What are the most important skills for Data Analysts?
Hypothesis: I predict that Excel, SQL and a visualisation tool like Tableau or Power BI will prove to be the most popular from knowledge attained in my certificate in data analytics
​
Question 2:
What state in Australia hires the most Data Analysts?
Hypothesis: I predict that Victoria or New South Wales, due to the high population in the cities of Melbourne and Sydney
​
Question 3:
What Industry/ Sector has the highest demand for Data analysts?
Hypothesis: I predict that the government and finance sectors due to the scale of the industries and the large amount of data they collect that requires analysis to make usable.
​
Question 4:
What is the average salary for Data Analysts and how does it differ by state?
Hypothesis: I predict that the salary of a data analyst will sit in the range of around $75,000 to $95,000. This is due to the role involving specific skills and knowledge in the IT area of knowledge. In terms of differences between states I would guess that Victoria would have a higher average salary due to population density that would push up the competition and demand for analysts in the state.
​
Data​
Dataset 1: By: NADZMI
https://www.kaggle.com/datasets/nadzmiagthomas/australia-data-science-jobs/code
You can find Nadzmi’s source code at: https://github.com/nadzmi27/Multithread-Webscraping
​
Data Description:
This data was collected on August and December of 2022 by Nadzmi using Web scrapping of Glassdoor with Selenium (Python). The dataset is from Kaggle containing columns such as: title of jobs, location of jobs, company, salary estimates, ratings, skills and etc.
​
Dataset 2: By: AUSTIN NGUYEN
https://www.kaggle.com/datasets/austinnguyen00/data-jobs-in-australia
​
Data Description:
This data was collected on September of 2022 by AUSTIN NGUYEN using Web scrapping of LinkedIn with Selenium (Python). The dataset is from Kaggle containing columns such as: Job Type, State, Seniority level, Employment type, Industries and Skills
​
Explanation:
I decided to use these datasets as they have a great number of metrics that will allow many different insights to be derived from them. Some restrains of these datasets is that due to the dataset containing two collection dates there are many duplicates, there seems to be many jobs rows that seem unrelated to data science, there is a lot of unnecessary data for my use case and there are many instances of blank cells.
​
Data Preparation
Data Cleaning
The process of preparing the data for analysis was thorough cutting the original row count from 2232 down to 894 rows through the removal of unnecessary and duplicate data. Additionally, through modification and formatting the data was prepped for my specific analysis requirements.
​
Data cleaning overview:
-
Deleted unnecessary columns and rows that would not contribute to the analysis
-
Formatted columns and row names to be consistent, elementary and easier to work with
-
Imported additional dataset modifying it to match the first
-
Combined both datasets using Power Query
-
Used the Removed duplicates tool deleting a total of 880 rows
-
Deleted web scrapping formatting errors
-
Formatted columns to appropriate text, currency or number categorisation
​
Data Integration: The two datasets were integrated into one using Power query and modifying the second dataset’s structure to match the first.
​
Verification: Once finished preparing the data for analysis, I verified that data through the use of the remove duplicates tool, looking through column filters for blanks/ out of place options and using sort options to identify potential outliers that would skew the analysis.
Documented cleaning procedure
-
Deleted Job description as it was unnecessary for my analysis
-
Autofitted column and row widths with alt, H, O, A/I
-
Noticed not all the job rows fit the Data Science job description so created a new column and used the following Function to identify rows that I need:
=IF((COUNTIF(A2, "*Data*") + COUNTIF(A2, "*Analyst*") + COUNTIF(A2, "*Scien*"))>0, TRUE, FALSE)
Returning 1097 True results and 991 False results.
-
Filtered new column to false and looked over rows as well as looking over the filtered category of the Job title column to ensure nothing was missed
-
Created a backup sheet and deleted all False Rows
-
Deleted True or False column on new sheet
-
Used alt, W, F, R to freeze the top row for ease of navigating the spreadsheet
-
Deleted “Company revenue”, “Url”,”Company founded”, “Company Culture and Values”, “Company Senior Management”, “Company Friend Recommendation”, “Company CEO Approval” Columns
-
Used Find and replace to remove “_yn” in the column names of programs / skills
-
Used find and replace on Company size column to remove the word “Employees” from data moving it to the column header in brackets
-
Selected Data set and replaced blank cells with “Unkown”
-
Selected entire dataset and removed 869 duplicates using ALT, A, M leaving 228 rows
-
--------
-
Imported additional dataset
-
Autofitted column and row widths with alt, H, O, A/I
-
Added individual skills / program columns to match original dataset
-
Added country column and filled “Australia” after confirming with state and location column
-
Used =COUNTIF($J2,"*Column name*") function to replicate original skill columns format
-
Deleted row 593 due to formatting error from web scrapping
-
Removed 11 duplicates leaving 770 rows
-
Moved and renamed columns to match original dataset
-
Used power query to combine datasets
-
Moved combined data to new sheet
-
Using filter tool renamed state column data to be consistent
-
Copied over Job title column data from original dataset into job title column
-
Filtered to hide all “Data Analyst, Data Scientist and Data Engineer” roles skimmed remaining cells reformatting ones that fell into the later categories and deleted all other rows leaving 894 rows, This is because I am only analysing the 3 job types mentioned previously.
-
Using CTRL 1 formatted columns to Text, Currency and Number
-
Consolidated company sector column names between the two datasets
-
Created two new columns and used the SUM function to count the total number of instances of each skill column.
-
Found that columns Julia, mongodb, Cassandra and bigml contained no instances after cleaning and thus deleted those columns
-
Transformed data into a Table using CTRL T
-
Inserted Pivot tables to conduct analysis
​
Analysis and Results
Data Statistics
-
893 total Jobs
-
278 Data Analyst
-
453 Data Engineer
-
162 Data Scientists
-
34 Sectors
​
Insights
From the initial exploring stage of analysis, I found the following insights:
-
Majority of roles are concentrated in the south eastern states
-
Majority of roles are full time
-
Over half of all roles analysed can be found in two sectors
​
Question 1: What are the most important skills for Data Analysts?
The most reoccurring skills for data analysts to have, were SQL, Python, Excel, Tableau, R and Power BI in that order of relevance. Here are the percentage of jobs that listed the following skills:
-
30.9% SQL
-
16.3% Python
-
12.6% Excel
-
12.4% Tableau
-
11.6% R
-
6.2% Power BI
​​
​​
​
​
​
​
​
​
​​​​​
Question 2: What state in Australia hires the most Data Analysts?
New south wales hire the most data analysts with a total of 39.93% of data analyst Job postings. In close second Victoria with 38.49%
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​​​​
Question 3: What Industry/ Sector has the highest demand for Data analysts?
The industry with the highest demand for data analysts is the IT services and consulting industry at 30% of posted jobs belonging to this sector. In close second the finance industry came in at 19% with these two sectors just coming shy of 50% of all posted data analyst positions.
​​​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
Question 4: What is the average salary for Data Analysts and how does it differ by state?
The average salary for data analysts from a total of 278 jobs came to $106,916 for the whole of Australia the below is the state breakdown:
-
NSW: Average of $104,972 from a total of 111 jobs
-
VIC: Average of $96,456 from a total of 107 jobs
-
QLD: Average of $114,181 from a total of 20 jobs
-
SA: Average of $132,619 from a total of 8 jobs
-
TAS: Average of $185,000 from a total of 4 jobs
-
WA: Average of $122,247 from a total of 16 jobs
-
NT: Insufficient data
-
ACT: Insufficient data
Conclusion
​
-
SQL, Python, Excel, Tableau, R and Power BI are the most common skills needed
From this information more directed and effective study time can be spent on learning and honing skills that are in demand.
​
-
NSW and Victoria have the most data analyst positions
With this knowledge educated decision making can be made for targeting applying efforts
​
-
IT services and Finance are the hiring sectors for data analysts
Industry knowledge learning efforts can be made for a popular sector of interest.
​
-
Average salary of around $100,000
With this knowledge expected salary conversations can rely on data of past position salaries with further in depth understanding based on which state the position is in.
​
Limitations
Some limitations encountered during this project was the following:
-
limited time frame of the data, the analysis would prove more significant with more up to date data
-
Limited amount of data, with more detailed data more accurate insights could be derived.
-
No Historical data that could be used to see if any trends over time emerge.
​
Further Analysis potential
After completing this analysis project, I can see that further time and effort in analysing historical data would be interesting to see for how the trend of data analyst positions is increasing or decreasing.
Additionally, a view into more specific break downs of the top metrics holders in areas like sectors or states would be interesting to see. An approach to this maybe the breakdown of the company hiring the most data analysts from each sector or the breakdown of data analyst jobs from cities and rural locations.
​
Recommendations
After the completion of this project my recommendation for people aspiring to get into a data analytics role in Australia would be to learn the most important skills of SQL, Python, Excel, Tableau, R and Power BI as well as study industry knowledge from the IT or finance industry. By doing this you will build up the knowledge and skills that are most commonly sought after from hiring companies increasing your chances of being hired.


