0% found this document useful (0 votes)
236 views45 pages

Kaggle's State of Machine Learning and Data Science 2021

The document provides an overview of the results from a survey of over 25,000 data scientists and machine learning engineers. It focuses on the responses from those identifying as employed data scientists. It covers topics like demographics of data scientists, their education and experience levels, details about their employment including salaries and company sizes, and popular technologies used. While the field remains male-dominated, education levels are high with most having graduate degrees and ongoing learning is common. Experience levels are lower for machine learning than programming. Pay varies significantly globally with most US data scientists earning over $100,000.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
236 views45 pages

Kaggle's State of Machine Learning and Data Science 2021

The document provides an overview of the results from a survey of over 25,000 data scientists and machine learning engineers. It focuses on the responses from those identifying as employed data scientists. It covers topics like demographics of data scientists, their education and experience levels, details about their employment including salaries and company sizes, and popular technologies used. While the field remains male-dominated, education levels are high with most having graduate degrees and ongoing learning is common. Experience levels are lower for machine learning than programming. Pay varies significantly globally with most US data scientists earning over $100,000.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

State of Machine Learning

and Data Science 2021


Insights from Kaggle’s annual user survey focused
on working data scientists.

October 14, 2021


Table of Contents

Overview & Report Methodology

01 Data Scientist Profile

02 Education

03 Data Science & Machine Learning Experience

04 Employment

05 Technology

Conclusion

Kaggle | State of ML & Data Science 2021


Overview &
Methodology

Overview & Report Methodology 3 Kaggle | State of ML & Data Science 2021
This is our 5th year conducting an in-depth user
survey & publicly sharing the results.

Over 25,000 data scientists and ML engineers


submitted responses on their backgrounds and day
to day experience – everything from educational
details to salaries to preferred technologies and
techniques.

Overview & Report Methodology 4 Kaggle | State of ML & Data Science 2021
This report is focused only on a slice of the data –
the 14% of respondents who are currently
employed with the job title of “data scientist”. It’s a
follow-up analysis to a report we published last
year with the same criteria.

We organized the report into five sections: 01 Data


Scientist Profile, 02 Education, 03 Data Science &
04 Machine Learning Experience, 04
Employment, and 05 Technology.

Note: there are many other job titles that support


data science and ML workflows and also many
students and data enthusiasts who aren’t full-time,
employed data scientists. You can find their
responses in the complete 2021 survey dataset on
Kaggle. We highly encourage conducting your own
analysis and sharing with the broader community –
we’d love to see them!

Overview & Report Methodology 5 Kaggle | State of ML & Data Science 2021
You can find a detailed summary of our survey
methodology here.

Many survey questions were multiple choice with


the ability for respondents to select all options that
applied to them. For that reason, you may see
visualizations where the total percentage is more
than 100%.

Also, all monetary amounts captured in the report


are in USD.

Overview & Report Methodology 6 Kaggle | State of ML & Data Science 2021
01

Data Scientist
Profile

Data Scientist Profile 7 Kaggle | State of ML & Data Science 2021


Gender

Data science is still suffering from a large gender gap in Gender Identity of Data Scientists
the workplace, as 82% of users identify as men.

Data Scientist Profile 8 Kaggle | State of ML & Data Science 2021


Gender (cont.)

Looking over the past five years, there has been no Gender Identity of Data Scientists
meaningful change in gender distribution.

Data Scientist Profile 9 Kaggle | State of ML & Data Science 2021


Age

Data science remains a fairly young profession, with more Age ranges of data scientists
than half of all data scientists being between the ages of
22 and 34.

Data Scientist Profile 11 Kaggle | State of ML & Data Science 2021


Country

Data scientists live and work all around the globe, and Most Common Nationalities
more than 40% of survey respondents live outside of the
10 countries where we had the most respondents.

Data Scientist Profile 12 Kaggle | State of ML & Data Science 2021


Country (cont.)

Country demographics are nearly the same as last year Most Common Nationalities
with two countries having far more representation in the
Kaggle community. India makes up 24.4% of Kaggle data
scientists, while 12.2% reside in the United States. Brazil
is a distant third, at under 4.3%.

Data Scientist Profile 13 Kaggle | State of ML & Data Science 2021


02

Education

Education 14 Kaggle | State of ML & Data Science 2021


Higher Education

Graduate degrees continue to be the norm for data Education Levels of Kaggle Data Scientists
scientists, with over 62% having obtained either a
Master’s or doctoral degree. Fewer than 5% of data
scientists have no degree beyond a high school diploma.

Education 15 Kaggle | State of ML & Data Science 2021


Higher Education

Looking year-over-year, it is becoming more common to Education Levels of Data Scientists Year over Year
be employed as a data scientist without having an
advanced degree, although advanced degrees are still
the norm (~64%.

Education 16 Kaggle | State of ML & Data Science 2021


Ongoing Learning

Data science and machine learning techniques rapidly Popular Ongoing Learning Resources
progress, so it’s no surprise most of Kaggle data
scientists maintain ongoing education.

Coursera remains the most popular ongoing data science


learning resource.

Education 17 Kaggle | State of ML & Data Science 2021


Ongoing Learning (cont.)

Kaggle Learn Courses had the biggest popularity growth Most Popular Learning Platforms Year over Year
9% since last year.

Education 18 Kaggle | State of ML & Data Science 2021


03

Data Science &


Machine Learning
Experience

Data Science & Machine Learning Experience 19 Kaggle | State of ML & Data Science 2021
Programming Experience

While most Kaggle data scientists have at least a few Programming Experience for Data Scientists
years of experience under their belt, a growing share Global vs USA
have taken up programming within the last year (14.6% vs
9% in 2020.

Data Science & Machine Learning Experience 20 Kaggle | State of ML & Data Science 2021
Machine Learning Experience

Most Kaggle data scientists are newer to machine Years of Machine Learning Experience
learning than programming. Slightly more than 55% of
data scientists have less than three years experience.
Less than 6% of professional data scientists have been
using machine learning for a decade or more. As with
programming, US data scientists have more machine
learning experience than the global respondents.

Data Science & Machine Learning Experience 21 Kaggle | State of ML & Data Science 2021
04

Employment

Employment 22 Kaggle | State of ML & Data Science 2021


Pay

Companies in the United States are most likely to pay in Global Salary Distribution
the six figures, based on these survey results. Global
companies have lower salary ranges that are more evenly
distributed.

There are trends regionally, such as India, where nearly


90% make less than $50,000 USD per year.

Employment 23 Kaggle | State of ML & Data Science 2021


Pay (cont.)

Comparing salaries between our two largest countries, Salary Distribution US vs India
most US-based data scientists make over $100,000 per
year while less than 3% of India-based data scientists
make over $100,000 per year.

Employment 24 Kaggle | State of ML & Data Science 2021


Pay (cont.)

Looking at the most common salaries by country, we see Median Salary by Country
that US companies are more likely to pay higher salaries.
Companies in Germany and Japan follow, with
significantly higher salaries than the other included
regions.

Employment 25 Kaggle | State of ML & Data Science 2021


Companies Employing Data Science

Like last year, large enterprises and small startups are the Company Size of Data Science Employers
most common choices of data scientists in this survey.
Over half of employers have less than 250 employees.
Yet, one in five work at companies with over 10,000
employees.

Employment 26 Kaggle | State of ML & Data Science 2021


Data Science Teams

The sizes of data science teams didn’t meaningfully Data Science Team Size
change from last year – over half of data scientists still
work at companies with five or fewer people on the data
science team, yet one in five work on a team with 20
data scientists.

Employment 27 Kaggle | State of ML & Data Science 2021


Spending

There’s plenty of money being spent on machine learning Enterprise Spending on Cloud Computing
and cloud computing products, but not by all data Products (Global)
scientists.

There’s quite a range, with over a quarter of data


scientists claiming to have spent no money at all, while
one in 10 has spent over $100,000 USD in the last five
years.

Employment 28 Kaggle | State of ML & Data Science 2021


Spending

Data scientists from the US spend more money in the Cloud Spending by Country
cloud than their global counterparts. There are more than
two times the responses for the highest spending level in
the US compared to other countries.

Employment 29 Kaggle | State of ML & Data Science 2021


05

Technology

Technology 30 Kaggle | State of ML & Data Science 2021


Interactive Development Environments

Jupyter-based IDEs continue to be the go-to tool for data IDE Popularity
scientists, with around three-quarters of Kaggle data
scientists using it. However, Visual Studio Code is in the
second spot with 38%.

Technology 31 Kaggle | State of ML & Data Science 2021


Interactive Development Environments (cont.)

Looking year over year, VSCode is continuing its Top IDE Popularity Year Over Year
popularity gain.

Note: In the previous figure Jupyter and JupyterLab were separate choices,
whereas in this figure they were combined in order to be consistent with how
the question was structured in 2019 (and to allow for comparison with 2018
where JupyterLab was not yet an option).

Technology 32 Kaggle | State of ML & Data Science 2021


Methods & Algorithms

Like last year, the most commonly used algorithms were Methods and Algorithms Usage
linear and logistic regression, followed closely by decision
trees and random forests.

Of more complex methods, gradient boosting machines


and convolutional neural networks were the most popular
approaches.

Technology 33 Kaggle | State of ML & Data Science 2021


Methods & Algorithms (cont.)

We also saw strong year-over-year growth in the use of Popular ML Algorithms


large language models such as transformer networks
BERT, GPT3, etc).

Technology 34 Kaggle | State of ML & Data Science 2021


Machine Learning Frameworks

Python-based tools continue to dominate the machine Machine Learning Framework Usage
learning frameworks.

Like last year, Scikit-learn, a swiss army knife applicable


to most projects, is the top with over 80% of data
scientists using it. TensorFlow and Keras, notably used in
combination for deep learning, were each selected on
about half of the data scientist surveys. Gradient
boosting library xgboost is fourth, with about the same
usage as 2020 and 2019.

The most popular of the new tools added to the survey


this year is Huggingface reaching over 10%.

Technology 35 Kaggle | State of ML & Data Science 2021


Machine Learning Frameworks (cont.)

Despite being used less frequently overall, we continue to ML Framework Popularity


see strong year-over-year growth of the PyTorch
framework.

Technology 36 Kaggle | State of ML & Data Science 2021


Enterprise Cloud Computing

The three big players in cloud computing continue to be Cloud Provider Popularity
Amazon Web Services, Google Cloud Platform, and
Microsoft Azure in that order of usage.

Technology 37 Kaggle | State of ML & Data Science 2021


Enterprise Cloud Computing (cont.)

Those who use cloud services were also asked about Cloud Computing Products (AWS/GCP/Azure)
specific products in the survey. Amazon's Elastic
Compute Cloud was the most popular cloud computing
product, but Google Cloud's Compute Engine and Azure's
Virtual Machines also have strong adoption. One in four
did not name a cloud product.

Technology 39 Kaggle | State of ML & Data Science 2021


Enterprise Cloud Computing (cont.)

Likewise, Amazon's Simple Storage Service (S3 was the Data Storage Product (AWS/GCP/Azure)
most popular data storage product, but Google Cloud
Storage and Azure Data Lake Storage also have strong
adoption.

Technology 40 Kaggle | State of ML & Data Science 2021


Enterprise Machine Learning Tools

Enterprise Machine Learning Product Usage


Like last year, of enterprise ML customers, Amazon
SageMaker was by far the most popular choice. Another
exciting product is Databricks — it had similar adoption to
Azure ML Studio (~13%) and greater adoption than Google
Cloud Vertex AI (~8%).

Technology 41 Kaggle | State of ML & Data Science 2021


Enterprise Big Data

Regarding databases, there isn't a clear favorite among Database Product Popularity
data scientists. MySQL, PostgreSQL, and Microsoft SQL
Server maintained the top three spots.

Technology 42 Kaggle | State of ML & Data Science 2021


Machine Learning Experiments

Compared to last year there are more data scientists Usage of Machine Learning Experiment Tools
using tools to keep track of and manage their
experiments. TensorBoard continues to be a favorite
22.3% with MLflow following close behind (18%.

Technology 47 Kaggle | State of ML & Data Science 2021


Automated Machine Learning

Google Cloud AutoML maintained its top position in the Automated Machine Learning Framework Usage
AutoML category.

Technology 45 Kaggle | State of ML & Data Science 2021


Automated Machine Learning (cont.)

Adoption of Google Cloud's AutoML technology has Regular Usage of Google Cloud Auto ML
grown steadily over the past several years.

Technology 44 Kaggle | State of ML & Data Science 2021


Tensor Processing Units

Google Cloud's Tensor Processing Units (TPUs) also Regular Usage of TPUs
showed strong year-over-year growth.

Technology 46 Kaggle | State of ML & Data Science 2021


Conclusion
Kaggle has published the complete dataset of responses
for the community to review, and we’ll run a competition
from October 14 to November 28th, 2021 to learn even
more about data science practitioners in 2021.

Conclusion 48 Kaggle | State of ML & Data Science 2021

You might also like