Kaggle's State of Machine Learning and Data Science 2021
Kaggle's State of Machine Learning and Data Science 2021
02 Education
04 Employment
05 Technology
Conclusion
Overview & Report Methodology 3 Kaggle | State of ML & Data Science 2021
This is our 5th year conducting an in-depth user
survey & publicly sharing the results.
Overview & Report Methodology 4 Kaggle | State of ML & Data Science 2021
This report is focused only on a slice of the data –
the 14% of respondents who are currently
employed with the job title of “data scientist”. It’s a
follow-up analysis to a report we published last
year with the same criteria.
Overview & Report Methodology 5 Kaggle | State of ML & Data Science 2021
You can find a detailed summary of our survey
methodology here.
Overview & Report Methodology 6 Kaggle | State of ML & Data Science 2021
01
Data Scientist
Profile
Data science is still suffering from a large gender gap in Gender Identity of Data Scientists
the workplace, as 82% of users identify as men.
Looking over the past five years, there has been no Gender Identity of Data Scientists
meaningful change in gender distribution.
Data science remains a fairly young profession, with more Age ranges of data scientists
than half of all data scientists being between the ages of
22 and 34.
Data scientists live and work all around the globe, and Most Common Nationalities
more than 40% of survey respondents live outside of the
10 countries where we had the most respondents.
Country demographics are nearly the same as last year Most Common Nationalities
with two countries having far more representation in the
Kaggle community. India makes up 24.4% of Kaggle data
scientists, while 12.2% reside in the United States. Brazil
is a distant third, at under 4.3%.
Education
Graduate degrees continue to be the norm for data Education Levels of Kaggle Data Scientists
scientists, with over 62% having obtained either a
Master’s or doctoral degree. Fewer than 5% of data
scientists have no degree beyond a high school diploma.
Looking year-over-year, it is becoming more common to Education Levels of Data Scientists Year over Year
be employed as a data scientist without having an
advanced degree, although advanced degrees are still
the norm (~64%.
Data science and machine learning techniques rapidly Popular Ongoing Learning Resources
progress, so it’s no surprise most of Kaggle data
scientists maintain ongoing education.
Kaggle Learn Courses had the biggest popularity growth Most Popular Learning Platforms Year over Year
9% since last year.
Data Science & Machine Learning Experience 19 Kaggle | State of ML & Data Science 2021
Programming Experience
While most Kaggle data scientists have at least a few Programming Experience for Data Scientists
years of experience under their belt, a growing share Global vs USA
have taken up programming within the last year (14.6% vs
9% in 2020.
Data Science & Machine Learning Experience 20 Kaggle | State of ML & Data Science 2021
Machine Learning Experience
Most Kaggle data scientists are newer to machine Years of Machine Learning Experience
learning than programming. Slightly more than 55% of
data scientists have less than three years experience.
Less than 6% of professional data scientists have been
using machine learning for a decade or more. As with
programming, US data scientists have more machine
learning experience than the global respondents.
Data Science & Machine Learning Experience 21 Kaggle | State of ML & Data Science 2021
04
Employment
Companies in the United States are most likely to pay in Global Salary Distribution
the six figures, based on these survey results. Global
companies have lower salary ranges that are more evenly
distributed.
Comparing salaries between our two largest countries, Salary Distribution US vs India
most US-based data scientists make over $100,000 per
year while less than 3% of India-based data scientists
make over $100,000 per year.
Looking at the most common salaries by country, we see Median Salary by Country
that US companies are more likely to pay higher salaries.
Companies in Germany and Japan follow, with
significantly higher salaries than the other included
regions.
Like last year, large enterprises and small startups are the Company Size of Data Science Employers
most common choices of data scientists in this survey.
Over half of employers have less than 250 employees.
Yet, one in five work at companies with over 10,000
employees.
The sizes of data science teams didn’t meaningfully Data Science Team Size
change from last year – over half of data scientists still
work at companies with five or fewer people on the data
science team, yet one in five work on a team with 20
data scientists.
There’s plenty of money being spent on machine learning Enterprise Spending on Cloud Computing
and cloud computing products, but not by all data Products (Global)
scientists.
Data scientists from the US spend more money in the Cloud Spending by Country
cloud than their global counterparts. There are more than
two times the responses for the highest spending level in
the US compared to other countries.
Technology
Jupyter-based IDEs continue to be the go-to tool for data IDE Popularity
scientists, with around three-quarters of Kaggle data
scientists using it. However, Visual Studio Code is in the
second spot with 38%.
Looking year over year, VSCode is continuing its Top IDE Popularity Year Over Year
popularity gain.
Note: In the previous figure Jupyter and JupyterLab were separate choices,
whereas in this figure they were combined in order to be consistent with how
the question was structured in 2019 (and to allow for comparison with 2018
where JupyterLab was not yet an option).
Like last year, the most commonly used algorithms were Methods and Algorithms Usage
linear and logistic regression, followed closely by decision
trees and random forests.
Python-based tools continue to dominate the machine Machine Learning Framework Usage
learning frameworks.
The three big players in cloud computing continue to be Cloud Provider Popularity
Amazon Web Services, Google Cloud Platform, and
Microsoft Azure in that order of usage.
Those who use cloud services were also asked about Cloud Computing Products (AWS/GCP/Azure)
specific products in the survey. Amazon's Elastic
Compute Cloud was the most popular cloud computing
product, but Google Cloud's Compute Engine and Azure's
Virtual Machines also have strong adoption. One in four
did not name a cloud product.
Likewise, Amazon's Simple Storage Service (S3 was the Data Storage Product (AWS/GCP/Azure)
most popular data storage product, but Google Cloud
Storage and Azure Data Lake Storage also have strong
adoption.
Regarding databases, there isn't a clear favorite among Database Product Popularity
data scientists. MySQL, PostgreSQL, and Microsoft SQL
Server maintained the top three spots.
Compared to last year there are more data scientists Usage of Machine Learning Experiment Tools
using tools to keep track of and manage their
experiments. TensorBoard continues to be a favorite
22.3% with MLflow following close behind (18%.
Google Cloud AutoML maintained its top position in the Automated Machine Learning Framework Usage
AutoML category.
Adoption of Google Cloud's AutoML technology has Regular Usage of Google Cloud Auto ML
grown steadily over the past several years.
Google Cloud's Tensor Processing Units (TPUs) also Regular Usage of TPUs
showed strong year-over-year growth.