0% found this document useful (0 votes)

191 views7 pages

Data Science Dictionary

The document is a comprehensive data science dictionary that defines key terms and concepts in the field, such as algorithms, artificial intelligence, big data, and machine learning. Each entry includes a definition, an example, and a translation in Portuguese. It serves as a reference for understanding fundamental data science terminology.

Uploaded by

Victor Sousa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

191 views7 pages

Data Science Dictionary

Uploaded by

Victor Sousa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Data Science Dictionary

 Algorithm
Definition: A defined set of instructions or steps used to perform calculations or solve problems in
computing. In data science, algorithms process data to uncover patterns or make predictions.
Example: The team designed an algorithm to identify fraudulent transactions in real-time.
Translation: Algoritmo.

 Artificial Intelligence (AI)

Definition: A branch of computer science that aims to simulate human-like intelligence in machines,
allowing them to reason, learn, and make decisions autonomously.
Example: AI is increasingly being used to power chatbots that handle customer inquiries efficiently.
Translation: Inteligência Artificial.

 Big Data
Definition: Extremely large datasets that cannot be processed using traditional data-processing
techniques. Big Data is often analyzed to reveal patterns, trends, and associations.
Example: The retail company used big data analytics to understand shopping habits across multiple
regions.
Translation: Big Data (grandes volumes de dados).

 Bias
Definition: A systematic error introduced into data analysis or machine learning models, leading to
skewed results or conclusions. Bias can arise from data collection methods or model design.
Example: The model's predictions were inaccurate due to bias in the training data, which
overrepresented one demographic group.
Translation: Viés.

 Clustering
Definition: A type of unsupervised machine learning where the goal is to group data points into
clusters based on their similarities. It helps identify natural patterns in the data.
Example: Clustering was used to segment customers based on purchasing behavior, enabling more
targeted marketing strategies.
Translation: Agrupamento.

 Correlation
Definition: A statistical measure that describes the strength and direction of a relationship between
two variables. It indicates how one variable changes as the other does.
Example: A positive correlation was found between daily screen time and reduced sleep hours
among teenagers.
Translation: Correlação.
D

 Data Cleaning
Definition: The process of identifying and rectifying errors or inconsistencies in a dataset to ensure
its accuracy and completeness. This is a critical step before analysis can take place.
Example: Data cleaning removed duplicate entries and filled in missing values, making the dataset
ready for analysis.
Translation: Limpeza de dados.

 Data Visualization
Definition: The graphical representation of data, typically using charts, graphs, and plots, to make
complex data more accessible and understandable.
Example: The data scientist created an interactive dashboard that visualized trends in sales over the
last quarter.
Translation: Visualização de dados.

 Exploratory Data Analysis (EDA)

Definition: The process of analyzing data sets to summarize their main characteristics, often using
visual methods like histograms, box plots, and scatter plots. EDA is crucial for identifying patterns,
outliers, and data structure.
Example: EDA showed that the data had several outliers, which required further investigation.
Translation: Análise exploratória de dados.

 Feature Engineering
Definition: The process of transforming raw data into meaningful features that can improve the
performance of machine learning algorithms. It often involves techniques like scaling, encoding, and
creating new variables.
Example: Feature engineering involved creating interaction terms between product price and
customer age to enhance the model’s predictive power.
Translation: Engenharia de características.

 Forecasting
Definition: The practice of predicting future values based on historical data using statistical models
and machine learning techniques.
Example: Time series forecasting was used to predict next quarter’s sales based on historical trends.
Translation: Previsão.

 Gradient Descent
Definition: An optimization algorithm used in machine learning to minimize a loss function by
iteratively adjusting the model's parameters. It is commonly used in training deep learning models.
Example: The model’s performance improved as gradient descent minimized the error between
predicted and actual values.
Translation: Descenso de gradiente.
 Graph
Definition: A data structure made up of nodes (vertices) and edges (connections) that represents
relationships among entities. It is widely used to model networks such as social media or
transportation systems.
Example: The social network analysis used graphs to find the most influential nodes within a
community.
Translation: Grafo.

 Hypothesis Testing
Definition: A statistical method used to evaluate assumptions about a population based on sample
data. It involves testing a null hypothesis against an alternative hypothesis.
Example: Hypothesis testing showed that the new marketing campaign significantly increased
customer engagement.
Translation: Teste de hipótese.

 Hyperparameter
Definition: Parameters set before the learning process begins that influence the training process,
such as learning rate, number of trees, or regularization strength.
Example: The model's accuracy improved after tuning the hyperparameters, including the learning
rate and batch size.
Translation: Hiperparâmetro.

 Imputation
Definition: The process of replacing missing values in a dataset with substituted values, often using
statistical techniques such as mean, median, or predictive models.
Example: Imputation helped fill in missing data for age and income, which was crucial for the
analysis.
Translation: Imputação.

 Inferential Statistics
Definition: A branch of statistics that allows for making predictions or generalizations about a
population based on a sample. It involves techniques like hypothesis testing and confidence intervals.
Example: Inferential statistics showed that the average annual salary in the region was significantly
higher than the national average.
Translation: Estatística inferencial.

 JSON (JavaScript Object Notation)

Definition: A lightweight, human-readable data format used to represent structured data objects. It is
commonly used in APIs to transmit data between servers and clients.
Example: The API returned data in JSON format, which was then parsed and analyzed for further
insights.
Translation: JSON (Notação de Objetos JavaScript).
 Joint Distribution
Definition: A probability distribution that models the relationship between two or more random
variables. It shows how likely different combinations of values for the variables are.
Example: The joint distribution of age and income helped identify purchasing patterns among
different demographic groups.
Translation: Distribuição conjunta.

 K-Means
Definition: A popular unsupervised machine learning algorithm used to partition data into k distinct
clusters based on feature similarity.
Example: K-Means clustering divided the customer base into five distinct segments, each with
unique characteristics.
Translation: K-Means.

 Kernel
Definition: A mathematical function used in machine learning, particularly in Support Vector
Machines (SVMs), to map data into a higher-dimensional space to make it easier to separate.
Example: The use of the Gaussian kernel allowed the SVM model to successfully classify non-linear
data.
Translation: Kernel.

 Linear Regression
Definition: A statistical method for modeling the relationship between a dependent variable and one
or more independent variables by fitting a linear equation to observed data.
Example: Linear regression was used to predict house prices based on size, location, and amenities.
Translation: Regressão linear.

 Logistic Regression
Definition: A statistical model used to predict a binary outcome (e.g., yes/no, success/failure) based
on one or more independent variables.
Example: Logistic regression was employed to classify emails as either spam or non-spam.
Translation: Regressão logística.

 Machine Learning (ML)

Definition: A subset of artificial intelligence that enables systems to automatically improve their
performance with experience and data, without being explicitly programmed.
Example: Machine learning algorithms were used to predict credit card fraud by analyzing
transaction patterns.
Translation: Aprendizado de máquina.
 Model
Definition: A mathematical representation of a real-world process that is used to make predictions or
inform decisions based on input data.
Example: The predictive model forecasted customer churn based on their usage patterns and
demographic data.
Translation: Modelo.

 Neural Network
Definition: A computational model inspired by the structure of the human brain, consisting of
interconnected nodes (neurons) that process data for tasks like classification and regression.
Example: The deep neural network achieved high accuracy in recognizing handwritten digits.
Translation: Rede neural.

 Normalization
Definition: The process of adjusting the range of numerical data so that it fits within a specific scale,
often to improve the performance of machine learning algorithms. Normalization typically rescales
data into a range between 0 and 1, or -1 and 1.
Example: Before training the model, the data was normalized to ensure that the features with larger
numerical ranges did not dominate the results.
Translation: Normalização.

Azure Data Engineer Interview Questions and Answers
No ratings yet
Azure Data Engineer Interview Questions and Answers
7 pages
(Chapman & Hall - CRC Computer Science & Data Analysis) Faraway, Julian James - Wang, Xiaofeng - Yue, Yu - Bayesian Regression Modeling With Inla-CRC Press (2018)
No ratings yet
(Chapman & Hall - CRC Computer Science & Data Analysis) Faraway, Julian James - Wang, Xiaofeng - Yue, Yu - Bayesian Regression Modeling With Inla-CRC Press (2018)
325 pages
Unit I data analytics
No ratings yet
Unit I data analytics
46 pages
The Ultimate Glossary
No ratings yet
The Ultimate Glossary
9 pages
Data Science Vs Machine Learning Vs Deep Learning: The Difference
No ratings yet
Data Science Vs Machine Learning Vs Deep Learning: The Difference
19 pages
Unit 3
No ratings yet
Unit 3
97 pages
BIA_notes
No ratings yet
BIA_notes
10 pages
Machine learning QB
No ratings yet
Machine learning QB
15 pages
Practitioner's Guide To Data Science
No ratings yet
Practitioner's Guide To Data Science
403 pages
Data Science Sub Diciplines
No ratings yet
Data Science Sub Diciplines
7 pages
COMP1801 - Copy 1
No ratings yet
COMP1801 - Copy 1
18 pages
Book Summary
No ratings yet
Book Summary
35 pages
Ids PDF
No ratings yet
Ids PDF
397 pages
AIML MODEL
No ratings yet
AIML MODEL
13 pages
Fd45092a Ccad 459e Bc18 b01536fd6bac Untitled
No ratings yet
Fd45092a Ccad 459e Bc18 b01536fd6bac Untitled
53 pages
Glossary of Problem & Approach
No ratings yet
Glossary of Problem & Approach
3 pages
ML Unit 3
No ratings yet
ML Unit 3
10 pages
PREDECTIVE ANALYTICS
No ratings yet
PREDECTIVE ANALYTICS
11 pages
big-data-imp-notes-of-big-dats (1)
No ratings yet
big-data-imp-notes-of-big-dats (1)
17 pages
Data Analytics PDF
0% (1)
Data Analytics PDF
6 pages
DA (All CHP.)
No ratings yet
DA (All CHP.)
14 pages
abhijitya_midsem
No ratings yet
abhijitya_midsem
6 pages
A Comprehensive Guide To Machine Learning
No ratings yet
A Comprehensive Guide To Machine Learning
152 pages
Crash Course_Introduction to Data Science
No ratings yet
Crash Course_Introduction to Data Science
121 pages
DMML Assignment
No ratings yet
DMML Assignment
3 pages
Terms in DS
No ratings yet
Terms in DS
6 pages
Data Science Notes Structured FINAL v2
No ratings yet
Data Science Notes Structured FINAL v2
9 pages
Chapter 01 2
No ratings yet
Chapter 01 2
19 pages
BIG DATA PART-I
No ratings yet
BIG DATA PART-I
15 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
32 pages
Week 12 Intro to DS and ML
No ratings yet
Week 12 Intro to DS and ML
67 pages
MachineLearning Presentation
No ratings yet
MachineLearning Presentation
71 pages
Summary DS231
No ratings yet
Summary DS231
11 pages
Machine Learning Introduction
100% (1)
Machine Learning Introduction
20 pages
Module 1
No ratings yet
Module 1
138 pages
Data Science Theory: Analysis and Analytics
No ratings yet
Data Science Theory: Analysis and Analytics
14 pages
m is Business Intelligence Big Data a Nay Tics
No ratings yet
m is Business Intelligence Big Data a Nay Tics
7 pages
Report Print
No ratings yet
Report Print
22 pages
Presentation 2
No ratings yet
Presentation 2
9 pages
ML IMP QUES 1
No ratings yet
ML IMP QUES 1
22 pages
Internship Report: T.J.Instituteoftechnology
No ratings yet
Internship Report: T.J.Instituteoftechnology
29 pages
Basics of Data Science KPK
No ratings yet
Basics of Data Science KPK
38 pages
A to Z of Machine Learning by Rashvandh
No ratings yet
A to Z of Machine Learning by Rashvandh
34 pages
Ds unit 1 notes
No ratings yet
Ds unit 1 notes
23 pages
MSE-merged
No ratings yet
MSE-merged
78 pages
MODULE 1 [ML][1]
No ratings yet
MODULE 1 [ML][1]
17 pages
001-2023-0714 DLBDSIDS01 Course Book
No ratings yet
001-2023-0714 DLBDSIDS01 Course Book
90 pages
Ass 2
No ratings yet
Ass 2
6 pages
Mca Format Crime Prediction
No ratings yet
Mca Format Crime Prediction
62 pages
Data Science
No ratings yet
Data Science
132 pages
Machine Learning
No ratings yet
Machine Learning
7 pages
Data Science.pptx
No ratings yet
Data Science.pptx
25 pages
DS Module 1
No ratings yet
DS Module 1
112 pages
Data Scientist RoadMap
No ratings yet
Data Scientist RoadMap
8 pages
Data Science Syllabus From Beginner to Advanced
No ratings yet
Data Science Syllabus From Beginner to Advanced
7 pages
ds final
No ratings yet
ds final
3 pages
1 Introduction
No ratings yet
1 Introduction
30 pages
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
From Everand
IT Specialist: Data Analytics Certification Prep - 500 Exam Questions and Explanations
Steve Brown
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Ontologies Engineering
No ratings yet
Ontologies Engineering
71 pages
Beginners_Guide_AI_and_ML
No ratings yet
Beginners_Guide_AI_and_ML
4 pages
Bca Vi May2018 Data Mining and Data Warehousing
No ratings yet
Bca Vi May2018 Data Mining and Data Warehousing
2 pages
SSRN 4469572
No ratings yet
SSRN 4469572
32 pages
Designing A Tourist App System
No ratings yet
Designing A Tourist App System
8 pages
Drone
No ratings yet
Drone
6 pages
Nigel_Anthony_Resume_DataScience
No ratings yet
Nigel_Anthony_Resume_DataScience
2 pages
Sql MCQS
No ratings yet
Sql MCQS
5 pages
Task2 Wiremu Casey s3661841
No ratings yet
Task2 Wiremu Casey s3661841
10 pages
CH 02
No ratings yet
CH 02
25 pages
Data Annotator - Resume
No ratings yet
Data Annotator - Resume
2 pages
Database MCQ
No ratings yet
Database MCQ
36 pages
New Tridib
No ratings yet
New Tridib
1 page
Lab Assessment 2 - Question
No ratings yet
Lab Assessment 2 - Question
2 pages
Database Management
No ratings yet
Database Management
14 pages
Report Sample
No ratings yet
Report Sample
17 pages
R18 B.Tech - CSE (AIML) 3-2 Tentative Syllabus
No ratings yet
R18 B.Tech - CSE (AIML) 3-2 Tentative Syllabus
24 pages
Database Systems: Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel
No ratings yet
Database Systems: Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel
50 pages
Data Scince
No ratings yet
Data Scince
8 pages
Lesson-05 - Creating and Modifying Databases
No ratings yet
Lesson-05 - Creating and Modifying Databases
3 pages
Master-Thesis--final-
No ratings yet
Master-Thesis--final-
102 pages
Walkowski 2019
No ratings yet
Walkowski 2019
4 pages
ICS4U - IsA Topics in Computer Science
No ratings yet
ICS4U - IsA Topics in Computer Science
4 pages
Database Normalization (AS)
No ratings yet
Database Normalization (AS)
58 pages
FAKE JOB POST PREDICTION USING ML
No ratings yet
FAKE JOB POST PREDICTION USING ML
7 pages
Aryan Dubey - Resume
No ratings yet
Aryan Dubey - Resume
1 page
Lab Report (1) Bachpan
No ratings yet
Lab Report (1) Bachpan
29 pages
Data Mining Approaches For Big Data and Sentiment Analysis in Social Media
No ratings yet
Data Mining Approaches For Big Data and Sentiment Analysis in Social Media
313 pages
III_CC11 - Practical Data Science Lab
No ratings yet
III_CC11 - Practical Data Science Lab
4 pages

Uploaded by

Uploaded by

Data Science Dictionary

 Artificial Intelligence (AI)

 Exploratory Data Analysis (EDA)

 JSON (JavaScript Object Notation)

 Machine Learning (ML)

You might also like