0% found this document useful (0 votes)
191 views7 pages

Data Science Dictionary

The document is a comprehensive data science dictionary that defines key terms and concepts in the field, such as algorithms, artificial intelligence, big data, and machine learning. Each entry includes a definition, an example, and a translation in Portuguese. It serves as a reference for understanding fundamental data science terminology.

Uploaded by

Victor Sousa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
191 views7 pages

Data Science Dictionary

The document is a comprehensive data science dictionary that defines key terms and concepts in the field, such as algorithms, artificial intelligence, big data, and machine learning. Each entry includes a definition, an example, and a translation in Portuguese. It serves as a reference for understanding fundamental data science terminology.

Uploaded by

Victor Sousa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Data Science Dictionary

 Algorithm
Definition: A defined set of instructions or steps used to perform calculations or solve problems in
computing. In data science, algorithms process data to uncover patterns or make predictions.
Example: The team designed an algorithm to identify fraudulent transactions in real-time.
Translation: Algoritmo.

 Artificial Intelligence (AI)


Definition: A branch of computer science that aims to simulate human-like intelligence in machines,
allowing them to reason, learn, and make decisions autonomously.
Example: AI is increasingly being used to power chatbots that handle customer inquiries efficiently.
Translation: Inteligência Artificial.

 Big Data
Definition: Extremely large datasets that cannot be processed using traditional data-processing
techniques. Big Data is often analyzed to reveal patterns, trends, and associations.
Example: The retail company used big data analytics to understand shopping habits across multiple
regions.
Translation: Big Data (grandes volumes de dados).

 Bias
Definition: A systematic error introduced into data analysis or machine learning models, leading to
skewed results or conclusions. Bias can arise from data collection methods or model design.
Example: The model's predictions were inaccurate due to bias in the training data, which
overrepresented one demographic group.
Translation: Viés.

 Clustering
Definition: A type of unsupervised machine learning where the goal is to group data points into
clusters based on their similarities. It helps identify natural patterns in the data.
Example: Clustering was used to segment customers based on purchasing behavior, enabling more
targeted marketing strategies.
Translation: Agrupamento.

 Correlation
Definition: A statistical measure that describes the strength and direction of a relationship between
two variables. It indicates how one variable changes as the other does.
Example: A positive correlation was found between daily screen time and reduced sleep hours
among teenagers.
Translation: Correlação.
D

 Data Cleaning
Definition: The process of identifying and rectifying errors or inconsistencies in a dataset to ensure
its accuracy and completeness. This is a critical step before analysis can take place.
Example: Data cleaning removed duplicate entries and filled in missing values, making the dataset
ready for analysis.
Translation: Limpeza de dados.

 Data Visualization
Definition: The graphical representation of data, typically using charts, graphs, and plots, to make
complex data more accessible and understandable.
Example: The data scientist created an interactive dashboard that visualized trends in sales over the
last quarter.
Translation: Visualização de dados.

 Exploratory Data Analysis (EDA)


Definition: The process of analyzing data sets to summarize their main characteristics, often using
visual methods like histograms, box plots, and scatter plots. EDA is crucial for identifying patterns,
outliers, and data structure.
Example: EDA showed that the data had several outliers, which required further investigation.
Translation: Análise exploratória de dados.

 Feature Engineering
Definition: The process of transforming raw data into meaningful features that can improve the
performance of machine learning algorithms. It often involves techniques like scaling, encoding, and
creating new variables.
Example: Feature engineering involved creating interaction terms between product price and
customer age to enhance the model’s predictive power.
Translation: Engenharia de características.

 Forecasting
Definition: The practice of predicting future values based on historical data using statistical models
and machine learning techniques.
Example: Time series forecasting was used to predict next quarter’s sales based on historical trends.
Translation: Previsão.

 Gradient Descent
Definition: An optimization algorithm used in machine learning to minimize a loss function by
iteratively adjusting the model's parameters. It is commonly used in training deep learning models.
Example: The model’s performance improved as gradient descent minimized the error between
predicted and actual values.
Translation: Descenso de gradiente.
 Graph
Definition: A data structure made up of nodes (vertices) and edges (connections) that represents
relationships among entities. It is widely used to model networks such as social media or
transportation systems.
Example: The social network analysis used graphs to find the most influential nodes within a
community.
Translation: Grafo.

 Hypothesis Testing
Definition: A statistical method used to evaluate assumptions about a population based on sample
data. It involves testing a null hypothesis against an alternative hypothesis.
Example: Hypothesis testing showed that the new marketing campaign significantly increased
customer engagement.
Translation: Teste de hipótese.

 Hyperparameter
Definition: Parameters set before the learning process begins that influence the training process,
such as learning rate, number of trees, or regularization strength.
Example: The model's accuracy improved after tuning the hyperparameters, including the learning
rate and batch size.
Translation: Hiperparâmetro.

 Imputation
Definition: The process of replacing missing values in a dataset with substituted values, often using
statistical techniques such as mean, median, or predictive models.
Example: Imputation helped fill in missing data for age and income, which was crucial for the
analysis.
Translation: Imputação.

 Inferential Statistics
Definition: A branch of statistics that allows for making predictions or generalizations about a
population based on a sample. It involves techniques like hypothesis testing and confidence intervals.
Example: Inferential statistics showed that the average annual salary in the region was significantly
higher than the national average.
Translation: Estatística inferencial.

 JSON (JavaScript Object Notation)


Definition: A lightweight, human-readable data format used to represent structured data objects. It is
commonly used in APIs to transmit data between servers and clients.
Example: The API returned data in JSON format, which was then parsed and analyzed for further
insights.
Translation: JSON (Notação de Objetos JavaScript).
 Joint Distribution
Definition: A probability distribution that models the relationship between two or more random
variables. It shows how likely different combinations of values for the variables are.
Example: The joint distribution of age and income helped identify purchasing patterns among
different demographic groups.
Translation: Distribuição conjunta.

 K-Means
Definition: A popular unsupervised machine learning algorithm used to partition data into k distinct
clusters based on feature similarity.
Example: K-Means clustering divided the customer base into five distinct segments, each with
unique characteristics.
Translation: K-Means.

 Kernel
Definition: A mathematical function used in machine learning, particularly in Support Vector
Machines (SVMs), to map data into a higher-dimensional space to make it easier to separate.
Example: The use of the Gaussian kernel allowed the SVM model to successfully classify non-linear
data.
Translation: Kernel.

 Linear Regression
Definition: A statistical method for modeling the relationship between a dependent variable and one
or more independent variables by fitting a linear equation to observed data.
Example: Linear regression was used to predict house prices based on size, location, and amenities.
Translation: Regressão linear.

 Logistic Regression
Definition: A statistical model used to predict a binary outcome (e.g., yes/no, success/failure) based
on one or more independent variables.
Example: Logistic regression was employed to classify emails as either spam or non-spam.
Translation: Regressão logística.

 Machine Learning (ML)


Definition: A subset of artificial intelligence that enables systems to automatically improve their
performance with experience and data, without being explicitly programmed.
Example: Machine learning algorithms were used to predict credit card fraud by analyzing
transaction patterns.
Translation: Aprendizado de máquina.
 Model
Definition: A mathematical representation of a real-world process that is used to make predictions or
inform decisions based on input data.
Example: The predictive model forecasted customer churn based on their usage patterns and
demographic data.
Translation: Modelo.

 Neural Network
Definition: A computational model inspired by the structure of the human brain, consisting of
interconnected nodes (neurons) that process data for tasks like classification and regression.
Example: The deep neural network achieved high accuracy in recognizing handwritten digits.
Translation: Rede neural.

 Normalization
Definition: The process of adjusting the range of numerical data so that it fits within a specific scale,
often to improve the performance of machine learning algorithms. Normalization typically rescales
data into a range between 0 and 1, or -1 and 1.
Example: Before training the model, the data was normalized to ensure that the features with larger
numerical ranges did not dominate the results.
Translation: Normalização.

You might also like