0% found this document useful (0 votes)

185 views30 pages

MUNAR - Linear Regression - Ipynb - Colaboratory

This document provides instructions for performing linear regression analysis on various datasets using Python. It demonstrates how to use single and multiple features to predict an outcome variable using linear regression. Key steps include importing data, determining correlations, plotting the data, calculating the linear regression line, and using multiple linear regression to predict real estate prices based on size and year features.

Uploaded by

TRISTAN JAYSON MUNAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

185 views30 pages

MUNAR - Linear Regression - Ipynb - Colaboratory

Uploaded by

TRISTAN JAYSON MUNAR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

2/18/22, 11:11 PM Linear Regression.

ipynb - Colaboratory

Linear Regression
Objective(s):
This activity aims to perform regression analysis using linear regression

Intended Learning Outcomes (ILOs):

Demonstrate how to use python to predict the outcome using linear regression.
Demonstrate how to use single and multiple features to predict the outcome using linear
regression.

Resources:

Jupyter Notebook
stores_dist.csv
real_estate_price_size_year.csv
Ames_Housing_Sales.csv

Procedure:
Code Text

Import Google Drive

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.moun

Import the libraries and the data

#import the libraries

import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

#import the stores_dist.csv data

salesDist = pd.read_csv('/content/drive/My Drive/Colab Notebooks/Linear Regression/stores-dis

#check the imported data

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 1/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

salesDist.head()

district annual net sales number of stores in district

0 1 231.0 12

1 2 156.0 13

2 3 10.0 16

3 4 519.0 2

4 5 437.0 6

Rename the annual sales to sales and number of stores in district to stores

#rename the annual sales to sales and the number of stores in district to stores

salesDist = salesDist.rename(columns={'annual net sales':'sales','number of stores in distric

#check the salesDist data to verify if the columns were renamed.

salesDist.head()

district sales stores

0 1 231.0 12

1 2 156.0 13

2 3 10.0 16

3 4 519.0 2

4 5 437.0 6

Determine the correlation

#check the correlation

salesDist.corr()

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 2/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

district sales stores

Interpret the correlation.
district 1.000000 0.136103 -0.230617

sales 0.136103 1.000000 -0.912236

There is a negative correlation between sales and stores .
stores -0.230617 -0.912236 1.000000

Drop the column with the lowest correlation and verify the dataframe if the column was deleted.

sales = salesDist.drop('district',axis=1)

sales.head()

sales stores

0 231.0 12

1 156.0 13

2 10.0 16

3 519.0 2

4 437.0 6

From the correlation coefficent data, what type of correlation did you observe between annual net
sales and number of stores in the district?

Negative correlation

Create a plot to visualize the data. You will also assign stores as the independent variable x and
sales as the dependent variable y .

# dependent variable for y axis

y = sales['sales']

# independent variable for x axis
x = sales.stores

# Display the plot inline

%matplotlib inline

# Increase the size of the plot

plt.figure(figsize=(20,10))

# Create a scatter plot: Number of stores in the District vs. Annual Net Sales

plt.plot(x,y, 'o', markersize = 15)

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 3/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

# Add axis labels and increase the font size

plt.ylabel('Annual Net Sales', fontsize = 30)

plt.xlabel('Number of Stores in the District', fontsize = 30)

# Increase the font size on the ticks on the x and y axis

plt.xticks(fontsize = 20)

plt.yticks(fontsize = 20)

# Display the scatter plot

plt.show()

Calculate the slope and y-intercept of the linear regression line.

m, b = np.polyfit(x,y,1)

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 4/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

print ('The slope of line is {:.2f}.'.format(m))

print ('The y-intercept is {:.2f}.'.format(b))

print ('The best fit simple linear regression line is {:.2f}x + {:.2f}.'.format(m,b))

The slope of line is -35.79.

The y-intercept is 599.38.

The best fit simple linear regression line is -35.79x + 599.38.

Using the linear regression line, you can predict the annual net sales based on the number of stores
in the district.

# Function to predict the net sales from the regression line

def predict(query):

if query >= 1:

predict = m * query + b

return predict

else:

print ("You must have at least 1 store in the district to predict the annual net sale

# Enter the number of stores in the function to generate the net sales prediction.

predict(4)

456.2313681207654

predict(16)

26.786342565077348

Using statsmodel to perform Multivariable Linear Regression

Import the libraries. Make sure to install all the libraries to avoid errors.

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import statsmodels.api as sm

import seaborn as sns

Load the data real_estate_price_size_year.csv and verify the data.

data = pd.read_csv('/content/drive/My Drive/Colab Notebooks/Linear Regression/real_estate_pri
data.head()

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 5/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

price size year

0 234314.144 643.09 2015

1 228581.528 656.22 2009

2 281626.336 487.29 2018

3 401255.608 1504.75 2015

4 458674.256 1275.46 2009

Show the descriptive statistic analysis

data.describe()

price size year

count 100.000000 100.000000 100.000000

mean 292289.470160 853.024200 2012.600000

std 77051.727525 297.941951 4.729021

min 154282.128000 479.750000 2006.000000

25% 234280.148000 643.330000 2009.000000

50% 280590.716000 696.405000 2015.000000

75% 335723.696000 1029.322500 2018.000000

max 500681.128000 1842.510000 2018.000000

Interpret the count, mean, min and std

The average real estate price in the dataset is 292,289.47 currency. The cheapest real estate price
is 154,282.13 currency is 52.78% of the average real estate price. The standard deviation is
77,051.73 currency. If we divide the standard deviation over mean we will get the coefficient
variation 0.26 which is less than 1. Thus, the dataset is more or less concentrated near the value
of the mean.

Calcuate the multiple linear regression. Set the dependent variable and independent variable

# Following the regression equation, our dependent variable (y) is the price

y = data ['price']

# Similarly, our independent variable (x) is the size and year

x1 = data [['size','year']]

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 6/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

# Add a constant. Esentially, we are adding a new column (equal in lenght to x), which consis
x = sm.add_constant(x1)

# Fit the model, according to the OLS (ordinary least squares) method with a dependent variab
results = sm.OLS(y,x).fit()

/usr/local/lib/python3.7/dist-packages/statsmodels/tsa/tsatools.py:117: FutureWarning: I
x = pd.concat(x[::order], 1)

# Print a nice summary of the regression.

results.summary()

OLS Regression Results

Dep. Variable: price R-squared: 0.776
Model: OLS Adj. R-squared: 0.772
Method: Least Squares F-statistic: 168.5
Date: Fri, 18 Feb 2022 Prob (F-statistic): 2.77e-32
Time: 14:54:29 Log-Likelihood: -1191.7
No. Observations: 100 AIC: 2389.
Df Residuals: 97 BIC: 2397.
Df Model: 2
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
const -5.772e+06 1.58e+06 -3.647 0.000 -8.91e+06 -2.63e+06
size 227.7009 12.474 18.254 0.000 202.943 252.458
year 2916.7853 785.896 3.711 0.000 1357.000 4476.571
Omnibus: 10.083 Durbin-Watson: 2.250
Prob(Omnibus): 0.006 Jarque-Bera (JB): 3.678
Skew: 0.095 Prob(JB): 0.159
Kurtosis: 2.080 Cond. No. 9.41e+05

Warnings:

[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

[2] The condition number is large, 9.41e+05. This might indicate that there are

strong multicollinearity or other numerical problems.

Interpret the result. Explain the adjusted R squared , coef and standard error.

The result of the OLS regression shows that size and year are great predictor variables for
price . Results shows that these variables are statistically significant because their p-values
equal 0.000. The coefficient signifies how much the mean of the dependent variable ( price )
changes given a one-unit shift in the independent variables ( size and year ) while holding other
variables in the model constant. Among the two independent variables, year affects the price

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 7/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

positively than the size . The R-squared of the regression is the fraction of the variation in the
dependent variable( price ) that is accounted by the independent variables ( size and year ).
Adjusted R-squared, a modified version of R-squared, adds precision and reliability by considering
the impact of additional independent variables that tend to skew the results of R-squared
measurement

Using sklearn to perform Linear Regression

Import the data using Pandas. Check the data. Examine the data types and shape of the dataset.

import pandas as pd
import numpy as np

# Import the data using the file path

data = pd.read_csv('/content/drive/My Drive/Colab Notebooks/Linear Regression/Ames_Housing_Sa

print(data.shape)

(1379, 80)

data

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 8/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

1stFlrSF 2ndFlrSF 3SsnPorch Alley BedroomAbvGr BldgType BsmtCond BsmtExposu

0 856.0 854.0 0.0 None 3 1Fam TA

1 1262.0
data.dtypes.value_counts()
0.0 0.0 None 3 1Fam TA

2 920.0 866.0 0.0 None 3 1Fam TA

object 43

float64
3 21

961.0 756.0 0.0 None 3 1Fam Gd

int64 16

dtype:
4 int64
1145.0 1053.0 0.0 None 4 1Fam TA

... ... ... ... ... ... ... ...

# Select the object (string) columns

1374 953.0 694.0 0.0 None 3 1Fam None No

mask = data.dtypes == np.object

categorical_cols = data.columns[mask]

1375 2073.0 0.0 0.0 None 3 1Fam TA

/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:2:
1376 1188.0 1152.0 0.0 None 4 DeprecationWarning:
1Fam Gd `np
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/relea
1377

1078.0 0.0 0.0 None 2 1Fam TA

1378 1256.0 0.0 0.0 None 3 1Fam TA

1379 rows × 80 columns

# Determine how many extra columns would be created

num_ohc_cols = (data[categorical_cols]

.apply(lambda x: x.nunique())

.sort_values(ascending=False))

# No need to encode if there is only one value

small_num_ohc_cols = num_ohc_cols.loc[num_ohc_cols>1]

# Number of one-hot columns is one less than the number of categories

small_num_ohc_cols -= 1

# This is 215 columns, assuming the original ones are dropped.

# This is quite a few extra columns!

small_num_ohc_cols.sum()

215

Create a new data set where all of the above categorical features will be one-hot encoded.

Used the dataframe .copy() method to create a completely separate copy of the dataframe
for one-hot encoding
On this new dataframe, one-hot encode each of the appropriate columns and add it back to
the dataframe. Be sure to drop the original column.
For the data that are not one-hot encoded, drop the columns that are string categoricals.

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 9/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

from sklearn.preprocessing import OneHotEncoder, LabelEncoder

# Copy of the data

data_ohc = data.copy()

# The encoders

le = LabelEncoder()

ohc = OneHotEncoder()

for col in num_ohc_cols.index:

# Integer encode the string categories

dat = le.fit_transform(data_ohc[col]).astype(np.int)

# Remove the original column from the dataframe

data_ohc = data_ohc.drop(col, axis=1)

# One hot encode the data--this returns a sparse array

new_dat = ohc.fit_transform(dat.reshape(-1,1))

# Create unique column names

n_cols = new_dat.shape[1]

col_names = ['_'.join([col, str(x)]) for x in range(n_cols)]

# Create the new dataframe

new_df = pd.DataFrame(new_dat.toarray(),

index=data_ohc.index,

columns=col_names)

# Append the new data to the dataframe

data_ohc = pd.concat([data_ohc, new_df], axis=1)

/usr/local/lib/python3.7/dist packages/ipykernel_launcher.py:13: DeprecationWarning:

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/re
del sys.path[0]

/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:13: DeprecationWarning:
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/re
del sys.path[0]

/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:13: DeprecationWarning:
https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 10/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/re
del sys.path[0]

/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:13: DeprecationWarning:
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/re
del sys.path[0]

data_ohc

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 11/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

1stFlrSF 2ndFlrSF 3SsnPorch BedroomAbvGr BsmtFinSF1 BsmtFinSF2 BsmtFullBath

0 856.0 854.0 0.0 3 706.0 0.0 1

1 1262.0 0.0 0.0 3 978.0 0.0 0

2 920.0 866.0 0.0 3 486.0 0.0 1

3 961.0 756.0 0.0 3 216.0 0.0 1

4 1145.0 1053.0 0.0 4 655.0 0.0 1

... ... ... ... ... ... ... ...

# Column difference is as calculated above
1374 953.0 694.0
data_ohc.shape[1] - data.shape[1] 0.0 3 0.0 0.0 0

1375 2073.0 0.0 0.0 3 790.0 163.0 1

215
1376 1188.0 1152.0 0.0 4 275.0 0.0 0

print(data.shape[1])
1377 1078.0 0.0 0.0 2 49.0 1029.0 1

1378 1256.0 0.0 0.0

# Remove the string columns from the dataframe 3 830.0 290.0 1
data = data.drop(num_ohc_cols.index, axis=1)
1379 rows × 295 columns

print(data.shape[1])

Create train and test splits of both data sets. To ensure the data gets split the same way, use the
same random_state in each of the two splits.
For each data set, fit a basic linear regression model
on the training data.
Calculate the mean squared error on both the train and test sets for the
respective models

from sklearn.model_selection import train_test_split

y_col = 'SalePrice'

# Split the data that is not one-hot encoded
feature_cols = [x for x in data.columns if x != y_col]
X_data = data[feature_cols]
y_data = data[y_col]

X_train, X_test, y_train, y_test = train_test_split(X_data, y_data,
test_size=0.3, random_state=42)
# Split the data that is one-hot encoded
feature_cols = [x for x in data_ohc.columns if x != y_col]
X_data_ohc = data_ohc[feature_cols]
y_data_ohc = data_ohc[y_col]

X_train_ohc, X_test_ohc, y_train_ohc, y_test_ohc = train_test_split(X_data_ohc, y_data_ohc,
https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 12/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory
test_size=0.3, random_state=42)

# Compare the indices to ensure they are identical
(X_train_ohc.index == X_train.index).all()

True

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error

LR = LinearRegression()

# Storage for error values

error_df = list()

# Data that have not been one-hot encoded

LR = LR.fit(X_train, y_train)

y_train_pred = LR.predict(X_train)

y_test_pred = LR.predict(X_test)

error_df.append(pd.Series({'train': mean_squared_error(y_train, y_train_pred),

'test' : mean_squared_error(y_test, y_test_pred)},

name='no enc'))

# Data that have been one-hot encoded

LR = LR.fit(X_train_ohc, y_train_ohc)

y_train_ohc_pred = LR.predict(X_train_ohc)

y_test_ohc_pred = LR.predict(X_test_ohc)

error_df.append(pd.Series({'train': mean_squared_error(y_train_ohc, y_train_ohc_pred),

'test' : mean_squared_error(y_test_ohc, y_test_ohc_pred)},

name='one-hot enc'))

# Assemble the results

error_df = pd.concat(error_df, axis=1)

error_df

no enc one-hot enc

train 1.131507e+09 3.177269e+08

test 1.372182e+09 3.664592e+16

For each of the data sets (one-hot encoded and not encoded):

Scale the all the non-hot encoded values using one of the following: StandardScaler ,
MinMaxScaler , MaxAbsScaler .
Compare the error calculated on the test sets

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 13/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

# Mute the setting wtih a copy warnings

pd.options.mode.chained_assignment = None

from sklearn.preprocessing import StandardScaler, MinMaxScaler, MaxAbsScaler

scalers = {'standard': StandardScaler(),

'minmax': MinMaxScaler(),

'maxabs': MaxAbsScaler()}

training_test_sets = {

'not_encoded': (X_train, y_train, X_test, y_test),

'one_hot_encoded': (X_train_ohc, y_train_ohc, X_test_ohc, y_test_ohc)}

# Get the list of float columns, and the float data

# so that we don't scale something we already scaled.

# We're supposed to scale the original data each time

mask = X_train.dtypes == np.float
float_columns = X_train.columns[mask]

# initialize model

LR = LinearRegression()

# iterate over all possible combinations and get the errors

errors = {}

for encoding_label, (_X_train, _y_train, _X_test, _y_test) in training_test_sets.items():

for scaler_label, scaler in scalers.items():

trainingset = _X_train.copy() # copy because we dont want to scale this more than on
testset = _X_test.copy()

trainingset[float_columns] = scaler.fit_transform(trainingset[float_columns])

testset[float_columns] = scaler.transform(testset[float_columns])

LR.fit(trainingset, _y_train)

predictions = LR.predict(testset)

key = encoding_label + ' - ' + scaler_label + 'scaling'

errors[key] = mean_squared_error(_y_test, predictions)

errors = pd.Series(errors)

print(errors.to_string())

print('-' * 80)

for key, error_val in errors.items():

print(key, error_val)

/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:16: DeprecationWarning: `np

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/relea
app.launch_new_instance()

not_encoded - standardscaling 1.372182e+09

not_encoded - minmaxscaling 1.372179e+09

not_encoded - maxabsscaling 1.372198e+09

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 14/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

one_hot_encoded - standardscaling 5.029834e+26

one_hot_encoded - minmaxscaling 8.065328e+09

one_hot_encoded - maxabsscaling 8.065328e+09

--------------------------------------------------------------------------------

not_encoded - standardscaling 1372182358.934498

not_encoded - minmaxscaling 1372179001.352261

not_encoded - maxabsscaling 1372198037.9660723

one_hot_encoded - standardscaling 5.0298339033526313e+26

one_hot_encoded - minmaxscaling 8065327607.218111

one_hot_encoded - maxabsscaling 8065327607.199247

Plot predictions vs actual for one of the models.

import matplotlib.pyplot as plt

import seaborn as sns

%matplotlib inline

sns.set_context('talk')

sns.set_style('ticks')

sns.set_palette('dark')

ax = plt.axes()

# we are going to use y_test, y_test_pred

ax.scatter(y_test, y_test_pred, alpha=.5)

ax.set(xlabel='Ground truth',

ylabel='Predictions',

title='Ames, Iowa House Price Predictions vs Truth, using Linear Regression');

Double-click (or enter) to edit

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 15/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

[ Linear Regression ]

Import Package

# Libraries required for data analysis and preprocessing
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler

# Libraries required for data visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Libraries required for linear regression model application
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import mean_squared_error

# Libraries required for residualization
import scipy.stats

Import Data
Bureau of Custosm Import Data for January 2022 from https://customs.gov.ph/import-reports/

org_df = pd.read_csv('/content/drive/My Drive/Colab Notebooks/Linear Regression/DailyfiledJan
org_df.head()

hs_code country weight amount_php tax

0 84381020000 IN 175.0 60119.33 0.00

1 85437090700 CN 75.0 34971.04 349.71

2 39021090000 TW 90000.0 6557462.10 655746.21

3 44152000000 JP 14520.0 48879.11 3421.54

4 39231090200 JP 37312.0 1097936.98 164690.55

Data Preprocessing

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 16/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

Remove Missing Values

To deal meaningfully with the imported data, we will check and delete rows recorded as NaN or null.
Let's check the missing value first.

org_df.isnull().sum()

hs_code 0

country 1

weight 0

amount_php 0

tax 0

dtype: int64

df = org_df.dropna()

hs_code country weight amount_php tax

0 84381020000 IN 175.00 60119.33 0.00

1 85437090700 CN 75.00 34971.04 349.71

2 39021090000 TW 90000.00 6557462.10 655746.21

3 44152000000 JP 14520.00 48879.11 3421.54

4 39231090200 JP 37312.00 1097936.98 164690.55

... ... ... ... ... ...

258263 85176299000 CN 1.53 50025.08 0.00

258264 85176299000 CN 10.50 343920.74 0.00

258265 85176299000 CN 9.06 297022.83 0.00

258266 85176299000 CN 0.77 25533.74 0.00

258267 84389029000 US 12.20 151972.67 0.00

258267 rows × 5 columns

df.describe()

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 17/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

hs_code weight amount_php tax

count 2.582670e+05 2.582670e+05 2.582670e+05 2.582670e+05

mean 7.101635e+10 1.902956e+04 1.445547e+06 2.588446e+04

std 2.298279e+10 7.856476e+05 2.190546e+07 4.577029e+05

min 1.012100e+09 0.000000e+00 0.000000e+00 0.000000e+00

25% 5.807909e+10
df.describe()
1.840000e+00 6.706795e+03 0.000000e+00

50% 8.471302e+10 1.977000e+01 4.110362e+04 1.501700e+02

hs_code weight amount_php tax
75% 8.537102e+10 3.298050e+02 2.744186e+05 3.703160e+03
count 2.582670e+05 2.582670e+05 2.582670e+05 2.582670e+05
max 9.704000e+10 2.024130e+08 3.445777e+09 1.059243e+08
mean 7.101635e+10 1.902956e+04 1.445547e+06 2.588446e+04

std 2.298279e+10 7.856476e+05 2.190546e+07 4.577029e+05

min 1.012100e+09 0.000000e+00 0.000000e+00 0.000000e+00

25% 5.807909e+10 1.840000e+00 6.706795e+03 0.000000e+00

50% 8.471302e+10 1.977000e+01 4.110362e+04 1.501700e+02

75% 8.537102e+10 3.298050e+02 2.744186e+05 3.703160e+03

max 9.704000e+10 2.024130e+08 3.445777e+09 1.059243e+08

Calculate the tax rate as follows:

tax_rate = tax / price

From the current data, price is the import amount per item and tax is the tax per item, so wen can
calculate the tax rate in this way.

df = df.assign(tax_rate = df['tax'] / (df['amount_php']))

df.describe()

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 18/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

hs_code weight amount_php tax tax_rate

count 2.582670e+05 2.582670e+05 2.582670e+05 2.582670e+05 258260.000000

Tax rate variable is added. Let's draw a rugplot to look at the distribution of tax0.052744
mean 7.101635e+10 1.902956e+04 1.445547e+06 2.588446e+04
rates.

A rugplot
stdis a 2.298279e+10
graph that draws a small line 2.190546e+07
7.856476e+05 segment (rug) 4.577029e+05
to check the distribution of the data.
0.063685

min 1.012100e+09 0.000000e+00 0.000000e+00 0.000000e+00 0.000000

sns.rugplot(df['tax_rate'])

25% 5.807909e+10 1.840000e+00 6.706795e+03 0.000000e+00 0.000000

<matplotlib.axes._subplots.AxesSubplot at 0x7eff217fa590>
50% 8.471302e+10 1.977000e+01 4.110362e+04 1.501700e+02 0.030000

75% 8.537102e+10 3.298050e+02 2.744186e+05 3.703160e+03 0.100000

max 9.704000e+10 2.024130e+08 3.445777e+09 1.059243e+08 0.650000

Log Transformation

A. Explore Correlations
Correlation is a measure of the degree of linear relationship of variables.

Because regression techniques represent the relationship in which the independent variable affects
the dependent variable, the weight for that variable in the regression formula may vary depending
on the direction and strength of the correlation.

numeric_df = df[['weight', 'amount_php', 'tax']]

numeric_df.head()

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 19/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

weight amount_php tax

0 175.0 60119.33 0.00

1 75.0of each
The correlation 34971.04
feature can349.71
be quantified through a heatmap, and visually seen at the same
time. The bar shown
2 90000.0 on the right655746.21
6557462.10 side of the heat map shows what color the correlation is based on
its strength.
3 14520.0 48879.11 3421.54

4 37312.0 1097936.98 164690.55

corr = numeric_df.corr()
sns.heatmap(corr, annot=True)

<matplotlib.axes._subplots.AxesSubplot at 0x7eff18ebd790>

The formula used to calculate the correlation here is Pearson Correlation coefficient, a
quantification of the linear correlation between two variables X and Y. 1 means perfect positive
linear correlation, 0 means no linear correlation, and -1 means perfect negative correlation. The
graph above shows that the variable that shows the strongest linear correlation with tax is price. It's
a natural result because taxes are levied at a fraction of the price of the product.

However, it is dangerous to identify the correlation only with figures like this. As mentioned, all
Pearson correlation coefficients show is a 'linear correlation'. That is, even if there is an nonlinear
correlation, the Pearson correlation coefficient can come close to zero. Therefore, it is
recommended that you always draw a pair plot together to check the correlation. The pairplot
provided by the seaborn library visualizes and shows the correlation of each variable.

sns.pairplot(numeric_df, markers='x', diag_kind="kde")

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 20/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

<seaborn.axisgrid.PairGrid at 0x7eff18ebd190>

The graph on the diagonal of the Pairplot represents the distribution of each variable, and the rest
are scatterplots of the two variables. As previously identified by the correlation coefficient, prices
and taxes show the strongest linear correlation. However, we can see that not all data points are
above the right-up line. If you look at the graph from the left to the third in the last row, where the X-
axis is price and the Y-axis is tax , you can see that some taxes are underreported even as prices
rise.

B. Log Transformation

If you look at the pairplot you can see that the data is skewed to the left rather than evenly
distributed. Let's visualize the distribution through the histogram.

numeric_df.hist(figsize=(15,10))

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 21/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7eff18a2c510>,

<matplotlib.axes._subplots.AxesSubplot object at 0x7eff18a26910>],

[<matplotlib.axes._subplots.AxesSubplot object at 0x7eff189d5d10>,

<matplotlib.axes._subplots.AxesSubplot object at 0x7eff18996350>]],

dtype=object)

Log transformation controls the distribution by reducing the deviation of the data. We're going to
apply the log() function of the numpy library, but since the values of 0 to 1 return negative values,
let's add 1 to the whole data and apply the function.

numeric_log_df = np.log(numeric_df + 1)

numeric_log_df.hist(figsize=(15,10))

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 22/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7eff188538d0>,

<matplotlib.axes._subplots.AxesSubplot object at 0x7eff188b38d0>],

[<matplotlib.axes._subplots.AxesSubplot object at 0x7eff1881c350>,

<matplotlib.axes._subplots.AxesSubplot object at 0x7eff187d2850>]],

dtype=object)

When the histogram is redrew after log transformation, it can see that the distribution of the data is
distributed appropriately. Let's look at the basic statistics and pairplot graphs again.

numeric_log_df.describe()

weight amount_php tax

count 258267.000000 258267.000000 258267.000000

mean 3.706474 10.677909 4.585706

std 3.058671 2.726455 4.250274

min 0.000000 0.000000 0.000000

25% 1.043804 8.811026 0.000000

50% 3.033510 10.623876 5.018405

75% 5.801529 12.522413 8.217212

max 19.125821 21.960415 18.478235

sns.pairplot(numeric_log_df, markers='x', diag_kind="kde")

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 23/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

<seaborn.axisgrid.PairGrid at 0x7eff1869dc50>

After the log transformation, it can see that the distribution between each variable is better
visualized.

Let's look at the correlation coefficients again.

corr = numeric_log_df.corr()

sns.heatmap(corr, annot=True)

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 24/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

<matplotlib.axes._subplots.AxesSubplot at 0x7eff183b9bd0>

Feature Scaling - MinMax Scaling

Here, we will use MinMax Scaler, a method that takes advantage of the minimum and maximum
values of the attribute to return the data values to a range value between 0 and 1. MinMax Scaler
subtracts the minimum value of the feature and returns it divided by the difference between the
maximum and minimum value.

# create object

scaler = MinMaxScaler()

# Fit -> transform

scaler.fit(numeric_log_df)

scaled = scaler.transform(numeric_log_df)

# Convert to a data frame because it is returned in an array form

numeric_log_df = pd.DataFrame(data = scaled, columns=numeric_log_df.columns)

numeric_log_df

weight amount_php tax

0 0.270341 0.501088 0.000000

1 0.226434 0.476416 0.317128

2 0.596449 0.714746 0.724827

3 0.501069 0.491663 0.440418

4 0.550413 0.633364 0.650053

... ... ... ...

258262 0.048532 0.492718 0.000000

258263 0.127699 0.580507 0.000000

258264 0.120704 0.573831 0.000000

258265 0.029854 0.462095 0.000000

258266 0.134908 0.543317 0.000000

258267 rows × 3 columns

Applying Algorithms

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 25/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

Split Training-Test Datasets

We will use 80% of the total data as training data and the remaining 20% as test data.
X = numeric_log_df[[ 'weight', 'amount_php']]

Y = numeric_log_df['tax']

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, test_size=0.2)

Multi Linear Regression

Now, let's model linear regression by utilizing multiple independent variables.

Defining Variables

x_train = X_train[['weight', 'amount_php']]

y_train = Y_train

The independent variable x is quantity, weight, price, and the dependent variable y is tax.

Creating Model and Modeling

Let's create a model.

multi_fitter = LinearRegression()
multi_fitter.fit(x_train, y_train)

LinearRegression()

Through the fit() function, multi_fitter utilized the training data to model the linear relationship
between the independent and dependent variables.

Checking the Regression Coefficient

multi_fitter.coef_

array([ 0.99175674, -0.47404556])

Visualizing Actual and Predicted Values

y_predict = multi_fitter.predict(x_train)

plt.scatter(y_train, y_predict, alpha=0.4)

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 26/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

plt.xlabel("Actual Tax")

plt.ylabel("Predicted Tax")

Text(0, 0.5, 'Predicted Tax')

Checking R²; Coefficient of Determination

Let us check the coefficient of determination.

multi_fitter.score(x_train, y_train)

0.3003469124609752

The multi-regression model has an explanatory power of about 30%.

MSE, RMSE
We will measure the error of the model through MSE and RMSE.

mse = mean_squared_error(y_train, multi_fitter.predict(x_train))

rmse = np.sqrt(mse)

mse, rmse

(0.03697998323651459, 0.19230180247858986)

The RMSE value, which converts the indicator of the error into units similar to the actual value,
shows that the tax predicted by the model has an error of approximately 0.192.

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 27/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

K-Fold

scores = cross_val_score(multi_fitter, x_train, y_train, cv=4)

scores

array([0.29782897, 0.29998959, 0.30076604, 0.30274157])

Checking the Result of K-Fold

Let's check the average value of the score.

scores.mean()

0.3003315436263042

Cross-validation results also show approximately 30% of explanatory power.

Applying Test Data

We will evaluate the final model by utilizing the test data we separated before modeling.

x_test = X_test[[ 'weight', 'amount_php']]

y_test = Y_test

A. Visualizing Actual and Predicted Values

Let's visualize the actual tax value of the test data and the tax value that the model predicted from
the test data.

The more accurately the model predicts, the closer the distribution of the points will be to a straight
line with a slope of 1.

y_predict = multi_fitter.predict(x_test)

plt.scatter(y_test, y_predict, alpha=0.4)

plt.xlabel("Actual Tax")

plt.ylabel("Predicted Tax")

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 28/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

Text(0, 0.5, 'Predicted Tax')

B. MSE, RMSE
Let's estimate the error through MSE and RMSE.

mse = mean_squared_error(y_test, multi_fitter.predict(x_test))

rmse = np.sqrt(mse)

mse, rmse

(0.03688809427338566, 0.19206273525435813)

It has an error of about 0.0192, similar to the error in the training data.

C. R²; Coefficient of Determination

multi_fitter.score(x_test, y_test)

0.3055014734172349

We obtained coefficients of determination similar to those in the training data.

If the difference in error and coefficient of determination is large when validated from the training
data and when evaluated with the test data, it can be overfit to the training data and be an
unsuitable model for generalization.

Interpretation
The model created is unsatisfactory in predicting tax using amount_php and weight as
independent variables as shown by a low R-sqaured value of 0.30.

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 29/30
2/18/22, 11:11 PM Linear Regression.ipynb - Colaboratory

error 0s completed at 10:55 PM

https://colab.research.google.com/drive/1XuPEQycCLJU2QGchaU5PPmHfgh7NhGcx#scrollTo=OracJqWau7jC&printMode=true 30/30

Patterns in Nature
100% (1)
Patterns in Nature
55 pages
Module 2 Thermo,Fluid Mechanics,Heat t,,Refiregration Second Dra (2)
No ratings yet
Module 2 Thermo,Fluid Mechanics,Heat t,,Refiregration Second Dra (2)
191 pages
Multilevel Security For Relational Databases
100% (2)
Multilevel Security For Relational Databases
296 pages
Development Standards
100% (2)
Development Standards
563 pages
MHF4U1 Lesson 1.1 Solutions
No ratings yet
MHF4U1 Lesson 1.1 Solutions
7 pages
The_Portuguese_version_of_the_Knowledge
No ratings yet
The_Portuguese_version_of_the_Knowledge
8 pages
Chapter 5 Equilibrium Part 2
No ratings yet
Chapter 5 Equilibrium Part 2
29 pages
Chapter Simple Linear Regression 1
100% (1)
Chapter Simple Linear Regression 1
77 pages
Spreading Code Acquisition and Tracking: Tan F. Wong: Spread Spectrum & CDMA
No ratings yet
Spreading Code Acquisition and Tracking: Tan F. Wong: Spread Spectrum & CDMA
22 pages
The Boston Housing Dataset
100% (2)
The Boston Housing Dataset
4 pages
!!new Words
No ratings yet
!!new Words
68 pages
Lab X - Building A Machine-Learning Annotator With Watson Knowledge Studio
No ratings yet
Lab X - Building A Machine-Learning Annotator With Watson Knowledge Studio
27 pages
Law of Cosines 13
No ratings yet
Law of Cosines 13
4 pages
BTest 2 - Physics
No ratings yet
BTest 2 - Physics
8 pages
Social Network Analysis Con Python PDF
No ratings yet
Social Network Analysis Con Python PDF
80 pages
Logic Gates Programming in PLC
88% (8)
Logic Gates Programming in PLC
19 pages
Ip Lab Manual (Python) 2019-20
No ratings yet
Ip Lab Manual (Python) 2019-20
16 pages
Model Driven Logical ETL Design Part1
No ratings yet
Model Driven Logical ETL Design Part1
9 pages
Chapter 5 - Fallacies of Argument
No ratings yet
Chapter 5 - Fallacies of Argument
2 pages
Lectures_ProbaStat for Engineers
No ratings yet
Lectures_ProbaStat for Engineers
60 pages
Dbms Mini Project
No ratings yet
Dbms Mini Project
19 pages
Smart Health Prediction System
No ratings yet
Smart Health Prediction System
5 pages
Comparative Anatomy of Angels-English-Gustav Theodor Fechner.
No ratings yet
Comparative Anatomy of Angels-English-Gustav Theodor Fechner.
18 pages
Linear Programming For The Premier League Fantasy Football
0% (1)
Linear Programming For The Premier League Fantasy Football
27 pages
Year 7: Pythagoras' Theorem
0% (1)
Year 7: Pythagoras' Theorem
32 pages
Minor Project Ii Report Text Mining: Reuters-21578: Submitted by
100% (1)
Minor Project Ii Report Text Mining: Reuters-21578: Submitted by
51 pages
Dependency Parsing
100% (11)
Dependency Parsing
127 pages
Tune Gaussian Mixture Models - MATLAB & Simulink - MathWorks India
No ratings yet
Tune Gaussian Mixture Models - MATLAB & Simulink - MathWorks India
6 pages
Darpan Chaudhary Analytics Take-Home Test
No ratings yet
Darpan Chaudhary Analytics Take-Home Test
6 pages
Weka A Tool For Exploratory Data Mining
No ratings yet
Weka A Tool For Exploratory Data Mining
157 pages
Logistic Regression
100% (1)
Logistic Regression
21 pages
Simple - Linear - Regression - Ipynb - Colaboratory
No ratings yet
Simple - Linear - Regression - Ipynb - Colaboratory
2 pages
Metrolinx Report
No ratings yet
Metrolinx Report
94 pages
CBSE Class 8 Maths Chapter 4 Data Handling Notes Free PDF
No ratings yet
CBSE Class 8 Maths Chapter 4 Data Handling Notes Free PDF
3 pages
Test Item Analysis
No ratings yet
Test Item Analysis
2 pages
SIT718 Assessment-Task 4-T3 2019-Amended PDF
No ratings yet
SIT718 Assessment-Task 4-T3 2019-Amended PDF
7 pages
Case 2 Predicting Boston Housing
0% (6)
Case 2 Predicting Boston Housing
2 pages
Opportunities and Limitations of AHP in Multio - 1988 - Mathematical and Compute
No ratings yet
Opportunities and Limitations of AHP in Multio - 1988 - Mathematical and Compute
4 pages
Python Predictive Modeling
No ratings yet
Python Predictive Modeling
24 pages
Entity Analysis Resolution
100% (1)
Entity Analysis Resolution
22 pages
Toy Problem List To Do in Data Science Domain
No ratings yet
Toy Problem List To Do in Data Science Domain
5 pages
Autodesk Inventor Addin
No ratings yet
Autodesk Inventor Addin
24 pages
Crime Hotspot Prediction Using Machine Learning v4
No ratings yet
Crime Hotspot Prediction Using Machine Learning v4
20 pages
Information Systems - Attributed
No ratings yet
Information Systems - Attributed
491 pages
Data Deduplication Strategies in Cloud Computing
No ratings yet
Data Deduplication Strategies in Cloud Computing
5 pages
Math8 - Q1 - Module 1 - MELC 1,2
100% (2)
Math8 - Q1 - Module 1 - MELC 1,2
12 pages
Dynamics of Rigid Bodies 100 Problems
No ratings yet
Dynamics of Rigid Bodies 100 Problems
8 pages
Ridge and Lasso in Python PDF
No ratings yet
Ridge and Lasso in Python PDF
5 pages
Genetic Algorithms - Quick Guide
No ratings yet
Genetic Algorithms - Quick Guide
39 pages
1 - Course Slides - Data Science and ML Fundamentals
No ratings yet
1 - Course Slides - Data Science and ML Fundamentals
92 pages
Deep Learning Technique Syllabus
No ratings yet
Deep Learning Technique Syllabus
2 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
5 pages
Polynomial Regression and Step Function
100% (1)
Polynomial Regression and Step Function
6 pages
B.R. Duffy and H.K. Moffatt - Flow of A Viscous Trickle On A Slowly Varying Incline
No ratings yet
B.R. Duffy and H.K. Moffatt - Flow of A Viscous Trickle On A Slowly Varying Incline
6 pages
Reinforcement Learning I:: The Setting and Classical Stochastic Dynamic Programming Algorithms
No ratings yet
Reinforcement Learning I:: The Setting and Classical Stochastic Dynamic Programming Algorithms
42 pages
Modern Database Management 11e Chapter 1 Problems
0% (1)
Modern Database Management 11e Chapter 1 Problems
10 pages
Bostrom Review
No ratings yet
Bostrom Review
7 pages
Tutorial 2018 Optimization
No ratings yet
Tutorial 2018 Optimization
7 pages
(U E in - Ppa) Notes
100% (1)
(U E in - Ppa) Notes
28 pages
How To Model Residual Errors To Correct Time Series Forecasts With Python
No ratings yet
How To Model Residual Errors To Correct Time Series Forecasts With Python
22 pages
Dependency Parsing
No ratings yet
Dependency Parsing
96 pages
Crime Analysis Through Machine Learning: November 2018
No ratings yet
Crime Analysis Through Machine Learning: November 2018
7 pages
Property Risk Assessment For Combined Cycle Power Plants
No ratings yet
Property Risk Assessment For Combined Cycle Power Plants
39 pages
Deep Learning For IoT Big Data and Streaming Analytics A Survey
No ratings yet
Deep Learning For IoT Big Data and Streaming Analytics A Survey
40 pages
LDA 01 Linear Discriminant Analysis
No ratings yet
LDA 01 Linear Discriminant Analysis
65 pages
House Price Prediction Using Machine Learning: Bachelor of Technology
No ratings yet
House Price Prediction Using Machine Learning: Bachelor of Technology
20 pages
DA Project Report
No ratings yet
DA Project Report
17 pages
Advanced Simpy
No ratings yet
Advanced Simpy
25 pages
M5 - Custom Model Building With SQL in BigQuery ML Slides
No ratings yet
M5 - Custom Model Building With SQL in BigQuery ML Slides
32 pages
Arch Model and Time-Varying Volatility
No ratings yet
Arch Model and Time-Varying Volatility
17 pages
Machine Learning in 10 Pages PDF
No ratings yet
Machine Learning in 10 Pages PDF
10 pages
GAM: The Predictive Modeling Silver Bullet: Author: Kim Larsen
No ratings yet
GAM: The Predictive Modeling Silver Bullet: Author: Kim Larsen
27 pages
Generalized Additive Model
No ratings yet
Generalized Additive Model
10 pages
Data Analysis and Harmonization: A Simple Guide
From Everand
Data Analysis and Harmonization: A Simple Guide
Jeff Voivoda
No ratings yet
Logistic Regression
No ratings yet
Logistic Regression
41 pages
Lab 3 - Linear Regression
No ratings yet
Lab 3 - Linear Regression
15 pages
Machine Learning: Hands-On for Developers and Technical Professionals
From Everand
Machine Learning: Hands-On for Developers and Technical Professionals
Jason Bell
No ratings yet
Dionysius Exiguus On Easter Cycle
No ratings yet
Dionysius Exiguus On Easter Cycle
18 pages
Pipe Friction Apparatus Lab
No ratings yet
Pipe Friction Apparatus Lab
11 pages
Social Media Data Mining and Analytics
From Everand
Social Media Data Mining and Analytics
Gabor Szabo
No ratings yet
Automobile
No ratings yet
Automobile
15 pages
Session 1 - Knowing and Understanding The Numeracy Assessment Tools Rationale, Framework and Features
No ratings yet
Session 1 - Knowing and Understanding The Numeracy Assessment Tools Rationale, Framework and Features
16 pages
Variable Selection
No ratings yet
Variable Selection
15 pages
ML - Expectation-Maximization Algorithm
No ratings yet
ML - Expectation-Maximization Algorithm
3 pages
Critical infrastructure protection Complete Self-Assessment Guide
From Everand
Critical infrastructure protection Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Data Mart Info
No ratings yet
Data Mart Info
5 pages
FOLD presentation-WAMS
No ratings yet
FOLD presentation-WAMS
41 pages
11 Maths Impq 10 Straight Lines Kvs
No ratings yet
11 Maths Impq 10 Straight Lines Kvs
3 pages
E-Business Models and Web Strategies for Agribusiness
From Everand
E-Business Models and Web Strategies for Agribusiness
Roby Jose Ciju
No ratings yet