Report of Profit Prediction
Report of Profit Prediction
Internship report on
PROFIT PREDICTION OF 50 COMPANIES
A Dissertation work submitted in partial fulfilment of the requirement for the award of the degree of
Internship
By
Abstract
Table of Contents
Abstract
1 Introduction 4-5
1.1 Data Science 4
1.2 Machine Learning 5
2 Existing Methods 6
2.1 Issues in existing Systems 6
3 Proposed method 7-9
3.1 Algorithm 7
4 Methodology 10-11
4.1 Data Collection 10
4.2 Data Preprocessing 11
4.3 Feature Selection 11
4.4 Split Data into Train and Test Set
4.5 Train the Model
4.6 Evaluate the Model
4.7 Optimize the Model
4.8 Deploy the Model
5 Implementation 12-13
5.1 Source Code 12-13
6 Conclusion 14
7 References 15
1. Introduction
2. Existing Methods
There may be several existing systems that attempt to predict the profit
value of a company based on its expenses such as R&D spend,
administration cost, and marketing spend. However, many of these
systems may rely on manual calculations or basic statistical techniques
that may not accurately capture the complex relationships between
these variables.
Machine learning models, on the other hand, can learn from data and
make accurate predictions based on patterns in the data. In this context,
linear regression models have been widely used for predicting
continuous target variables such as profit. The model estimates the
relationship between the independent variables and the dependent
variable by fitting a linear equation to the data.
1. Limited Accuracy
3. Limited Scope
3. Proposed Method
The proposed system offers a more accurate and efficient method for
predicting company profits, which can be useful for businesses in
making informed financial decisions. The model can also be further
improved by incorporating additional relevant variables or using more
advanced algorithms, such as neural networks or decision trees.
3.1 Algorithm
4. Predict the profit values for the testing set using the trained model.
where y is the predicted value of Profit, x1, x2, and x3 are the input
variables (R&D Spend, Administration Cost, and Marketing Spend),
and b0, b1, b2, and b3 are the coefficients that are learned during
training.
Methodology
The methodology for building an ML model that can predict the profit
value using linear
regression can be broken down into the following steps:
4.3 Feature Selection: Determine which features are most relevant for
predicting the profit value of a company. In this case, the selected
features are R&D Spend, Administration Cost, and Marketing Spend.
4.4 Split Data into Train and Test Sets: Split the data into a training
set and a test set. The training set will be used to train the linear
regression model, while the test set will be used to evaluate the model's
performance
4.4 Split Data into Train and Test Sets: Split the data into a training
set and a test set.The training set will be used to train the linear
regression model, while the test set will be used to evaluate the model's
performance.
4.5 Train the Model: Train a linear regression model using the training
data.
10
Downloaded by Harsh Garg 24601 ([email protected])
lOMoAR cPSD| 13119880
4.6 Evaluate the Model: Evaluate the performance of the model using
the test data. This
may involve metrics such as mean squared error or R-squared.
4.8 Deploy the Model: Once the model has been optimized, it can be
deployed for use in
predicting the profit value of a company based on R&D Spend,
Administration Cost, and
Marketing Spend.
11
Downloaded by Harsh Garg 24601 ([email protected])
lOMoAR cPSD| 13119880
Implementation
dataset = pd.read_csv('50_Startups.csv')
dataset.head()
dataset.tail()
dataset.describe()
print('There are' , dataset.shape[0],'rows and' , dataset.shape[1],'columns
in the dataset')
dataset isnull().sum()
dataset.info()
c=dataset.corr()
c
sns.heatmap+c,annot=True,cmap='Blues')
plt.show()
outliers = ['Profit']
plt.rcParams['figure.figsize'] =[8,8]
sns.boxplot(data=dataset[outliers], orient='v', palette = 'Set2' , width
=0.7)
12
sns.pairplot(dataset)
plt.show()
x = dataset.iloc[:,:-1].values
y = dataset.iloc[:,3].values
Testing
y_pred = model.predict(x_test)
testing_data_model_score = model.score(x_test,y_test)
13
Model Evaluation
import numpy as np
rmse = np.sqrt(mean_square_error(y_pred,y_test))
print(' Root mean squared error of the model is' ,rmse)
14
Conclusion
The proposed system has several advantages over the existing systems,
as it uses more relevant features and a better machine learning
algorithm. This model can be used by investors and businesses to make
more informed decisions about where to invest their money and how to
improve their profits.
15