Copy of AE II Simulation File.pdf
Copy of AE II Simulation File.pdf
PRACTICAL FILE
Code
-
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
Code
-
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
- Code
import pandas as pd
data = {
"FamilyName": ["Shah", "Vats", "Vats", "Kumar", "Vats", "Kumar", "Shah", "Shah", "Kumar",
"Vats"],
"Gender": ["Male", "Male", "Female", "Female", "Female", "Male", "Male", "Female",
"Female", "Male"],
"MonthlyIncome": [44000.00, 65000.00, 43150.00, 66500.00, 255000.00, 103000.00,
55000.00, 112400.00, 81030.00, 71900.00]
}
df = pd.DataFrame(data)
"(A)"
family_gross_income = df.groupby("FamilyName")["MonthlyIncome"].sum()
print("Familywise Gross Monthly Income:")
print(family_gross_income)
"(B)"
family_income_stats = df.groupby("FamilyName")["MonthlyIncome"].agg(["max", "min"])
print("\nHighest and Lowest Monthly Income for Each Family:")
print(family_income_stats)
"(C)"
low_income_members = df[df["MonthlyIncome"] < 80000.00]
print("\nMonthly Income of Members Earning Less Than Rs. 80000:")
print(low_income_members)
"(D)"
female_stats = df[df["Gender"] == "Female"]
total_females = len(female_stats)
average_female_income = female_stats["MonthlyIncome"].mean()
print(f"\nTotal Number of Females: {total_females}")
print(f"Average Monthly Income of Females: Rs. {average_female_income:.2f}")
"(E)"
average_income = df["MonthlyIncome"].mean()
print("\nAverage income is:",average_income)
df_filtered = df[df["MonthlyIncome"] >= average_income]
print("Data Frame After Removing Rows with Income Below Average Income:")
print(df_filtered)
Practical 1. Write programs in Python using NumPy library to do the following:
a. Create a two dimensional array, ARR1 having random values from 0 to 1. Compute
the mean, standard
deviation, and variance of ARR1 along the second axis.
b. Create a 2-dimensional array of size m x n integer elements, also print the shape,
type and data type of
the array and then reshape it into an n x m array, where n and m are user inputs given
at the run time.
c. Test whether the elements of a given 1D array are zero, non-zero and NaN. Record
the indices of these
elements in three separate arrays.
d. Create three random arrays of the same size: Array1, Array2 and Array3. Subtract
Array 2 from Array3
and store in Array4. Create another array Array5 having two times the values in
Array1. Find Co-
variance and Correlation of Array1 with Array4 and Array5 respectively.
e. Create two random arrays of the same size 10: Array1, and Array2. Find the sum of
the first half of both
the arrays and product of the second half of both the arrays.
f. Create an array with random values. Determine the size of the memory occupied by
the array.
g. Create a 2-dimensional array of size m x n having integer elements in the range
(10,100). Write
statements to swap any two rows, reverse a specified column and store updated array
in another
variable
- Code
import numpy as np
#"A"
array1 = np.random.rand(5,4)
print("array1:", array1)
# "(B)"
m = int(input("Enter the no. of rows(m):"))
n = int(input("Enter the no. of columns(n):"))
array2 = np.random.randint(1,100,size=(m,n))
print("Original Array:", array2)
print("Shape", array2.shape)
print("Type:", type(array2))
print("DataType:", array2.dtype)
reshaped_array = array2.reshape(n,m)
print("Reshaped Array:",reshaped_array)
# "(C)"
# "(D)" ARR1 =
np.random.rand(10) ARR2 =
np.random.rand(10) ARR3 =
np.random.rand(10)
ARR5 = 2 * ARR1
# "(E)"
# "(F)"
array = np.random.rand(10, 10)
memory_size = array.nbytes
#"(G)"
column = 0
reversed_array = array.copy()
reversed_array[:, column] = reversed_array[::-1, column]
print("Array after reversing column {}:\n".format(column), reversed_array)
Practical 2. Do the following using PANDAS Series:
a. Create a series with 5 elements. Display the series sorted on index and also sorted
on values seperately
b. Create a series with N elements with some duplicate values. Find the minimum and
maximum ranks
assigned to the values using ‘first’ and ‘max’ methods
c. Display the index value of the minimum and maximum element of a Series
- Code
import pandas as pd
# "(A)"
series1 = pd.Series([45,50,23,67,30], index = ['a','b','c','d','e'])
print("Original Series:\n",series1)
sorted_series_by_index =series1.sort_index()
print("Series1 sorted by index:\n", sorted_series_by_index)
sorted_series_by_values = series1.sort_values()
print("Series1 sorted by value:\n", sorted_series_by_values)
# "(B)"
ranks_first = series2.rank(method='first')
print("\nRanks (method='first'):\n", ranks_first)
ranks_max = series2.rank(method='max')
print("\nRanks (method='max'):\n", ranks_max)
min_rank_first = ranks_first.min()
max_rank_first = ranks_first.max()
min_rank_max = ranks_max.min()
max_rank_max = ranks_max.max()
#"(C)"
series = pd.Series([45, 23, 78, 12, 56], index=['a', 'b', 'c', 'd', 'e'])
print("Original Series:\n", series)
min_index = series.idxmin()
max_index = series.idxmax()
print("\nIndex of the minimum element:", min_index)
print("Index of the maximum element:", max_index)
Practical 4. Consider two excel files having attendance of two workshops, each of
duration 5 days. Each file has three
fields ‘Name’, ‘Date, duration (in minutes) where names may be repetitve within a file.
Note that duration may
take one of three values (30, 40, 50) only. Import the data into two data frames and do
the following:
a. Perform merging of the two data frames to find the names of students who had
attended both
workshops.
b. Find names of all students who have attended a single workshop only.
c. Merge two data frames row-wise and find the total number of records in the data
frame.
d. Merge two data frames row-wise and use two columns viz. names and dates as
multi-row indexes.
Generate descriptive statistics for this hierarchical data frame.
- Code
import pandas as pd
workshop1 = pd.read_excel(r"C:\Users\fq1089\OneDrive\Documents\DSA
Folder\workshop1.xlsx")
workshop2 = pd.read_excel(r"C:\Users\fq1089\OneDrive\Documents\DSA
Folder\workshop2.xlsx")
workshop1.columns = workshop1.columns.str.strip().str.lower()
workshop2.columns = workshop2.columns.str.strip().str.lower()
workshop1_names = set(workshop1['name'].unique())
workshop2_names = set(workshop2['name'].unique())
single_workshop_names = (workshop1_names ^ workshop2_names)
print("\nNames of students who attended only one workshop:")
print(single_workshop_names)
- excel data
- Output
Practical 5. Using Iris data, plot the following with proper legend and axis labels:
(Download IRIS data from:
https://archive.ics.uci.edu/ml/datasets/iris or import it from sklearn datasets)
a. Load data into pandas’ data frame. Use pandas.info () method to look at the info on
datatypes in the
dataset.
b. Find the number of missing values in each column (Check number of null values in
a column using
df.isnull().sum())
c. Plot bar chart to show the frequency of each class label in the data.
d. Draw a scatter plot for Petal Length vs Sepal Length and fit a regression line
e. Plot density distribution for feature Petal width.
f. Use a pair plot to show pairwise bivariate distribution in the Iris Dataset.
g. Draw heatmap for any two numeric attributes
h. Compute mean, mode, median, standard deviation, confidence interval and
standard error for each
numeric feature
i. Compute correlation coefficients between each pair of features and plot heatmap
- Code
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import sem, norm
# a. Use pandas.info()
print("Dataset Info:")
iris_df.info()
# d. Scatter plot for Petal Length vs Sepal Length with regression line
sns.lmplot(
data=iris_df,
x=iris.feature_names[0], # Sepal Length
y=iris.feature_names[2], # Petal Length
hue='species',
markers=['o', 's', 'D'],
ci=None
)
plt.title('Scatter Plot with Regression Line')
plt.show()
"Mean": mean,
"Mode": mode,
"Median": median,
"Standard Deviation": std,
"Standard Error": se,
"95% Confidence Interval": (ci_lower, ci_upper)
}
print("\nStatistical Measures for Each Numeric Feature:")
for col, stat in stats.items():
print(f"\nFeature: {col}")
for key, value in stat.items():
print(f" {key}: {value}")
- Output
Practical 3. Create a data frame having at least 3 columns and 50 rows to store
numeric data generated using a random
function. Replace 10% of the values by null values whose index positions are
generated using random function.
Do the following:
a. Identify and count missing values in a data frame.
b. Drop the column having more than 5 null values.
c. Identify the row label having maximum of the sum of all values in a row and drop
that row.
d. Sort the data frame on the basis of the first column.
e. Remove all duplicates from the first column.
f. Find the correlation between first and second column and covariance between
second and third column.
g. Discretize the second column and create 5 bins.
code
-
import pandas as pd
import numpy as np
np.random.seed(42)
data = np.random.rand(50, 3)
df = pd.DataFrame(data, columns=['Column1', 'Column2', 'Column3'])
missing_values_count = df.isnull().sum().sum()
df_no_duplicates = df_sorted.drop_duplicates(subset='Column1')
correlation = df_no_duplicates['Column1'].corr(df_no_duplicates['Column2'])
covariance = df_no_duplicates['Column2'].cov(df_no_duplicates['Column3'])
bins = pd.cut(df_no_duplicates['Column2'], bins=5, labels=False)