Load a Pandas dataframe with a selected dataset. Identify and count the missing values in a dataframe. Clean the data after removing noise as follows: a. Drop duplicate rows. b. Detect the outliers and remove the rows having outliers c. Identify the most correlated positively correlated attributes and negatively correlated attributes

 

Load a Pandas dataframe with a selected dataset. Identify and count the missing values in a
dataframe. Clean the data after removing noise as follows:
a. Drop duplicate rows.
b. Detect the outliers and remove the rows having outliers
c. Identify the most correlated positively correlated attributes and negatively correlated attributes



CODE



import pandas as pd


# Load the dataset

df = pd.read_csv('your_dataset.csv')


# Identify and count missing values

missing_values = df.isnull().sum()

print("Missing values:")

print(missing_values)


# Drop duplicate rows

df = df.drop_duplicates()


# Detect outliers and remove rows with outliers

def remove_outliers(df, column):

    Q1 = df[column].quantile(0.25)

    Q3 = df[column].quantile(0.75)

    IQR = Q3 - Q1

    lower_bound = Q1 - 1.5 * IQR

    upper_bound = Q3 + 1.5 * IQR

    return df[(df[column] >= lower_bound) & (df[column] <= upper_bound)]


columns_to_check = ['column1', 'column2']


for column in columns_to_check:

    df = remove_outliers(df, column)


# Identify the most correlated attributes

correlation_matrix = df.corr()


# Find the most positively correlated attributes

positive_corr = correlation_matrix.unstack().sort_values(ascending=False)

print("Most positively correlated attributes:")

print(positive_corr)


# Find the most negatively correlated attributes

negative_corr = correlation_matrix.unstack().sort_values(ascending=True)

print("Most negatively correlated attributes:")

print(negative_corr)


Comments

Popular posts from this blog

The weights of 8 boys in kilograms: 45, 39, 53, 45, 43, 48, 50, 45. Find the median

Import iris data using sklearn library . Compute mean, mode, median, standard deviation, confidence interval and standard error for each feature ii. Compute correlation coefficients between each pair of features and plot heatmap iii. Find covariance between length of sepal and petal iv. Build contingency table for class feature