Featured post

Quiz: Data PreProcessing

Monday, 10 May 2021

Quiz: Data PreProcessing

1. What are some examples of data quality problems:

A. Duplicate Data

B. Correlation between features

C. Missing values

D. All of the Above  


2. Which Method is used for encoding the categorical variables?

A. LabelEncoder

B. OneHotEncoder

C. None of the Above

D. All of the Above 


3. Which of the below is valid for Imputation

A. Imputation with mean/median

B. Imputing with random numbers

C. Imputing with one

D. All of the above


4. What's the purpose of feature scaling

A. Accelerating the training time

B. Getting better accuracy

C. Both A and B

D. None


5. In standardization, the features will be rescaled with

A. Mean 0 and Variance 0

B. Mean 0 and Variance 1

C. Mean 1 and Variance 0

D. Mean 1 and Variance 1 


6. What is a Dummy Variable Trap?

A. Multicollinearity among the dummy variables

B. One variable predicts the value of other

C. Both A and B

D. None of the Above


7. Which of the following(s) is/are features scaling techniques?

A. Standardization

B. Normalization

C. Min-Max Scaling

D. All of the Above 


8. Whats the best way to handle missing values in the dataset?

A. Dropping the missing rows or columns

B. Imputation with mean/median/mode value

C. Taking missing values into a new row or column

D. All of the above 

Solution:

1. D, 2. A, 3. A, 4. C, 5. B, 6. C, 7. D, 8.B


Hint:


What is Standardization?

In Standardization the values are centered around the mean with a unit standard deviation. Which means that the mean of the attribute becomes 0 and the resultant distribution has a unit standard deviation.


What is Normalization?

In Normalization values are shifted and rescaled so that they are between 0 and 1. It is also caleed as Min-Max scaling.

There is no hard and fast rule to decide which one to be used on the data. Best way is to use them one by one on the dataset and compare the result.