Featured post

Quiz: Data PreProcessing

Monday, 10 May 2021

Quiz: Data PreProcessing

1. What are some examples of data quality problems:

A. Duplicate Data

B. Correlation between features

C. Missing values

D. All of the Above  


2. Which Method is used for encoding the categorical variables?

A. LabelEncoder

B. OneHotEncoder

C. None of the Above

D. All of the Above 


3. Which of the below is valid for Imputation

A. Imputation with mean/median

B. Imputing with random numbers

C. Imputing with one

D. All of the above


4. What's the purpose of feature scaling

A. Accelerating the training time

B. Getting better accuracy

C. Both A and B

D. None


5. In standardization, the features will be rescaled with

A. Mean 0 and Variance 0

B. Mean 0 and Variance 1

C. Mean 1 and Variance 0

D. Mean 1 and Variance 1 


6. What is a Dummy Variable Trap?

A. Multicollinearity among the dummy variables

B. One variable predicts the value of other

C. Both A and B

D. None of the Above


7. Which of the following(s) is/are features scaling techniques?

A. Standardization

B. Normalization

C. Min-Max Scaling

D. All of the Above 


8. Whats the best way to handle missing values in the dataset?

A. Dropping the missing rows or columns

B. Imputation with mean/median/mode value

C. Taking missing values into a new row or column

D. All of the above 

Solution:

1. D, 2. A, 3. A, 4. C, 5. B, 6. C, 7. D, 8.B


Hint:


What is Standardization?

In Standardization the values are centered around the mean with a unit standard deviation. Which means that the mean of the attribute becomes 0 and the resultant distribution has a unit standard deviation.


What is Normalization?

In Normalization values are shifted and rescaled so that they are between 0 and 1. It is also caleed as Min-Max scaling.

There is no hard and fast rule to decide which one to be used on the data. Best way is to use them one by one on the dataset and compare the result.


Tuesday, 5 January 2021

Installing Multiple Python

mkdir /opt
cd /opt
sudo yum install tk-devel gdbm-devel
mkdir  python
cd python
export http://www-proxy-idc.in.oracle.com:80
wget http://www.python.org/ftp/python/2.7.9/Python-2.7.9.tgz
tar xvzf Python-2.7.9.tgz
echo $PWD
cd Python-2.7.9
./configure --prefix=/opt/python2.7
echo $PWD
make
sudo make install
sudo ln -s /opt/python2.7/bin/python2.7 /usr/bin/python27
sudo ln -s /opt/python2.7/bin/idle2.7 /usr/bin/idle-python27
sudo ln -s /opt/python2.7/bin/pip2.7 /usr/bin/pip27
 

python27



curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py

python27 get-pip.py
pip27 install pandas
pip27 install cx_oracle




install cxOracle:

mkdir  /opt/oracle
cd /opt/oracle/
wget https://download.oracle.com/otn_software/linux/instantclient/185000/instantclient-basic-linux.x64-18.5.0.0.0dbru.zip
unzip instantclient-basic-linux.x64-18.5.0.0.0dbru.zip
cd instantclient_18_5/

/opt/oracle/instantclient_18_5

export LD_LIBRARY_PATH=/opt/oracle/instantclient_18_5:$LD_LIBRARY_PATH



for Python : 3.6
wget http://www.python.org/ftp/python/3.6.5/Python-3.6.5.tgz
tar xvzf Python-3.6.5.tgz
echo $PWD
cd Python-3.6.5
./configure --prefix=/opt/python3.6
echo $PWD
make
sudo make install
 ln -s /opt/python3.6/bin/python3.6 /usr/bin/python36
ln -s /opt/python3.6/bin/idle3.6 /usr/bin/idle-python36
ln -s /opt/python3.6/bin/pip3 /usr/bin/pip36
which pip36
pip36 install django
pip36 install pandas
pip36 install cx_oracle