Question: Write equation for Linear Regression?
Ans: y = a + bx
Question: Write equation for Logistic Regression?
Ans: Sigmoid Function 1 / (1 + e^-value)
Question: How will You calculate AUCROC (Area Under Curve ROC) value manually?
Ans: Using the formula for the area of a trapezoid.
Question: What performance metrics you used in model building?
Ans:
Confusion Matrix
F1 Score
Gain and Lift Charts
Kolmogorov Smirnov Chart
AUC – ROC
Log Loss
Gini Coefficient
Concordant – Discordant Ratio
Root Mean Squared Error
Cross Validation (Not a metric though!)
Question: Assumptions of Linear Regression?
Ans: 5 key assumptions:
Linear relationship (Outliers need to be checked)
Multivariate normality (can be checked with a histogram or a Q-Q-Plot)
No or little multicollinearity (Can be tested with 3 criteria, Correlation matrix , Tolerance, Variance Inflation Factor (VIF) )
No auto-correlation
Homoscedasticity (can be checked using scatter plot)
Question: Different ways you used to treat missing values and outliers?
Ans:
For missing Values below one can be used but not limited to these only:
- If a feature has too many missing values then drop the whole feature(column).
- If
the feature is too important to drop then introduce another binary
feature as isnull of this feature and impute the null values of the
existing feature with median/mean.
- If there are very few missing values in a feature and removing those rows doesn't hurt the sample size then remove the rows.
- If
removing rows with missing values in either of the feature reduce the
sample size drastically then go for imputation. There are multiple ways
for that.
- impute with mean/median of the column.
- mean/median of that column for N nearest neighbors
- If it's a time series data set then use Marcov Chain to predict the missing values
- If
each row is a time series and your algorithm doesn't demand the rows to
be of same size then leave it as is. One example would be dynamic time
warping distance between time series.
For Outliers:
- Remove the outliers. (Trimming)
- Replacing the values of outliers or reducing the influence of outliers through outlier weight adjustments. (Winsorization)
- To estimate the values of outliers using robust techniques.
More Read:
Source
Question: What is Feature Selection?
Ans:
“Feature
Selection is a process of selection a subset of Relevant
Features(Variables or Predictors) from all features, which is used to
make Model Building.”
With
N(high Dimension) number of features data analysis is challenging to
the engineers in the field of Machine Learning and Data Mining.Feature
Selection gives an effective way to solve this problem by removing
irrelevant and redundant data, which can reduce computation time,
improve learning accuracy, and facilitate a better understanding for the
learning model or data.
Question: How many Features to have in the Model?
Ans: One important thing is we have to take consideration Trade off between Predictive accuracy vs Model Interpretability. because if we use large number of Features the Predictive accuracy is likely to go up and Model Interpretability goes down.
If we have less number of Features then it is easy to interpret the model, less likely to overfit but it will give low prediction accuracy.
And if we have large number of Features then it is difficult to Interpret model, more likely to overfit and it will give high prediction accuracy.
Question: Types of Feature Selection?
Ans: High number of features in the data increases the risk of Overfitting in the Model.
Feature Selection method helps to reduce the dimension of features by without much loss of information.
Below are the some methods used for Feature Selection:
a> Filter Method, b> Wrapper Method (subset selection, Forward Stepwise selection, Backward Stepwise Selection), c> Embedded Method (Shrinkage)( LASSO Regression, RIDGE Regression)
More read on Feature Selection
: Source
Question: Explain Decision Tree?
Ans: Decision tree is one of the most powerful and popular tool for classification
and prediction. It is a flowchart like tree structure,
where each internal node denotes a test on an attribute, each branch
represents an outcome of the test, and each leaf node (terminal node)
holds a class label.
More Read
: Source
Question: Difference between K-Means and KNN?
Ans:
These are completely different methods.
K-means is a
clustering algorithm that tries to partition a set of points into K sets
(clusters) such that the points in each cluster tend to be near each
other. It is unsupervised because the points have no external
classification.
K-Nearest Neighbors(K-NN)
is a classification (or regression) algorithm that in order to
determine the classification of a point, combines the classification of
the K nearest points. It is supervised because you are trying to
classify a point based on the known classification of other points.
Question: What is K in KNN?
Ans: K is just the number of neighbors "voting" to classify the point.
Question: What is Confusion matrics, explain?
Ans: A confusion matrix is a table that describes the performance of a
classifier/classification model. It contains information about the actual and prediction classifications done by the classifier and this information is used to evaluate the performance of the classifier.
The confusion matrix is only used for classification tasks,
and as such cannot be used in regression models or other
non-classification models.
Question: What is cross-validation and whats its purpose?
Ans: Cross-validation is a technique in which we train our model using the
subset of the data-set and then evaluate using the complementary subset
of the data-set.The purpose of cross-validation is model checking, not model building.
Now, suppose we have two models, one is linear regression model and other is neural
network. To find out which model is better at predicting the test
set points, we can do K-fold
cross-validation. But once we have used cross-validation to select the better
performing model, we train that model (whether it be the linear
regression or the neural network) on all the data. We don't use the
actual model instances we trained during cross-validation for our final
predictive model.
Note that there is a technique called bootstrap aggregation (usually
shortened to 'bagging') that does in a way use model instances produced
in a way similar to cross-validation to build up an ensemble model.
Question: Whats the difference between Parameter and Hyper-Parameter?
Ans: A model hyperparameter is a configuration that is external to the model and whose value cannot be estimated from data.
- They are often used in processes to help estimate model parameters.
- They are often specified by the practitioner.
- They can often be set using heuristics.
- They are often tuned for a given predictive modeling problem.
E.g: The learning rate for training a neural network, the C and sigma hyperparameters for support vector machines, the k in k-nearest neighbors.
A model parameter is a configuration variable that is internal to the model and whose value can be estimated from data.
- They are required by the model when making predictions.
- They values define the skill of the model on your problem.
- They are estimated or learned from data.
- They are often not set manually by the practitioner.
- They are often saved as part of the learned model.
E.g: The weights in an artificial neural network, the support vectors in a support vector machine, the coefficients in a linear regression or logistic regression.
Question: What is the difference between supervised and unsupervised machine learning?
Ans: Supervised learning requires labeled training data. You should know which data point belongs to which class or has what label. Unsupervised learning, on the other hand, does not require labeling data.
Question: What is the difference between L1 and L2 regularization?
Ans: L1 regularization is more binary -- many variables are assigned a 1 or a 0 in weighting. It is like setting a Laplacian prior on the terms On the other hand, L2 regularization tends to spread the error among all the terms and corresponds to a Gaussian prior.
Question: What are the Assumptions of Naive Bayes?
Ans: No assumption in Naive Bayes. We treat each feature independently and equal. This also means we can't apply PCA in Naive Bayes and we consider weightage of each feature equal.
Next->