# Implement Commonly asked ML algorithm in the interview from scratch

This article is to make sure you don’t fail at those simple shit during the interview. For example, as a machine learning engineer, you might get this interview questions could you explain how K-means works?

**Supervised Training**

- KNN

2. Logistic Regression

In logistic regression, we want to predict **probabilistic values** by applying sigmoid function to linear regression model. During the training, the cost function we choose usually is **cross entropy error**.

3. Linear Regression

In linear regression, we want to predict **continuous values** by fitting a training data with a linear function. During the training, the cost function we choose usually is **mean square error**.

3.Perceptron

**Perceptron** is very simple binary classifier with **linear decision boundary**, which is composed of a **linear regression(weighted input)** plus **activation** **function**. During the training, the cost function we choose usually is **hinge loss**. In fact, **a neural network with a single neuron** is the same as linear regression! The only difference is the neural network post-processes the weighted input with an activation function [7].

4.Decision Tree

**Decision Tree** is supervised machine learning algorithm. Idea is to build a **binary tree** to split our data. Unlike logistic regression, there’s **no need to calculate gradient** in decision tree algorithm to determine what best DT looks like. They use **greedy search** to explore all possible features and for each feature, to explore all unique possible value to determine threshold with help of **entropy** and i**nformation gain**. **Entropy is loss** that normally we used for DT [8].

5.Random Forest

6.AdaBoost(Gradient Boosting)

7.Naive Bayes ->For DS role, interviewers somehow like to ask this probabilities questions maybe because they think it makes them looks so smart but .. just prepare you know Naive Bayes, dumb ass.

**Bayes Theorem**

**Unsupervised Training**

- k-means clustering

2. PCA (dimension reduction)

Fundamental **Statistic** Concepts [6]:

**Events**could be**independent**,**dependent**, and**mutually exclusive**.**Probability**is the likelihood of an**event**occurring.**Joint probability**is the likelihood of**more than one event**occurring at the**same time**P(A and B). It is the probability of the**intersection**of two or more events written as**p(A ∩ B)**. If event A and event B are independent to each other,**P(A∩B) = P(A) * P(B)**.- If event A and event B are
**mutually exclusive, P(A∩B) = 0.** **Conditional probability**is probability of an**event**is the probability that an*B***event**, if event A is dependent to event B.*A*has already occurred**P(B|A).**

Classic T**extbook Questions** that might be asked during ML interview:

**Generative models**vs**Discriminative models**[5]

- Can you explain what’s batch normalisation?
- Can you explain what’s bias and variance error?

It’s a way to decompose our machine learning models’s error.

Basic **Terms** we use a lot in ML algorithm:

**Epochs**: number of passes of the entire training dataset that ml algorithm completed or consumed. For example,**one epoch**means an entire training dataset is passed into ml algorithm**only once**. [1][2]**Batch Size**: Because**one epoch**is too big to feed into computer**at once**, which will raise memory error, we divide dataset into several**batches**. So, batch size is number of training data given into a single batch.**Iterations**: number of batches needed to complete**one epoch**. For example, there’re 2000 training examples in our entire training dataset and batch size is assigned to 500, then iterations is 4. If the batch size is the whole training dataset then the number of epochs is the number of iterations.**Gradient**: Normally, It refers to**the derivative of cost function with respect to parameters**such as weight and bias in machine learning.**Cost Function**: to tell us**how good our model is at making predictions for a given parameters(weight and bias for example)**. In order to have own**gradients**, this function should be**differentiable**with respect to parameters.**Gradient Descents**: It’s an iterative optimizer algorithm**to get minimum of cost**. There’s a couple of optimzers, such as SGD, Adam, and etc. [3]**Learning rate**: it’s an important parameter of most ml algorithm. Basically, it tells us**how far we go in negative gradient direction in each step**. For example, if you choose smaller learning rate, it might be slow but reach out to our minimum. On the contrary, if you choose larger learning rate, it might be fast but jump around and never reach out to our minimum.**Adaptive learning rate**: Principally, adaptive learning rate algorithm such as**AdaGrad(Accumulating Historical Gradients)**,**Adam**, they**automatically**reduce the learning rate by some factor every few epochs(Put in simple words, we hope in the beginning of training, it’s larger LR; in the end of training, it’s smaller LR.**LR cannot be one-size-fits-all: Giving different parameter different LR**)**Normalizing**: It’s an important**data pre-processing**to ensure all values of input data**are within the same range**to speed up and stabilize process of calculating gradient [4]. For example, normalize our features so that they are all in the range -1 to 1. In practice, min-max scaling in practice.**Regularization**: It’s a way to prevent overfitting to reduce variance of NN by having a limitation on weight(It’s called weight decay)such as L1 and L2, which make NN is more robust to noise error.**Euclidean Distance**: square root of the sum over the squared distances.

Ref:

[2].https://towardsdatascience.com/epoch-vs-iterations-vs-batch-size-4dfb9c7ce9c9

[3].https://ml-cheatsheet.readthedocs.io/en/latest/optimizers.html#sgd

[4].https://ml-cheatsheet.readthedocs.io/en/latest/linear_regression.html#normalization

[5].https://medium.com/better-programming/generative-vs-discriminative-models-d26def8fd64a

[6].https://medium.com/@mlengineer/joint-probability-vs-conditional-probability-fa2d47d95c4a

[7].https://ml-cheatsheet.readthedocs.io/en/latest/nn_concepts.html?highlight=perceptron#loss-functions

[8].https://towardsdatascience.com/entropy-how-decision-trees-make-decisions-2946b9c18c8

[9]. Stanford Machine learning cheet sheet: https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-deep-learning-tips-and-tricks#data-processing