Hello and welcome to this R tutorial here we are at the final round of regression with our final regression

model random Forest’s regression.

In a previous section we saw the decision tree regression model.

So now the decision tree regression doesn’t have any secret for you then you will perfectly understand

ran M-4s regression because the random forest is just a team of decision trees each one making some

prediction of your dependent variable and the ultimate prediction of the random first itself is simply

the average of the different predictions of all the different trees in the forest.

And actually at the end of the previous section about decision trees.

I asked you an enigma.

The Enigma was knowing the result we got.

With one tree what would be the result with ten trees or 100 trees or 500 trees in terms of visualization

and interim of prediction.

So I hope that after watching the intuition to toys made by Karylle you actually ask yourself this question

and try to predict what’s going to happen here with random first regression.

So let’s find out about that.

We are going to build a random first regression model and see what happens.

So let’s do it.

We are going to start by selecting the right father as a working directory.

So it’s in part two regression.

And here is the final regression model we are building.

Them for a regression.

So let’s go inside and that’s the rifle that we want to set as working directory with the position services

file.

So let’s click on this more button and set as a working directory.

All good.

And now let’s take our regression template to build this model efficiently.

So we are actually going to take everything from here to the bottom but we will only include this code

section to visualize the regression model results because you understood that the decision tree regression

model is a non-continuous regression model and since random forest is a combination of decision trees

then it’s a combination of non-continuous regression model and intuitively we understand.

We can guess that the Ranum for us regression will is not going to be continuous either.

So since this code doesn’t work for non continuous regression model we will actually use this one that

works perfectly for it.

So I’m going to copy this paste that here and remove this section that is non appropriate for non-continuous

regression models.

Here we go.

And now the template is ready.

Let’s change the basics.

Let’s replace here regression all by random forest regression.

Visualizing the run for US regression results and fitting random forest regression to our data set.

OK great.

So now let’s build the model which is in this section here.

So let’s remove this.

And as usual we’re going to import the right library for the job and then use a function to build our

Random forest regressors.

So the package you are going to import is called Ranum forest.

So for those of you who don’t have the package installed your packages here.

Well you can check it out.

Mine is already installed because I used it before but I’m going to write this line here.

For those of you who need to install it so install dot packages parenthesis and in quotes random so

no capital R but then capital F o r s t.

All right.

Ranum forest.

And so I’m not going to install it because my needs are in style so I’m going to put down comments.

But if you want to install it you just need to select this line as I just did and press command control

press enter to execute it.

And this will install the package properly.

But here I’m going to put in comment by pressing command plus shift Blassie.

Here we go.

And now when we have to do is to add this you know Library random forest to actually automatically select

the box here to import automatically to run him for his package when we execute the whole code or the

section.

So that’s important and now time to build the aggressor so let’s do it.

We’re going to call the aggressor regressors as usual to keep things simple and equals.

And now the function that we’re going to use is also random forest written the same.

So now let’s add some parenthesis and now the press one to have a look at the arguments.

The arguments are here and the first argument is data but as you can see it specifies that it’s an optional

data frame and we could use this argument to build our regressors but we are going to use the main arguments

to specify the independent variables on one side and the dependent variable and another side.

And to do this we are going to use these two arguments x and y.

So X will contain the matrix of features that is the independent variables and y will contain the dependent

variable vector.

That is the sorry column.

So let’s first input these two arguments to the first argument is X equals and so we have several ways

to take our independent variables.

So one of the way is to take our dataset here and then choose to write columns of the independent variables

and you know our dataset is composed of two columns the first column indexed by one which is the independent

variable column and the second column indexed by 2 which is our dependent variable column.

So here we need index 1 because we want to take the independent variable right now next argument the

next argument is why the dependent variable vector.

And now as you can see why is expected to be a response vector.

It’s actually a vector and here it expected to have a data frame.

So by using this one index and two brackets here I actually import a data from that here to get a vector

I actually need to use another trick another technique which is know to use the dollar sign and then

the name of the column which is of course salary.

And that will give me a vector.

So just to recap this syntax here will give me a data frame because we’re taking some sub data frame

of our original data from data set and here by using the dollar sign syntax here taking data said Doris

unsorry.

I’m actually taking the salary column of our data dataset but that will make it a vector and that’s

exactly what we run because the Y argument here is expecting a vector.

So we’re all good.

And now we actually need to input a third argument.

Can you guess what that is.

For those of you all the Python tutorial Well you will guess what it’s going to be.

It’s actually going to be entry the number of trees in the forest.

Well of course we’re building around them forests so it’s actually a lot better if we can choose the

number of trees that we build in our forest and it’s even better considering the fact that we’re going

to play around with different number of trees that is we’re going to start with 10 trees with a forest

of 10 trees and then you know we’ll try with a lot more than 10 trees like 100 trees or 300 trees or

500 trees.

So that’s what we’re going to input the third argument entry and we’re going to start with 10 trees

All right so let’s start with this and that’s all the arguments we need to build around forest.

We only need independent variables.

The dependent variable and the number of trees and that will already make a robust Ranum for us regression

model and then we will make it even more robust by adding more trees in the forest.

But before we continue let’s set the Ranum factors to something fixed so that we all get the same results

So you know in Python We used a random set parameter equal to zero.

Here we can do the same on or by using the set dot seed function.

And then in this function we actually put a seed and you know we can use whatever we want in Bison we

usually take zero 42 and we like to do is you know take either 1 2 3 or 1 2 3 4.

So let’s use the seed to get the same result.

And that’s what made this tutorial easier to follow if you’re coding at the same time.

So now we’re all good we’re actually all good with the whole code.

We don’t have anything to replace.

The only thing that will do now is to you know try several Rudham for us with several number of trees

and look at the visualization results and look at the prediction to see if we’re getting close to the

supposed 160 K per year salary of our new employee that is about to be hired.

So let’s do it.

Let’s execute the sections one by one.

So let’s import the data set first.

Here we go.

David said Well important we make sure we have our two columns the independent variable level and the

dependent variable salary.

Perfect.

Now no need to split the dataset into 20 sets in the test set.

No need to apply feature scaling and now time to create our first random forest.

So let’s do this let’s execute this code section here and here it is random for us.

Well created perfect.

So now it’s time to have fun.

Would you like to vizualize the result first or getting the prediction.

Well first let’s maybe visualize the results because we want to make sure we have the right model and

we want to validate it because we will try several number of trees.

Here we are starting with centuries.

So we want to see if it looks like a correct model.

So I’m going to execute this section.

Here we go and let’s see what we’ll get.

OK so first of all this looks fine.

We don’t seem to have any problem here.

The only thing that we can improve very quickly is actually you know those straight lines here.

There are supposed to be vertical and to get a better representation of this.

We just need to increase the resolution as we did for Decision Tree regression.

So let’s add on one that will be sufficient and let’s re execute this and now much better it almost

looks like it’s some vertical straight lines representing better than non-continuous.

And so now what can we say.

Let’s zoom on this plot to have a better look at.

Now that’s Interbrand.

OK.

So the answer to the enigma that I asked you in the previous section and that I was asking you again

in this tutorial is that we simply get more steps in the stairs by having several decision trees instead

of one decision tree.

We have a lot more steps in the stairs than what we had with one decision tree and therefore we have

a lot more of splits of the whole range of levels and therefore a lot more intervals of the different

levels so each straight horizontal line here separate by these vertical lines or one interval that is

one split and the fact that we get more steps in the stairs is actually quite intuitive because you

know if we get for example this prediction here for the 6.5 level Well what happened for this prediction

is that we had 10 trees voting on which step the salary of the 6.5 level position would be.

And then the Ranum for us took the average of all the different predictions of the salary of the 6.5

level made by all the different trees in the forest.

And for example if we take the fourth position level 10 votes were made each of these 10 votes correspond

to one prediction of the level for salary made by each one of those ten trees and then to run them for

us took the average of these 10 predictions and this average is nothing else than the prediction of

the level for salary made by the random forest itself.

And so we get more steps because simply the whole range of levels is split into more intervals and that

is because the random forest is calculating many different averages of its decision trees predictions

in each of these intervals.

So that’s what happened it’s quite intuitive.

However there is something important to point out here is that if we add a lot more trees in our random

forest Well it doesn’t mean we’ll get a lot more steps on the stairs because the more you add some trees

the more the average of the different predictions made by the trees is converging to the same average

You know this is based on the same technique entropy and information gain.

So the more you add trees the more the average of these votes will converge to the same ultimate average

and therefore it will converge to some certain shape of stairs here.

So that’s important to visualize this as well.

And now since we have our intuition of the visualization of the run for US regression one day.

Let’s see what happens with the prediction.

So let’s see what prediction we get.

Remember that disemployed said that it’s pretty sorry was a 160 k.

And now let’s see what’s around for us composed of 10 trees.

So let’s look at that and it says that a previous Saori was a hundred and forty one thousand dollars

That’s actually a very dangerous prediction because we are way below the 160 K Sellery that this new

entry is set to have in its previous company.

So if we trust this prediction we will actually think this employee’s bluffing but no worries we will

not stop here.

Right now we’re going to try run first with a lot more than 10 trees.

So let’s pick for example 100 trees and let’s see what we’ll get.

So I’m going to rebuild the model.

Here we go.

And now let’s look at the graphic results.

And as I was telling you we don’t get much more steps in this plot or new random for us regression.

You know we multiplied our number of trees by 10 but the number of steps was definitely not multiply

by 10.

We compare we can compare that very quickly.

This is the previous plot and this is a new one.

Ten one hundred trees we can see that we have maybe a little more steps but definitely not ten times

the previous steps.

So the reason for this the explanation is related to this convergence idea that I talked to you about

And so what changes here with 100 trees in terms of the plot is not the number of steps there was increased

but a better choice a better location of the steps and the stairs with respect to our salary access

That means that maybe the steps are better located to make our ultimate predictions of the salaries

of each of our level from 1 to 10 incremented by on one.

So to check that out we simply need to make our final prediction to predict the salary of the 6.5 level

So let’s recap the employer is saying 160 k around and random forest was dently said a hundred 41 k

And now let’s see what say around M-4s with 100 trees exited and now it says 166 K..

So much better.

We’re getting close to the supposed real salary of 160 K and besides we’re never actually on the good

side of negotiation because we will no longer think that this employee is bluffing.

So since the prediction seems to be improving as we increase the number of trees let’s actually try

with 500 trees that’s a huge forest.

We have now.

So let’s execute this to build our new huge forest of 500 trees.

Here we go.

Ufer has created.

Let’s have a quick look at the visualization plot results.

But it’s going to be the same thing we will not get a lot of more stares maybe a little more.

Well actually let’s check it out.

Well definitely not.

We seem to have the same number of steps on the stairs.

But as I was telling you each of the steps in this series might actually be better located to make each

ultimate prediction of the salaries for each of the 10 levels here.

So the best way to check that out is actually to get our ultimate prediction of the sorry of this 6.5

level.

And let’s check it out.

Let’s see if we get a better prediction than the 166 OK.

Executing.

And right on the spot we hit the bull’s eye with 160 458 predicted sorry.

So awesome job that just running for US with 500 trees just dead here because it predicted almost the

same sorry as to suppose to 160 K salary.

The disputer simply said to have and it’s Breese company and actually so far before we made this run

for us with 500 trees the best model that made the closest prediction to this 160 get salary was a polynomial

regression model and now the Ranum for us regression is beating the polynomial regression model because

now we get a prediction that is almost the same as a real value.

So right in the spot.

Congratulations.

We actually made our final model and now I just want to conclude this tutorial by making this transition

to one of our future point which is actually part 10 in part and we will build some essential machinery

model.

There is some models that are a combination of several machinery models and you know in machinery these

are actually the best models.

You know when you have a team of several machine models they can actually make an awesome prediction

because unless we have a nine time machine running model in argumentation of machinery models that is

the only model to be right.

Well you are more likely to get the correct prediction with ten machine learning models predicting the

same thing than was just one model.

So that’s actually what we did here.

Well we had a team of same machine learning models which were decision tree regression models.

But in the future we’ll make a team of different machine only models.

So that’s going to be very fun.

That’s going to be very powerful as well.

And I look forward to getting there with you.

So now I’m telling you congratulations for two things first for building this very powerful regression

model to run for US regression model and second for having build all our regression models.

We built some linear regression models some non linear regression models some non-linear are non-continuous

regression models and some non-linear or non-continuous and simple regression models.

So congratulations you’re definitely on your way to becoming some expert in machine learning.

But wait for what’s coming next.

So speaking of what’s coming next I look forward to seeing you in the next sections or next parts.

And until then enjoy mission early.