In the previous tutorial we learnt how to download and Install the R programming language and the RStudio IDE. To work with any dataset, we have to import it into our IDE. In this article, I’m going to show you how to Import the Dataset in R, so we can start preparing it to make Machine Learning Models.
Getting the Dataset
I have prepared the dataset that we are going to use in this tutorial. It’s just a simple dataset that we will use for the whole data preprocessing phase. I have also included a data_preprocessing.R file, and this file contains the template that we are using to prepare our data for Machine Learning. Both of these files are in a zip file. To download the dataset and the template, click here.
Paste the downloaded files into your preferred working directory. By working directory, I mean the folder in which you are going to keep all your project files. I placed my data.csv file in a folder on the desktop. I named this folder, “Data Science”. This is the folder that I’m going to navigate to and set it as the working directory in RStudio.
To import our Dataset, first, we have to set a working directory. To do this, go to ‘Files’, at the bottom left corner of RStudio and navigate to where you stored the ‘Data.csv’ file.
Select the folder in which you stored your dataset. Under ‘More‘, select ‘Set as working directory‘
Once inside this directory, open a new R file in RStudio and name it ‘data_preprocessing.R’. Save this file in the same directory. In this file, we will need only one line of code to import our data set from the ‘.csv’ file.
Importing the Dataset
We need to create a variable that will be the data set itself, we will call this variable ‘dataset‘. Type dataset = read.csv(‘Data.csv’) . Select the line you just added and then press the ‘Ctrl + Enter’ Keys on your keyboard to execute the command.
Under the ‘Environment’ section, you should see the data set you imported. Double-click on it and you’ll see the data set in RStudio.
As you can see, here the indexes start at 1 unlike in Python where they start at zero. We have imported our Dataset.
On to the Next One 😉
Let’s continue with Taking care of Missing Data in the next article. Please share this article. Remember to subscribe, to get notified every time we post. Thank you.