Topic 3: Importing data into R

Suriyati Ujang

Week 5

In this topic, you will learn about :

  1. Importing from flat files with utils base package
  1. Importing Excel data file
  1. Importing data from other statistical software
  1. Exporting data

Data Wrangling

Data wrangling is the process of converting raw data into a usable form.

Some examples of data wrangling include:

  1. Merging multiple

    data sources into a single dataset for analysis

  2. Identifying gaps

    in data (for example, empty cells in a spreadsheet) and either filling or deleting them

  3. Deleting data

    that’s either unnecessary or irrelevant to the project you’re working on

  4. Identifying extreme outliers

    in data and either explaining the discrepancies or removing them so that analysis can take place

Packages

Flat files with utils base package

Flat files are simple text files where data is stored in a tabular format, such as CSV (Comma-Separated Values) files.

Function Usage
read.table( ) Reads a file in table format and creates a data frame from it
read.csv( ) Where commas are used as separators and periods are used as decimals
read.csv2( ) Where semicolons are used as separators and commas are used as decimals.
read.delim( ) When numbers in your file use periods as decimals.
read.delim2( ) When numbers in your file use commas as decimals
read.fwf( ) Read a table of fixed width formatted data into a data.frame.

Example

About 80 cereal products with their dietary characteristics.

View original data

# Load the haven package
library(haven)

# Read data from a csv file (.csv format)
my_data <- read.csv2(
  "https://perso.telecom-paristech.fr/eagan/class/as2013/inf229/data/cereal.csv")

# Print the first few rows of the data
head(my_data)
                       name         mfr        type calories protein fat sodium
1                    String Categorical Categorical      Int     Int Int    Int
2                 100% Bran           N           C       70       4   1    130
3         100% Natural Bran           Q           C      120       3   5     15
4                  All-Bran           K           C       70       4   1    260
5 All-Bran with Extra Fiber           K           C       50       4   0    140
6            Almond Delight           R           C      110       2   2    200
  fiber carbo sugars potass vitamins shelf weight  cups    rating
1 Float Float    Int    Int      Int   Int  Float Float     Float
2    10     5      6    280       25     3      1  0.33 68.402973
3     2     8      8    135        0     3      1     1 33.983679
4     9     7      5    320       25     3      1  0.33 59.425505
5    14     8      0    330       25     3      1   0.5 93.704912
6     1    14      8     -1       25     3      1  0.75 34.384843

Importing Excel data file

R provides several packages that allow you to read data from Excel files.

Package: readxl

Function: read_excel()

# Load the readxl package
library(readxl)

# Read data from an Excel file 
my_data <- read_excel(
  "data.xlsx", sheet = "Sheet1")

# Print the first few rows 
print(head(my_data))

Package: openxlsx

Function: loadWorkbook() and readWorkbook()

# Load the openxlsx package
library(openxlsx)

# Read data from an Excel file
my_wb <- loadWorkbook("data.xlsx")
my_data <- readWorkbook(
  my_wb, sheet = "Sheet1")

# Print the first few rows of the data
print(head(my_data))

Importing data from other statistical software

reading data from other statistical software formats such as SPSS, Stata, and SAS.

# Load the foreign package (usually not required as it comes with base R)
library(foreign)

# Read data from an SPSS file (.sav format)
my_data <- read.spss("data.sav")

# Print the first few rows of the data
print(head(my_data))

reading data from SPSS and Stata file formats. It provides more consistent handling of data types and value labels.

# Load the haven package
library(haven)

# Read data from a Stata file (.dta format)
my_data <- read_dta("data.dta")

# Print the first few rows of the data
print(head(my_data))

supports reading data from SPSS, Stata, and SAS file formats.

# Load the readstat package
library(readstat)

# Read data from a SAS file (.sas7bdat format)
my_data <- read_sas("data.sas7bdat")

# Print the first few rows of the data
print(head(my_data))

Exporting data

R provides several functions and packages for exporting data to various file formats. Exporting data is essential when you want to save the results of your data analysis, share data with others, or use the data in other applications.

  1. write.table( ) and write.csv( )
  2. write.xlsx( )
  3. write_dta( )
  4. write_sav( )

1. write.table( ) and write.csv( ):

These functions are used to write data frames to text files in tabular format, such as CSV files.

Example: Exporting Data to CSV

# Create a sample data frame
my_data <- data.frame(
  Name = c("John", "Jane","Mike"),
  Age = c(25, 30, 22),
  Score = c(80, 85, 70)
)

# Export data to CSV file
write.csv(my_data, 
          file = "data.csv", 
          row.names = FALSE)

working directory

2. write.xlsx( ):

The openxlsx package provides the write.xlsx( ) function to export data frames to Excel files (.xlsx format).

Example: Exporting Data to Excel

# Load the openxlsx package
library(openxlsx)

# Export data to Excel file
write.xlsx(my_data, file = "data.xlsx")

3. write_dta( ):

The haven package provides the write_dta( ) function to export data frames to Stata files (.dta format).

Example: Exporting Data to Stata

# Load the haven package
library(haven)

# Export data to Stata file
write_dta(my_data, file = "data.dta")

4. write_sav( ):

The haven package also provides the write_sav( ) function to export data frames to SPSS files (.sav format).

Example: Exporting Data to SPSS

# Load the haven package
library(haven)

# Export data to SPSS file
write_sav(my_data, file = "data.sav")

Thank you