R Tutorial (Part 1): The Basics

Published

January 10, 2025

How Does This Tutorial Work?

This tutorial is intended for students who have never worked with R and RStudio before, or who wish to refresh their memory by starting with the absolute basics again. The tutorial will guide you through the fundamentals of the statistics program R step by step. Using a statistical software such as R is required in various contexts related to studying and research. Thus, R is an important tool for empirical work, which will not only help you to understand statistical concepts better, but will also allow you to carry out statistical analyses yourself.

In this tutorial we will explain to you the structure and fundamental functioning of R and RStudio. You will find boxes of different colors throughout the tutorial:

Note

In the blue boxes you will find useful hints, which explain the work with R and RStudio in more detail.

Tip

In the green boxes we deal with further topics on the functionality of R and RStudio. If you want, you can skip these boxes at the first run, so that it is easier for you to focus on the basics.

Warning

In the yellow boxes we point you to pitfalls and possible sources of error. Try to memorize these points especially so that R always does what you want.

Try it out yourself!

In the orange boxes you will find a number of exercises. Please do all the exercises yourself on your own computer.

How to Use This Tutorial

Learning R is just like learning a new language. You have to repeat the content again and again until the learned can be applied in practice independently. Don’t be too strict on yourself and don’t expect to have already internalized all the content after working through the tutorial for the first time. Our tutorial is intended to encourage you to try out the presented content immediately. Therefore, we have repeatedly supplemented the text and our examples with images and small GIFs (short videos without sound) that explain how to use the program. The exercises presented in the red boxes are also a central part of the tutorial. In the first of these boxes, we will prompt you to install R and RStudio on your own computer. Our experience shows that mastering R and RStudio is only successful if you work with it yourself from the very beginning and revise the contents regularly.

Please do it! It will definitely be worth it!

1 First steps in R and R-Studio

R, as we use it, consists of two programs: R and RStudio. R and RStudio are not the same, but build on each other.

What is R?

programming language with a focus on statistics, open-source and free
The “engine”, the program that performs all our calculations
Can do anything we need (and much more)
Should be updated regularly

What is RStudio?

Additional program (editor) for easier use of R, open-source and free
Accesses R in the background, therefore it does not work without having R installed
Can do anything R can, but is more user-friendly
Should be updated regularly

1.1 Installation

Note

If you are using a private computer it is helpful and necessary to know how to install programs on it. The individual steps depend on your respective computer and operating system. Since there are a lot of different computer and operating system models, we can’t offer a detailed guide for each variant for the program installation.

Please familiarize yourself with your device so that you can also install and use programs (“apps”) such as R and RStudio.

The following steps give a small overview of the most common ways to install the program:

Installation on Computers with Windows Operating System

Installation files are usually downloaded from Windows as an executable file with the .exe file extension. Double-clicking on this file will start the installer and you only have to follow the instructions on the screen.

Installation on Computers with Mac Operating System

On a Mac, there are two variants for downloaded installation files, where the following steps differ. You can recognize these variants by the respective file extension, which is either “.pkg” (e.g. the program R) or “.dmg” (e.g. the program RStudio).

File Extension pkg

Double-click on the downloaded installation file in the “Downloads” folder (it is displayed as an open postal package).
Follow the instructions displayed on the screen.

File Extension dmg

Downloading the DMG file: Most programs are downloaded from the manufacturer’s website as .dmg file. This file is usually found in the “Downloads” folder or on the desktop.
Open the DMG file: Double-click the downloaded DMG file to mount it. A new window with the program icon appears. If no window appears, check your desktop - a “virtual drive” is displayed there.
Drag-and-drop: Drag the program icon into the “Program Files” folder in the Finder. This step copies the application firmly to your system.
Remove the virtual drive: Right-click the virtual drive on the desktop after copying and select “Eject” or press CMD+E to safely remove the drive.
Security queries: When you open an application that has been downloaded from the Internet, macOS may display a security warning. Confirm that you trust the application by clicking “Open”.

Installation on Computers with Linux Operating System

If you’re using Linux as an operating system, you probably don’t need a guide of this kind… ;-)

To install R, click [here] (https://posit.co/download/rstudio-desktop/) and then click 1: Install R. Then we have to select the correct version for our operating system, download and install it (like any other program).

To install RStudio afterwards, click [here] (https://posit.co/download/rstudio-desktop/) and then click 2: Install RStudio. This should automatically download the correct version for our operating system, which we can then install (like any other program).

Try it out yourself!

Install R and RStudio on Your Computer.

Note: Unfortunately, an installation on tablets with iOS or Android operating system is not possible. Therefore, you need a laptop or desktop computer. Whether you’re a Mac, Windows or Linux user, doesn’t matter.

Updating R and RStudio

R and RStudio should be updated regularly to ensure that we use the latest features and security updates. RStudio usually automatically reminds us when an update is available. You can use this reminder from RStudio to also install the latest version of R. Unfortunately, there is no automatic reminder from R itself.

Neither R nor RStudio can be updated automatically. To update R and RStudio, we simply go back to [the website from above] (https://posit.co/download/rstudio-desktop/) and download the latest version, as if we were installing the program for the first time.

R can be easily installed on Mac via the existing version. On Windows we need to manually uninstall the old version (e.g. via Start > Settings > Apps > Apps & Features), if we want to prevent multiple versions of R from being installed at the same time. Normally, RStudio will automatically find the latest version of R installed on your computer if you close RStudio completely and reopen it again.

RStudio can normally be installed on both Mac and Windows via the existing version without having to manually delete the old version.

After installing R and RStudio, we only work with RStudio. R itself does not have to be opened at any time (if you want, you can delete the desktop shortcut to R directly to avoid confusion with RStudio).

Next we open RStudio by clicking on the RStudio shortcut on the desktop, in the Start menu on Windows or in the Finder under Programs on Mac. When we open RStudio for the first time, the window is divided into the following three areas:

1.2 R as a Calculator

After opening RStudio, we can directly enter and run R commands/code in the window called Console (bottom left). To get started, we use R like a calculator to execute simple calculations. Type…

1 + 8

…in the Console and confirm the input using the Enter- or Return key (↵).

Note

All R commands that we execute in the Console or scripts (we will find out what that is later) ausführen, will be in grey boxes in this tutorial.

1 + 8

[1] 9

Right below these boxes is the output (the result) of the executed command, which appears in the console in the next line after pressing the ↵ key. Before the result, you will always see a [1] displayed. We can ignore why the reason for this for now.

You can copy the code in the gray boxes by hovering the top right corner of the box and clicking on the copy icon. Then you can insert the code in your console (or script) with the command command + v on Mac or Ctrl + v on Windows.

Copy Code from the Website

Using the symbols on our keyboard we can execute the most important calculation types:

Symbol on the Keyboard	Calculation/Operation
`+`	Addition
`-`	Subtraction
`*`	Multiplication
`/`	Division
`^`	Exponentiation
`.`	Decimal point
`(` and `)`	Structure of the calculation (Parentheses)
`sqrt()`	Square root
`log()`	(Natural) Logarithm

Try it out!

7 / 8

Solution

[1] 0.875

1.6 * 7

Solution

[1] 11.2

log(54)

Solution

[1] 3.988984

3^2

Solution

[1] 9

(3 + 6) * 4

Solution

[1] 36

3 + 7 * 4

Solution

[1] 31

1.3 Logical Comparisons

We can compare two numbers in R and determine whether they are equal or unequal. We can also check whether a number is larger or smaller than another number. Such a question is then answered in R with either Yes (TRUE) or No (FALSE). Such comparisons are called logical comparison and are not limited to numbers (more about this later).

For example, if we’re interested in whether the number 7 is greater than the number 3, we can find out with the code 7 > 3:

7 > 3

[1] TRUE

Since 7 is actually larger than 3, R gives us the response TRUE.

Symbol on the Keyboard	Calculation or Operation
`==`	Equal
`!=`	Not equal
`>` or `<`	Greater or less than
`>=` or `<=`	Greater or equal to, or less than or equal to

Warning

To check if two numbers are identical, you have to use the operator == and get the return value TRUE (yes, the numbers are identical) or FALSE (no, the numbers are not identical). The individual = assigns a value to a variable (more about that later).

Try it out!

8 > 7

Solution

[1] TRUE

3 > 4

Solution

[1] FALSE

3 <= 4

Solution

[1] TRUE

4 >= 4

Solution

[1] TRUE

6 == 7

Solution

[1] FALSE

6 != 7

Solution

[1] TRUE

8 != 8

Solution

[1] FALSE

Combined comparisons with logical AND or logical OR

In addition to the simple logical comparisons from above, it is also possible to combine several logical comparisons. As a link, the logical AND as well as the logical OR are particularly relevant.

Symbol on the Keyboard	Operation	Meaning
`&`	logical AND	Are both comparisons true?
`\|`	logical OR	Is at least one of the comparisons true?

In response to a combined logical comparison you get the return value TRUE or FALSE again. To make sure that R evaluates all symbols in the order we want, it can be useful to place brackets.

Example 1:
Is the number 7 greater than 5 AND is the number 9 less than 8?

(7 > 5) & (9 < 8)

[1] FALSE

Example 2:
Is the number 7 greater than 5 OR is the number 9 smaller than 8 (or both)?

(7 > 5) | (9 < 8)

[1] TRUE

1.4 Data Types Part 1

In the following we want to introduce you to the most important data types that we will encounter in our work with R.

numeric, integer, double

As we have seen in section 1.2, we can handle the data type “number” in R. For this type of variable, the term numeric is used. In some situations, R differentiates further whether it is an integer number (integer) or a decimal number (double). With each of these three data types, mathematical calculations can be made. For our applications, it usually doesn’t matter whether a number of R is understood as numeric, integer or double. Therefore, we will not discuss the differences in detail.

logical

In section 1.3 we received an information as response to a logical comparison, which can have only two values: TRUE or FALSE. This information has the data type logical in R. If the data type logical is stored for a value, R knows that only these two values are possible.

Note

In fact, some mathematical calculations are also possible with the data type logical. How is that possible? In R, the logical value TRUE is understood as $1$, the value FALSE as 0. Thus, the calculation TRUE $+$ TRUE would have to result in $2$:

TRUE + TRUE

[1] 2

This property of logical values may seem meaningless at first. However, we can make use of it on many occasions. For example, for any given amount of logical values, we could simply add up all the elements to find out how many elements of this set have the value TRUE.

character (string)

Another important function of R is handling text, which can vary greatly in length—from a single letter to an entire book. Because the order of letters is crucial for readability, text is referred to as strings. In R, the terms character and string are used interchangeably.

Due to the variable length of text, it is necessary to clearly mark where a string begins and ends. This is done using quotation marks "" placed around the text.

"This is my text"

[1] "This is my text"

Text cannot, of course, be used to make meaningful mathematical calculations, which is why the attempt leads to an error message that part of the calculation is not a number (non-numeric):

"This is my text" + 1

Error in "This is my text" + 1: non-numeric argument to binary operator

1.5 Assignments

Everything we work with in R is referred to as an object. For example, the result of a logical comparison is an object of type logical. Often, we don’t want to create objects just once and immediately use them; instead, we may want to perform further calculations with them later. For this, we can consciously create objects with names we assign, using what is called an assignment. This allows us to reuse the created and named objects without needing to re-execute the original command.

The assignment is done using the assignment arrow <- (a “lesser than”, directly followed by a “minus”). To the left of the arrow is the object name, which we later use to access the contents of an object. To the right of the arrow is the operation, which provides us as a result the content we want to save. Once we have created the object, it can be found in the Environment window (top right in RStudio). Here you can also find useful information about the object, such as its data type.

An Example

We want to run the operation $\sqrt{x}$ für $x = 7$ once and save the result as an object called “A”.

Typically, the programming process is:

Determine the object name
Assignment arrow
Operation

A <- sqrt(7)

Assignment

If we want to find out if the assignment has worked and the object A now has the number value of the square root of $7, we can simply run the object A. That is, we enter A after the assignment into the Console and press the Enter key:

A <- sqrt(7)
A

[1] 2.645751

If we now only execute the command to the right of the assignment, we see as the result the same number value:

sqrt(7)

[1] 2.645751

Note

If we execute the assignment command in the Console, not will be displayed which specific value has been assigned exactly. With the assignment

A <- sqrt(7)

we have only instructed R to perform the assignment. We have not given the command to display the specific value. The content of the object is only displayed when we either type in and run the object on the left of the assignment arrow A or the command on the right of the assigment arrow sqrt(7) in the console.

[1] 2.645751

sqrt(7)

[1] 2.645751

An object can also be overwritten by assigning a new content to it with the assignment arrow. However, the old object is then lost. Therefore, when assigning an object, we have to be aware of whether we’re already using the selected name elsewhere and whether we really want to overwrite the object for good. An overview can be found in the Environment window, because all created objects are listed there.

It is recommended to choose clear and meaningful object names, so you intuitively know the content of the object. E.g., pers_ID <- 123456 for assigning a person ID to an object.

Note

The naming of objects is left to the user’s judgment. Assign the names that best describe the content of the named object for you. However, some characters are not allowed. An object name may usually not contain spaces or hyphens and not start with a digit or an underscore.

Allowed:

my_object <- 4
my.object <- 4
my2ndobject <- 4
objectNumber2 <- 4

Nicht Erlaubt:

my object <- 4
my-object <- 4
2ndObject <- 4
_objectNumber2 <- 4

Reserved Object Names

In addition to the above rules, there are some specific words that are not allowed as object names because they are reserved for special objects. You can display the list of all reserved words using the following command.

?Reserved

Try it out!

Calculate $16^3$ and assign the result to the object $z$.
Calculate the square root of $z$ uand assign the result to the object $y$.
Calculate the natural logarithm of $y$ and assign the result to the object $x$.
Perform steps 1 - 3 without assigning intermediate results in one line of code and compare the result with the value stored in $x$.

Solution

z <- 16 ^ 3
y <- sqrt(z)
x <- log(y)
x

[1] 4.158883

log(sqrt(16 ^ 3))

[1] 4.158883

Of course it is also possible to assign any other data type to an object. For example, if we save text in an object, this object will have the type character. We can find information on the data type contained in an object in the Environment window.

Create a Character Object

2 Reproducible work with RStudio

2.1 R Scripts

So far, we have programmed in the Console. In theory, this would be sufficient to use all functions of R. However, this approach is not recommended! As soon as we close RStudio, all our calculations are lost and are not easily reproducible.

Using R in the Console is like shouting individual tasks to our butler James while decorating the ballroom. James immediately carries out each task as we call it out. A more efficient approach would be to think through everything we want to accomplish, write a list of tasks, and hand it to James. He would then complete each task in the specified order.

The advantage of this approach is that we can revisit the list the next day to see exactly which steps were needed to decorate the ballroom in exactly the same way as before.

Such a procedure is also efficient when using R. The list of work assignments for our Butler R is called Script.

Create a Script

To create a new script, we click on File > New File > R Script.

Alternatively, we click on and choose R Script. The new script will open in the upper left area.

Create a Script

When we first opened RStudio after the installation, wthe screen was divided into three areas. After creating a script, a fourth area appears at the top left:

RStudio Fenster anpassen

The following GIF shows how we can change the size of the individual windows in RStudio.

Change Window Sizes

You can also arrange the Console on the right side of the screen next to the script. This can be useful if we want to see both the script and the console window unfolded large. To arrange the console to the right we go to the menu bar > Tools > Globlal Options > Pane Layout. Here we can select the window arrangement and confirm with Apply.

Arrange Console on the Right

In the script, we can now write all the commands that we would like to execute. Unlike in the Console, however, pressing the `Enter’ key does not execute the command, but it simply results in a line break and we can write another command in the next line.

The script is a collection of commands that are performed to reach a specific goal in a specified order (line by line from top to bottom).

If we want to run a line of the script, we can place the cursor in that line (e.g., by clicking somewhere in the line with the mouse) and then:

press the key combination Ctrl + Enter (or Ctrl + Return on some keyboards). On a Mac, Command + Enter also works.
click the icon in the top right corner of the script window.

If we want to run multiple lines or specific parts of one or more lines, we can highlight the desired commands and then execute them using either option 1 (Ctrl + Enter) or option 2 (clicking on )

Execute a Script

Try it out!

Open a new script in RStudio
Copy the following lines and paste them into the new script.
```
x <- 4
y <- 5
x + y # Addition 1
x <- x + y
x + y # Addition 2
```
Select all lines in the script and run the code with Ctrl-Enter (Mac: command-Enter). What do you notice?
Solution
```
x <- 4
y <- 5
x + y # Addition 1
```
```
[1] 9
```
```
x <- x + y
x + y # Addition 2
```
```
[1] 14
```
Addition 1 and 2 have different results, because the value of object x is changed between the two calculations.
Swap the third (x + y) and fourth line (x <- x + y), select the entire code again and execute it. What do you notice?
Solution
```
x <- 4
y <- 5
x <- x + y
x + y # Addition 1
```
```
[1] 14
```
```
x + y # Addition 2
```
```
[1] 14
```
Addition 1 and 2 now have the same results because the value x was changed before the first addition. Although the same commands were executed in sum, the order of the commands makes a crucial difference.

Save Script

To save a script, we have several options:

In the menu bar: File > Save
Using the key combination Strg + s (Windows and Mac) or command + s (Mac only)
In R Studio below the symbol in the upper left corner.

Comments

In many cases, it makes sense to include comments in our script so that other people or we can later understand what was calculated.

Comments are written after a # character. When executing the code, R knows that this is a comment and does not execute the line.

# calculating 4 to the power of 5:
4 ^ 5

[1] 1024

For comments that go over more than one line, there must be a # at the beginning of each line. It is also possible to write a comment directly behind an executable command:

4 ^ 5 # calculating 4 to the power of 5

[1] 1024

We can see that only the code before the # is executed.

What are “useful” comments?

When you’re just learning R, it makes sense to comment on what the commands in your script do. The more familiar you get with R, the less you will need such comments, because the information what your code does is in the commands themselves. (After all, you could always find out how the commands work by researching them later). Therefore, it is more useful to comment on why a certain code is necessary to achieve the goal of your analysis (and why you didn’t choose another way). Such information is not immediately apparent from the code itself and it is quite amazing how quickly you forget what you were thinking about when you were programming something.

2.2 Dealing with the Workspace

When we work in R, we usually create different objects (e.g. x <- 42), which are then displayed in our Environment window in RStudio. At some point we are finished with our statistical analysis for today and want to close RStudio. If we close RStudio, we are asked by the program if we want to save the Workspace, which contains all our R objects from the Environment window (and a few other things).

Unlike our R script, we do not want to save the Workspace (in 99% of cases)!

Why don’t we save the workspace?

Only what is written in our R script is reproducible! If we have tried things in the R Console, deleted or adapted commands in our script, we don’t know exactly how the objects came about in the Environment.
Therefore, it is good practice to force ourselves to include all essential steps for the analysis in the script by starting with a “fresh” working environment (= empty Environment) every time we work with R.

If we saved the Workspace when closing RStudio, all R objects from the previous session would automatically reload upon reopening. While this might seem convenient, it often causes issues in practice.

Adjusting Workspace Settings

In RStudio’s settings, we can specify that…

no workspace should be loaded when opening, and
when closing RStudio the workspace is not saved and we are no longer asked about it.

To do this we click on Tools > Global Options and select the settings marked blue in the picture below.

Try it out!

Adjust the workspace settings in RStudio as described.
Create at least one R object, e.g. with the command x <- 10. Close RStudio and note that you are no longer asked if you want to save the workspace (if you have made changes to an R script, you will still be asked if you want to save the script).
Reopen RStudio and confirm that the previously created R object no longer exists. You can either try to display the object (e.g. by entering the name of the object in the Console and pressing Enter) or you can make sure that the Environment window in RStudio is empty.

2.3 Restarting R in the background

Even while working in RStudio, it makes sense to reset R , and create a “fresh working environment” (= empty Environment) regularly. We want to make sure that the code in our R script works as expected and that we can actually reproduce all essential analysis steps.

Instead of closing and reopening RStudio, we can instead restart R in the background by either…

clicking on Session > Restart R or
pressing the key combination command + Shift + 0 für Mac oder Strg + Shift + F10 für Windows verwenden.

After that, we can either…

manually re-run our code in the script step by step, or
use the key combination command + Shift + Enter (Mac) or Strg + Shift + Enter (Windows) to execute the complete currently opened script.

Try it out!

Create a new R script containing only the following three lines:
```
x <- 1
y <- 2
x + y
```
Run the three commands in a row in the script line by line.
Enter the following command directly in the Console: y <- x
Re-run the third line of the script and compare the result with the result that one would expect only knowing what’s in the script.
Restart R in the background by using the described key combination.
Re-execute the complete script by using the described key combination. Check the result.

3 Working with Data

3.1 Functions

Arguments

Functions perform operations for us by giving them certain input values, so-called Arguments. This is comparable to the mathematical functions that specify which calculations are to be made with the Argument $x$ to get a result $y$, e.g.

Functions perform operations for us by accepting specific input values, known as arguments. This is similar to mathematical functions, which specify the calculations to be performed with the argument $x$ to produce a result $y$, e.g.,:

\[ f(x) = x^2 + 5 \] Similarly, functions in R can process one or more arguments and return results such as mean values, sums or more complex operations.

An Example

One relatively simple function is the exponential function $e^x$. Its only argument is the exponent.

Using the exponential function in R for $x = 1$:

exp(1)

[1] 2.718282

It is often useful to use a result of functions as an argument for another function. These functions are then nested in a certain order.

An Example

To calculate $e^{\sqrt{x}}$ we would first calculate the square root of $x$ with the function sqrt(x) and then insert the result as an argument in the exponential function exp():

Warning

The order of nesting plays a role, e.g.,:

$e^{\sqrt{1}} \neq \sqrt{e^{1}}$

exp(sqrt(1))

[1] 2.718282

sqrt(exp(1))

[1] 1.648721

In mathematics, it is also possible to process more than one argument in one function, such as in the equation for a plane in three-dimensional space:

\[ f(x, y) = 2x + 3y \] Similarly, in R, many functions require multiple arguments. These arguments are separated by commas within a command and have distinct names to differentiate them. Some arguments have default settings, which R uses if no value is specified for that argument.

An Example

This becomes clearer using the example of the round(x, digits) function, which rounds a number to any number of decimal places. As the first argument, we pass the number to be rounded to the function, as the second argument, the number of decimal places. The name of the argument that determines the number of decimal places is digits. The default (the default if you don’t pass an argument for digits) is zero. So the number will be rounded to zero decimal places if the digits argument is not specified.

The round(4.12345, digits = 2) function rounds the number 4.12345 to two decimal places, i.e. to 4.12.

Do arguments have to be named?

You will notice as we proceed that arguments can be specified in two ways. In the example above, round(4.12345, digits = 2), the first argument, the number to be rounded (4.12345), is provided unnamed, while the second argument, the number of desired decimal places (2), is provided named with digits =.

In principle, both approaches are possible for any argument, but they will only produce the same result if the expected order of arguments for the command is followed. If arguments are left unnamed, the command cannot distinguish whether 2 refers to the number to be rounded or the number of decimal places. We can clarify this for the command by adhering to the order of arguments specified in the corresponding help page (see section 3.1.2).

The process becomes less error-prone and significantly clearer if we follow the following convention:

The first argument of a command (the main argument with which the command works) is not named.
All further arguments that we want to specify are named with the corresponding name (e.g. digits =).

Try it out!

round(4.12345, 2)

[1] 4.12

yields the same result as

round(x = 4.12345, digits = 2)

[1] 4.12

On the other hand,

round(2, 4.12345)

[1] 2

does not lead to the desired result.

The command assumes that the number 2 should be rounded to 4.12345 decimal places. While specifying decimal places for an integer makes no sense, the command resolves this by simply ignoring the decimal places and attempting to round 2 to four decimal places instead.

By naming the arguments, we could do without the correct order:

round(digits = 2, x = 4.12345)

[1] 4.12

Although this reversal of the order would technically work, it is rather uncommon. A widely accepted style convention suggests that the main argument of the command, which is expected in the first position, should remain unnamed, while all subsequent arguments can be provided in any order but should be explicitly named:

round(4.12345, digits = 2)

[1] 4.12

Functions without Arguments

In R there are also functions without any argument, such as:

Sys.time() # the point in time at which this command was executed

[1] "2024-12-10 16:44:36 CET"

On a technical level, whenever something happens in R (calculating, displaying, or processing any kind of information), a function is executed.

Help

If we want to understand how a function works, such as the names of its arguments, we need to consult the help documentation. The help section can be found in the “multifunction window” at the bottom right, under the Help tab.

We can open the help window even faster by typing in a question mark right before the function’s name in the Console (e.g., ?log) and executing the command.

On the help page, we can see under Description or Usage that the log() function, by default, calculates the natural logarithm (which is almost always the one needed in statistics). However, we could specify a different base using the base argument.

Try it out!

Find out why why the following calculation prints 0as result using the help function.

round(0.5, digits = 0)

[1] 0

Solution

On the help page for ?round, under Details, it says:

Note that for rounding off a 5, the IEC 60559 standard (see also ‘IEEE 754’) is expected to be used, ‘go to the even digit’. Therefore round(0.5) is 0 and round(-1.5) is -2.

The help in R is only useful if we already know the name of a function. But what do we do if we don’t know the name of a function (or have forgotten it)? In other words, how can we find out in practice which commands or functions in R can help us with a specific task?

Typically, we enter a question formulated as precisely as possible into the search engine of our choice.

Oftentimes, such a search reveals that a useful function for us is included in a specific R package.

Packages

When we install R, a large number of functions are already available in the base version. However, it is possible to significantly extend the functionality through so-called packages. Most of these packages are continuously maintained and expanded by a large group of developers and are freely available to all R users.

Often, packages are created by researchers for a specific purpose and contain a series of commands designed to build upon each other and help solve a typical problem.

If we want to use commands from a package, we first need to download (“install”) the package. This can be done either with the command install.packages("PACKAGENAME") or by using the Packages tab in the bottom-right panel of RStudio.

Installing Packages

In both cases, we need to know the exact name of the package. For example, if we enter the name incorrectly in the command install.packages("PACKAGENAME") (or forget the quotation marks), the command will fail:

install.packages("Psych") # the package is called "psych" with a lowercase p

Warning: package 'Psych' is not available for this version of R

A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages

Warning: Perhaps you meant 'psych' ?

Caution

The error message…
Warning: package 'Psych' is not available for this version of R
…does not mean that our version of R is outdated!
In this case, we simply made a typo (R is correct: there is no package called ‘Psych’ for our version of R, but there is one called ‘psych’).

Just as we only need to install programs on our computer once, packages also generally need to be installed only once to be used repeatedly. However, after an update to R (not RStudio), it is often necessary to reinstall previously installed packages.

To use the commands from a package, we first need to make it available to R. This process is called loading the package. The best way to do this is with the command…

library(PACKAGENAME)

… which we normally write in the very beginning of our script, so we don’t forget to load the necessary packages.

Of course, the command must not only be in the script, it also has to be executed!

Note

A package usually only has to be installed once, but after each restart of R or RStudio it needs to be loaded again so that we can use the commands in it.

An Example

For our example, we want to use the so-called logistic function:

\[ f(x) = \frac{1}{1 + e^{-x}} \]

Since we read online that the package psych includes a command logistic() that can compute this function, we will install and load this package. Before the package is installed and loaded, we cannot use the command it provides.

logistic(0)

Error in logistic(0): could not find function "logistic"

Therefore, we install the package once, as described above, via the Packages tab or using the following code:

# install once:
install.packages("psych")

If installing the package worked, we can now load the package:

# load:
library(psych)

and use the command logistic()

logistic(0)

[1] 0.5

Try it out!

If you haven’t already completed the steps described above, try using the function logistic() before installing and loading the psych package.
Then, install the package as described above.
Load the package as described above.

Execute the following commands:

logistic(0)
logit(0.5) # this is the inverse function

Close RStudio and try running the commands from step 4 again.

Solution

Without loading the package, it won’t work:

logistic(0)

Error in logistic(0): could not find function "logistic"

logit(0.5) # this is the inverse function

Error in logit(0.5): could not find function "logit"

Only if we load the package in the script beforehand, it works:

library(psych)
logistic(0)

[1] 0.5

logit(0.5) # this is the inverse function

[1] 0

Caution

When installing a package using the function install.packages("PACKAGENAME"), the package name must be in quotation marks. However, when loading it with the function library(PACKAGENAME), quotation marks are not required.

Note

When we open a script that attempts to load a package that has not yet been installed, RStudio notifies us. At the top of the script, a message appears: “Package PACKAGENAME required but is not installed. Install Don’t Show Again”. Here, we can simply click on Install to install the package directly.

Writing Custom Functions

If we need a function that has not yet been implemented in R and is not included in any package, it is also possible to write our own functions.

This allows us to execute specific code repeatedly without having to rewrite (or copy) the entire code each time.

Functions are also objects and can be stored in variables. We then call the function using these variables. A function is structured as follows:

function_name <- function(<function parameters>) {
    # Code
}

To call a function, you add () after the function name.

Now, we can write a simple function that executes the logistic function described above when called:

logistic_function <- function(number) {
    1 / (1 + exp(-number))
}

This code does not produce any output yet because, although we have written the function, we have not called it. To execute the function, we use its name and add () with the argument the function should work with (just like with the functions we’ve used so far).

When writing the function, we specified in the () after function that it should work with an object called number. In the next line, within the {}, we performed the operation ($\frac{1}{1 + e^{-x}}$) using this object number, which defines the operation our function is intended to perform.

A brief test of the function shows that it does what it’s supposed to do:

logistic_function(number = 0)

[1] 0.5

Most functions work with multiple arguments. For example, we can write a function that returns the sum of two numbers (essentially recreating the existing sum() function). We use the two placeholders x and y as arguments, which will later be replaced by the numbers we want to add when we call the function:

my_sum <- function(x, y) {
    x + y
}

my_sum (x = 3, y = 5)

[1] 8

Try it out!

Write a function square() that takes a number as an argument and returns its square.
Solution
```
square <- function(x) {
  x * x
}

square(x = 3)
```
Write a function logit_function() that calculates the inverse of the logistic function: \[ f(y) = ln(\frac{y}{1 - y}) \]
Solution
```
logit_function <- function(number) {
  log(number / (1 - number))
}

logit_function(number = 0.5)
```

3.2 Data Structures

Simple Data Structures: Vectors

So far, we have only stored a single value when assigning an object. However, we often want to work with a whole set of values. To store a series of values of the same data type, we use vectors. You’ve probably encountered the term “vector” in math class.

If we want to combine multiple components (e.g., numbers) into a vector in R, we use the function c().
As we learned earlier with function arguments, the individual values are separated by commas within the function.

As we know from math class, we can also perform calculations with vectors. For example, we can subtract two vectors of the same length from each other

\[ \vec{v} = \begin{pmatrix} 3 \\ 8 \\ 5 \end{pmatrix} - \begin{pmatrix} 1 \\ 5 \\ 2 \end{pmatrix} = \begin{pmatrix} 3 - 1 \\ 8 - 5\\ 5 - 2 \end{pmatrix} = \begin{pmatrix} 2 \\ 3 \\ 3 \end{pmatrix} \] or subtract a specific number from each element of a vector

\[ \vec{v} = \begin{pmatrix} 3 \\ 8 \\ 5 \end{pmatrix} - 2 = \begin{pmatrix} 3 - 2 \\ 8 - 2\\ 5 - 2 \end{pmatrix} = \begin{pmatrix} 1 \\ 6 \\ 3 \end{pmatrix} \]

In R, we first need to create the desired vector using c() and can then perform calculations with it as usual:

c(3, 8, 5) - 2

[1] 1 6 3

As we can see, vectors in R are always displayed as rows, but this makes no difference for our purposes.

For longer vectors, it can be useful to first store the vector in an object and then perform the operation using that object.

# Abspeichern eines Vektors in einem Objekt
mein_Vektor <- c(3, 8, 5)
mein_Vektor - 2

[1] 1 6 3

Both approaches lead to the same result, so it’s up to us which one we choose.

In math class, you have probably only encountered vectors with numbers, which can be used for calculations. However, the structure of a vector is useful for many purposes. A vector, in general terms, is an object that contains any number of elements of the same type in a fixed order. As we will see, it is also very useful to group multiple elements into a vector for other data types (such as logical values or characters) in many situations.

A vector always has exactly one data type. For example, a vector of type logical can be created in much the same way as with numbers, using the c() command:

c(TRUE, FALSE, TRUE)

[1]  TRUE FALSE  TRUE

The “identifier” for logical values is that all TRUE and FALSE values are spelled correctly (i.e., all uppercase with no typos).

Creating a vector of type character works in exactly the same way. Here, the “identifier” for character values is the use of "" around each element:

c("Homer", "Marge", "Bart", "Lisa", "Maggie")

[1] "Homer"  "Marge"  "Bart"   "Lisa"   "Maggie"

Caution

Different data types cannot be mixed in a vector.
If we include different data types in a vector, R tries to find a common denominator, which usually results in the type character.

c(3, "word", TRUE)

[1] "3"    "word" "TRUE"

When we run this code and display the “mixed” vector, we see that all elements are enclosed in "". This indicates that the number 3 and the logical value TRUE have now been converted to character values, losing their original properties.

The conversion of data types often happens without R issuing a warning, making it easy for us as users to overlook. However, converting data to character can result in the loss of certain properties. For example, we can no longer perform mathematical operations, such as addition, on the character value "3".

Try it out!

Assign the birth year of three (fictional) people to the vector birth_year.
How old did all three people turn in 2023?
Assign the names of the people from (1) to the vector names in the same order.
Solution

Solution
```
birth_year <- c(1998, 2002, 1988)

2023 - birth_year
```
::: {.cell-output .cell-output-stdout}
```
[1] 25 21 35
```
:::
```
names <- c("Markus", "Philipp", "Moritz")
names
```
::: {.cell-output .cell-output-stdout}
```
[1] "Markus"  "Philipp" "Moritz" 
```
::: :::

How Does This Tutorial Work?

How to Use This Tutorial

1 First steps in R and R-Studio

What is R?

What is RStudio?

1.1 Installation

File Extension pkg

File Extension dmg

1.2 R as a Calculator

1.3 Logical Comparisons

1.4 Data Types Part 1

numeric, integer, double

logical

character (string)

1.5 Assignments

An Example

2 Reproducible work with RStudio

2.1 R Scripts

Create a Script

Save Script

Comments

2.2 Dealing with the Workspace

Adjusting Workspace Settings

2.3 Restarting R in the background

3 Working with Data

3.1 Functions

Arguments

An Example

An Example

An Example

Help

Packages

An Example

3.2 Data Structures

Simple Data Structures: Vectors

Solution

More Complex Data Structures: Data Frames and Lists

data.frame

list

Indexing: Accessing Elements of an Object

Indexing Vectors

Indexing Data Frames

Indexing Data Frames and Lists using $

Conditional Indexing of Data Frames

3.3 Practical Functions for Data Frames

3.4 Data Types Part 2

Factors

An Example

Missing Values

An Example

An Example

4 Common Mistakes

Indexing Data Frames and Lists using `$`