Mastering Vectors in R: A Foundation for Efficient Data Manipulation
Unpaid
Prepared By
DataScienceEra Community Team
Web: https://datascienceera.com/
Introduction
Welcome to “Mastering Vectors in R: A Foundation for Efficient Data Manipulation.” In the realm of R programming, vectors serve as the bedrock upon which data manipulation and analysis are built. Understanding and harnessing the power of vectors is crucial for anyone venturing into the world of R.
In this comprehensive session, we will embark on a journey through the intricacies of vectors, the cornerstone data structure in R programming. By the end of our exploration, you’ll possess a firm grasp of how vectors work, their diverse types, operations that can be performed on them, and their pivotal role in data analysis and programming logic.
Lecture Objectives:
- Understanding Vectors: Define what vectors are in R and their significance in programming.
- Types of Vectors: Explore different types of vectors (numeric, character, logical) and their usage.
- Vector Operations: Learn basic operations like indexing, subsetting, and arithmetic on vectors.
- Vector Functions: Introduce key functions for manipulating and working with vectors in R.
- Applications: Discuss real-world applications and scenarios where vectors are utilized in data analysis and programming.
So let’s Get started!
The Definition of a Vector:
Vectors are fundamental data structures in R used to store elements of the same data type in an ordered sequence.
Significance:
Understanding the role of vectors in R is crucial. They offer efficiency in handling data and performing operations seamlessly.
Suppose we have the such data
ID | Age | Gender | Salary |
---|---|---|---|
1 | 20 | Male | $12,000 |
2 | 25 | Female | $20,000 |
3 | 35 | Female | $30,000 |
4 | 40 | Male | $2,000 |
5 | 42 | Female | $10,000 |
Now we will store each variable’s data separately; this is how we can create a vector.
# Creating a ID vector
ID <- seq(1, 5)
print(ID)
[1] 1 2 3 4 5
In this code snippet written in R, we’re creating an “ID” vector using a function called ‘seq()’. Let’s break it down:
ID <- seq(1, 5)
: This line sets up a variable named ‘ID’ and uses the ‘seq()’ function to generate a sequence of numbers.seq(1, 5)
: The ‘seq()’ function makes a sequence of numbers starting from 1 to 5. In R, ‘seq()’ helps create a sequence based on the provided starting and ending values.print(ID)
: Lastly, the ‘print()’ function is used to show the content of the ‘ID’ vector. This vector holds the sequence of numbers from 1 to 5.
This code segment’s purpose is to make a series of ID numbers from 1 to 5 and then display that sequence in the R console or output window.
# Creating a Age vector
Age <- c(20, 25, 35, 40, 42)
print(Age)
[1] 20 25 35 40 42
In this R code snippet, we’re dealing with the creation of an “Age” vector using the ‘c()’ function. Let’s break it down:
Age <- c(20, 25, 35, 40, 42)
: This line creates a variable called ‘Age’ and uses ‘c()’ to gather specific ages into a vector.c(20, 25, 35, 40, 42)
: The ‘c()’ function helps bring together these ages (20, 25, 35, 40, 42) into the ‘Age’ vector.print(Age)
: Lastly, ‘print()’ is used to show what’s inside the ‘Age’ vector. It displays the series of ages we gathered.
This code snippet’s main goal is to form a collection of ages (20, 25, 35, 40, 42) and then exhibit this collection in the R console or output window.
# Creating Gender vector
Gender<-c("Male","Female","Female","Male","Female")
print(Gender)
[1] “Male” “Female” “Female” “Male” “Female”
This R code snippet is all about crafting a “Gender” vector using the ‘c()’ function. Let’s break it down:
Gender <- c(“Male”, “Female”, “Female”, “Male”, “Female”)
: This line initializes a ‘Gender’ variable and uses ‘c()’ to make a vector.c(“Male”, “Female”, “Female”, “Male”, “Female”)
: The ‘c()’ function gathers these words (“Male”, “Female”, “Female”, “Male”, “Female”) and places them in the ‘Gender’ vector.print(Gender)
: Lastly, ‘print()’ is used to display what’s inside the ‘Gender’ vector. It shows the different gender identities we gathered.
This code snippet’s purpose in R is to create a list named ‘Gender’. This list holds different gender identities: “Male”, “Female”, “Female”, “Male”, “Female”. The ‘print()’ function helps display this list in the R console or output window.
# Creating a vector of Salary variable
Salary <- c(12000, 20000, 30000, 2000, 10000)
print(Salary)
[1] 12000 20000 30000 2000 10000
This snippet of R code is dedicated to generating a “Salary” vector using the ‘c()’ function. Let’s break it down:
Salary <- c(12000, 20000, 30000, 2000, 10000)
: This line initializes a variable named ‘Salary’ and uses the ‘c()’ function to create a vector.c(12000, 20000, 30000, 2000, 10000)
: The ‘c()’ function combines the provided numerical values (12000, 20000, 30000, 2000, 10000) into the ‘Salary’ vector.print(Salary)
: Finally, the ‘print()’ function is used to display the content of the ‘Salary’ vector, which contains the specified salary values (12000, 20000, 30000, 2000, 10000).
This code snippet in R is used to create a vector named ‘Salary’ that holds a series of salary values: 12000, 20000, 30000, 2000, 10000. The ‘print()’ function then showcases the contents of this vector in the R console or output window.
Syntax:
Demonstrating various ways to create vectors:
- Using the
c()
function to concatenate elements into a vector. - Sequence generation with
seq()
to create sequences of numbers. - Repeating values with
rep()
to generate vectors with repeated elements.
Types of Vectors
1. Numeric vector
Detailing the creation of numeric vectors and their operations like addition, subtraction, multiplication, and division. Explaining how to generate sequences of numbers and use mathematical functions on numeric vectors.
# Numeric vector operations
Salary_addition <- Salary + 2
print(Salary_addition)
[1] 12002 20002 30002 2002 10002
In R, you can do cool stuff with numbers in vectors. Let’s explore this code snippet:
Salary_addition <- Salary + 2
: This line makes a new bunch of numbers called ‘Salary_addition’. It takes each number from the ‘Salary’ group and adds 2 to it.
The ‘Salary_addition’ group will have the original ‘Salary’ values increased by 2. Here’s what it looks like after adding 2 to each number: [1] 12002 20002 30002 2002 10002
In R, ‘print()’ is like a magic spell to show what’s in ‘Salary_addition’. It displays the new salaries after adding 2 to each original salary. This trick is handy when you want to change all the numbers in a group by the same amount, like giving everyone a raise by a fixed number of dollars.
# Vector Multiplication
Salary_multiplication <- Salary * 3
print(Salary_multiplication)
[1] 36000 60000 90000 6000 30000
In R, vector multiplication allows you to multiply each element in a vector by a specified value.
Salary_multiplication <- Salary * 3
: This line creates a new vector called ‘Salary_multiplication’ by multiplying each value in the ‘Salary’ vector by 3.
# vector division
Salary_division <- Salary / 2
print(Salary_division)
[1] 6000 10000 15000 1000 5000
In R, vector division allows you to divide each element in a vector by a specified value.
Salary_division <- Salary / 2
: This line creates a new vector called ‘Salary_division’ by dividing each value in the ‘Salary’ vector by 2.
# Applying mathematical functions
Salary_result <- sqrt(Salary)
print(Salary_result)
[1] 109.54451 141.42136 173.20508 44.72136 100.00000
In R, mathematical functions can be applied to vectors to perform operations on each element.
Salary_result <- sqrt(Salary)
: This line creates a new vector named ‘Salary_result’ by applying the square root function to each value in the ‘Salary’ vector.
# Exponentiation of a vector
Salary_result_power <- Salary_result ^ 2
print(Salary_result_power)
[1] 12000 20000 30000 2000 10000
In R, you can perform exponentiation on vectors by raising each element to a specific power.
Salary_result_power <- Salary_result ^ 2
: This line creates a new vector named ‘Salary_result_power’ by squaring each value in the ‘Salary_result’ vector. Here are the resulting values in the ‘Salary_result_power’ vector:
# Taking the natural logarithm of a vector
Salary_log <- log(Salary)
print(Salary_log)
[1] 9.392662 9.903488 10.308953 7.600902 9.210340
In R, you can compute the natural logarithm of elements in a vector using the logarithm function.
Salary_log <- log(Salary)
: This line creates a new vector named ‘Salary_log’ by taking the natural logarithm of each value in the ‘Salary’ vector.
# Absolute value of a vector
Salary_abs <- abs(Salary)
print(Salary_abs)
[1] 12000 20000 30000 2000 10000
In R, you can calculate the absolute value of elements in a vector using the ‘abs()’ function.
Salary_abs <- abs(Salary)
: This line creates a new vector named ‘Salary_abs’ containing the absolute values of each element in the ‘Salary’ vector.
2. Character Vector
- Exploring the creation of character vectors to store text data and perform operations like concatenation and manipulation of strings.
- Discussing the importance of character vectors in handling categorical data.
# Creating a character vector
Gender<-c("Male","Female","Female","Male","Female")
print(Gender)
[1] “Male” “Female” “Female” “Male” “Female”
# Concatenating strings in character vectors
Gender_string <- paste(Gender, collapse = ", ")
print(Gender_string)
[1] “Male, Female, Female, Male, Female”
Creating a Character Vector
In R, a character vector stores text values, such as categories like ‘Male’ and ‘Female’.
Gender <- c(“Male”, “Female”, “Female”, “Male”, “Female”)
: This line creates a character vector named ‘Gender’ containing different gender labels. Here are the values stored in the ‘Gender’ vector:
Concatenating Strings in Character Vectors
The ‘paste()’ function in R is used to concatenate strings within a vector.
Gender_string <- paste(Gender, collapse = “,”)
: This line creates a new string named ‘Gender_string’ by joining the elements of the ‘Gender’ vector into a single string, separated by a comma and space. Here is the concatenated string in ‘Gender_string’:
# Handling categorical data with character vectors
categorical_data <- factor(Gender)
print(categorical_data)
[1] Male Female Female Male Female Levels: Female Male
Handling Categorical Data
In R, categorical data (like gender labels ‘Male’ and ‘Female’) can be represented as factors, which are useful for statistical analysis and modeling.
categorical_data <- factor(Gender)
: This line converts the character vector ‘Gender’ into a categorical factor named ‘categorical_data’. Here are the levels/categories represented in the ‘categorical_data’ factor:
The ‘factor()’ function in R converts the character vector ‘Gender’ into a factor by identifying unique values and assigning them as levels/categories.
Reasons for converting categorical data to factors include efficient memory usage, better performance in statistical modeling, and clearer representation of categorical variables for analysis. Here, the resulting ‘categorical_data’ factor has two levels: ‘Male’ and ‘Female’, representing the unique categories in the original ‘Gender’ vector.
3.Logical Vectors
- Introducing logical vectors that contain Boolean values (TRUE or FALSE) and their application in conditions, filtering, and logical operations.
- Demonstrating logical operations such as AND, OR, NOT on logical vectors.
- Look at the below dataset again. Here we add another column based on whether the respondent got the increment or not.
List of Logical Components of Vector
Logical Component | Symbol | Example in R |
---|---|---|
AND (Conjunction) | && | TRUE && FALSE |
OR (Disjunction) | || | TRUE || FALSE |
NOT (Negation) | ! | !TRUE |
XOR (Exclusive OR) | ^ | TRUE ^ FALSE |
Equality | == | 5 == 5 |
Inequality | != | 5 != 3 |
Greater Than | > | 10 > 5 |
Less Than | < | 3 < 7 |
Greater Than or Equal To | >= | 6 >= 6 |
Less Than or Equal To | <= | 4 <= 5 |
Suppose we will store a column, named as increment like this data
ID | Age | Gender | Salary | Increment |
---|---|---|---|---|
1 | 20 | Male | 12000 | TRUE |
2 | 25 | Female | 20000 | FALSE |
3 | 35 | Female | 30000 | TRUE |
4 | 40 | Male | 2000 | TRUE |
5 | 42 | Female | 10000 | FALSE |
Salary <- c(12000, 20000, 30000, 2000, 10000)
print(Salary)
[1] 12000 20000 30000 2000 10000
# Creating a logical vector
logical_vector <- c(TRUE, FALSE, TRUE, TRUE, FALSE)
print(logical_vector)
[1] TRUE FALSE TRUE TRUE FALSE
A logical vector in R stores values that are either ‘TRUE’ or ‘FALSE’, representing logical conditions.
logical_vector <- c(TRUE, FALSE, TRUE, TRUE, FALSE)
: This line creates a logical vector named ‘logical_vector’ containing logical values.
Logical vectors are used to represent logical conditions, often used for filtering or making decisions in programming.
# Adding conditions based on salary
# Create a logical vector based on the condition (Salary greater than 12000 or equal to 12000)
Salary_logical <- Salary > 12000 | Salary == 12000
# Print the logical vector
# Or operator
print(Salary_logical)
[1] TRUE TRUE TRUE FALSE FALSE
Adding Conditions Based on Salary
In R, you can create logical vectors based on specific conditions applied to numeric vectors like salaries.
Salary_logical <- Salary > 12000 | Salary == 12000
: This line creates a logical vector named ‘Salary_logical’ based on the condition that checks if salaries are greater than 12000 or equal to 12000. Here are the resulting logical values stored in the ‘Salary_logical’ vector:
The ‘OR’ operator (|) in R evaluates the logical ‘OR’ condition between two expressions: ‘Salary > 12000’ and ‘Salary == 12000’, producing a logical vector based on these conditions. The resulting ‘Salary_logical’ vector contains ‘TRUE’ for salaries greater than 12000 or equal to 12000, and ‘FALSE’ otherwise.
# AND operator
And_operation <- Salary > 12000 & Salary == 12000
print(And_operation)
[1] FALSE FALSE FALSE FALSE FALSE
Using the AND Operator
In R, the ‘AND’ operator (‘&’) evaluates logical ‘AND’ conditions between two expressions.
And_operation <- Salary > 12000 & Salary == 12000
: This line creates a logical vector named ‘And_operation’ based on the condition that checks if salaries are simultaneously greater than 12000 and equal to 12000. Here are the resulting logical values stored in the ‘And_operation’ vector:
The ‘AND’ operator (&) in R evaluates the logical ‘AND’ condition between ‘Salary > 12000’ and ‘Salary == 12000’, producing a logical vector based on these conditions. The resulting ‘And_operation’ vector contains ‘FALSE’ for all entries because salaries cannot be both greater than 12000 and equal to 12000 simultaneously.
# Create a logical vector based on the condition (Salary not equal to 12000)
Salary_logical <- Salary != 12000
# Print the logical vector
print(Salary_logical)
[1] FALSE TRUE TRUE TRUE TRUE
Creating a Logical Vector Based on Salary
In R, you can create a logical vector based on a specific condition applied to a numeric vector like salaries.
Salary_logical <- Salary != 12000
: This line creates a logical vector named ‘Salary_logical’ based on the condition that checks if salaries are not equal to 12000. Here are the resulting logical values stored in the ‘Salary_logical’ vector:
The ‘!=’ operator in R checks for inequality. In this case, ‘Salary != 12000’ evaluates whether each salary value is not equal to 12000. The resulting ‘Salary_logical’ vector contains ‘TRUE’ for entries where the salary is not equal to 12000 and ‘FALSE’ where the salary is equal to 12000.
Vector Operations: Indexing and Subsetting
Explaining Access to Vector Elements
- Accessing elements of a vector using indices and logical conditions.
- Discussing the use of square brackets [ ] for indexing and subsetting vectors.
Accessing Elements Using Indices
# Accessing elements using indices
element_ID<-ID[3] # refers to accessing the third element in the vector named ID
print(element_ID)
[1] 3
This code snippet retrieves the value of the third element in the ‘ID’ vector and stores it in the variable ‘element_ID’. Subsequently, the ‘print()’ function displays the value of this element.
element_age <- Age[3] # refers to accessing the third element in the vector named Age
print(element_age)
[1] 35
element_Gender <- Gender[3] # refers to accessing the third element in the vector named Gender
print(element_Gender)
[1] “Female”
element_Salary <- Salary[3] # refers to accessing the third element in the vector named Gender
print(element_Salary)
[1] 30000
element_Salary_logical <- Salary_logical[3] # refers to accessing the third element in the vector named Gender
print(element_Salary_logical)
[1] TRUE
In R, you can retrieve data of a specific respondent by indexing across multiple vectors. For instance:
# Accessing data of the third respondent using indexing
third_respondent <- c(ID[3], Age[3], Gender[3], Salary[3], Salary_logical[3])
print(third_respondent)
[1] “3” “35” “Female” “30000” “TRUE”
This code snippet creates a vector ‘third_respondent’ by collecting elements from various vectors (‘ID’, ‘Age’, ‘Gender’, ‘Salary’, ‘Salary_logical’) using their respective indices [3]. It constructs a collection of data representing the third respondent.
Subsetting with Logical Conditions
# Subsetting with logical conditions
subset_vector1 <- Age[Age > 30]
print(subset_vector1)
[1] 35 40 42
In R, you can create subsets of data from a vector based on logical conditions. For example:
# Subsetting with logical conditions
## Adding multiple Conditions
subset_vector2 <- Age[Age > 30 & Salary<=12000 ]
print(subset_vector2)
[1] 40 42
This code snippet generates a subset ‘subset_vector1’ of the ‘Age’ vector, containing values that are greater than 30. It uses a logical condition (Age > 30) to filter elements and retain only those meeting the specified criterion.
Vector Arithmetic
Demonstrating Element-wise Operations
- Element-wise operations between vectors, including addition, subtraction, multiplication, and division.
- Highlighting the importance of vectors of equal length for arithmetic operations.
Suppose we want to create a column ‘Total Income’ based on the ‘Salary’.
ID | Age | Gender | Salary | Increment | Total Income |
---|---|---|---|---|---|
1 | 20 | Male | 12000 | TRUE | 3200 |
2 | 25 | Female | 20000 | FALSE | 22000 |
3 | 35 | Female | 30000 | TRUE | 33000 |
4 | 40 | Male | 2000 | TRUE | 2000 |
5 | 42 | Female | 10000 | FALSE | 10000 |
Element-wise Arithmetic Operations
In R, you can perform element-wise arithmetic operations based on specific conditions within vectors.
# Element-wise arithmetic operations
Salary <- c(12000, 20000, 30000, 2000, 10000)
# Suppose we want to give increment when Salary is greater than 12000 or equal to 12000
## we will multiple 10% (0.1) of the original Salary
# Create a logical vector based on the condition (Salary greater than 12000 or equal to 12000)
salary_condition <- Salary >= 12000
# Initialize the Increment vector as zero
Increment <- rep(0, length(Salary))
# Calculate increment for elements meeting the condition
Increment[salary_condition] <- Salary[salary_condition] * 0.1
print(Increment)
[1] 1200 2000 3000 0 0
Let’s simplify it:
- Starting Salaries: We have a group of salaries: 12000, 20000, 30000, 2000, 10000.
- Setting Conditions: We decide to give a 10% raise to salaries that are 12000 or more.
- Marking Eligible Salaries: We create a list to mark which salaries are 12000 or above.
- Preparing for Raises: A new list called ‘Increment’ is created, initially filled with zeros, matching the original salaries.
- Calculating Raises: For salaries that meet the condition, a 10% raise is calculated and placed in the ‘Increment’ list.
- Displaying Raises: We use ‘print()’ to show the ‘Increment’ list containing extra amounts for eligible salaries.
This code helps figure out who gets a raise based on their salary. It’s like saying, “Hey, if your salary is 12000 or more, here’s a 10% raise for you!” It’s pretty cool, right?
# Now the Total Income is
Total_income<-c(Salary+Increment)
print(Total_income)
[1] 13200 22000 33000 2000 10000
This code snippet adds the ‘Salary’ vector and the ‘Increment’ vector element-wise to calculate the ‘Total Income’. It assumes the ‘Increment’ values and performs addition for each corresponding element in the vectors, generating the resulting ‘Total_income’ vector.
Comparison of vectors
# Comparisons between vectors
comparison_vector <- Salary > Increment
print(comparison_vector)
[1] TRUE TRUE TRUE TRUE TRUE
Let’s simplify it:
- Comparing Salaries and Increments: We’re comparing each salary in the ‘Salary’ vector with the corresponding raise amount in the ‘Increment’ vector.
- Marking the Comparison: A new list called ‘comparison_vector’ is made to mark where salaries are greater than their respective increments.
- Displaying the Comparison: We use ‘print()’ to show the ‘comparison_vector’, which tells us where salaries exceed their calculated increments.
This code helps us find out where salaries have surpassed their expected increments. It’s like saying, “Hey, if your salary is greater than your calculated raise, here’s a mark for you!” It’s a neat way to track those cases, right?
Functions for Creation
Exploring Functions: seq(), rep(), vector()
- An in-depth exploration of functions like seq(), rep(), and vector() in R to create sequences, repeated values, and empty vectors, respectively.
Some Vector Functions are as follows,
Function | Description |
---|---|
seq() | Generates sequences of numbers or other objects based on defined start, end, and increment values. |
rep() | Produces vectors with repeated values or sequences of values. |
vector() | Creates empty vectors of specified lengths and data types. |
# Using seq() and rep() to create sequences and repeated values
sequence_vector <- seq(1, 10, by = 2)
print(sequence_vector)
[1] 1 3 5 7 9
This code snippet uses the seq() function to create a sequence called ‘sequence_vector’. The sequence starts at 1, ends at 10, and progresses in steps of 2. Consequently, it generates the sequence [1, 3, 5, 7, 9], incrementing by 2 at each step.
repeated_values <- rep(5, times = 4)
print(repeated_values)
[1] 5 5 5 5
This code snippet uses the rep() function to create a vector called ‘repeated_values’. It repeats the value 5 four times, generating the vector [5, 5, 5, 5].
# Creating an empty vector
empty_vector <- vector("numeric", length = 5)
print(empty_vector)
[1] 0 0 0 0 0
This code snippet utilizes the vector() function to generate an empty numeric vector named ‘empty_vector’. It specifies the length of the vector as 5, resulting in an empty vector with 5 elements of numeric data type. When printed, it displays [0, 0, 0, 0, 0], representing an empty numeric vector of length 5.
Manipulation Functions for Vectors
Function | Description |
sort() | Arranges the elements of a vector in ascending order. |
rev() | Reverses the order of elements in a vector. |
unique() | Extracts unique elements from a vector, removing duplicates. |
In R, manipulation functions like sort(), rev(), and unique() provide essential capabilities to modify vectors, facilitating tasks such as sorting, reversing, and extracting unique elements. These functions are valuable in data manipulation and maintaining data integrity by managing vector elements effectively.
# Sorting, reversing, and extracting unique elements
sorted_vector_asc <- sort(Age) # for Ascending order
print(sorted_vector_asc)
[1] 20 25 35 40 42
sorted_vector_desc <- sort(Age,decreasing = TRUE) # for descending order
print(sorted_vector_desc)
[1] 42 40 35 25 20
reversed_vector <- rev(Total_income)
print(reversed_vector)
[1] 10000 2000 33000 22000 13200
unique_elements <- unique(Salary)
print(unique_elements)
[1] 12000 20000 30000 2000 10000
The code snippet demonstrates several operations on vectors in R:
- Sorting: The ‘Age’ vector is sorted in ascending and descending order using the
sort()
function. - Reversing: The order of elements in the ‘Total_income’ vector is reversed using the
rev()
function. - Extracting Unique Elements: Unique elements from the ‘Salary’ vector are extracted using the
unique()
function.
Conclusion
In this document, we explored various vector operations in R, emphasizing their significance and usefulness in data manipulation and analysis. From basic arithmetic operations to more advanced functionalities, we covered a wide array of operations.
Vector operations are fundamental in data science and programming. They play a crucial role in efficient data handling, manipulation, and mathematical computations. Having a strong grasp of these operations is essential for effective data analysis and manipulation in R.
These operations merely scratch the surface of the extensive capabilities of R programming when working with vectors.
DataScienceEra, an organization dedicated to advancing knowledge in data science, extensively utilizes these fundamental concepts in its data analysis and research endeavors.
Thank you for exploring vector operations in R with us! Stay tuned for more insights into the world of data science from DataScienceEra!
Signature: DataScienceEra Community Team