BudiBadu Logo
Samplebadu

R by Example: Data Frames

R 4.x

Working with tabular data structures through this code example showing data frame construction with heterogeneous columns, dollar sign and bracket notation for column access, logical subsetting for row filtering, and dynamic column addition.

Code

# Creating a data frame
df <- data.frame(
  Name = c("Alice", "Bob", "Charlie"),
  Age = c(25, 30, 35),
  Married = c(TRUE, FALSE, TRUE)
)

# Inspecting structure
str(df)
summary(df)

# Accessing columns
ages <- df$Age
names <- df[["Name"]]

# Subsetting (Filtering)
# Select rows where Age > 28
subset_df <- df[df$Age > 28, ]

# Adding a new column
df$Height <- c(165, 180, 175)

Explanation

Data frames are two-dimensional tabular data structures that organize data into rows and columns, resembling spreadsheets or database tables. Each row represents an observation or case, while each column represents a variable. Unlike matrices which require all elements to be the same type, data frames allow heterogeneous columns where each column can contain different data types such as numeric, character, or logical, though all elements within a single column must share the same type.

Data frames are implemented as lists of vectors of equal length, which explains their flexibility. All columns must have the same number of rows, and columns have names that serve as variable identifiers. Rows can also be named, though they default to sequential numbers. The str() function displays the internal structure showing data types and first few values of each column, while summary() provides statistical summaries appropriate to each column's type.

Column access supports multiple syntaxes: the dollar sign $ operator like df$Age, double brackets [[]] for programmatic access, or single brackets [] which return a data frame subset. Row subsetting uses logical vectors created by conditions like df$Age > 28, which generates a boolean vector used to filter rows. The bracket syntax [rows, columns] allows simultaneous row and column selection, with empty dimensions selecting all elements in that dimension.

Code Breakdown

2
data.frame() constructs table with named columns of equal length.
9
str(df) displays internal structure showing types and sample values.
13
df$Age extracts column as vector using dollar sign notation.
18
df[df$Age > 28, ] filters rows using logical vector, empty column selects all.