R by Example: Apply Family
Applying functions over data structures without explicit loops through this code example showing apply for matrices, lapply returning lists, sapply simplifying results, and tapply for grouped operations on vectors.
Code
# Create sample data
mat <- matrix(1:12, nrow=3)
my_list <- list(a = 1:5, b = 6:10, c = 11:15)
# apply() - for matrices/arrays
# MARGIN: 1 = rows, 2 = columns
row_sums <- apply(mat, 1, sum)
col_means <- apply(mat, 2, mean)
# lapply() - returns a list
squared_list <- lapply(my_list, function(x) x^2)
# sapply() - simplifies to vector/matrix
squared_vec <- sapply(my_list, function(x) sum(x^2))
# tapply() - apply function to subsets
values <- c(1, 2, 3, 4, 5, 6)
groups <- factor(c("A", "B", "A", "B", "A", "B"))
group_means <- tapply(values, groups, mean)Explanation
The apply family of functions provides tools for applying functions over various data structures without writing explicit loops, often resulting in cleaner and more efficient code. These functions leverage optimized C code underneath, making them faster than equivalent R loops for many operations. The family includes apply() for matrices and arrays, lapply() for lists, sapply() for simplified output, vapply() for type-safe output, mapply() for multivariate operations, and tapply() for grouped operations.
Apply family function characteristics:
apply(X, MARGIN, FUN)applies function to matrix margins where MARGIN 1 is rows, 2 is columnslapply(X, FUN)applies function to each list element, always returns a listsapply(X, FUN)simplifieslapplyoutput to vector or matrix when possiblevapply(X, FUN, FUN.VALUE)requires output type specification for safety and performancemapply(FUN, ...)applies function to multiple list or vector arguments in paralleltapply(X, INDEX, FUN)applies function to vector subsets grouped by factors
While powerful for many tasks, modern alternatives like the dplyr package or purrr package are often recommended for complex operations on large datasets due to consistent syntax and optimized performance. The apply family remains fundamental for base R programming and is widely used in statistical computing workflows.
Code Breakdown
apply(mat, 1, sum) applies sum function to each row (MARGIN=1).lapply() applies function to each list element, returns list.sapply() simplifies lapply output to vector when possible.tapply(values, groups, mean) computes mean for each factor level.
