R by Example: Factors
Representing categorical data efficiently with this code example showing factor creation from character vectors, level definition and ordering, internal integer storage with character labels, and type conversion behavior.
Code
# Creating a factor
gender <- factor(c("Male", "Female", "Female", "Male"))
# Inspecting levels
levels(gender) # "Female" "Male"
# Changing levels order (useful for plotting)
rating <- factor(c("Low", "High", "Medium", "Low"),
levels = c("Low", "Medium", "High"),
ordered = TRUE)
# Summary counts categories
summary(gender)
# Converting back to string
char_vec <- as.character(gender)
# Converting to integer (returns internal codes)
int_vec <- as.numeric(gender)Explanation
Factors are specialized data structures for representing categorical data with a fixed and known set of possible values called levels. Internally, factors store data as integers with corresponding character labels for each category, providing memory efficiency since each unique character value is stored only once. The actual data is stored as a vector of integers pointing to the level labels, which reduces memory usage for character vectors with repeating values.
Factor characteristics include:
- Levels are sorted alphabetically by default unless explicitly specified
- Ordered factors support comparison operations like
<and>based on level order - Unordered factors represent nominal categories without inherent ordering
- The
factor()function creates factors with automatic level detection - Levels can be explicitly defined using the
levelsparameter
Type conversion with factors requires caution. Using as.numeric() on a factor returns the internal integer codes (1, 2, 3, etc.) rather than the character values. For factors containing numeric strings like "10" or "20", direct conversion to numeric returns indices not values, requiring conversion to character first using as.numeric(as.character(factor)). Factors are essential for statistical modeling where categorical variables need special treatment distinct from continuous text data.
Code Breakdown
factor() converts character vector to categorical with automatic level detection.levels(gender) returns unique categories sorted alphabetically.levels = c(...) explicitly defines category order for ordinal data.as.numeric(gender) returns internal integer codes, not character values.
