Go by Example: Strings and Runes
Dive into Go's handling of text with Strings and Runes. This example explains the difference between bytes and Unicode code points (runes), demonstrating how to correctly iterate over UTF-8 encoded strings and access individual characters.
Code
package main
import "fmt"
func main() {
// String basics
s := "Hello, 世界"
fmt.Println("String:", s)
fmt.Println("Length in bytes:", len(s))
// Rune (Unicode code point)
for i, r := range s {
fmt.Printf("Index %d: %c (rune: %v)\n", i, r, r)
}
// String indexing (bytes)
fmt.Println("First byte:", s[0])
}Explanation
In Go, a string is fundamentally a read-only slice of bytes containing UTF-8 encoded text. This design differs from languages where strings are arrays of characters. Go introduces the concept of a rune, which is an alias for int32 and represents a Unicode code point. This distinction between bytes and runes is crucial for correctly handling international text.
Because UTF-8 is a variable-width encoding, a single character can occupy 1 to 4 bytes. ASCII characters like 'H' use 1 byte, while characters from languages like Chinese ('世界') typically use 3 bytes each. The built-in len() function returns the byte count, not the character (rune) count. For the string "Hello, 世界", len() returns 13 bytes (7 for "Hello, " plus 6 for the two Chinese characters), even though there are only 9 characters total.
Iterating over a string with the range keyword automatically decodes UTF-8, yielding one rune at a time along with its starting byte index. This built-in UTF-8 awareness makes range the safest way to process Unicode strings. Direct indexing with s[0] accesses individual bytes,not runes, which can split multi-byte characters and produce invalid partial characters. Go source files themselves are UTF-8 encoded by default, allowing Unicode literals directly in code.
- Rune Literals: Rune literals are enclosed in single quotes (e.g.,
'A','⌘') and are just 32-bit integers, whereas string literals use double quotes ("hello") or backticks.

