Building a Token Bucket Rate Limiter in Go with Chi and sync primitives

Rate limiting is a problem every HTTP service eventually faces. In this post I will walk through building an in-memory token bucket rate limiter in Go, using chi for routing and nothing but the standard library's sync package for concurrency. No Redis, no external state. just goroutines, sync.RWMutex, and sync.Mutex used deliberately at two different levels.
The companion Rust implementation lives at rate-limiter-rs and makes the same design decisions using Axum and std::sync::RwLock/Mutex. The locking structure in both projects is intentionally identical, which makes the comparison meaningful.
The token bucket algorithm
Each client gets a bucket that holds up to Capacity tokens. Every request consumes one token. A background goroutine refills tokens at a fixed rate over time. When the bucket is empty the request is rejected with a 429 and a Retry-After header.
| Time | Event | Tokens remaining | Response |
|---|---|---|---|
| t=0s | Bucket created | 10 | |
| t=0s | 3 requests arrive | 7 | 200 |
| t=1s | Refill fires (+1) | 8 | |
| t=1s | 9 requests arrive | 0 (last gets rejected) | 200 × 8, 429 × 1 |
| t=2s | Refill fires (+1) | 1 | |
| t=2s | Next request | 0 | 200 |
The critical constraint: the check ("do I have a token?") and the decrement ("consume one") must be atomic. Two concurrent requests for the same client, with one token remaining, must not both succeed.
Configuration
type Config struct {
Capacity float64
RefillRate float64
RefillInterval time.Duration
CleanupTTL time.Duration
CleanupInterval time.Duration
}
func DefaultConfig() Config {
return Config{
Capacity: 10,
RefillRate: 1,
RefillInterval: 100 * time.Millisecond,
CleanupTTL: 5 * time.Minute,
CleanupInterval: 60 * time.Second,
}
}
Tokens is float64 because refills are time-proportional and produce fractional values. We only consume whole tokens, so a client with 0.8 tokens cannot make a request.
The store: two levels of locking
This is where the interesting concurrency work happens.
type Bucket struct {
Mu sync.Mutex
Tokens float64
Capacity float64
LastRefill time.Time
}
type Store struct {
mu sync.RWMutex
buckets map[string]*Bucket
}
There are two distinct locks:
Store.muis async.RWMutexthat protects the map itself. Many goroutines can hold the read lock simultaneously to look up existing clients. Only inserting a new client requires the write lock.Bucket.Muis async.Mutexper bucket that protects the token count and the refill timestamp. This lock is held for the duration of a token check-and-decrement.
This two-level approach means contention is per-client, not global. Two requests for different clients can proceed in parallel even while one is holding its bucket lock.
Double-checked insert
func (s *Store) GetOrInsert(clientID string, capacity float64) *Bucket {
s.mu.RLock()
b, ok := s.buckets[clientID]
s.mu.RUnlock()
if ok {
return b
}
s.mu.Lock()
defer s.mu.Unlock()
if b, ok = s.buckets[clientID]; ok {
return b
}
b = &Bucket{
Tokens: capacity,
Capacity: capacity,
LastRefill: time.Now(),
}
s.buckets[clientID] = b
return b
}
The first RLock is the optimistic fast path: most clients will already have a bucket and we can return immediately without ever taking a write lock. When a new client is seen, we upgrade to a write lock and check again, the second check is necessary because another goroutine may have inserted a bucket between the two locks.
Refill iteration
func (s *Store) Each(f func(*Bucket)) {
s.mu.RLock()
defer s.mu.RUnlock()
for _, b := range s.buckets {
f(b)
}
}
Each holds a read lock on the map while iterating. The function f receives a pointer to each bucket and is expected to lock bucket.Mu itself before touching tokens. This lets refills for different clients happen concurrently if the caller chooses, though in practice the refill loop is single-goroutine, so it locks and unlocks each bucket sequentially.
The rate limiter
func (rl *RateLimiter) Allow(clientID string) error {
bucket := rl.store.GetOrInsert(clientID, rl.config.Capacity)
bucket.Mu.Lock()
defer bucket.Mu.Unlock()
if bucket.Tokens >= 1 {
bucket.Tokens--
return nil
}
retryAfter := int(math.Ceil(1 / rl.config.RefillRate))
return &limiterrors.RateLimitExceeded{RetryAfterSecs: retryAfter}
}
GetOrInsert returns a pointer to the bucket. We then lock it, check the token count, decrement if possible, and return. The lock is released by the deferred Unlock. If two goroutines arrive simultaneously for the same client with one token remaining, only one will find Tokens >= 1 while holding the lock. The other will see 0 and get a 429.
Time-proportional refill
func (rl *RateLimiter) RefillAll() {
rl.store.Each(func(b *store.Bucket) {
b.Mu.Lock()
defer b.Mu.Unlock()
elapsed := time.Since(b.LastRefill).Seconds()
tokensToAdd := elapsed * rl.config.RefillRate
b.Tokens = math.Min(b.Tokens+tokensToAdd, b.Capacity)
b.LastRefill = time.Now()
})
}
Rather than adding a fixed number of tokens on every tick, we compute how many seconds have passed since the last refill and multiply by the refill rate. This means the refill is correct regardless of how often the task fires or whether a tick was delayed. A bucket untouched for 3 seconds gets 3 * RefillRate tokens added in one pass.
Background goroutines
func StartRefillTask(rl *limiter.RateLimiter, stop <-chan struct{}) {
interval := rl.Config().RefillInterval
go func() {
ticker := time.NewTicker(interval)
defer ticker.Stop()
for {
select {
case <-stop:
return
case <-ticker.C:
rl.RefillAll()
}
}
}()
}
func StartCleanupTask(rl *limiter.RateLimiter, stop <-chan struct{}) {
interval := rl.Config().CleanupInterval
go func() {
ticker := time.NewTicker(interval)
defer ticker.Stop()
for {
select {
case <-stop:
return
case <-ticker.C:
before := rl.Store().Len()
rl.Cleanup()
after := rl.Store().Len()
if removed := before - after; removed > 0 {
slog.Info("cleaned up inactive buckets",
"removed", removed,
"remaining", after,
)
}
}
}
}()
}
Both tasks receive a stop <-chan struct{} channel. When main receives SIGINT or SIGTERM it calls close(stop), which unblocks both select cases and lets the goroutines return cleanly before the server shuts down.
The cleanup task prevents unbounded memory growth. Without it every unique client IP stays in the map forever. The RemoveInactive method holds the write lock on the map while going over it and deleting stale entries.
Chi middleware
func RateLimit(rl *limiter.RateLimiter) func(http.Handler) http.Handler {
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
clientID := extractClientID(r)
if err := rl.Allow(clientID); err != nil {
if rle, ok := err.(*limiterrors.RateLimitExceeded); ok {
rle.Write(w)
return
}
ie := &limiterrors.InternalError{Message: err.Error()}
ie.Write(w)
return
}
next.ServeHTTP(w, r)
})
}
}
func extractClientID(r *http.Request) string {
if key := r.Header.Get("X-API-Key"); key != "" {
return key
}
if forwarded := r.Header.Get("X-Forwarded-For"); forwarded != "" {
if ip, _, found := strings.Cut(forwarded, ","); found {
return strings.TrimSpace(ip)
}
return strings.TrimSpace(forwarded)
}
addr := r.RemoteAddr
if i := strings.LastIndex(addr, ":"); i != -1 {
return addr[:i]
}
return addr
}
The middleware is a closure that captures *RateLimiter. It type-asserts the error to *RateLimitExceeded to write the 429 with the correct Retry-After header. Any other error falls through to the internal error handler.
Client ID resolution prefers an explicit X-API-Key, falls back to the first IP in X-Forwarded-For, and finally strips the port from RemoteAddr.
Graceful shutdown
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
go func() {
slog.Info("listening", "addr", addr)
if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
slog.Error("server error", "err", err)
os.Exit(1)
}
}()
<-quit
slog.Info("shutdown signal received -- draining requests")
close(stop)
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
srv.Shutdown(ctx)
close(stop) signals both background goroutines to exit. srv.Shutdown(ctx) stops accepting new connections and waits for in-flight requests to complete, with a 10-second timeout.
Testing
func TestConcurrentRequestsSameClientOnlyCapacitySucceed(t *testing.T) {
rl := makeLimiter()
total := 20
capacity := int(testConfig().Capacity)
var mu sync.Mutex
var ok, fail int
var wg sync.WaitGroup
for range total {
wg.Add(1)
go func() {
defer wg.Done()
err := rl.Allow("192.168.1.3")
mu.Lock()
if err == nil {
ok++
} else {
fail++
}
mu.Unlock()
}()
}
wg.Wait()
if ok != capacity {
t.Errorf("expected %d allowed, got %d", capacity, ok)
}
if fail != total-capacity {
t.Errorf("expected %d rejected, got %d", total-capacity, fail)
}
}
Twenty goroutines fire simultaneously for the same client. Exactly Capacity (5 in the test config) must succeed. This test directly validates that bucket.Mu.Lock() in Allow prevents any two goroutines from both passing the Tokens >= 1 check when only one token remains.
The full test suite:
| Test | Scenario |
|---|---|
TestClientExhaustsBucketAndIsRejected |
5 requests succeed, 6th is 429 |
TestTokensRefillAndRequestsSucceedAgain |
Sleep 1.1s, call RefillAll, next request succeeds |
TestConcurrentRequestsSameClientOnlyCapacitySucceed |
20 goroutines, exactly capacity succeed |
TestDifferentClientsHaveIndependentBuckets |
Client A exhausted does not affect client B |
TestInactiveBucketsAreCleanedUp |
TTL=0 cleanup removes all buckets |
Go vs Rust: the same locking structure, different enforcement
Both implementations use the same two-level locking pattern:
| Go | Rust | |
|---|---|---|
| Map guard | sync.RWMutex + map |
std::sync::RwLock<HashMap> |
| Per-bucket guard | sync.Mutex per bucket |
std::sync::Mutex<Bucket> |
The algorithm is identical. The difference is how the language enforces the contract.
In Go, bucket.Mu.Lock() and defer bucket.Mu.Unlock() are explicit calls. defer makes the release automatic on function exit, but nothing in the type system prevents you from accessing bucket.Tokens without holding the lock. The correctness is a discipline enforced by the programmer, not the compiler.
In Rust, mutex.lock().unwrap() returns a MutexGuard. You cannot access the data inside the Mutex without holding the guard, because the data is only accessible through the guard's Deref impl. The guard releases the lock when it goes out of scope --- enforced by the borrow checker. You cannot forget to release it and you cannot access the protected data without it.
Running it
go run ./cmd/server
# GET http://localhost:3000/ping
for i in $(seq 1 11); do
curl -s -o /dev/null -w "%{http_code}\n" http://localhost:3000/ping
done
go test ./tests/... -v
Project structure
cmd/server/main.go
internal/
types/types.go
store/store.go
limiter/limiter.go
tasks/tasks.go
middleware/middleware.go
errors/errors.go
tests/ratelimit_test.go
Source code: github.com/lethuzulu/rate-limiter-go
The Rust version using Axum and DashMap: github.com/lethuzulu/rate-limiter-rs




