Rabin-Karp String Search in Zig

Rabin-Karp String Search in Zig

# Rabin-Karp String Search in Zig ## Introduction String searching is a fundamental problem in computer science. Given a large text (haystack) and a smaller pattern (needle), we want to find all occurrences of the pattern in the text. The naive approach checks every possible position, resulting in O(n×m) time complexity where n is haystack length and m is needle length. The **Rabin-Karp algorithm** improves average-case performance using **rolling hash** techniques, achieving O(n+m) average time complexity. ## The Mathematics Behind Rabin-Karp ### Hashing as a Fingerprint The core idea: use a hash function to compute a "fingerprint" of the needle and compare it against fingerprints of substrings in the haystack. If hashes match, we verify character-by-character to avoid false positives from hash collisions. ### Rolling Hash Formula We treat strings as numbers in base B (typically 256 for ASCII): ``` hash("abc") = ((a × B + b) × B + c) % P ``` Where: - a, b, c are ASCII values - B = 256 (base) - P = a large prime (101 in our implementation) **Rolling property**: When sliding the window by one character, we can compute the new hash from the old hash in O(1): ``` new_hash = (B × (old_hash - leading_char × B^(m-1) % P) + trailing_char) % P ``` This allows us to update the hash in constant time per shift, avoiding recomputation from scratch. ### Modular Arithmetic Using modulo P keeps hash values bounded and prevents overflow. The choice of prime P reduces collision probability. In practice, we use a larger prime (like 10^9+7) for fewer collisions, but 101 works for demonstration. ## Implementation Details ### Core Algorithm (Zig) ```zig pub fn rabinKarp(haystack: []const u8, needle: []const u8) ?usize { if (needle.len == 0) return 0; if (needle.len > haystack.len) return null; const base: u64 = 256; const prime: u64 = 101; var needle_hash: u64 = 0; var window_hash: u64 = 0; var i: usize = 0; // Compute initial hashes while (i haystack.len) return null; var i: usize = 0; while (i <= haystack.len - needle.len) { var j: usize = 0; while (j < needle.len and haystack[i + j] == needle[j]) { j += 1; } if (j == needle.len) return i; i += 1; } return null; } ``` ## Complexity Analysis | Algorithm | Average Case | Worst Case | Space | |-----------|--------------|------------|-------| | Naive | O(n×m) | O(n×m) | O(1) | | Rabin-Karp| O(n+m) | O(n×m) | O(1) | **Worst case** occurs when many hash collisions happen (e.g., both strings consist of repeated characters). The average case assumes a good hash function with low collision probability. ## Expected Benchmark Results When compiled and run with `zig build-exe benchmark.zig && ./benchmark`, we expect: ``` Rabin-Karp vs Naive Search Benchmark (Zig) Haystack size: 100 characters Rabin-Karp average: ~X ns Naive average: ~Y ns Speedup: Z.x Haystack size: 1000 characters Rabin-Karp average: ~X ns Naive average: ~Y ns Speedup: Z.x Haystack size: 10000 characters Rabin-Karp average: ~X ns Naive average: ~Y ns Speedup: Z.x Haystack size: 100000 characters Rabin-Karp average: ~X ns Naive average: ~Y ns Speedup: Z.x Haystack size: 1000000 characters Rabin-Karp average: ~X ns Naive average: ~Y ns Speedup: Z.x ``` **Expected trend**: As haystack size increases, Rabin-Karp's speedup over naive search becomes more dramatic. For small strings (< 100 chars), naive may be faster due to Rabin-Karp's hash overhead. For large strings (10^5+ chars), Rabin-Karp should show 2-10x speedup depending on needle length and hash function. ## Why Zig? Zig is a modern systems programming language that offers: - **No hidden control flow** – explicit and predictable performance - **Manual memory management** – no garbage collector pauses - **Compile-time execution** – potential for algorithm specialization - **C interoperability** – easy to integrate with existing codebases - **Simple, readable syntax** – clear mapping to algorithm logic Our implementation takes advantage of: - Slices for efficient string views - Fixed-width integers for arithmetic precision - Comptime for potential compile-time optimization - Built-in testing framework ## Building and Testing ```bash # Compile zig build-exe rabin_karp.zig -O ReleaseSafe zig build-exe benchmark.zig -O ReleaseSafe # Run tests zig test rabin_karp.zig # Run benchmarks ./benchmark ``` ## Conclusion The Rabin-Karp algorithm demonstrates how mathematical insight (modular arithmetic, rolling computation) can transform a naive O(n×m) problem into an efficient O(n+m) average-case solution. Our Zig implementation is concise, type-safe, and showcases Zig's suitability for performance-critical algorithms. The key takeaway: **choose the right algorithm**. For single searches, naive may suffice. For multiple patterns or massive texts, Rabin-Karp shines. Zig gives us the tools to implement it with confidence and performance. --- *Part of the Dex Time Challenge #25 – Implementing Rabin-Karp in Zig*