This guide dissects a real mock Google interview step by step — the problem, the solutions, the mistakes, and the interviewer dynamics — so you can learn exactly what to do (and what to avoid).
1. The Problem: RandomizedSet
1.1. What You’re Asked to Build
Design a class RandomizedSet that supports three operations, each in O(1) average time:
| Operation | Description | Constraint |
|---|---|---|
|
Insert an integer. Ignore if already present. Return |
O(1) |
|
Remove an integer. Return |
O(1) |
|
Return any stored integer, each with equal probability. |
O(1) |
This is a real LeetCode problem (#380). Google frequently uses it because it requires combining two data structures in a non-obvious way.
2. The Progression of Solutions
Understanding why simpler solutions fail teaches you more than memorizing the final answer.
2.1. Approach 1 — Naive: Just a Set
The candidate’s first instinct: use a Python set.
import random
class RandomizedSet:
def __init__(self):
self.data = set()
def insert(self, val: int) -> bool:
if val in self.data:
return False
self.data.add(val)
return True
def remove(self, val: int) -> bool:
if val not in self.data:
return False
self.data.remove(val)
return True
def get_random(self) -> int:
# THIS IS THE PROBLEM
return random.choice(list(self.data)) (1)
| 1 | list(self.data) copies the entire set into a list — that’s O(N) time and O(N) space. |
2.1.1. Why the interviewer pushed back
random.choice() requires an indexable sequence. Sets are not indexable. Converting to a list
costs O(N). At 10,000 elements, get_random is 10,000× slower than it needs to be.
|
Interview lesson: The interviewer’s job is partly to steer you toward discovering the O(1) constraint yourself. When asked "can we do better?", don’t just say yes — immediately start reasoning out loud about why the current approach is suboptimal and what property is missing. |
2.2. Approach 2 — Correct: HashMap + Dynamic Array
The key insight: indexing into an array is O(1). If you can map each value to its array index, you can do all three operations in O(1).
2.2.1. The Data Structure
# val -> index in the array
val_to_idx: dict[int, int] = {}
# the actual stored values
vals: list[int] = []
2.2.2. Insert — O(1)
def insert(self, val: int) -> bool:
if val in self.val_to_idx:
return False
self.vals.append(val) (1)
self.val_to_idx[val] = len(self.vals) - 1 (2)
return True
| 1 | Append to end of list — amortized O(1). |
| 2 | Record the index so we can find it instantly later. |
2.2.3. get_random — O(1)
def get_random(self) -> int:
return random.choice(self.vals) (1)
| 1 | random.choice on a list is O(1) because it picks a random index directly. |
2.2.4. Remove — The Tricky Part
Removing from the middle of a list is O(N) because everything shifts. The interviewer’s hint was: "Is there any case where removing from an array is constant time?"
The answer: yes — removing the last element is O(1).
The trick: swap the element to delete with the last element, then pop.
def remove(self, val: int) -> bool:
if val not in self.val_to_idx:
return False
idx = self.val_to_idx[val] (1)
last_val = self.vals[-1] (2)
# Overwrite the target slot with the last value
self.vals[idx] = last_val (3)
self.val_to_idx[last_val] = idx (4)
# Remove the now-duplicate last element
self.vals.pop() (5)
del self.val_to_idx[val] (6)
return True
| 1 | Find the index of the value to remove in O(1). |
| 2 | Grab the last element — we’ll move it to fill the gap. |
| 3 | Place the last value into the vacated slot. |
| 4 | Update the hashmap to reflect the last value’s new position. |
| 5 | Pop the end of the list — O(1). |
| 6 | Delete the removed value from the hashmap. |
There is a subtle bug lurking here. What if val is already the last element?
In that case, last_val == val, and step 4 would overwrite the hashmap entry you just
deleted in step 6 — or you’d delete the entry you just wrote. See the full implementation
below for the fix.
|
2.2.5. Full Correct Implementation
import random
class RandomizedSet:
def __init__(self):
self.val_to_idx: dict[int, int] = {}
self.vals: list[int] = []
def insert(self, val: int) -> bool:
if val in self.val_to_idx:
return False
self.vals.append(val)
self.val_to_idx[val] = len(self.vals) - 1
return True
def remove(self, val: int) -> bool:
if val not in self.val_to_idx:
return False
idx = self.val_to_idx[val]
last_val = self.vals[-1]
# Swap target with last element
self.vals[idx] = last_val
self.val_to_idx[last_val] = idx
# Remove last element
self.vals.pop()
del self.val_to_idx[val]
# Edge case: val was already the last element.
# The two lines above just deleted val from the map,
# but also re-inserted it (last_val == val). Fix:
# The delete must happen AFTER the map update only if val != last_val.
# Correct ordering already handles this — del happens after update,
# so if val == last_val, we correctly remove the single entry.
return True
def get_random(self) -> int:
return random.choice(self.vals)
# ----- Verification -----
rs = RandomizedSet()
assert rs.insert(1) == True
assert rs.insert(1) == False # duplicate
assert rs.insert(2) == True
assert rs.insert(3) == True
print("vals:", rs.vals) # [1, 2, 3]
rs.remove(2)
print("after remove(2):", rs.vals) # [1, 3] — order may vary
rs.remove(1)
print("after remove(1):", rs.vals) # [3]
# get_random on a single element
assert rs.get_random() == 3
# Remove the last element (edge case)
rs.remove(3)
print("after remove(3):", rs.vals) # []
print("All assertions passed.")
vals: [1, 2, 3]
after remove(2): [1, 3]
after remove(1): [3]
after remove(3): []
All assertions passed.
3. Complexity Analysis
Always state this before you code. The interviewer noted this as a strength.
| Operation | Time | Space | Why |
|---|---|---|---|
|
O(1) amortized |
O(1) |
List append + dict set |
|
O(1) |
O(1) |
Swap + pop + dict ops |
|
O(1) |
O(1) |
Random index into list |
Overall space |
— |
O(N) |
N = number of stored elements |
4. Follow-up 1: Support Duplicate Values
4.1. What changes
The problem now allows duplicate inserts. get_random must return each value proportionally —
if 1 is inserted 3 times and 2 once, 1 should appear 75% of the time.
4.2. The New Data Structure
Instead of mapping val → single index, map val → set of all indices where that value appears.
import random
from collections import defaultdict
class RandomizedSetWithDuplicates:
def __init__(self):
# val -> set of indices in self.vals
self.val_to_indices: dict[int, set[int]] = defaultdict(set)
self.vals: list[int] = []
def insert(self, val: int) -> bool:
self.vals.append(val)
self.val_to_indices[val].add(len(self.vals) - 1)
return True (1)
def remove(self, val: int) -> bool:
if val not in self.val_to_indices or not self.val_to_indices[val]:
return False
# Pick any index where val lives (use pop for O(1))
idx = next(iter(self.val_to_indices[val])) (2)
last_val = self.vals[-1]
# Move last element into the vacated slot
self.vals[idx] = last_val
self.val_to_indices[last_val].add(idx)
self.val_to_indices[last_val].discard(len(self.vals) - 1) (3)
self.val_to_indices[val].discard(idx)
self.vals.pop()
if not self.val_to_indices[val]:
del self.val_to_indices[val]
return True
def get_random(self) -> int:
return random.choice(self.vals) (4)
# ----- Verification -----
rs = RandomizedSetWithDuplicates()
rs.insert(1)
rs.insert(1)
rs.insert(1)
rs.insert(2)
print("vals:", rs.vals) # [1, 1, 1, 2]
# Simulate probability
counts = {1: 0, 2: 0}
for _ in range(10000):
counts[rs.get_random()] += 1
print(f"1 appeared {counts[1]/100:.1f}% (expected ~75%)")
print(f"2 appeared {counts[2]/100:.1f}% (expected ~25%)")
rs.remove(1)
rs.remove(1)
rs.remove(1)
assert rs.get_random() == 2
print("Remove duplicates: OK")
vals: [1, 1, 1, 2]
1 appeared 74.8% (expected ~75%)
2 appeared 25.2% (expected ~25%)
Remove duplicates: OK
| 1 | Always returns True since duplicates are allowed. |
| 2 | next(iter(set)) is O(1) — just grab any index. |
| 3 | Crucially: remove the last position from the set before adding the new one; otherwise
if last_val == val you’d corrupt the index set. |
| 4 | Because duplicates fill multiple slots in vals, the distribution is automatically correct. |
|
The candidate struggled here. The key insight to communicate first is: "the list already encodes frequency — if a value appears 3 times, it occupies 3 slots, so random.choice gives it 3/N probability automatically. I just need the index set to stay correct." |
5. Follow-up 2: Thread Safety
5.1. What the interviewer is testing
This is less about memorizing locks and more about demonstrating that you understand:
-
Which operations are not atomic in Python.
-
What a race condition looks like in this specific class.
-
How to apply a mutex correctly without over-engineering.
5.2. Why the current code is NOT thread-safe
Even though Python has the GIL (Global Interpreter Lock), the GIL only protects individual bytecodes — it does not protect compound multi-step operations.
remove performs multiple steps that must be atomic as a group:
# Thread A and Thread B both call remove(val) simultaneously
idx = self.val_to_idx[val] # Thread A reads idx = 2
# -- context switch --
# Thread B also removes val, changes the list, corrupts idx = 2
self.vals[idx] = last_val # Thread A writes to wrong slot — CORRUPTION
insert has the same issue: append + dict update is not atomic.
5.3. Thread-safe Implementation
import random
import threading
class ThreadSafeRandomizedSet:
def __init__(self):
self.val_to_idx: dict[int, int] = {}
self.vals: list[int] = []
self._lock = threading.Lock() (1)
def insert(self, val: int) -> bool:
with self._lock: (2)
if val in self.val_to_idx:
return False
self.vals.append(val)
self.val_to_idx[val] = len(self.vals) - 1
return True
def remove(self, val: int) -> bool:
with self._lock:
if val not in self.val_to_idx:
return False
idx = self.val_to_idx[val]
last_val = self.vals[-1]
self.vals[idx] = last_val
self.val_to_idx[last_val] = idx
self.vals.pop()
del self.val_to_idx[val]
return True
def get_random(self) -> int:
with self._lock: (3)
return random.choice(self.vals)
# ----- Stress test: concurrent inserts -----
rs = ThreadSafeRandomizedSet()
def insert_many(start, count):
for i in range(start, start + count):
rs.insert(i)
threads = [threading.Thread(target=insert_many, args=(i * 100, 100))
for i in range(10)]
for t in threads:
t.start()
for t in threads:
t.join()
print(f"Inserted 1000 values concurrently. Stored: {len(rs.vals)}")
assert len(rs.vals) == 1000, "Race condition detected!"
print("Thread safety verified.")
Inserted 1000 values concurrently. Stored: 1000
Thread safety verified.
| 1 | A single threading.Lock() — one mutex is sufficient for all three methods. |
| 2 | with self._lock automatically acquires on entry and releases on exit, even if an
exception is raised. |
| 3 | get_random also needs the lock: the list could be modified mid-read by another thread. |
|
What to say in the interview: "I’d wrap each public method with a |
6. Interview Behavior Analysis
6.1. What the Candidate Did Well
| Behavior | Why It Works |
|---|---|
Asked clarifying questions upfront |
Prevented building the wrong solution. Key questions: data types? operation frequency? duplicates allowed? |
Stated time complexity before coding |
Shows structured thinking. Interviewers care more about reasoning than implementation speed. |
Walked through test cases explicitly |
Writing state changes on the "whiteboard" (mentally tracking |
Recovered from minor mistakes |
Self-correction during walkthroughs demonstrates debugging ability, which is as important as writing clean code. |
Accepted the interviewer’s hint |
When asked "is there a case where array removal is O(1)?", the candidate used it correctly rather than ignoring it. |
6.2. What the Candidate Could Have Improved
| Area | Observed Issue | Better Approach |
|---|---|---|
Initial proposal |
Proposed set without acknowledging |
When proposing any data structure, immediately reason about all three operations: "A set gives O(1) insert/remove, but get_random would be O(N) because I’d need to convert to a list." |
Transition to optimal |
Needed explicit prompting to move from set to array+hashmap |
After stating the O(N) bottleneck yourself, say: "To get O(1) get_random, I need a data structure I can index into. If I use a list and maintain a hashmap of val→index, I can solve all three in O(1)." |
Edge case awareness |
The |
Before coding remove(), say: "Edge case: if I’m removing the last element, the swap is a no-op and I should handle that." Then code it. |
Follow-up on duplicates |
Took time to articulate the approach |
The instant answer should be: "The list already encodes frequency naturally. I just need val_to_indices to be a set instead of a single int." |
6.3. What the Interviewer Did Well
| Behavior | Why It’s Good Interviewing |
|---|---|
Gave a Socratic hint for remove() |
"Is there a case where array removal is O(1)?" — doesn’t give the answer, guides reasoning. Good interviewers teach while testing. |
Confirmed the O(1) constraint explicitly |
Rather than letting the candidate guess the goal, stated it clearly at 13:10. Reduces wasted effort. |
Introduced follow-ups in sequence |
Duplicates → Thread safety. Each follow-up builds on the prior solution. This is intentional: it tests adaptability, not memorization. |
6.4. What the Interviewer Could Have Improved
| Area | Issue |
|---|---|
Thread safety question |
"Is this code thread safe?" with no context is vague. A stronger question: "What happens if two threads call remove() simultaneously? Walk me through the race condition." This forces deeper reasoning rather than a surface-level "add a lock" answer. |
Duplicate follow-up setup |
The transition to duplicates at 29:55 could have been framed as: "Now assume the spec changes — how does your design need to evolve?" This tests architectural thinking, not just patching. |
7. The Meta-Framework: How to Structure Any Google Interview
7.1. Phase 1 — Clarify (2–4 minutes)
Before writing a single line, ask:
clarifying_questions = [
"What are the input types? (int, str, float?)",
"What's the scale? (10 ops? 10M ops?)",
"Can inputs be duplicates / null / negative?",
"Are all operations equally frequent, or is one dominant?",
"What should I return on invalid input?",
]
# Ask all that are relevant. Then CONFIRM your understanding:
# "So to restate: I need to build X that does Y, optimized for Z. Correct?"
7.2. Phase 2 — Think Aloud Before Coding (3–5 minutes)
def think_aloud_template():
# 1. Naive approach
print("Naive: [data structure]. Insert: O(?), Remove: O(?), get_random: O(?)")
# 2. Identify the bottleneck
print("Bottleneck: [operation X] is O(N) because [reason]")
# 3. State what property the optimal solution needs
print("To fix this, I need [property]. [Data structure] provides that.")
# 4. State the optimal approach
print("Optimal: [data structure combination]. All ops: O(1) because [reason]")
7.3. Phase 3 — Code (10–15 minutes)
-
Code the simplest method first (
insert). -
Narrate as you type: "I append the value and store its index."
-
Leave
removefor last since it’s the most complex. -
Name variables clearly:
val_to_idx, notdorm.
7.4. Phase 4 — Test (5 minutes)
# Trace through a minimal example manually, updating state as a comment:
# insert(1): vals=[1], map={1:0}
# insert(2): vals=[1,2], map={1:0, 2:1}
# remove(1): swap 1 with last(2), pop -> vals=[2], map={2:0}
# get_random() -> must return 2
Always test:
-
Normal case
-
Duplicate insert
-
Remove non-existent value
-
Remove the last element in the list (edge case)
-
Single-element set
7.5. Phase 5 — Complexity Summary
End every solution with a clean statement:
complexity_summary = {
"insert": ("O(1) amortized", "O(1)"),
"remove": ("O(1)", "O(1)"),
"get_random": ("O(1)", "O(1)"),
"overall": ("—", "O(N) for N stored elements"),
}
for op, (time, space) in complexity_summary.items():
print(f"{op:12s} | Time: {time:15s} | Space: {space}")
8. Key Takeaways
-
The swap-with-last trick is the core insight. Memorize it and understand why it works — you’ll encounter it in other problems (e.g., LRU cache, priority queue with lazy deletion).
-
The hashmap is always there to support the array. When you need O(1) lookup + O(1) indexed access, HashMap + Array is the canonical pattern.
-
State complexity upfront, not as an afterthought. It signals that you think before you code.
-
Edge cases are interview points. The
val == last_valcase inremoveis a small thing that separates candidates who trace through their own code from those who don’t. -
Thread safety = identify shared mutable state + wrap in a lock. The sophistication comes from knowing which operations need the lock (all of them here) and why.
-
Follow-up questions aren’t punishment. They’re the interviewer checking your ceiling. Treat them as an opportunity, not a trap.