Your Metrics Dropped 10%. What Do You Do?": A Guide to Nailing Root Cause Analysis
Introduction
You're in a Product Manager or Technical Program Manager interview. You've nailed the "design a product" question and talked about your leadership style. Then, the interviewer leans in and says:
"Imagine you're the PM for Instagram Reels, and this morning, you see that daily active users have dropped by 10%. What do you do?"
This isn't a design, strategy, or behavioral question. This is an Execution (or Analytical ) question.
It’s a "the-house-is-on-fire" scenario. They aren't testing your creativity; they are testing your structure, analytical rigor, and grace under pressure . Your answer here reveals if you are a structured, data-driven leader or just someone who guesses.
The Biggest Mistake: Jumping to Conclusions
The weakest candidates hear this question and immediately start guessing.
- "It's probably a bug from the last release."
- "Maybe a competitor launched a new feature."
- "I'd check if the servers are down."
While any of these could be true, guessing is the fastest way to fail this interview. It shows a lack of structured thinking. A senior PM or TPM never guesses; they follow a framework to diagnose the problem systematically.
A 5-Step Framework for Root Cause Analysis (RCA)
Instead of guessing, take a breath and walk the interviewer through a clear, methodical plan. This framework is based on the one we teach in our Kracd.com interview prep guides.
Step 1: Clarify and Gather Context First, don't accept the premise without understanding it. Ask clarifying questions to narrow the scope.
- "A 10% drop is significant. When did this drop start? Did it happen suddenly at 9 AM, or has it been a slow decline over the past 72 hours?"
- "How are we measuring 'daily active users'? Has the definition of this metric or the data pipeline that measures it changed recently?"
Step 2: Form High-Level Hypotheses Once you have more context, bucket your potential problems. A "great" answer groups hypotheses logically.
- "To investigate this, I'd group my hypotheses into a few key areas:"
- External Factors: Seasonality (is it a holiday?), a competitor's action, a major news event, or a regional outage (e.g., network provider).
- Internal Technical Issues: This could be a bug in a recent release, a server-side outage, API failures, or a data pipeline error that is misreporting the metric.
- Internal Product Changes: Did we just launch a new feature that's cannibalizing Reels? Did we change the user interface? Did a recent marketing campaign just end?
Step 3: Prioritize Your Investigation (Gather Data) You can't check everything at once. Explain what you would check first and why.
- "My first step wouldn't be to check the code; it would be to validate the data itself. I'd check our analytics pipeline to ensure the metric isn't being underreported. That's the fastest and easiest thing to check."
- "Second, I'd check for any system-wide outages or errors. Our internal dashboards would show this quickly."
- "Third, assuming the data is correct and there are no outages, I'd start to segment the drop. If the drop is only on the new Android build, we've found our culprit. If it's global and for all users, it's more likely a server-side or major external event."
Step 4: Refine Hypotheses and Drill Down Explain how you'd use the data to get closer to the truth.
- "Okay, let's say we've confirmed the data is accurate and the drop is only on iOS users who updated in the last 24 hours."
- "This allows us to eliminate all external factors, all server-side issues, and all other platforms. The root cause is almost certainly a bug in the latest iOS release. My next step would be to work with the engineering lead to analyze that build's changes."
Step 5: Identify the Root Cause and Propose a Solution Always end by stating the solution and the next steps.
- "We've found the root cause: The new update introduced a login bug for 15% of iOS users. The solution is twofold:
- Short-term (Stop the bleeding): Initiate an emergency hotfix and roll back the update for users who haven't installed it yet.
- Long-term (Prevent it from happening again): Conduct a post-mortem. Why did our testing and QA not catch this? We need to improve our automated test coverage for login flows."
Stop Practicing. Start Performing.
Knowing this framework is good. But being able to perform it flawlessly under pressure—while a hiring manager is grading you—is what gets you the job.
In our Mastering Product Management and Art of Program Execution (TPM) guides, we don't just give you theories. We give you the evaluation rubrics that FAANG companies actually use to score your answer.
You'll learn what separates a "Good" answer from a "Great" one:
- A "Good" answer is systematic. (e.g., "I'd check internal, then external factors.")
- A "Great" answer is systematic, prioritized, and shows deep business intuition. (e.g., "I'd check data integrity first. Then, I'd segment by user type and platform...")
Our guides, complete with 1:1 mock interview options with actual FAANG hiring managers , are the single best investment you can make in your tech career.
👉 Get the PM Prep Guide or Get the TPM Prep Kit today and walk into your next interview with the confidence of an insider.
FAQs
Q1: What's the difference between an Execution and an Analytical question?They are very similar. An Analytical question focuses on how you think with data (e.g., "What metrics would you track...?") . An Execution question focuses on how you act on that data (e.g., "The data is bad. What do you do?") . The "Metrics Drop" question tests both at the same time.
Q2: What if I don't find a single root cause?That's a very realistic outcome. A great answer would be to state that! For example: "My analysis indicates the 10% drop isn't from one bug. It's a 5% drop from a new competitor promotion (which we can't control) and a 5% drop from our new, confusing UI. I would prioritize forming a tiger team to A/B test a fix for the UI, as that's what's in our control."
Q3: How technical do I need to be as a PM to answer this?You don't need to read code, but you must understand the system. Knowing to ask "was it a client-side or server-side change?" or "was it an API issue?" is crucial. Our PM Prep Guide has a dedicated "Technical Questions" module just for this.

.jpg)
.jpg)

































.webp)






