Last week, a routine Pipedrive update turned into a very un-routine service interruption and many of you felt it. We want to walk you through what happened, how we got things back on track and what we’re doing to prevent it from happening again. Just to be clear: this wasn’t caused by a security breach or attack and no data was lost.
So, what happened?
Early in the morning on June 10 (UTC time), one of our systems ran a routine security update. Unfortunately, that update unexpectedly caused a key networking component to restart in a way that disrupted how different parts of our infrastructure communicate with each other.
At the same time, we also rolled out a completely unrelated network configuration change to a DNS (Domain Name System – the system that helps direct traffic to the right servers when you open a website). This DNS change didn’t cause the outage or make things worse. However, because it happened just before the issues began, we initially thought it might be related. This led our investigation down the wrong path for a while, which delayed us from identifying the outage’s true root cause.
Once we pinpointed the cause of the outage, we quickly fixed it and Pipedrive was fully back online later that day.
Here’s what you might’ve seen
For most customers, Pipedrive simply wasn’t accessible during the incident.
Here’s what that looked like in practice:
If you tried to log in or sign up during the outage, it didn’t work
Pages didn’t load or behaved strangely
Integrations and API calls failed
Our support team may have been slower to respond (because our tools were affected too)
The timeline (in UTC)
Time | What happened |
06:06 | First system alerts indicated a problem |
06:17 | We officially kicked off the incident investigation |
06:42 | Cross-team response call started |
09:23 | We found the problem’s root cause – a networking component restarted in a way we didn’t expect |
09:59 | First region came back online |
11:43 | Most systems were back up |
12:19 | Final region was restored |
14:39 | Incident closed – everything stable again |
Changes we’ve made since
We’ve already made adjustments to stop this exact scenario from repeating. These include adjustments to how we apply updates and protect critical networking settings.
One of the biggest takeaways for our infrastructure team is this: avoid tunnel vision. Just because a problem’s root cause seems obvious, doesn’t mean it is the right one. We should always explore alternative explanations, especially when the symptoms don’t fully add up.
We’re also changing how we roll out infrastructure changes and improving our ability to pinpoint real problems faster. As well as this, we’re committing to more frequent status page updates to inform you about what’s in progress and what we’re working on during an incident.
A note from Paulo
We know you count on Pipedrive to get work done and when it’s not working, it gets in your way. That’s on us. We’re truly sorry for the disruption on June 10.
This wasn’t just a blip. It was a serious incident, and we’ve been digging deep to fully understand what went wrong and, more importantly, how to prevent it from happening again. Our teams moved fast to fix the issue and now we’re focused on tightening up the right parts of our system and learning from where we got stuck.
At Pipedrive, we aim to keep things simple and dependable, a tool you can trust to do its job every day.
Thank you for your trust in us. We’re committed to learning from this and doing better and are grateful to have you with us.
– Paulo Cunha, CEO of Pipedrive