How we solved outages and performance issues in Helpmonks
At the end of March 2019 and until the middle of April 2019, we experienced outages and performance issues in Helpmonks. What follows is a report of what happened and how we solved it.
Performance issues were caused because we moved our core database to a new cloud provider with a whole service package that did not perform the same in production as during our stress tests. After several support calls, upgrades to the underlining operating system and virtualization layer, we managed to bring the performance back. However, we experienced the "Support Service" that we wanted at first hand and realized that it was not any better than our team can provide. Long story short, after two weeks of, what seems like an endless nightmare of issues, we moved everything back to our private cloud. However, we added more servers to the database which is now scaled over several servers and networks.
Outages were caused for one by the provided Load-Balancer from our cloud provider and by issues in our code when parsing incoming emails. Furthermore, we identified a problem with our distributed storage that failed under some circumstances.
The provider confirmed the issue with the Load-Balancer and we were told that no fix is coming soon. Subsequently, we had to look for an alternative. However, as this is the first entry to our web app and API, we could not just "configure" something. Besides, we wanted to continue to cache calls to static assets and also create a fail-safe set up. In the end, we settled for haproxy with a second haproxy server and keepalived, i.e., if the load-balancer should become unavailable it will switch to another one automatically.
The storage issue was fixed by using several storage servers that are clustered with glusterfs. The significant benefit of this is that the whole glusterfs cluster (no pun intended) can be mounted with a "volume file" so that the disk is mounted even when some servers in the group are not available or have a network interruption.
In addition to all the hardware and fail-over configurations, we also enhanced our code by running multiple threads within one application so that one email that might throw a parsing error is not taking down the entire parsing application. This rarely ever was an issue, but murphy's law hit us full time and caused another outage during the time we were having the problems that we outlined above. Nowadays, we deploy multiple threads within the parsing task that guarantees that this will not happen again.
We learned a lot during this time and have made sure that every single service that you depend on is configured in a fail-over matter, is available under any circumstance, and independent if a server or a network is down.
Today, we can report that all the issues are a thing of the past and that our performance is not only back to what it was before, but surpasses all expectations.
We're fully aware that many of our customers around the world experienced performance issues and sometimes couldn't access their emails in Helpmonks. We are sorry about this. We know this caused a severe interruption in the daily workflow for many of you. We work diligently that this will not happen again in Helpmonks.
Thank you for being a customer, thank you for your patience during these rough times, and thank you for your understanding.
What people are saying about Helpmonks
Did you know?
Helpmonks is the only shared mailbox solution that offers multiple deployment options
Use our hosted shared mailbox software for your business. Everything is safe, secure and password-protected. Your data is safe and your privacy is respected.
Over 2,000 organizations use Helpmonks daily on our platform. Get up and running within minutes.
Cloud Server Edition
Want your own server with unlimited mailboxes? Our cloud servers are deployed with the highest security in mind and are GDPR compliant.
Dedicated Helpmonks shared mailbox servers are the perfect companion to your team workflow.
Scalable and enterprise ready. The Helpmonks self-hosted edition grows with your company and has been deployed by some of the largest Fortune 500 companies with millions of emails
Secure and protected. Have full control over all data.
One more thing...
Helpmonks gives you the ability tocollaborate as a groupin your shared mailbox, features aTeam To-Do App, powerfulCustomer Relationship Management (CRM)capabilities, integratedLive-Chatand anEmail Marketing Platform, approval control, collision detection, email tracking, reminders, mentions, custom fields, custom applications, satisfaction ratings, Single-Sign-On & SAML authentication, Trello, Zapier, Slack, Livechat, Chatra, a Developer API, and much more.