All of your transactions have been restored to your accounts and they will show up as they normally do on both the web and mobile apps.
What the heck happened?
Now, a little more explanation about what happened and what we're doing to future-proof the site. On the morning of Tuesday June 30th, the database started getting hung up on certain queries which caused a queue of queries to form. When this happens, the server tries its best to get through the queue as fast as possible, but in doing so, overloads and causes the queue to essentially stop in its tracks. This makes itself visible to you by seeing a bunch of errors and not being able to log into the site.
We were able to get things running again for the rest of the day but two days later on the morning of Thursday July 2nd the database had the same issue. We scrambled to get things running again but everything we'd done in the past simply wasn't working. We narrowed the problem down to a single table in our database, the Transactions table. This is what stores all of the transactions that ClearCheckbook users add to the site, all 75+ million of them. Whenever changes are made to this table (adding, editing or deleting transactions), indexes that help the database run more efficiently are updated. Normally this is a quick process but on 75M rows of data, this can take a long time.
To try and speed up the reading/writing of this table, we archived transactions for users who hadn't logged into the site for a few months. Unfortunately this didn't help too much and after a few hours we were back to the database overloading issues. By now it's Friday July 3rd and we're scrambling to get things figured out but it seems like nothing we do is helping at all. It's now that we make some calls and, fortunately, even on a Friday afternoon before Independence Day in the United States, we were able to reach some outside help that was able to meet up on Saturday July 4th and work with us.
The first thing we did was backup all transactions and force manual restoring when you logged back in. This worked fairly well and at least got the site operational. The problem was many people were reporting that not all of their transactions fully restored and were concerned that their data was lost.
What we're doing in the short term
This obviously wasn't a long term solution and the confusion that arose made us speed up plans to break up the giant Transactions table into about 30 smaller tables based on user id. This will be our mid-to-long term solution as far as the database schema goes. The downtime tonight was necessary so we could move all the transactions from the giant tables into the smaller ones. In the short term, this will help the site maintain uptime on our current hardware since the database won't have to search 75M transactions regularly.
What's next
The next step, and what we're working on now, is migrating to a cloud based solution. The outside help we mentioned earlier also excels at making these transitions and has done so with other companies in the past. We're working with them now to optimize our site code for its next home in the cloud. The cloud solution will help us expand and scale up as the site grows without having to maintain our own servers.
A heartfelt apology
I have to admit that this is the single most stressful situation I've ever been in before. When I took ClearCheckbook on as my full time job back in 2009, I knew there would be growing challenges along the way. I use the site daily myself for both personal and business accounts. When the site is having problems, I know how frustrating it can be.
What compounds the frustration is when I receive threatening, cursing and hate filled messages to my personal cell phone and email accounts and our social media outlets and the ClearCheckbook blog while we're working our hardest to resolve the problems. At the height of the issues, we were receiving several hundred emails an hour. There's no possible way we could respond to each one of your messages individually and still have time to work on the database problems.
I can't express how sorry I am that this happened the way it did. Please know that we do have a plan of action to prevent this kind of mess from happening in the future and I'll be posting more information about it as a plan and timeline solidifies.
If you're a premium member who contacted us to cancel/refund your membership, we tried to search through all the messages and fulfill your requests, but there are probably some that slipped through the cracks. If you haven't heard back from us and still wish to stop using the site, please contact us again through the Contact Us link at the bottom right side of the page.
Again, from the bottom of my heart, I'm so extremely sorry for the inconveniences the downtime caused you.
Brandon OBrien
Founder, ClearCheckbook