Your website is down! What happened? In this blog, Mike Griffiths, Managing Director at Reckless takes you through 4 of the best things you can do, from a tech point-of-view, to mitigate issues happening in that most important period.
It’s your busiest time of the year. You’ve done all of your planning and preparations; you’ve ensured you have plenty of stock, you have extra staff ready in the warehouse, customer services are ready. The orders start flooding in but then the worst happens… they stop, suddenly.
The website is down. Your storefront, your only source of revenue, totally closed at the busiest time imaginable. Your stock and staff are sitting there waiting. The worst case scenario.
Well, unfortunately, it could be any number of things. In this blog, Network Members and eCommerce specialists Reckless, take you through 4 of the best things you can do, from a tech point-of-view, to mitigate issues happening in that most important period.
A lot of the points are around website performance. If you implement the below you’ll be left with a faster website, which doesn’t just make it more reliable but will increase your conversion rate substantially.
When we onboard a client one of the first things we do is implement Cloudflare. Cloudflare is sometimes described as a content delivery network (CDN), but it’s so much more than that. It sits between the internet and your server, doing a few things:
That all sounds very fancy, but the key takeaway is that it will speed things up and keep things safe. The best bit: it’s free.
The main benefit for your website is going to be the caching of requests. If Cloudflare receives a request from a visitor and doesn’t need to request anything from your server then it won’t. That means a substantial percentage of requests won’t ever hit your server at all, so your resources go much further.
Let’s take a look at an example from one of our clients where we have implemented Cloudflare:
In the last 30 days nearly 80% of all requests have been cached. That’s 378 GB of data that should have been sent from our servers, which Cloudflare just took care of.
You’ll need to ensure the assets on your website can be easily cached, and you’ll need a way to implement cache busting for when you make changes to your assets (such as a CSS or JS change).
Not part of the free Cloudflare suite of tools, but a worthwhile (and small at $5 per 100,000 images stored and $1 per 100,000 images served) investment. Cloudflare Images does a few things for you:
As a developer, when you build a website you should be doing it with a performance-first approach. The goal is to have a website that scores perfectly in the various online tools, such as PageSpeed.
Attaining that 100/100 score is perfectly achievable. Until, that is, the required third-party scripts are added to the site. Every marketer knows that you can’t run an eCommerce store without the bare minimum of tracking in place. A typical eCommerce store would have at least the following (or equivalents of):
Not to mention potential CRM tools like HubSpot, or customer data platforms such as Segment.
Adding those tools is going to cripple your PageSpeed score, and may well lead to failing some of your Core Web Vitals metrics. But you can’t not have them, so what’s the solution?
In comes Zaraz. Zaraz, a tag manager similar to Google Tag Manager (GTM). There are various differences between them (see Cloudflare Zaraz vs Google Tag Manager (GTM)), but the key difference is that Zaraz offloads processing of these third parties away from the browser and onto Cloudflare’s network.
To give you an illustrative example, let’s say you could have an eCommerce store that has been developed with no third-party scripts present. It scores a healthy 90/100 on PageSpeed. Great.
You add your tracking scripts and check the score again, and it’s now a very sad 40/100.
Move those scripts over to Zaraz and the score will jump back up to 85/100.
Zaraz isn’t the silver bullet that fixes everything, and there are a number of niggles you might need to iron out, but it’s well worth a look if you’re suffering with third-party tracking speed woes.
Alright, onto the next point. The typical eCommerce ecosystem consists of:
When a customer transacts on the website it will usually go through the following steps:
Pretty straightforward. Unfortunately, ERPs are third-parties that your eCommerce store has no direct control over, but has a heavy reliance on.
At your busiest times, the ERP is going to get hit frequently. You may have put in all precautions to ensure your eCommerce store is robust enough to cope, but if your ERP falls over then how are you going to take orders? Your customers will go to transact and see errors or white screens. Not ideal.
Over the years I’ve seen many, many providers implement their ERP in this heavily reliant way. But as an eCommerce operator you need to ensure that you don’t trust your ERP to be available.
Here’s how we handle it instead:
Following this flow we are able to continue transacting no matter what happens to the ERP. If it’s slow then it doesn’t affect the user experience. If it is totally unavailable the customer is still able to transact.
There are things to consider with this approach, the main one being stock management. It’s important to maintain stock management within the eCommerce platform, but to sync it with the ERP regularly. This way, if the ERP is unavailable you’re still unlikely to oversell a product as your eCommerce store should be aware of the live stock numbers prior to the ERP going offline.
The bottleneck on most eCommerce platforms is the database. If something is going to choke up and become unavailable, it’s going to be that.
Usually you have three problems with databases on busy sites:
Users never connect to the database, your application does. Usually, your application will connect to the database at the start of the request, leave that connection open while it responds to the request, and then closes the connection. Remember, we’re talking about request time here - which is the time it takes the server to respond to the request - it’s not the time someone sits lingering on the page. As such, we’re dealing with milliseconds.
Before your busy time it’s important to stress test your site to understand how many concurrent visits you can handle, and then ensure your infrastructure can handle way more than you’re anticipating.
OK, so let’s assume we have infrastructure in place to handle more traffic than we’re anticipating. For the sake of this example, let's say that the average web page takes around 100ms for the server to query the database, compile the response and send it to the user. Let’s also say that your infrastructure can handle 1,000 concurrent requests (that’s a lot, given it’s just a 100ms window). Great. But what happens if one of your queries hangs?
It’s entirely possible that you have one rogue query in play that may be slow under certain circumstances, that may not be immediately obvious. If that one query takes 2 seconds to respond, all of a sudden your 100ms window jumps to 2100ms. Requests take longer to process, and the site begins to slow down for visitors. Your application quickly hits a state where it’s beginning to queue connections because there are none available. Some visitors get frustrated and hit refresh a few times, which exacerbates the issue. It quickly snowballs until the site goes down. You can bring it back up again, but the same snowball effect happens after a few minutes. Nightmare.
So what do we do to prevent it? You profile your queries.
Two options here:
MySQL has a built-in slow query logger. You don’t want to run this in production for too long as it can slow things down, but running for a short period of time on the run up to your busy period can show what queries are the bottleneck on your application.
The other option is a query profiling tool. We utilise Laravel heavily for our applications, so we often use Clockwork, which shows our developers which queries take the longest to run on each page request, and what part of the code is the source for the query (something that can be difficult to track down from a slow query log).
Once you have identified the slow queries, then what?
Well, the obvious thing to do is to try and make the queries themselves more efficient. Unfortunately that isn’t always possible. Sometimes you just need to crunch numbers to get the results you need. There are usually various caching tactics you can employ to mitigate this, but another strategy is to understand what the most computationally expensive parts of your page are and understand how and when they are being loaded.
One of the best examples of this is a “Other Customers Bought” block on a product listing page on an eCommerce store. Usually this is a carousel of products that features towards the bottom of the product listing page, or perhaps as an upsell on the cart.
Technically speaking, this is a very expensive feature. If it’s querying live data then it is effectively trawling through every order line item in every order on your store, and compiling a list of similar products from that. If you find that your eCommerce platform is doing this live on every page load then it’s probably a good indication that you need to assess whether you’re on the right platform to begin with - but let’s look at how you could mitigate the problem.
Two possible options for this very slow query would be:
You’re not going to catch everything every time, so it’s important to have something in place that catches errors as they happen.
There are lots of error trackers available, and they all have pros and cons. We utilise Laravel heavily, and as such tend to lean towards an error tracker called Flare.
Flare is great because it has context of your application’s code. It runs on both your server-side code, catching every warning or exception and logging it, and in the browser, catching every client-side error and warning, even those being thrown by third-party scripts.
Most error trackers can be configured to alert developers when a certain threshold is met in error numbers or severity. This can obviously be tweaked during your busy periods.
Because Flare has the context of your entire application it can also be used as a query profiler, helping to diagnose slower queries and how/when they’re being run in your production environment.
We’ve run through the different ways you can ensure your eCommerce store is ready for busy periods. Keeping things running like clockwork is challenging, but having the right team of people around you is invaluable.
Be prepared for the worst to happen, have mitigation strategies in place, and ensure your team is ready to respond should the worst happen.
The GO! Network is a free-to-use marketing intermediary, connecting in-house marketers with vetted agency partners that can help solve your challenge. If you’re looking to review or take on support, let us know here.