Alejandro Crosa

Collected Notes first “outage”!

I started receiving messages from people from some countries reporting that the CSS was not loading for the site. But, every time checked all seemed normal. So I messaged some friends and ask them if they saw any errors, they all reported that all worked as expected.

Turns out only certain geographic locations were getting 404 for the CSS stylesheet on the site. After a lot of head-scratching and talking to @_nlgonzalez and others, I realized that when I deploy a change on the site the replicas I run are more than one, so when k8s is updating the deployment some of the hosts will serve an old version of the site. Do you see the problem?

If a user comes to the site requesting an asset, Cloudfront will try to cache it from the app, but since the app host might have stale assets, Cloudfront would get a 404 and cache that response.

So what happened was that some Argentinian users were accessing the site while I was deploying and cached the 404 at the edge location :)


almost 4 years ago

Alejandro