500 Billion Impressions and Counting: How Movable Ink Handles Holiday Volume

Cyber Week 2017 was historic for consumer spending, with $6.59 billion spent on Cyber Monday alone. A healthy portion of that spend can be attributed to clever email marketing campaigns, and as a result Cyber Week was historic for Movable Ink, too. 

We hit nearly 2 billion impressions in 24 hours and reached a total of 500 billion all-time impressions. We’re proud to say we had 100% uptime during Cyber Week, just like we have for every Cyber Week since 2010.

So, how do we do it? We chatted with Lee Bankewitz, SVP of Engineering here at Movable Ink to find out how our platform handles the extra volume and how we prepare for Cyber Week.

Q: How do you prepare both the platform and engineering team for such a huge influx of demand around Black Friday and Cyber Week?

A: On the platform side, we try to anticipate the peaks we will see by working with our customers and understanding their campaign calendars and use cases. The system is fully automated at this point, so there is no manual intervention necessary for scaling. For the holidays, though, we get involved “just in case” and tweak a few things here or there.

As for the team, we have a dual-on-call 24-7 schedule from Black Friday through Cyber Monday for each of our teams. This year, those folks had the most boring on-call tours we’ve seen yet, which is the way we like it!

Q: What technical changes did you deploy to help scale the platform up to handle the huge number of impressions served?

A: Our system is already self-healing and has several layers of redundancy, including complete datacenter redundancy. There are many services and communication layers in the architecture, and each of them have their own health metrics which are continuously monitored. When a system shows early signs of degraded performance, it automatically expands to absorb the additional load and alerts the on-call staff.

For the weekend, we added plenty of extra capacity in anticipation of the extra volume, and we tweaked the self-healing configuration to more “paranoid,” so to speak. Basically, if anything showed the slightest signs of degradation, that system would react quickly and overcompensate. The on-call staff would be alerted and spot-check things afterwards but, that’s more of a formality at this point.

Q: Did you experience any interruptions or outages? How did you prepare to handle outages if they arose?

A: None! It was a very smooth holiday weekend, with absolutely zero interruptions. There were weeks of preparation including capacity planning, on-call scheduling, and communication templates for all sorts of worst-case scenarios but in the end, everything ran smoothly keeping our perfect BF/CM record in tact.

Q: Was your behavior tracking script impacted by the increase in demand on both your platform and clients’ websites?

A: Our on-site tracking script performed as well as it always has, despite a tremendous increase in online traffic. Our customers all see a big spike in site visits during the holidays, and our system needs to scale to accommodate all of those spikes at the same time! We observed half-a-billion on-site visits _per day_ from BF to CM. We are proud to report that we absorbed this traffic without incident while staying well within our SLAs.