Category Archives: Data

HolidayEmailActivity

Holiday Report: The Final Word

Over the 2015 holiday season — from Black Friday through the first week of the new year — we’ve been tracking consumer email behavior across industries, while keeping a close eye on Continue reading

Serving Files: S3 and High Availability

At Movable Ink we heavily use Amazon S3 for storing millions of files and serving them to hundreds of millions of users. It has a number of very compelling qualities: it has great performance characteristics and durability guarantees of a blistering eleven 9’s—they replicate our data in such a way that in theory there is 99.999999999% object retention.

However, durability and uptime are not one and the same, as many S3 customers found out when an internal configuration issue impacted services on Monday morning. The problem affected buckets in the US Standard S3 region, the most commonly used US S3 region.

We’re pretty conscious about potential single points of failure, and tend to have redundancy at multiple tiers: each layer is spread across multiple hosts which are interconnected at multiple points to the layers above and below it. This manifests as multiple load balancers, app servers, and availability zones, with the entire setup replicated across geographically separate datacenters thousands of miles apart. With all of that redundancy, of course we want our S3 serving to also be redundant.

S3 buckets are tied to a geographical location, and most correspond to one of Amazon’s datacenters. However, US Standard stores data on both the east coast and west coast. Given that it can be accessed from either coast, my first concern was around consistency: what would happen if you were to write data on one side and then immediately try to read it from the other? We tested it and it was oddly consistent, which seemed strange since it was serving from two different regions.

It turns out there is no replication happening. It actually only writes to the region of the endpoint you use while writing:

Amazon S3 automatically routes requests to facilities in Northern Virginia or the Pacific Northwest using network maps. Amazon S3 stores object data only in the facility that received the request.

Given this, we should really be treating US Standard as a single point of failure. So how can we make it redundant?

The strategy we take is to store data in different S3 regions, then come up with a way to point users and our backend services at whichever region is currently active. AWS actually has a couple of tools to facilitate the former. S3 supports file creation notifications to SNS or SQS, and you could set up AWS Lambda to automatically copy files to a different region. But even better than that, a few months ago Amazon released Cross-Region Replication to do exactly what we want. Setup is simple:

  • Turn on versioning on the source bucket. This comes at an extra cost since you pay for all previous versions of your files, but since we’ve already decided that this data is very important it’s worth it. After all, we’re talking about doubling our storage costs here.
  • Turn on cross-region replication. As part of the setup, you’ll create another versioned bucket in the destination datacenter and an IAM policy to allow data transfer between the two.
  • Do a one-time manual copy of all of your files from the source bucket to the destination bucket. Replication only copies files that are added or changed after replication is enabled. Make sure the permissions are identical.

cross-region-replication

Now every time we add a file to the source bucket, it is (eventually) replicated to the destination bucket. If all of our access is through our backend services, this may be good enough since failing over is a simple configuration change. But many of the references to our S3 buckets are buried in HTML documents or managed by third parties. How can we make it easy to switch between the buckets?

Our initial idea was to just set up a subdomain entry on a domain we control to CNAME to our S3 bucket, then do failover with DNS. S3 allows this, with one big caveat: your bucket must be named exactly the same as the domain. If you want to reference your S3 bucket as foo.example.com, your S3 bucket needs to be named foo.example.com.s3.amazonaws.com. Combined with S3’s restriction that every bucket name must be unique across regions, only one bucket can ever be referenced from foo.example.com so this doesn’t work.

Amazon has a CDN service, Cloudfront, which allows us to set an S3 bucket as an origin for our CDN distribution. We can then CNAME our subdomain to our Cloudfront distribution’s endpoint. In the event of a regional S3 failure, we can update Cloudfront to point to our backup S3 bucket. And you can either turn on caching and reap some latency benefits, or set the time-to-live cache setting to zero to act as a pass-through.

We would have preferred to set up two Cloudfront distributions and switch between them with DNS, but Amazon has similar restrictions disallowing two distributions from having the same CNAME. Still, this setup still lets us respond to an S3 outage in minutes, routing traffic to an unaffected region. In our tests, the failover can fully complete in between 5-10 minutes.

Building applications in the cloud means expecting failure, but it’s not always straightforward, especially when using third-party services like S3. Even with our final setup, it’s not completely clear what Cloudfront’s dependencies and failure modes are. But importantly, we control the DNS so we can implement our own fixes rather than waiting for Amazon.

If you’re interested in working on challenging problems like this, check out Movable Ink’s careers page.

– Michael Nutt, CTO

API Integration video for email

[VIDEO] Building Next-Gen Emails with API Integrations

Did you ever wish you could somehow combine social media and email marketing in totally new ways? Or pull in reservation information from an internal database and display it, in real-time, in your emails?

Well, good news – that’s all possible. By using API integrations within email campaigns, marketers can tap into a whole new world of opportunities. We recently published an eBook with examples of how brands are using API Integrations in email today and we wanted to continue that by showcasing how this looks in action.

Our product marketing specialist, Steven Joya, put together a brief video that will take you through the history of dynamic email content and how API integrations are implemented from start-to-finish.

Check it out:

Continue reading

How Lenscrafters Raised Click-Throughs by 60%

If you’re wearing glasses, there’s a good chance you’ve heard of Lenscrafters. You might even be wearing a brand you bought from the company right now. The brand has stores all over the country that sell eyewear products and offer eye exams.

One of the most important marketing tools for Lenscrafters is email. When it’s time for a sale or an appointment reminder, the company depends on email to drive results. That means getting online sales for glasses and, most importantly, getting people to book appointments.

Ivy Inniger, Director of CRM/Loyalty at Lenscrafters North America, wanted to make it as easy as possible to book appointments. To do that, Inniger and the team used Movable Ink’s technology to power custom-built API integrations in an email campaign.

Here’s how it worked:

Continue reading

Machine Learning and Data Science (xx) Love at Movable Ink

15761751682_69907f47de_o

Here at Movable Ink, we’re serious about gender equality. We are excited to support initiatives aimed at bringing more women into STEM fields by getting involved and supporting community efforts advocating this cause.

This past week, Movable Pink hosted the inaugural event of the NYC Women in Machine Learning and Data Science meetup. This group fosters informal discussions on machine learning and data science related topics, with the purpose of building a community around women in these fields. The inaugural event consisted of a series of lightning talks and culminated with a keynote delivered by Claudia Perlich, Chief Data Scientist at Dstillery. If you have not heard of Claudia yet, let’s just say she’s a data science rockstar, and some of her accomplishments include the Advertising Research Foundation’s (ARF) Grand Innovation Award and being selected as a member of the Crain’s NY annual 40 Under 40 list.

Lightning talks – Some interesting machine learning/data science topics were covered by speakers from the industry. With a span of 5 minutes each, the lightning talks covered the following topics:

  • Fashionably Data – speaker Anna Smith
  • Big Data and Hadoop – speaker Esther Kundin
  • Python Map Reduce vs Scalding – speaker Emily Samuels
  • Clustering Ferguson – speaker Dara Elass

Who attended? We had a diverse attendance, spanning both the industry and academic scenes. Some of the represented groups include:

  • Columbia university
  • NYU
  • Acxiom
  • Blenheim capital
  • Bloomberg
  • Distillery
  • Metis
  • RentTheRunway
  • Spotify

We were excited to see such a great turnout and vivid enthusiasm around machine learning and data science topics. Stay tuned for future events from the NYC Women in Machine Learning and Data Science meetup. Meanwhile, take a moment to check out some of the event pictures here.

The Problem with Predictive Analytics

Truman

Though some might object, most marketers would likely agree that relevance is an important factor in driving optimal email program performance and revenue.  Marketers are frenetically capturing, then filtering terabytes of customer data in an attempt to find that needle in the haystack that will tell them where/why/how each individual is most interested.  The difficulty is that there are an almost infinite number of data sources, and much of it is only relevant at the point in space and time at which it is captured.  Given the time it takes to structure and filter the data, then trigger or launch a campaign, the opportunity to most effectively leverage it is often long passed.

Typically, marketers do not use their email database as their Database Of Record (DBOR).  Instead, customer data are captured from various points (web clicks, views, purchases, forms, social, apps, 3rd parties, etc.) and stored in a DBOR, while email interaction data (opens, clicks) are stored within a separate email database.  It isn’t until all these data are merged that we can begin to tell a story of the individual customer.  Though some claim a “real-time” connection between the two, most email platforms only merge the data once every 24 hours.  In our era of perpetual motion and continual connectivity, 24 hours, or even 2-4 hours is sooo 2000-LATE.  Static and historical data certainly has value, but it is often out of context to the present moment.

Marketers understand this challenge, and many have turned to predictive analytics to optimize relevance.  While this often does bring some lift, one quickly reaches the ceiling because the practice still largely relies on static or aging data points, continual data refreshes, and a big fat guess.  So much time, effort, and budget is expended in trying to control the data, and usually it’s a losing fight.  Marketers can’t control data any more than we can control the individual customer.  Truthfully, the only one controlling customer data is the customer.  The customer has ultimate control over when the email is opened, where it is opened, or on what device, and such contexts are some of the most important factors to relevance.  If you believe that, then you probably also have to believe that the most critical moment occurs not when the email is sent, but rather when it is opened.

Evolution

Marketing is a continual evolution, and those at the front of the March of Progress have grown beyond Predictive and have begun using Agile email solutions, such as Movable Ink, that detect and leverage “time-of-open” data, such as time, location, device, even the weather to render contextually-personalized experiences.  For the uninitiated, I’ll explain by way of example… In the average database, my profile would tell you that I’m male, 38, live in the Bay Area, where it is often cool (I don’t mean in a hipster way…grrr) and foggy, and past purchases indicate a penchant for outerwear.  It might seem a reasonable assumption for a retailer to send me an email compelling me to take advantage of a remnant inventory sale on Member’s Only windbreakers, and if I download and purchase via their super-slick mobile app, I can celebrate an additional 15% off!  I’d be powerless to resist, right?

But remember, only the customer controls the data.  It just so happens that I’m writing this blog post on my laptop, while visiting clients…in Chicago…where it’s presently 86F with 90% humidity.  Given my current disposition, what would I want with a jacket…or an app download?  (I mean, I would want that Member’s Only jacket, but sure as hell not right now.)  Time-of-open data would have told the marketer that I’m not home, it’s hot where I am, and that I’m reading it on a laptop rather than an app-enabled device.  What I really want right now was the t-shirt/shorts email, linked to the specific product page. Predictive Analytics can’t predict such vagaries and variables.  Open-time optimization can.