Flipboard is a news aggregator and social network aggregation company with 500 million users and 100 million monthly active users. Flipboard was founded as one place to find the stories for your day, bringing together your favorite news sources with social content to give an in-depth view into everything from political issues to technology trends to travel inspiration.
Over the years, in partnership with the world’s greatest publishers, we’ve built a curated experience with a plurality of voices, where people can find quality stories of any interest, investing in their lives and passions.
The Flipboard platform has 100 million active users who generate 200k-300k read/sec, and 60k-120k writes/sec to Apache HBase. They used a self-managed Cloudera Cluster on EC2 to support their platform with HBase with 5,600 regions and 40 TB of data on EBS.
As the data volumes grew and user throughput requirements increased, Cloudera-based HBase was not scalable enough to support Flipboard's platform.
Because Flipboard uses a significantly large payload for each key in HBase, the bucket sizes were challenging to identify. So, the Mactores data engineering team decided it was best to test various Amazon EMR configurations and used Yahoo Cloud Severing Benchmark (YCSB) tool to benchmark all combinations of configurations to arrive at the recommended Amazon EMR deployment. Due to significantly high traffic and the requirements of the average latency to be less than 100ms for “get” requests, the Mactores team optimized the bucket cache, block cache, and various other HBase parameters to support the S3-based HBase.
The Flipboard team has much more time to focus on forwarding motions now that they leverage the managed services. By modernizing the platform, the Flipboard team now had the availability to add more functionality to improve the user experience. With the spike concerns now properly managed, costs are much more manageable and predictable.
Post Migration to Amazon EMR Flipboard benefits from the Autoscaling EMR clusters by adding region servers to support user traffic spikes. With Amazon EMR HBase backed by S3 support read replica, Flipboard can now separate read traffic from writing and provide high throughput to users who perform interactive actions on the platform. The next step of Modernization for Flipboard is planning to migrate some of the tables to DynamoDB and re-evaluate their use cases to use multi-column family databases.
The Flipboard team had already spent too many hours establishing a sensible starting point to resolve this ongoing challenge.
The AWS team recommended using a partner with experience with HBase challenges. Mactores suggested the approach of performing an assessment that aligned with the Flipboard team.
The assessment results clearly defined the previous efforts and needed strategy to move forward with a production solution.