Skip to main content

Relay Network is a technology company that offers a new personalized, mobile 1:1 channel for businesses to more effectively connect and engage with their customers. Relay’s secure and compliant feed channel combines the timeliness of text with the multifaceted experience of scrolling social feeds, helping over 100 industry-leading clients increase revenue, lower cost to serve, and improve customer relationships.

The company operates on AWS, with complex workloads that demand robust business continuity measures. Relay’s commitment to delivering value to customers keeps them on the vanguard of new technologies that offer product efficiencies, process improvements, and materially better outcomes for customers and end users in the product.

The Challenge

To guarantee the quality of customer experience, Relay consistently prioritized business continuity measures throughout the business and technical teams. However, this required intensive manual effort, diverted resources, and required implementation of complex workflows. Brendan Putek, Director of DevOps, recognized the inefficiency of these efforts in the area of Disaster Recovery and prioritized finding a better way forward.

Their DR Recovery Time Objective (RTO) had historically been 96 hours—which satisfied customer SLAs but left much to be desired with the recovery process and team experience during a disaster. When a major healthcare customer required a 24-hour RTO, it was clearly time to figure out a new solution.

With hundreds of thousands of AWS resources to manage and a small engineering team already handling multiple priorities, implementing an updated DR solution manually would have required approximately 18 months and at least three full-time engineers.

Additionally, they needed to maintain their 12-hour Recovery Point Objective (RPO) while managing both legacy systems and multiple deployment tools and technology stacks, each with their own unique peculiarities, and an infrastructure that was primarily immutable and ephemeral.

The Solution

Relay Network found their solution in Arpio: a Disaster Recovery platform specially built for complex AWS environments. The tool’s ability to automatically discover and link resources across AWS accounts proved crucial for their implementation. This approach allowed them to maintain strict account boundaries for security while efficiently replicating their infrastructure for DR purposes.

Arpio particularly excelled in handling their complex environment, which included various AWS services and legacy systems, including services like DynamoDB, RDS, Elasticache, OpenSearch, EC2, ECS Fargate, Lambda, EventBridge, SQS, S3, and Step Functions. Beyond dynamic resource discovery and linking, integration with existing AWS infrastructure made a world of difference. Other key functionality included filtering by tags, which allowed Relay to easily exclude non-production resources in the region.

The Results

Relay Network successfully implemented Arpio as their DR solution within the customer’s required timeframe. The implementation proved remarkably efficient, requiring only 1.8 full-time equivalent (FTE) resources over nine months.

Other customer requirements were easily met: Relay showed a 12x improvement in RTO after implementing Arpio. Brendan noted that “The actual hands-on work during recovery tests takes only 20 minutes, with the remainder primarily consisting of system initialization time.” This dramatic improvement enabled them to meet their customers’ requirements while maintaining their existing RPO.

Arpio has also helped the Relay team identify issues before they arise and proactively address additional future risk points. For example, Brendan noted that the account separation of their DR environment has also enhanced their security posture.

“Arpio gets a 10/10 Recommendation”

After over a year from onboarding and innumerable DR tests later, Relay Network’s DR implementation has been a resounding success with Arpio. The project met their immediate customer requirements and established a foundation for future improvements in their disaster recovery capabilities, all while requiring significantly less resource investment than a manual implementation would have demanded.

This wouldn’t have been possible without the support of the Arpio team. Both on a daily basis with our questions, to roadmap items for supporting key resources based on our feedback, it’s a partnership—not just a vendor relationship. – Brendan Putek, Director of DevOps, Relay Network