Optimizing AWS Infrastructure: Preparing for Large Registrations

    Effortless scaling, increased registration volume, and revenue growth

    Big events should be fun, not stressful. Our customer runs a registration platform that previously saw spikes of hundreds and thousands of sign ups over a few minutes. When they experienced big events, the trouble was generally that the stack would collapse at around 2,000 concurrent users. The auto scale would lag, the pages would take ages to load, and ultimately some would-be attendees would just abandon the process altogether. It was not only embarrassing - it was costing real money!


    Challenges

    • High traffic in a short time: Thousands of people simultaneously slammed the website at once, and it created huge spikes in concurrency.
    • Frequent crashes: Even at well under 2,000 simultaneous registrations, the application would fail, ruining the user experience.
    • Inefficient auto scaling: The resources would not scale fast enough when it hit peak load and thus brought the service down.
    • Poor user experience: Users experienced slowness and downtime and just abandoned the processes, hurting their perception of the brand.
    • Revenue impact: System failures led to fewer completed registrations & lost revenue.

    The Solution

    • In order to uncover the bottlenecks we simulated huge concurrency using Apache JMeter in non-GUI mode on an AWS EC2 site we cross validated with Blazemeter.
    • We found that both the API layer as well as the database were choking.
    • We optimized the handling of requests at the Nginx layer, removed the API Gateway and pushed WAF rules to the Application Load Balancer (the API Gateway had limitations under max parallel traffic).
    • During this, continuous monitoring using New Relic and Datadog alerted us to any anomalies, allowing us to fix the issues before users experienced them.

    The Impact

    With Knackforge Cloud Services in place, the customer experienced:

    • 10,000+ concurrent registrations: The platform had incredibly efficiently handled its target load at both test conditions and production, without skipping a beat.
    • Unimpeded scaling: The optimized stack had proactively auto-scaled and avoided crashing.
    • Revenue: Improving registration success rates also delivered more attendees and hence revenue, during peak times.

    Technologies Used:

    • Apache JMeter
    • Blazemeter
    • AWS EC2 Auto Scaling
    • Nginx
    • AWS Application Load Balancer
    • New Relic
    • Datadog