How we managed to send 75k emails per hour
Ardent team at KnackForge always love to get hands dirty with challenging projects. In this connection we recently took an interesting newsletter sending project from one of our potential clients who is doing relatively big in Internet marketing.
In brief, we were asked for a custom system for sending out newsletter emails, based on Drupal. Tentatively 600k emails to be sent per month. A newsletter list shall have up to 80k users and limited to a couple of lists to begin with.
We got started with a site from client that already had simplenews, mandrill, mimemail, ckeditor, etc. pre-configured for us. I have covered in length about this in my previous blog Leveraging CKeditor template to theme Drupal contents and needs no much explanation for anyone who has already done newsletter site in Drupal. Everything looked straightforward until this point :)
On top of it, we did introduce a few features,
- Scheduled newsletter sending
- Checking broken links
- CKEditor template per newsletter
- Making mandrill obey 'From name' and 'From address' mentioned in simplenews (#214473
- Reject list handling (#2130153)
- User activity and reports
- To import subscribers from plain text file
Challenge 1: Handling reject list
We wanted to remove the hard bouncing emails from subscribers list. We had to pull those list of emails from Mandrill's end by making API call. Mandrill contrib module has been using its custom implementation for API calls from Drupal (instead of official library). This didn't support fetching reject list.
Our next thought was to use official library supported by Mandrill service provider. To our surprise both the code had same class name and hence serious conflict resulted in using both at once (#2152809).
As a workaround we inherited the module's class and forked the reject list API calls from official library in our custom inherited class. This did help us get job done right. The removal was handled as a cron job. Of course the dev branch of Mandrill module had a half baked code to support official library but not usable for production site :(
Challenge 2: Miserable throughput / send rate
Though we had ideas about subscribers count and emails to be handled, but least thought process happened in connection to throughput. By this we mean number of emails sent-out from our site. To our fortune we were able to get only 30 to 50 emails per minute. This took up to 15 hours to complete sending out newsletter to all subscribers. This is certainly not acceptable when the expected send rate is 72k per hour.
So what caused this,
- We were using elyisa cron (best of contrib modules) to trigger jobs like scheduled sending, reject list removal, user activity, last interaction, etc. and eventually simplenews cron was one among them
- Some jobs really took more time than anticipated to complete once started and didn't let room for simplenews cron to take charge for sending emails which is the high priority job ultimately. (At least the hosting provider did let us run the Apache process as long as it wanted)
- Besides, simplenews by default sends only one email at a time.
- Since we were using Mandrill's api implementation. A send API call which is an HTTP request, takes a second in ideal case to complete.
- By sending one email at once, the best throughput we were able to get was 60 emails per minute or 3600 per hour. Still too far from expected rate.
Fix 1: Concurrent sending
We thought sending would be faster, if we have multiple Apache processes pushing send API calls to Mandrill's end in parallel. We started looking at HTTP Parallel Request & Threading Library module module. Thought we can have multiple requests processing the email send (ideally running simplenews_cron()) but this didn't work quite well, as the requests were not well received in Mandrill's end or some mess happened along the way. Emails were sent only to limited no. of subscribers. In other words, not all API calls logged in Mandrill's end. Could be because the sending was too faster at a given time than Mandrill could receive to its best.
Fix 2: Concurrent Queuing and continuous sending
Since the concurrent api calls didn't work, we thought of de-coupling the email preparing job (simplenews cron) and sending (custom job makes api call). Mandrill had a settings to write the email data to Drupal queue (mandrill_queue). So the custom implementation based on httprl did trigger concurrent calls to keep writing the email data to mandrill_queue and we had another custom cron job that simply took those email data from queue and kept sending until queue goes down to zero, one by one. This trial helped to bring the sent time to 6 to 7 hours for subscribers count of 70k mainly because of continuous sending. Several attempts were made inline to this but could only reach up to 3 or 4 hours send time. Still not to our best yet.
Fix 3: Bulk sending / Merge tags and drush script as crontab
In the meantime we were in touch with Mandrill support team to get the best strategies to increase the throughput. Their understanding and timely help, directed us to Bulk sending of emails (multiple addresses in TO field). That looked convincing to us.
But had another question to answer along the way, how to handle dynamic contents specific to recipients? For instance name, unsubscribe link, etc.
Merge tags was suggested to handle the same. This is similar to token in Drupal and should be familiar to anyone used to Mailchimp. Merge tags are presented by syntax *|VARIABLE_NAME|*.
For instance *|FNAME|* would represent "first name" and to be replaced by it's equivalent value before sending actual email. List of merge variables and their value per recipients to be mentioned in email data. See http://help.mandrill.com/entries/21678522-How-do-I-use-merge-tags-to-add-dynamic-content-.
Note: Bulk sending requires unchecking option "Expose The List Of Recipients When Sending To Multiple Addresses" under "Sending options" in Mandrill account. Otherwise all addresses in TO field will be exposed to all recipients.
So we refrained from httprl, concurrent sending, queue, etc. though they helped to speed up the process but required more server resource. More focus was spent on bulk sending. Along the way we moved to relatively high end server. Simplenews_cron implementation was disabled and forked version of the same that would support multiple emails in TO field was arrived together with merge tabs implementations to handle dynamic content. The final version of code did send 2000 emails per api call (2 to 3 seconds). This was turned into drush script, running from cli as *nix crontab. Processing all emails in spool table until it goes empty. In less than an hour time or so time, our site is able to send as many as 75k emails. The joy of accomplishment.
Of course I have to thank and appreciate our colleagues (especially Ganesan) for being patient and nice for several rewrites that we had to do along the way until getting to desired throughput. And essentially to client for the trust and offering several trials that directed us to bring the best out of us.
Challenge 3: Reports per newsletter and per subscriber
For a given newsletter we wanted to show,
- Subscribers count the moment newsletter was sent (while simplenews module shows only the most recent subscribers count)
- Click rate and open rate
- Count of total sent, hard bounces, soft bounces, rejected emails, complaints, unsubscribe, opens, clicks, unique opens and unique clicks
For a given subscriber we wanted to show,
- List of newsletter sent together with date, click count and open count
- Last interaction of the subscriber
Exports Calls and Tags Calls did the needful to pull the raw data needed for the above reports. However to associate the received data to intended newsletter we had to set unique custom tags. Thanks to hook_mandrill_mail_alter(), this helped us to do all the override needed. We do sync these data to local data store periodically by running cron job. From the same we construct the reports as needed.