The Radicati Group, an international technology market research firm, predicts that by 2019, business emails will account for 128.8 billion mails sent and received per day.
This hyper-growth is fueled by rapid digitization. With email being your primary identity, it also becomes a sink for every notification coming your way.
IT administrators of businesses often need to dig up and restore email data. This information can be useful as a reference for business decisions or during litigation. In most countries, retaining this information has become mandatory by law.
Many unpredictable circumstances can lead to losing important and crucial email data unless there is an efficient email back-up/ archival solution or system in place to prevent this and recover from it gracefully.
Let’s take a look at some data loss horror stories.
Scenario A: You’re just back from vacation, full of energy and ready to get to work…but a few hours later an unfortunate incident leads to a small office fire…luckily the office sprinklers put it out. However, in the process, your laptop is now wet and the data on it (including all downloaded mail) is inaccessible…lost.
Scenario B: A deadly virus infects your computer network and servers. This corrupts the email back-up, throwing everyone out of gear and disrupting the workflow.
Scenario C: One of your users accidentally deletes mail and wants those back.
Scenario D: Your users have to trim their mailboxes by deleting “unimportant” email on a regular basis to adhere to the mailbox quota policies of your organization. At some point, they may want to access historical data, which has been deleted.
During our pre-sales and service-level discussions with prospects and customers, who are seeking our services to help them manage their email data more effectively, we continuously learn about the different, and sometimes quite innovative methods used by their IT teams to secure a copy of their corporate emails.
The purpose of this article is to share these practices/strategies and highlight the pros, cons, and application of each so that you can make a more informed choice when designing your data management strategy.
What’s the requirement?
As part of information security, compliance, and IT requirements, businesses need to ensure that ALL the emails transacted by users are kept safe in a secondary or alternate store to facilitate:
- Quick retrieval of specific emails or the entire mailbox
- Fast search across the mailboxes of all users for regulation compliance, knowledge discovery
- DIY – An easy way for users to help themselves find and download their own mail.
The Mail Flow
If you are familiar with the mail flow architecture, please feel free to skip to the next section.
Before we get into the technicalities of the different ways to archive email, let’s understand the hops in the main flow path and at what points we can capture email.
Put it simply:
- A mail is received by your MTA/ mail server/ service
- The mail is delivered to the Inbox of the recipient user on the server.
- The user now connects to the mail server using a mail client (web, desktop, mobile) and downloads the mail using POP or views/works with the mail using IMAP. On certain systems, the user may use MAPI or ActiveSync to access the mail.
What’s the difference between Email Archiving and Backup?
Before we proceed, we should be clear on the core differences between Email Backup and Archiving.
An email backup is a collection of data stored on (usually removable) non-volatile storage media for purposes of recovery in case the original copy of data is lost or becomes inaccessible.
Thus, a backup contains a snapshot of the current state of the mail store. It does not contain the emails deleted from the mailbox or downloaded to the user’s PC between two backups. In case of a crash, messages can go missing from the last snapshot.
An email archive is a collection of data objects, perhaps with associated metadata in a storage system whose primary purpose is the long-term preservation and retention of that data. An email archive contains ALL the mails sent and received by the users and happens in ‘real time’.
The objective is that the organization has a copy of every mail sent or received, for a defined period of time, by selected or all users irrespective of how they access their mail. This results in the retrieval and storage of all mails without loss as seen during scheduled Backups.
Top 5 Strategies to Secure Email Data
1. Periodic BACKUP from the end points
Endpoints are the client devices and software from where the users access their email and download them to the client.
Many organizations focus on taking periodic backups of the email data on the client device, typically PST or EML files using a tool, which copies these files at specific intervals to a secondary storage medium. Some organizations also leave this to the end users to ensure that their data is safe.
The data backed up will contain the state of the user’s mailbox on the endpoint (such as a desktop) at the time of the backup.
Typically these backups are nothing but a collection of client data files (PST/EML or any other format depending on the software of the client) and are not conducive to support searching for emails or even selectively downloading email without too much effort.
And furthermore this method is too technical for any end user to access the backup themselves to search through or restore mail.
The use of this method:
In case of loss, failure or corruption of a user’s endpoint device/software, the administrator can restore the last good backup to get your users up and running quickly.
Some challenges with periodic backup from the endpoints are:
- Since these backups are taken at specific intervals, and not continuously, a backup will not include mails which were exchanged and which got deleted between two backups.
- If the endpoints are not online, the data backup jobs cannot be executed and may lead to data inconsistency.
- This method of backup does not store the data in a search ready, restore ready state.
- For an administrator to find a mail, he must know the period he is looking into, restore the correct PST file into a staging client setup, sync all the mails, and then search or restore as required.
- This process can take several hours or even run across days.
- This method of backup does not allow the users to help themselves.
2. Many to one Email Journaling, coupled with download to client PC [BACKUP]
It’s quite easy on most mail systems to configure a journaling rule to send a copy of every mail transacted by selected or all users to an email ID (typically called a journal email ID).
Generally, this journal email id is just another mailbox in the same mail system. The administrator configures a desktop email client like MS Outlook or Thunderbird, to POP mail from this journal email ID into a local PC to create PST files/EML files.
These PST and EML files are then backed up a secondary storage as described in the first strategy above and are regularly rotated to prevent endpoint storage bloat.
This is a feel good or notional backup, which suffers the same limitations as the first strategy as described above.
The use of this method:
This method of email backup, collects all mail for all users in one account and further splits this up into multiple PST/EML files, making it difficult to do any kind of processing on the data, or even restore any specific email for any user. Since using this method has no practical use, it is useful as a check mark compliance to the information security requirement to ensure that all data is backed up.
Some challenges with this method are:
Besides the challenges of periodically backing up the email data from the client as described in the first strategy above, the next big problem is that these backup files contain the data for all users, making it even more difficult to sift through and find information if and when required.
3. Mailstore BACKUPS from the server or backend storage
This is typically done by all mail administrators, wherein they periodically backup the entire mail storage from the backend to a secondary store.
This also happens periodically and will capture the mailbox storage via a snapshot tool or a tool like rsync to copy all the files, or a database backup tool depending on the mail solution deployed.
Since a lot of users pop and automatically delete mail from the mail store on the server, it is likely that this method of backup will capture even less data than the first strategy of backing up the endpoints.
The use of this method:
This method is a must for backend server maintenance and management procedures, which are used to restore the mail server or storage in case of an irrecoverable crash during a disaster. This method supports a DR strategy and should not be confused with an email backup, which can be used for selective restoration or search.
Some challenges with this method are:
This method is not useful while attempting to restore individual email or a full mailbox of a user since this is a raw backup of the mailbox storage in its most native format and is only meant to be used during a server restore procedure.
4. Email ARCHIVING to a separate system on-premise
It’s quite easy on most email systems to configure a journaling rule to send a copy of every mail transacted by selected or all users to a separate on premise archival platform, which ingests the email and retains the email in a search-ready form.
Broadly the main reason to keep this data on premise could be an infosec requirement. However, we have seen major acceptance from enterprises and even government organisations towards moving their data storage and data management workloads to the public cloud.
What all you can do with the data in the archival platform, which is on premise, depends on the capabilities of that platform. You may want to evaluate whether the platform allows you to keep all data online and search ready, and whether you can quickly find and restore any information quickly, and if your users can access their own archived email safely, securely and not be able to tamper with it, etc.
Check out this article which explains the components and costs of maintaining an in-premise setup.
While this method doesn’t replace strategy 3 (a server or mail store backup), it certainly can help you retire strategy 1 and 2, and improve productivity of the users and IT team.
5. Email ARCHIVING on the cloud
In this method, you would configure your primary mail platform to push/journal a copy of every mail transacted to a separate operational infrastructure on the cloud.
By having the data at a separate location in the cloud, you are improving redundancy, reliability, and safety of the data.
What all you can do with the data in the cloud archival platform, depends on the capabilities of that platform.
For example: Does the platform allow you to keep all data online and search-ready? Can you quickly find and restore any information quickly? Can your users access their own archived email safely, securely and not be able to tamper with it?
This article explains the components and costs of maintaining an on-premise setup v/s opting for a cloud solution.
While this method doesn’t replace strategy 3 (a server or mail store backup), it certainly can help you retire strategy 1 and 2, and improve the productivity of the users and IT team.
If you are using an on-premise mail server platform, then strategy 3, which is to maintain Server/backend storage backups is a must from a service operational and DR strategy perspective.
Strategy 1 and 2, which are methods to backup email data may not be required anymore, if you opt for strategy 4/5, which is mail archiving, since by design a copy of every mail is captured and stored to the archive store, and many archiving platforms may give you tools to search for email, restore mail selectively, and even allow your users to access their own archives (self-help), besides a host of other features.
Here is a table covering all the above strategies and mapping them against required features to manage email data effectively.
|Strategy||100% mail archived.||Compliance needing Fast Search across mailboxes||Organisation wide Knowledge discovery||End user self service for discovery and recovery||Data safe offsite||All Data online and Search ready|
|Periodic BACKUP from the end points||No||Not possible||Not possible||Not available||No. Most backups are stored at same site||Not possible|
|Many to one journaling||Yes||Not possible easily||Not possible easily||Not available||No. Most backups are stored at same site||Not possible|
|Mail store BACKUPS||Not possible||Not possible||Not possible||Not possible||No. Most backups are stored at same site||Not possible|
|Mail ARCHIVING in-premise||Yes||Yes. But not scalable||Yes. But not scalable||Sometimes available||No. Typically at same site||Maybe. Some have secondary volumes|
|Mail ARCHIVING on cloud||Yes||Yes and elastic||Yes & elastic||Yes.||Yes||Yes|
Once you are convinced that email archiving is the most optimal email data management solution, then you may want to read this post on How to choose between on-premise, dedicated-on-cloud or a SaaS-on-cloud for archiving of email.
Also, see why Vaultastic is a great fit for archiving emails to the cloud from any primary email platform.