There are many email archiving products that allow companies to apply retention and disposition to emails, including EMC SourceOne. The problem with these kinds of products is that they tend to address email archiving from a storage perspective instead of addressing records management perspective. Here are list of records management challenges that are unique to emails:
- Single copy – Most emails tend to have multiple recipients (from either to, cc, or bcc). So, if retention rules are defined for multiple users, you can potentially have duplicate email records. However, most email archiving products do a good job de-duplicating emails, so you do not have multiple copies of the same record.
- Categorization/Classification – Retention in records management is typically associated with a fileplan. Based on content of document, process in which document is derived from or applied to, or the creator of the document, document is classified to appropriate record series in the fileplan. The challenge with email is that its typically short, so there may not be enough content to truly classify it correctly. Some emails can be categorized to a process (eg invoice sent), which can then be correctly placed in the appropriate record series and retained for the appropriately amount of time. Most emails have to categorized based on who the owner is (eg CEO) and retained for generic amount of time.
Some email archiving products allow you to define rules to better categorize emails; however, this requires “true understanding” of business. For example, if you define a rule to “delete emails immediately with subject contains party”, this would do a good job filtering birthday party invites; however, if your company is in the business of hosting parties, you would not want to use this rule for categorization. Other email archiving products use heuristics to analyze sample of your companies email and you teach it what kind of emails should be records and what kinds of emails can be considered non-records. The challenge of heuristics is that you need a good sampling, which tends to require a large sampling of emails. Plus, you will need to train the system on how to categorize. Training heuristic system is easier to do then developing rules for categorization; however, gathering a good sampling is typically harder than using examples in developing rules.
There are a few vendors that apply both strategies of categorization – use heuristics first and then apply rules for categorization. I believe this is the best approach in solving a classification problem with limited content.
- Owner of email/record – Whereas a document tends to have a well-defined owner/creator, the owner of an email can be vague. For example, an email author may send request to a group of people for the latest version of SOP document. Several people reply with different versions of the desired document (due to lack of version control or use of Content Management System). Assuming that the email is categorized based on email authors, emails for different email authors may be different based on their roles (eg CEO emails retain for 7 yrs, everyone else 3 yrs). Therefore, you can have a situation where the same SOP document gets retained for different times because of how email categorized.
This challenge is not as much a technical issue as how the rules of classification are defined. In the world of records management, a document has a real owner. The request for a document is not typically treated as a record itself, so you do not run into these kind of issues. You may encounter that a document may fit into multiple record series, but at the end of the day, the records managers decide where the document fits most appropriately.
- Versioning/supersession – SOP documents typically have a shelf-life and get replaced on a periodic basis. When a new version of the document is declared as a record, the new record supersedes the old record. This is a straight-forward concept. How does this apply to email? Emails are normally not versioned; you typically have an email chain going back and forth between various users. Which one of the emails should be the record and which should be superseded? The reality is that supersession does not really fit with emails (hence title of the blog post).
Some vendors treat email chains as single chain and applies a single retention based on the earliest or latest date of the email chain. The idea of superseding email does not really exist in email archiving products. Again, this is not a technical issue as much as emails are very different than typical document/records.
I strongly believe that companies need to include email as part of their overall records management strategy; however, their current records management system may not be up to par to handle email archiving. Likewise, email archiving systems should not be use as records management system. I feel that in most situations, a company needs both types of systems, but they also should have a well thought out unified records strategy that includes both documents and emails.