Email & Records Management – Putting square peg in a round hole

There are many email archiving products that allow companies to apply retention and disposition to emails, including EMC SourceOne.  The problem with these kinds of products is that they tend to address email archiving from a storage perspective instead of addressing records management perspective.   Here are list of records management challenges that are unique to emails:

  1. Single copy – Most emails tend to have multiple recipients (from either to, cc, or bcc).  So, if retention rules are defined for multiple users, you can potentially have duplicate email records.  However, most email archiving products do a good job de-duplicating emails, so you do not have multiple copies of the same record.
  2. Categorization/Classification – Retention in records management is typically associated with a fileplan.  Based on content of document, process in which document is derived from or applied to, or the creator of the document, document is classified to appropriate record series in the fileplan.  The challenge with email is that its typically short, so there may not be enough content to truly classify it correctly.  Some emails can be categorized to a process (eg invoice sent), which can then be correctly placed in the appropriate record series and retained for the appropriately amount of time.  Most emails have to categorized based on who the owner is (eg CEO) and retained for generic amount of time.

    Some email archiving products allow you to define rules to better categorize emails; however, this requires “true understanding” of business.  For example, if you define a rule to “delete emails immediately with subject contains party”, this would do a good job filtering birthday party invites; however, if your company is in the business of hosting parties, you would not want to use this rule for categorization.  Other email archiving products use heuristics to analyze sample of your companies email and you teach it what kind of emails should be records and what kinds of emails can be considered non-records.  The challenge of heuristics is that you need a good sampling, which tends to require a large sampling of emails.  Plus, you will need to train the system on how to categorize.  Training heuristic system is easier to do then developing rules for categorization; however, gathering a good sampling is typically harder than using examples in developing rules.

There are a few vendors that apply both strategies of categorization – use heuristics first and then apply rules for categorization.  I believe this is the best approach in solving a classification problem with limited content.

  1. Owner of email/record – Whereas a document tends to have a well-defined owner/creator, the owner of an email can be vague.  For example, an email author may send request to a group of people for the latest version of SOP document.  Several people reply with different versions of the desired document (due to lack of version control or use of Content Management System).  Assuming that the email is categorized based on email authors, emails for different email authors may be different based on their roles (eg CEO emails retain for 7 yrs, everyone else 3 yrs).  Therefore, you can have a situation where the same SOP document gets retained for different times because of how email categorized.

This challenge is not as much a technical issue as how the rules of classification are defined.  In the world of records management, a document has a real owner.  The request for a document is not typically treated as a record itself, so you do not run into these kind of issues.  You may encounter that a document may fit into multiple record series, but at the end of the day, the records managers decide where the document fits most appropriately.

  1. Versioning/supersession – SOP documents typically have a shelf-life and get replaced on a periodic basis.  When a new version of the document is declared as a record, the new record supersedes the old record.  This is a straight-forward concept.  How does this apply to email?  Emails are normally not versioned; you typically have an email chain going back and forth between various users.  Which one of the emails should be the record and which should be superseded?  The reality is that supersession does not really fit with emails (hence title of the blog post).

Some vendors treat email chains as single chain and applies a single retention based on the earliest or latest date of the email chain.  The idea of superseding email does not really exist in email archiving products.  Again, this is not a technical issue as much as emails are very different than typical document/records.

I strongly believe that companies need to include email as part of their overall records management strategy; however, their current records management system may not be up to par to handle email archiving.  Likewise, email archiving systems should not be use as records management system.  I feel that in most situations, a company needs both types of systems, but they also should have a well thought out unified records strategy that includes both documents and emails.

Johnny Gee

Review – Alfresco 3 Cookbook

Alfresco 3 Cookbook by Snig Bhaumik.

It seems that I have become a regular reviewer of Alfresco books for Packt.  The publisher invited me to review the latest book about Alfresco and provided me a free copy of the book to review.  I was somewhat excited about reading this book because the purpose of the book was to provide “minimum theory…maximum action.”  I assumed that this book would be filled with tons of sample code that I could use on a regular basis.

If you are a first time Alfresco developer and have not read many books on Alfresco, Snig’s cookbook is an excellent reference.  If you have read other Alfresco books, about half the material will be familiar and is what you would typically find in Alfresco tutorial.  If you have already implemented several Alfresco applications, go directly to the workflow chapter; you probably already know the solutions/sample code presented in the rest of the cookbook.

Chp 1-5 covers the various features of Alfresco from both a user and administrator perspective.  Skip these chapters if you are familiar with Alfresco already.

Chp 6-7 talks about simple UI customization and how to configure custom content types, aspects, and search.  Again, there is nothing new if you have played with Alfresco before.

Chp 8 introduces Alfresco JavaScript API.  This is where the cookbook starts to shine.  Snig approach to presenting problems to solve and solutions/sample code to solve involves the following:

1) Getting Ready (intro to the problem)
2) How to do it (instructions/solution overview)
3) How it works (technical explanation of the solution)
4) There’s more (optional supplemental info).

Sample code is great for users who are not interested in learning to build Alfresco solutions.  Snig’s “How it works” section presents the secret sauce on what the sample code does and how it solves the problem.  I strongly believe that explaining how to solve the problem is more valuable than the actual solution itself.  Snig does a good job of explaining the solutions he presents throughout the book.

Chp 9 goes into more detail about FreeMarker templates.  If you haven’t worked with FreeMarker templates, Snig presents plenty  of examples that are useful from a day-to-day perspective.

Chp 10 discusses web scripts provided by Alfresco.  These scripts are built as RESTful APIs and can be customized without java or Eclipse.  Some of the solutions include:
1) Show home details -> good for describing user/project profiles
2) Display details of document (via search) -> good example of how to render in JSON
3) Sending emails using mail templates -> details how to include ticket in calling another web script, so that user doesn’t have to enter his credentials again.

Chp 11 is best chapter in the book; it covers how Alfresco uses jBPM from top to bottom.  Snig presents an excellent example of how to build a workflow using SDLC as sample process.  He then describes how it works in great detail.  For this chapter, I wished he covered how it works before how to do it.

Chp 12-14 covers additional features of Alfresco, like integration with Outlook, email, and file server.  These chapters detail how to configure these features and do not provide samples on how to customize the integrations.

I would recommend this book to someone who was asked to start building a prototype tomorrow.  If you are planning to learn Alfresco as profession, there are other books about Alfresco that cover the theory in more detail.

O Documentum Developer, Where Art Thou or rather Where Shall Thou Go?

I recently received an email from a Documentum developer asking about what technologies he should learn for the future, especially for job security.  Although I know Documentum technology well, I do not necessarily have a gauge on the job market.  Other bloggers like Virginia Backaitis tend to have a good pulse on the market as well as the development community.

That being said, I saw a recent blog posting about how Microsoft might be dumping .NET for HTML5 and Javascript.  This would be a game changer in the Documentum development community, which is basically comprise of two camps: java developers and .NET developers.  When Documentum version 5 came it, it was a big shift from Documentum’s proprietary API and docbasic to Java.  Documentum 5.3 provided a COM bridge that allowed Microsoft developers to call Documentum DFC (java classes).  With version 6.5, this COM bridge is no longer supported and Microsoft developers are forced to write web services to call Documentum DFS (web services).  Web services allow developers to integrate disparate systems running on different technologies (eg java vs .NET).  This was great advancement from a technical point of view, but practically it meant that a project might require both Java developer and Microsoft .NET developer.

If Microsoft is now going to switch development technologies away from .NET and to HTML+javascript, this means that learning proprietary .NET may no longer be necessary.  I bring this up because even EMC is planning to rewrite its Taskspace application using Spring and ExtJS.  I’m not sure if the SDK/customization model will be based on extended javascript, but I’m starting to see convergence on web development technology.

If I was a betting man, I would wager some money on learning ExtJS.  I wouldnt double down on it though.

Follow

Get every new post delivered to your Inbox.

Join 34 other followers