Dispatcher + SaaS = Incredibly Intelligent Recognition

A long time ago, in a field far far away from Documentum, I worked at USDA while I was undergrad.  One of the research projects at lab was the automation of detection of the healthy chicken carcass.  This process was/is mainly done by human inspectors who look at color and noticeable markings on the chicken skin to determine if the chicken was healthy.  Since this detection was done by visual inspection, the lab was researching whether we could create a system that would do this automatically with no human intervention.

I will spare you the gruesome details of collecting data from a chicken slaughter house, but in the end, we had data from hundreds of chicken – most of which were healthy, and a few that were not deemed not suitable for sale.  We took all this data and fed it to a neural network application.  For those of you who are not familiar with the terminology, its a type of artificial intelligence modeling.  The ability of neural networks to “learn” and to make correct determinations on whether a chicken was healthy was heavily reliant on the amount of data we fed into the system.  The more data we provided, the better we could trained the neural network to recognize healthy (vs sick) chickens.  If I remember correctly, the results of the initial testing pointed towards 85% success.  This was still below the rate of human detection, but it was a good start.

So how does this relate to Captiva Dispatcher?  Dispatcher is designed to intelligently recognize document types.  How it does it is similar to what I worked on at USDA.  In order for it to make this determination of document type, a customer has to provide various sampling of documents to “train” Dispatcher.  Sampling is pretty straight forward if you are trying to identify predefined forms.  This becomes a bigger challenge when there is no consistent template/structure that you can provide as a good sample to train Dispatcher.  This is the case for accounts payable.   There is no consistent look and feel for invoices from vendor to vendor.  This is especially problematic for large companies like Walmart; imagine the number of invoices that Walmart gets from all of its vendors.

Now imagine if you are a small-medium business (SMB).  You probably have significantly less vendor invoices than Walmart, but you still do not have a good sampling of various permutations of invoices you could get.  This is where the power of SaaS can exponentially help smaller businesses have similar competitive advantages as larger companies.  If Dispatcher could be configured to run as SaaS, you could harness the knowledge (aka form recognition data) from hundreds if not thousands of customers.  This data can then be used to train Dispatcher to recognize more and more variants of invoices.  Dispatcher intelligence would get better and better over time as more variations of invoices are submitted to Dispatcher and Dispatcher learns to recognize those.

The power of many can be used to help the one.

When good intentions go awry

So I was reading the Using Registered Tables vs Object Types design patterns discussion a few months back on EMC Developer Network.

I’m a Documentum old timer, so I was more comfortable using registered tables. Those of you who have used registered tables as lookups knows that its pretty straight forward. Create a table and then REGISTER it so that you can query against it using DQL. The hitch is that if you dont have privileges on the database (or the database is managed by a different group), you are highly dependent on them getting the table schema created with the appropriate permissions. Also, you are dependent on them to move the table and its values from one environment to the next OR you have to have strict version management of table population scripts. Anyways, I have followed this tedious process for many, many years.

After reading the design patterns discussion, I decided for my new project, I would try using object types for lookups. I was not too concerned about security on the objects given that the users were using Taskspace and would not have the ability to delete/manipulate the lookup objects.  I created custom type will null supertype, since I wasnt planning to have any content associated with them.  See Laurence Hart’s posting on using Contentless Objects as Lookups.

So I created quite a few custom types and created several hundred object instances of my custom lookup types.  It was very easy to change the value of the lookups and create new ones using DA or any other Documentum application.  Not only did I not have to worry about managing table population scripts, but I did not have to get DBA team involved, since creation of custom object types can be done using DA or DAB.  Everything work as expected.

Fast forward a couple of months and now we are ready to deploy our Taskspace application from DEV environment to testing environment.  Taskspace was designed to use the docapp archive feature of DAB to move form templates (components), processes, tabs, roles, and presets from one environment to the next.

Lo and behold, DAB does not support the archiving of contentless objects with null supertype.  There is no way to include object instances of this kind in docapp archive.  This makes sense at object type definition level; if custom type is not derived from dm_sysobject, then it doesnt have i_folder_id attribute, which means the object lives in what I like to call “La La Land”.   Application Builder supports the inserting of objects, but it assumes that the object resides in some folder.  Unfortunately, there is no “Insert Persistent Objects.”

I havent tried Composer to see if this is supported, but it really was a surprise to me that this feature was not in DAB.  In the end I had to remap the custom lookup types to be derived from dm_sysobject.  I hope my experience with this will prevent someone from going down the wrong path.

Follow

Get every new post delivered to your Inbox.

Join 45 other followers