While reviewing a new book on Solr, an open-source enterprise search engine, it occurred to me that Google has not completely conquered the search world. To achieve high accuracy/relevance in search results, content needs to be parsed and indexed in a way to uniquely identify a document. Otherwise, you could end up with hundreds or even thousands of matches based on a generic search criterion. Additional tagging may be employed to the index via taxonomy and auto-tagging tool to enhance search results. However, this approach requires a detail taxonomy that is appropriate for your enterprise.
For the purpose of this discussion, let’s assume that I work for a large fictitious company and have been asked to create a training course for a new application. I want to find examples of all the typical materials that are created for a training course: course outline, PowerPoint presentation template, and lab exercises. If I try to search using these key words, I probably won’t find anything useful because the search criterion is very generic. What I need is the ability to search for: “need to create training materials”.
In my fictitious company, I do not have a training department that I can go ask someone directly for this information nor is there a corporate intranet or central repository that has these templates available. So how do find this information and how does this fit with people, places, and purpose?
I believe what is missing from enterprise search (specifically on the indexing side of the equation), is the context of how content is created. We have already conquered the issue of “places” (or sources) by the creation and support for federated searches from various ECM vendors. No longer are we tied to searching a single global repository or from a single vendor. Also, most ECM repositories have some security in place that is adhered to when providing access to federated search.
The only issue with security IS the actual management of security. Here lies the “people” challenge. For most applications, we can define security groups and assign users to the appropriate groups. This kind of authorization is done at the application level. You can extend the authorization model to multiple applications by creating global groups (e.g. Documentum Federation). The challenge becomes mapping the authorization from vendor to another. Try to map SharePoint security with Documentum security at an enterprise level. “People” challenge will eventually get solved when application designers get comfortable with the idea of roles (vs. groups) from a security perspective.
So how does “purpose” fit in with searching? Purpose is similar to tagging, but a more user friendly way. I see the association of a purpose with content creation as the means to bring context to search results.
Let’s look at this from a content creator perspective in my fictitious company before we address how this would affect searching. If I was creating training material from scratch, I would create a space/site to gather and store (possibly creating federated links to documents existing outside of this space) all of the relevant information that would help generate new training material content. As part of the “purpose” of this space, I would indicate that I was “creating training materials for application X”. This purpose would be associated with all content created in this repository. The relationship would not necessarily be added to the index, but could be created in some global relationship database that is integrated with enterprise search engine.
Imagine now when a user is searching for something generic, he/she also has the ability to filter (or constrain) the search criteria based on the purpose of the search. I think the technology is already out there from auto-tagging/taxonomy perspective. The only thing missing is automating the assignment of “purpose” during content creation. Purpose must exist across the enterprise in all forms (e.g documents, images, emails, movies, etc) in order for the feature to be useful. I’m not sure if Solr will support this, but since its open-source, I can definitely look into whether I can integrate this into the search engine.