Search & Discovery Solutions: A Primer

I've written several posts describing Search & Discovery Solutions (SDSs). This post will pull them together in an overview.

Simply stated:

Semantic – studies meaning
Semantic search – tries to understand the searchers intent to improve findability

How a Search Engine Indexes Content
The core of an SDS is a search engine that indexes and searches a wide range of product data.

Search engines operate with an index – an optimized file format that supports rapid data access and display of search results. They usually do not store the complete sources.

A search engine must give fast results. For example, one hi-tech manufacturing company reports that search results display in less than two seconds when accessing 16 million items and metadata from within 24 million documents, drawings and images.

The usual approach to SDS indexing is based on text values. This is in stark contrast to relational databases that store data in tables, records, attributes and values. What's more, an index eliminates any need for an intermediate relational database to help with queries.

Utility programs mine the current contents in the enterprise's IT infrastructure to create the index. It is separate from – and does not change – the existing data attributes.

Searching can begin once indexing has been completed during implementation of an SDS. As changes in content occur, the utility updates the index to ensure changes are available for searching.

Searching Dimensions

Deep Search – drives past metadata into file and document contents by taking advantage of "full text" capability
Wide Search – accesses all forms of product data stored throughout the enterprise in multiple repositories

Deep Search
With full text capability:

"The search engine examines all words/objects in every stored document as it tries to match the search criteria entered by the user. This distinguishes it from searches based on metadata or parts of the original texts represented in databases, such as titles or selected sections." - Wikipedia

Full text does not rely on consistent metadata; it drives through metadata to access all relevant product data. In contrast, a real "gotcha" with a relational database query is that it relies on consistent metadata – a highly questionable assumption. Put another way, do you trust your company's metadata?

Deep, full text searching gains significant benefits – among them:

Negating the need for users to remember where the needed data is stored
Avoiding costly reorganizing and migrating data to satisfy relational database retrievals

The significance of full text search becomes more apparent in view of the two basic types of data: structured and unstructured. Structured data means defined field formats - for example, a product number, stored in PLM and ERP systems.

Structured data usually are already accessible in current systems; however, unstructured data is the most prevalent form – more than 80% of a company's data, according to numerous surveys.

Unstructured data deserve attention because of its overwhelming presence. It exists in CAD drawings, MS Word, PDF and emails, and more.

Wide Search
Product data may exist in many files and formats throughout the enterprise. Therefore, an array of appropriate connectors may be needed. These individual software modules extract metadata, data relationships and file contents. They allow users to discover product data previously hidden in isolated silos, as an optics technology development company found.

NorthRidge – Application Embedded Client

Yet, most unstructured data is not reachable by PLM systems that do not have an SDS with Wide search capabilities, robbing design engineers of potentially valuable information.

So, in addition to the common files, (e.g., MS Word files) the enterprises' specific needs will vary depending on its position in the R&D/PD spectrum. For example, the ability to open CAD files is essential for product development, but not for the needs of pure research. Cracking these files will expose textual data and relationships among files as they reference one another.

It may sound simple enough to add more data sources. But, determining the inclusions and exclusions is not a matter to be taken lightly. There will be trade-offs. For instance, security guidelines must be maintained to protect personnel records and sensitive Intellectual Property information.

More inclusions should lead to greater user acceptance, but with the disadvantage of adding potentially massive amounts of data. Before deciding what not to include, the risk of incomplete data for decision-making must be evaluated.

The Searching Process – The User Experience
A key component of an SDS is a purpose-designed user interface (UI) that is focused on the needs of engineering and manufacturing users. A single point-of-access eliminates the need for multiple sign-ons.

Perception Software – Starting a Search

When a user enters a search criteria – a key word or phrase – in the UI, the SDS presents an initial results list for further analysis and action. To facilitate this process, an SDS may use powerful techniques for normalizing and classifying data – think synonyms. The means to relate similar terms include 1) Taxonomy (hierarchical relationships) and/or 2) Ontology (a network of relationships) methodologies.

Refinements to the selection criteria may be necessary to access the most relevant documents. Some familiar refinements include Boolean logic with AND/OR/NOT operators; wild card entries; mandatory, prohibited, and optional clauses; and more. All SDSs provide this basic capability.

The more the inquiry is exploratory, but not focused on a product, process or part number, the more context sensitive semantic technology is necessary. The IHS Goldfire software is a good example in this case.

IHS Goldfire – Results of Semantic Search

To supplement this discussion, look at some representative vendor offerings and examine available open source options. In fact, you may be able to download a free trial version and see for yourself. Initial search results are surrounded by several navigational aids and refinement methods showing possible areas to explore further.

Using Deep/Wide search capabilities, the presented results are generated by driving past metadata to access desired content in every relevant record; the search criteria are highlighted in each listed record containing it.

Thumbnail previews of images will also be shown. These displays of retrieved results then can be iteratively refined to narrow the results list to find data that is most "relevant" – a key capability in semantic technology and in an SDS.

In addition to the displayed results, a user will see filtering choices to narrow subsequent searches, e.g., dates and file types are two examples of many. Further exploration is highly likely; a typical initial search may reveal hundreds of choices.

Relevant – the desired search end is an "aha" moment. Two factors come into play: 1) SDS algorithms and 2) user actions.

Algorithms determine the ranking of the results to be presented by taking into account factors such as the frequency and proximity of the criteria to other words, for example. Relevant ranking methods can range from simple to highly complex – based on sophisticated mathematical logic.

The second factor is the ability of a user to determine the correct choices for further discovery and ultimately, decision-making.

Actify – Displays Relationships for Exploration

Getting to the relevant data also depends on the user's ability to take advantage of the displayed navigational aids. For instance, the typical SDS may show relationships as an aid to decision-making, such as all related documents for a CAD drawing.

Exalead OnePart – Display Filtering Options

Faceted navigation is also a key discovery tool. Facets are the categories into which the SDS has grouped relevant records, with a number of them in each category. In turn, the records in a particular category may include several types. Thus, faceted navigation helps the user to identify and select a specific category for further exploration.

Now, the significance of Deep and Wide searching should be more understandable. With Deep, a user is assured that all search criteria have been identified; with Wide, a user is assured that all designated files have been accessed.

From a user's standpoint, the SDS will have done its job when relevant data is presented. To keep the power of an SDS in perspective, though, bear in mind that algorithms can't judge the quality and any duplications aren't removed, just identified. Well-informed users are crucial to getting results with an SDS.

The results should be to help make timely decisions that reduce the costs, times and risks of product development.

If you are considering a semantic search solution… Look Out. The term "semantic technology" is arguably one of the most abused of all marketing terms. Everyone seems to have his or her own definition and interpretation of desirable features, seems like PLM over the years.

My title states, "A Primer." Consider, further developing your understanding of semantic search technology to boost your company's chances for success with an SDS.

Acknowledgement for Assistance
The van der Roest Group, integration solutions and services for CAE/CAD/PDM/PLM.