In the previous post we wrote about the necessity to map words to a more general meaning. For example, to relate movies, we often need to find a common ground for words. A fictional example is a movie about a gory zombie squirrel and one about a genetically enhanced, giant shark. Keywords like ‘zombie-squirrel’ and ‘giant-shark’ do not have -syntactically- anything in common, but both words describe animals and both share the theme ‘when-animal-attack’, along with other non-trivial similarities.
What we need is a combination of a database and a lightweight ontology to connect words in a semantically sound way. Furthermore, the system should allow to somehow measure this similarity by using a hierarchical representation of the underlying concepts of the words. This sounds like WordNet and indeed, a lot of functionality we require is provided by it. We already toyed with the idea of synsets, not in the strict WordNet way, but we also tried to model very similar words in sets to map one-time keywords.
The good news is that WordNet has a lot of potential but the bad news is that because very unique keywords are used to describe movies, neither a straight-forward synonym-mapping, nor an efficient measurement of the distance between two words is often possible. In other words, more research is required maximize the benefit for the problem at hand.