Metadata Overload

Please Share...Print this pageTweet about this on TwitterShare on Facebook0Share on Google+0Pin on Pinterest0Share on Tumblr0Share on StumbleUpon0Share on Reddit0Email this to someone

What is metadata again? The term “metadata” refers to the information that is used to “tag” other information. It is data about data. Huh? All right, here’s an example – When researchers submit their scientific papers for publishing, they have to submit a list of keywords along with their papers. These keywords are an example metadata. Another example of metadata is the information a library puts in cataloging system.

Today, on the Internet people are putting in a variety of metadata in a variety of formats. Yet, most data remains hidden to us. There is a very simple reason for that – lack of knowledge about the vocabulary of the metadata. This is a problem we face more often when searching for more specialized information like medical or scientific information. Lack of knowledge about the metadata can really stifle the accessibility to the data.

One of the ways the problem today is handled in via faceted search. Faceted search just means a search that uses classification categories. Do you recall the initial Yahoo! that worked like a directory with subheadings like “education”, “entertainment”, etc.? That is a pretty good example of faceted research. The fact that the approach failed, especially post-Google, is a good indicator of the problem with this kind of approach. One of the key problems Yahoo! had was that the user didn’t want to waste time digging five layers deep in to find something. Another problem was that the categories just weren’t clear for some users, especially as they got deeper. Then there was the problem that sometimes information doesn’t belong to the intuitive category but to an arcane category that nobody knows. Let me boil down the last point – Yahoo!’s categorization system assumed that you knew something about the term you were searching for and hence could pin down where it would be in the tree. If you didn’t know anything about the term, then the task of finding an object via a classification system is virtually impossible.

If the user is fine with spending the time to “tree down” (and that’s a big if), we still need a system that bridges the user vocabulary with the system vocabulary and by that I mean metadata. One of the ways we can solve this problem is by making glossaries. The other way is to track query word pathways from searches and make them available to people for future searches. For example, if somebody searched the term “cold remedy” and clicked on “Drug A”, the system should suggest information for drug A as a link for the next user. Of course, we will need to base our suggestion function on multiple users and perhaps implemeting a system that allows users to vote for pathways they find useful. That way, we won’t be imposing hierarchy or classification from above, but instead using a classification system built by other users.

While the approach described above will help solve most of the problems, it istill sub-optimal. I believe that the best way to move forward is to leverage both the user metadata and the system metadata. In short, create a metadata system that is open to user input. I am proposing the creation of an “interactive metadata system”. The first example that comes to mind is a system that allows user comments and reviews. These systems are already available but haven’t been used for parsing specialized databases, the place where it is needed most

This user metadata is currently entered in user defined systems like user’s websites and blogs. Today we have to use secondary sources to add metadata about a piece of information and that involves either creating a site that includes all relevant links. This would remain, to a certain degree, because user needs are very diverse. We can, however, bring in the user who is using the data as part of the process by building an alternate database of how a particular article was referenced by other users.

Powered by

About Spincycle