Taking tagging to task
Tom Coates addressed something I’d been meaning to blog about: the increasing usage of keywording as a primary organizational system. It’s been brought to the forefront by the blogging community’s quick adoption of Flickr and del.icio.us for the simpler tasks of sharing photos and links, and is gaining a lot of popularity. This makes sense, since users at this level are more interested in collaborative data than the average user—after all, bloggers are the ones putting themselves into the system, and not just expecting results from it. Flickr and del.icio.us excel at finding way of aggregating these entries together to make finding new items more happenstance than research.
Tom’s ideas and outlines for integration of keywords in Safari are rather spot on, and it’s not like we haven’t seen Apple attempt to handle the difficult task of taxonomy before. In fact, we can look at the latest versions of both Address Book and iPhoto for examples. Address Book uses a mildly clumsy “group” system where you can drag and drop a person’s vCard into different categories, but provides no method of directly editing that person’s groups after they’ve been filed into them (although you can hold down the Option key and have the groups they are in highlighted in yellow, that’s about all you can do at an overview level).
As for iPhoto—keywords have been a time-tested approach to image databases, so it was natural for iPhoto (and also Flickr) to support them to please photo DB users. Keywords in photography are used as search critera because, after all, how else can we tell a computer about an image? Researchers at Purdue are developing an image search engine that finds images based on a user-provided image or sketch, but in reality, these are unique cases: these users know what they want ahead of time, can literally visualize what they need and can provide a good enough illustration to support it, and have no need to refine the process through language. A system like this is best applied to an application that creates relationships between existing picture data, and not necessarily one that has a human operator.
What we are concerned with in normal computing are people who don’t know what they want or exactly where things go, but can describe it well enough to begin looking, which is why keywords work. We’ve already seen nested categorization in an open system in action at DMOZ. We’ve also seen it go from its early stages as a useful alternative to the Yahoo! directory into something just as granular and complicated as the Dewey decimal system—which is not surprising: systematics is an element of library science after all. Each level deeper becomes a level removed from a user’s initial interaction with a system, and another level of complexity when that user wants to manipulate that item’s classification.
So how valuable is categorical inheritance to a user? Do you care that latest Wired article on CD copy protection was filed under /Society/Issues/Intellectual Property/Music Freedom/Corrupted Audio CDs on the DMOZ? Not much at all, it would seem these days, especially when faced with a high volume of materials—in this case, photos and bookmarks. But we’re still faced with the problem of relationship—there’s no way that del.icio.us can know that the “starwars” tag is related to “lucas”. And yet “lucas” doesn’t have to be about George Lucas — it could be about Lucas, Texas — and the point is that keywords-as-categories break down when treated singularly. As long as we have synonyms, we can’t have flawless keyword categories, because keyword meanings are not enforced. Shared tagging works only if everyone’s talking about the same thing. And typing them in the same way. And.. hell, you’ll find yourself with “video” and “videos” and “quicktime” and “mpeg” tags soon enough if you’re not careful. Which is perfectly fine from a keyword perspective—an item may be a video and a quicktime one at that—but not if you want to see all videos under one grouping. It’s obvious the next step to a Flickr or del.icio.us-like system is the application of heuristics to automatically create relationships between keywords by following how duplicate links are being filed by different users. (This reminds me of a different image database experiment: Carnegie-Mellon developed a human-influenced system for creating keywords for pictures presented as a word game—you get paired off against another user whereby you have to match his or her descriptions for a set of pictures in a limited amount of time. What you were actually doing was agreeing on search terms the computer could use: “Yes, we both agree the word _____ describes something in this photo. Computer, approve this keyword.”)
The ultimate message seems to be that it’s not that we need tags to make organizing our lives easier, it’s that we don’t need to organize things so much. We want communication—a way of saying what we want and getting it, without having to “look it up.” This is oddly enough related to my earlier Quicksilver review. We want to have it both ways—file when we need to regularly revisit items (like iPhoto’s Photo Album), and keyword when we just want a quick way to search for it later.
This all comes back to a study a while ago tracking the loss of bookmark usage to search engines: that people were relying on the Yahoos and Googles of the world to find pages that they’d previously been to. When we get to keyword our own data, it lets us create our “search” terms in our own language. And we’ll have this feature on the desktop soon enough when Microsoft’s search in Longhorn and Apple’s Spotlight in OS X Tiger is released—the value of this kind of freeform search is now clear. Nesting elements beyond two or three levels is more work than reward. We want stuff now. We’re impatient. Internet has made it so.
