| By Maureen McClarnon |
My last post discussed how content receives very basic metadata, and how it gets from the outside world into our systems. But before the customer sees the content, it goes through another step: indexing. But talking about indexing without talking about taxonomies and vocabularies is sort of an emu-and-egg problem. (Why emus? Because their eggs are beautiful and they are very weird-looking. I get enough chickens on my Instagram feed. Fun fact: Male emus take on the incubating/hatching responsibilities.)
We index content using both humans and machines (well, software and algorithms; the group’s name is “Machine Aided Indexing”), but the terms used to index that content come from the vocabulary group. Conversely, sometimes our vocabulary just doesn’t contain the necessary term, so an indexer requests an addition: there’s the emu, and the egg.
The Vocabulary Services team exists to provide standard terms for indexers so that users can find what they need, regardless of what they enter in the search box. One portion of the group—Named Authorities—concentrates on keeping up “authority files”: these are names of persons, companies, and organizations, and any variations thereof, plus fact-box information for these entities. Other authority files include events, geographic place names, creative works, and character names.
Take T.S. Eliot, the man who gave us the dense poetry of “The Wasteland” and “Four Quartets,” but also the poems that became the musical Cats: he wrote under seven different pseudonyms, male and female. Search using any of them and you will end up back at T.S. Eliot, because our authority team did the work to connect those dots.
The average user probably won’t search for Charles Augustus Conybeare—I will grant you that. But it may comfort you to know that someone searching In Context products doesn’t need to know that Beyoncé’s last name is Knowles, or that JAY-Z is really Sean Carter, to access their biographies or articles about them (as individuals or as a couple).
There’s another group in Vocabulary Services that’s in charge of the subject vocabulary: the extremely descriptive part of “descriptive metadata.” “Descriptive” doesn’t equal “subjective”: the decision to add a term to the vocabulary, and make it a “preferred term,” must be backed by evidence: Is it in the Library of Congress Subject Headings? Is it the term most frequently used in the literature on the subject? Will the term be used in the future? Is there any other term, or set of terms, that could be used to describe the same concept? Once the term has been “hazed,” the vocabulary editor defines, or “scopes” it, which tells the indexers what the term means under which conditions. The vocabulary editor also provides a list of “non-preferred terms”: words that might be used when searching for the same concept, but don’t pass the evidentiary threshold for “preferred” status.
Emotions are very subjective. The strongest emotions have the fewest synonyms:
Notice that “Tenderness” has the parenthetical “Affection” appended to it: that’s called a “qualifier,” in case anyone is tempted to talk about the tenderness of veal (which would automatically populate the indexing with “Affection,” the preferred term, and while an indexer may have great affection for veal, that won’t help the radical vegan high school student looking to back up her life choices with information on farming practices).
Why does this matter? Don’t we talk to each other constantly without first defining all of our terms? Sure we do. But allow me to give you examples of words untethered from meaning.
Without metadata-type standards, doing something as simple as buying clothes online would be impossible, because “Small,” “Medium,” and “Large” would have no meaning…Oh, wait! That’s a terrible example! But it’s also an excellent example: in the world of women’s clothing, sizing is metadata that is untrustworthy; the online shopper needs to pay attention not just to the stated size, but to customer reviews indicating whether the garment runs “true to size” and other auxiliary indicators. Free shipping and returns are so vital to the online clothing industry because everybody’s doing their own thing: words, and even numerical sizes, refer to a wide array of actual swathes of material shaped into a garment.
At the opposite end of the spectrum there’s Starbucks, which wasn’t happy with “Small,” “Medium,” and “Large”—descriptive terms which have worked quite well for years in the beverage context. Instead, customers are faced a mish-. “Tall” (small) makes the other sizes sound made-up, as if they’re just trying to be fancy, and that’s sort of true: grande is “large” in Italian, but medium in Starbucks-ese; venti means “twenty” but represents a 24-ounce drink; trenta, or “thirty,” is a 31-ounce drink. What if every specialty coffee joint that’s sprung up in the past five years decided be like ‘bucks and make up their own names and sizes? We already have words for these things, and they work really well! (Dictionary.com has a nice post about this language muckery.)
Whether you have an affection for or an aversion to coffee, you should be able to order the size you want without verbal acrobatics (or embarrassing terminology). The same goes for ordering clothes, emu eggs, or tickets for Cats: good metadata makes finding these things possible. Metadata will also help you distinguish between T.S. Eliot’s poem “Ash Wednesday,” David Landon’s “Ash Wednesday: Coffee at Starbucks,” the religious holiday, and the approximately 50 other creative works using “Ash Wednesday” in the title when searching Gale products.
Next time: Indexers use our words!