Metadata! What is it good for? Part 2: Metadata Boot Camp

Metadata! What is it good for? Part 2: Metadata Boot Camp

| By Maureen McClarnon |

Last week I said that content without metadata is just a shapeless blob, bouncing around inside a database without context, meaning, or any access point. This week, in the spirit of making and keeping resolutions (which is way easier when we have entire groups of people to keep them on our behalf), let’s resolve to never let content blobs remain shapeless!

Picture this: every year, a variety of content lines up at our doors, hailing from hundreds of different vendors—including Gale itself. Reference material, primary source archives, and periodicals are all in this mix, and much like a beginner’s fitness class, they arrive in varying states of readiness. What do I mean by readiness? What we need to do is extract some very basic metadata from each one before it can move on to the next level (indexing).

Here’s my handy table of metadata types, culled for this week’s example:

The Information Services Group is the first stop in the metadata training circuit. The folks at Information Services take the new recruits and start whipping them into shape by pulling out metadata and entering it into our systems in a standardized format.

BISAC link,  THEMA link

The key here is the standardized format. Because the content arrives from a huge variety of vendors, the descriptive metadata can’t simply be pulled directly from the content and plunked into our systems and products. We like standards! After all, a search on “title” could bring back the title of a publication or an article, but if those fields are separated out and disambiguated, the options, and the search, are much more useful.

Information Services makes all of the disparate metadata consistent. Then the content will display correctly across all of our products, be discoverable by our search services, and by those of our partners. Once beginning metadata training is completed, and the content can complete the metadata equivalent of ten push-ups, MARC records are created, making our bibliographic data available to every ILS (integrated library system).

The cherry on the top of the work Information Services does, at least for what users see, has to be the citation generation. Who among us has not spent hours going through one style manual or another trying to format a bibliography, only to be sure, at the end, that something…something…is still wrong? The citation generator is a personal trainer for your references! Remember: good form is more important than the number of reps.

Gale-created content, in case you’re wondering, has its document-level metadata captured during the authoring process—the equivalent of training it from childhood.

The next stop for primary source archive documents is the Content Conversion team, where the content is digitized, making it ready for product use and indexing. Content Conversion also extracts and standardizes that all-important document-level metadata that gives users an entrée into these fascinating resources. “Document,” in this case, can mean breaking a collection down to a page or article and providing metadata at that level. Over 175 million pages have been through the Content Conversion program to date. (There are 41 different collections under the umbrella of Gale Primary Sources—be sure to check them out and see what piques your interest.)

Periodical content works out with the Content Analysis and Management (CAM) team, where they standardize it even further. Remember: the content is just starting its life here at Cengage. Information Services gave it some shape and purpose, but not all of it is ready for prime time: in fact, the periodical feeds arrive in over 200 different formats. The CAM team defines and implements logic that takes the content blobs and “translates” them into display-ready documents for our products. This includes making sure that tables and images show up correctly in the body of the text, and that any special characters (ampersands, diacritics, etc.) are properly rendered, and enabling the collection of article-level metadata.  This team gets the New York Times and other high-value sources ready for indexers every day.

Morning metadata workouts, anyone?

Our content is blobby no more! Cengage could send it into the world with just this most basic training, but we like to work more holistically when we get our content into shape: next up, metadata gets metaphysical with indexing and vocabulary.

Check out our blog from last week: “Metadata! What is it good for? Part 1”


About the Author


Maureen McClarnon is a Senior Metadata Architect who lifts pies for exercise. 8 a.m. is the sweet spot for working out, if anyone’s asking. She celebrated her 10-year mark at Cengage this week


 


 

Leave a Comment