| By Charlie Close |
I want to talk about the virtues of technical recordkeeping. Seriously.
In my work at Gale as a data scientist, I turn our various data sets into metrics, and use the metrics to infer how to make better products and more sales. These inferences are usually expressed in the form of spreadsheets and charts.
Before that, I worked for many years on software development teams. Software engineers are not known for wanting to document what their code does or why it does it. They say the code is the documentation, and they have good reasons for saying it.
Therefore you could understand how I might not choose to write down what I’m doing with the data as I do it. But I’ve learned, sometimes the hard way, how important it is to document my work.
You Aren’t Going to Remember
Sometimes work comes to me from coworkers as one-off requests for information. A typical request might be, “Tell us whether a new product feature is being used the way we expected.”, or “I’m calling my customer in Texas this week. What products should I try to sell?”
Like most tasks, these come and go. No matter how clear and important a request looks at the moment, it will seem like ancient history in a few days.
Whenever I get a request, I open a new task in our work-tracking system and record the following information.
- Who requested it.
- The content of the original request.
- Any assumptions I make.
- Any initial data sets (spreadsheets, text files, etc.) given to me by the requester.
- Any data sets I use or create while fulfilling the request.
- If the data can change over time, then what data I used and the timeframe it spanned at the time. For example, when I query usage data, I record that the data currently covers from “this” month to “that” month.
- Any database queries, data processing scripts, or server commands I run, in order.
All of the above variables can affect the output I send back to the requester, and many of them are open to interpretation. I might decide to handle a request this week differently from how I handled a similar request last week. If I ever need to remember why I made the changes I did, it’s important to write it down at the time.
Someone Is Going to Ask about the Details
You might think the past is the past and it isn’t important to remember what you did last week. That’s true, unless someone asks you about it later. And you know someone is going to ask you about it later. Not every request, but some requests, and you don’t know which ones. The report that you sent and forgot about: the requester is just now getting around to looking at it and has questions. Did it cover just the libraries in her market, or all libraries? Did you look at the top five searches, or the top ten? How did you make this particular calculation?
Let me tell you, the feeling of being asked questions that I should be able to answer but can’t is like reaching the top of a tall roller coaster just before it plunges down. It feels even worse to construct a data set, build a report on it, and then not know how to recreate the data set because I didn’t write down the steps.
That Someone Might Be You
Although requesters often ask follow-up questions, the person who uses my records the most is me.
One of the great things about my job is that I get to learn all the time. I may know how I want to process a given batch of data, but I don’t always know how to perform every individual step. I have to figure it out. (Working with data can feel like wrestling snakes, which explains, I’m sure, why the Python language is called that.)
Whenever I learn a new bit of technique, that learning is stored as a real world example in the work-tracking system as a side effect of saving the artifacts of the request. This turns out to be useful when I encounter the same problem again and I remember that I solved it but I don’t remember how I solved it. Now I can look it up. The work-tracking system is, in effect, a personal knowledge base that I’m building up as I go.
Another thing that happens is when someone asks me to make a report like the one I did for a colleague, but different. For example, “That report you did for Madeline for California…can you do the same thing for me for Florida?”
I have a record of the report I did for Madeline and I can look it up by Madeline’s name. Changing it to handle Florida: easy. Even better, if my records show that a particular data product has a wide audience, based on multiple requests, I can automate it so that Madeline and her coworkers can run it as often as they want, and configure it to their individual needs. Putting the power in the requesters’ hands leaves me more time to ask interesting questions and generate valuable insights from the data: win-win.
Giving good support to people like Madeline and avoiding the occasional roller coaster have proven to me over time that keeping up my documentation more than pays for itself.
About the Author
Charlie Close is a data scientist at Gale and former analyst on Gale’s search engine team. His research includes study of user behavior to improve Gale products.