Thursday, September 18, 2014

What's Data Got to Do with It?, by Elizabeth Psyck


Let’s get one thing out of the way: While “data” is technically the plural of “datum”, colloquial usage has shifted to make the use of “data” as a singular acceptable. I have a Google frequency map that backs me up on this.

Whether you agree with me or not on the singular/plural use of the word data, it’s hard to argue that data is becoming more important to libraries and librarians. Whether you are collecting and analyzing your own statistics to see whether your library still needs a reference desk, or reading the latest ITHAKA report, it’s almost impossible to avoid. Like many librarians, I don’t come from a data or statistics heavy discipline and learned everything I’m about to share while on the job. Trust me, even if it looks scary right now, you can do it. With these 7 handy tips, you’ll soon be a data superstar! Or at least someone who can look critically at a report full of numbers and ask the right questions about what those numbers might mean.

Qualitative vs. Quantitative
Qualitative research involves descriptions, thoughts, feelings, opinions, and other things that can be observed but not measured. Quantitative research involves numbers and measurements and cold hard facts. Neither type of research is objectively better, but they do impact the questions you can answer and the arguments you can make.

Independent vs. Dependent Variables
Independent variables are the inputs or the things you do that influence (directly or not) the dependent variable/the results/the outcome. (For that to make sense, you also need a theory as to how and why these variables are related.) My personal (possibly unpopular) opinion is that it’s incredibly difficult to frame library work in terms of independent and dependent variables and we should be careful about getting too hung up on those terms, which can imply causation.

Causation vs. Correlation
Closely related to my last point, correlation is not causation. Correlation is when one variable changes consistently with another. Causation is when one variable causes the other to change. It’s really, really hard to argue causation in the real world because people and behaviors are complicated. That means it’s nearly impossible to isolate influencing factors. Did Student A get a better grade than Student B because A met with a librarian? Or is it because B had 3 papers due that week and is only taking this class for a general education requirement and was ok with getting a B-? I’m not saying we shouldn’t ever argue causation, but isolating the impact of the library in order to rule out all other possible factors (which is how you prove causation) is extremely challenging.

Samples Matter
Convenience samples – a research group chosen because they were available and easy to get involved – are bad. Don’t be a bad researcher. Ok, that’s probably a little harsh, but I do think that libraries rely way too much on convenience samples. I understand why, but research involving convenience samples don’t support sweeping arguments that they are often used to make. Example: asking the people in your library whether it’s a welcoming environment tells you whether people who are currently using your library at that time/on that day find it welcoming. It’s a biased sample because many people who don’t find your building welcoming just won’t come in the front door. A random sample in this case would help you find those people who study in a coffee shop or the student center instead of the library.

Age Matters
Data goes stale. Your library’s last large survey on information literacy might have taken place in 2008. That doesn’t seem so long ago to many of us (myself included), but to give you context I was still an undergraduate in 2008. My undergraduate classmates have finished law school and are assistant district attorneys. Old data doesn’t represent current students. Think of data as a snapshot that represents a single moment in a rapidly changing environment.

Humans are storytellers, which means that stories are more meaningful to many of us than numbers. Just because a story feels meaningful, doesn’t mean it actually is. Don’t fall into the trap of anecdata, giving more weight to the stories we tell (patron X writes a letter about how important a service is) than the numbers (only 10 people used the service in the past 6 months). Remember, each anecdote is a single data point.

Be Consistent
Whatever you do – be consistent about the questions to ask and how you interpret results. If you aren’t consistent, your results aren’t comparable.

Elizabeth Psyck is the government documents librarian at Grand Valley State University. If you’re extremely angry about her use of data as a singular, you can reach her at or @psyckology. She finds writing biographies in third person weird, but not quite as weird as writing them in first person.

No comments:

Post a Comment