538, the World Cup, and Facebook: Telling Stories about Data

As many of you already know, I’ve been following the World Cup. My team, Germany, won. Watching the World Cup has always involved reading news reports and commentary about the matches. This year I decided to include 538 in my reading.

538 is Nate Silver’s website. Nate Silver became famous predicting US elections. He is a master of analyzing big data to make predictions. It works well for elections. But it doesn’t work so well for the World Cup, at least not for me. First, the site predicted Brazil to win for a long time.

But it’s not just that 538 did not accurately predict the winners. I think that 538 misses the point of a World Cup. Crunching data about the teams doesn’t tell the whole story. And the World Cup is stories. Many stories. As a fan you learn the stories of your team and its history. You might start with world history—this is very salient as a Germany fan. England versus Argentina similarly (1984). It also involves stories about the teams previous encounters. Germany versus Argentina has happened before, even in Finals. And those stories are recounted, and reflected on, in the build up to a game. You might tell stories about strategy. Certainly the Germans have been telling those, about a decade long commitment to raising German players. How you structure a league to encourage more domestic players that can also play for the national side. How you balance the demands of a national league and a national team.

In a nutshell, context matters. These stories of world politics, former World Cups, and the arc of time turn statistics about the players into something richer. 538 tells none of those stories. And I suppose that’s exactly what it wants to be, a “science” of the World Cup. But my World Cup isn’t statistics, it’s larger, more discursive and has a multi-decade narrative arc.

Reflecting on this caused me to revisit the Facebook study. Yes, that Facebook study. The study reported data. But it was data about people. However, at the same time I think some of the response could be interpreted as people feeling that there was more to the story than just statistical reporting of the outcomes. Is it a similar type of human-dimension, an infusion of humanity? This is the question I’ve kept wondering since reflecting on the problems of both of these data-driven reports. 538 reduces football to data. In so doing it loses the human dimension. The Facebook study started as data and the public raised human concerns and considerations. If I have a take away it is that fields like social computing, or any data science of humans, need to seriously pay attention to the stories that we tell about people. How we frame or potentially reduce people is something that the public will care about, for it is their humanity, their stories that we seek to tell.

That Facebook Study

Following Michael Bernstein’s suggestion that Social Computing researchers join the conversation.

Facebook and colleagues at Cornell and the University of California, San Francisco published a study in which it was revealed that ~600,000 people had their Newsfeed curated to see either positive or negative posts. The goal was to see how seeing happy or sad posts influenced the users. Unless you’ve been without Internet connectivity you likely have heard about the uproar its generated.

Much has been said, Michael links to a list and some more essays that he’s found. Some people have expressed concerns about the role that corporations play in shaping our views of the world (via their online curation of it). Of course they do that everyday, but this study focused attention on that curation process by telling us, at least for a week how it was done for the subjects of the study. Others have expressed concern about the ethics of this study.

What do I think?

I’ve been dwelling on the ethical concerns. It helps that I’m teaching a course on Ethics and Computing. And that I’m doing it in Oxford, England. So I’m going to start from here.

First, this study has caused me to reflect on the peculiar situation that exists in the United States with regards to ethical review of science, and the lack of protection for individuals that participate in it.

In the United States, only institutions that take Federal Government research dollars are required to have Institutional Review Boards (IRBs). The purpose of an IRB is to review any study involving human subjects to ensure that it meets certain ethical standards. The IRB process has its origin in the appalling abuses conducted in the name of science like the Tuskegee Experiment. Facebook does not take Federal research money, and is therefore not required to have an IRB. The institutions by which research gets published are also not required to perform ethical reviews of work that they receive.

I find myself asking whether individuals who participate in a research study, irrespective of who funds that work, have the right to be protected? Currently there’s an inconsistency, in some research the answer is yes, and in others it is no. It seems very peculiar to me that who funds the work determines whether the research is subject to ethical review and whether the people who participate have protection.

Second, most of the responses I’ve read have been framed in American terms. But social computing, including this study, aspires to be a global science. What I mean is that nowhere did I read that these results only apply to a particular group of people from a particular place. And with the implication of being global comes a deeper and broader responsibility: to respect the values of the citizens that it touches in its research.

The focus on the IRB is uniquely American. Meanwhile I am in Europe. I’ve been learning more about European privacy laws, and my understanding is that they provide a broader protection for individuals (for example, not distinguishing based on who pays for the research), and also place a greater burden on those who collect data about people to inform them, and to explicitly seek consent in many cases. I interpret these laws as reflecting the values that the 505 million European Union citizens have about their rights.

I’ve not been able to tell whether European citizens were a part of the 600,000 people in the study. The PNAS report said that it was focused on English speakers, which perhaps explains why the UK was the first country to launch an inquiry. If Europeans citizens were involved we might get more insight into how the EU and its member nations view ethical conduct in research. If they were not, there is still some possibility that we will learn more about what the EU means when it asks “data controllers” (i.e. those collecting, holding, and manipulating data about individuals) to be transparent in their processes.

I’ve read a number of pieces that express concern about what it means to ask people to consent to a research study. Will we lose enough people that we can’t study network effects? How do we embed it into systems? These are really good questions. But, at the same time I don’t think we can or should ignore citizen’s rights and this will mean being knowledgable about systems that do not just begin and end with the IRB. Its not just because its the law, but because without it I think we demonstrate a lack of respect for other’s values. And I often think that’s quite the point of an ethical review, to get beyond our own perspective and think about those we are studying.

Lack of a Critical Education: An Explanation for IT problems?

An New York Times article about sexism in the tech industry has been making the rounds on Facebook. One explanation that some of my friends have used to address the why such rampant and explicit misogyny exists is the lack of education. Not engineering/computing education, but a well rounded one in which people would come to understand why its inappropriate and why having a diverse workforce actually matters.

I was making the same argument the other day about a different topic. When Snowdon, Assange, and Manning decided to leak intelligence secrets all of them claimed they had done so because to do otherwise would be ethically wrong. I/You/the NSA may disagree, but they all agree that they had a moral/ethical/civil duty to do so. As I said to a colleague, what drives this moral/ethical/civic sensibility? I shared the thought with my colleague that perhaps a lack of a well-rounded education might play a role here.

For decades we’ve shortchanged all education. It cost us too much. Further, we’ve long prioritized the sciences over the social sciences and the humanities. (We now find it alarming that Congress ridicules the sciences, but as another colleague of mine pointed out, that’s how long many/some in the sciences have treated the social sciences/humanities). But it is just these maligned disciplines that would have gone some way to create the critical thinkers that seem to have vanished from the tech sector. And now we have an industry that’s unabashed in its misogyny. We have “rogue” technologists who now have the power to decide when to leak secrets, and deciding to do so based on moral principles that at least to some are questionable. I wonder whether we did it to ourselves and if there is worse to come.

p.s. if you want to be even more depressed here’s a timeline of sexist incidents (thanks to the friend of another colleague) in the Tech Sector.

A Future for Academia Driven by Metrics

By now, anyone who knows me knows that I am a *huge* fan of metrics. Particularly when they are used uncritically. So perhaps it was inevitable that I would end up in an environment where metrics play an increasingly ubiquitous role: academia.

I want to introduce three metrics.

Student credit hours: a number that measures by class/faculty the number of students a person has taught. You will have a larger number if you teach larger classes. It’s also the number that is at the beginning of a formula that computes the portion of the Institute’s state budget (and presumably how that is divided, although that part of the budgetting process is a complete mystery to me). Higher is better, and in fairness I can imagine that larger classes can create their own organizational structures that need managing and more potential problem cases.

What’s missing in this metric are some other fundamentals about class.

  1. Smaller might be better for the student experience including but not limited to mentoring, one-on-one time with individuals, managing different learning styles… and that this might be exactly what distinguishes a University education at a bricks and mortar institution from an online experience.
  2. Class preparation time, do classes with more students involve more course preparation time. I taught a class recently that was about 1000 pages of reading for 12 people, but it would have still been 1000 pages if it had been 120 people.
  3. The lack of institutional support for say, grading, that larger classes receive.

Research expenditure. This metric measures the amount of money that the Institute receives when a faculty member spends their grant. Again, bigger is better. But this metric assumes that all research costs the same. Not all research costs the same amount to achieve, and funding agencies know that. It does not account for how much it costs to do research.

H-index. I’ve already written about this.

Imagine my joy when someone suggested that we plot all three against each other for an individual. What would that mean? Someone with a larger class, in an area of research that was more expensive to do, and with a high index does well. So, should we optimize (which is the purpose of metrics, to drive behaviour) for large classes at the sake of not giving students the opportunities that come from small ones? Should we optimize for expensive and popular research, and ignore the intellectual, social and political good that might come from less expensive research areas? Should we give even more legitimacy to the papers of an h-index and not ask about the papers that were potentially unpopular but changed a person’s thinking, deepen their intellect…?

Needless to say this epitomizes all that worries me about metrics. The desire to rank and compare, and use numbers to support that is to think uncritically. Sadly, it’s all too common in academia.

Photocopier: Physical, Digital, Organizational and a Craft too!

A couple of days ago I finally learnt the username/password combination and the network name for the third floor mopier (scanner, photocopier, printer). Perhaps its because I worked at Xerox for some years, but it always frustrates me when there’s a device I can’t print or photocopy on. This one took me some time to figure out how to operate for a variety of reasons.

I stood next to it several times. Nothing about its physical self revealed its digital self to me. Sometimes you can get a printer to print out its network configuration. But this machine did not allow you to touch any buttons without being logged in first. And so standing there next to it in the physical world changed nothing about my ability to print to it in the digital world. I was having the reverse experience of the one in which your computer “discovers” a printer but you can’t discover it in the physical world (vague embarrassment recalled as I spent some time printing to a machine which I thought was just outside my office (it said Gutenberg on the front of the machine that I read as the network name of the machine, but that’s actually the name of a machine located in a different building on campus. Luckily I printed out an email, so the person receiving the print out was able to email me to let me know that I was mistaken about the name of that machine).

The key to discovering its online name was to find out what username/password combination worked. Who should I ask? The printer’s physical existence is in a space that I don’t understand organizationally. Does it belong to the School of Interactive Computing? Does it belong to the School of Language, Media and Culture? Does it belong to IMTC? Not clear to me because the physical location (which for many other parts of the third floor I can easily read and interpret) was ambiguous. I wondered who to ask.

Quite by chance someone tells me what the username/password combination is, and I log on to the photocopier. I have some photocopying to do. I first learnt to photocopy in graduate school. Need several chapters of a book? No Google search facility back then (WAIS and Gopher if I recall correctly) that would likely yield a probably illegal copy of what you were looking for. No, it was off to the library and then over to the photocopy room. It was a time when people would say that part of learning to be a graduate student was learning how to photocopy, smiling, but acknowledging a truth about the importance of being able to master that skill.

The Department of Information and Computer Science at the University of California, Irvine had a dedicated staff member in the photocopy room. He took care of the several machines that were in the room (no doubt did other things, but this was the primary place I encountered him). ICS had a system of user names and passwords associated with individuals and caps. So the book chapter copying was always a dilemma of balancing the desire to have the reference material against the annual cap. That was until I got the username and password combination for a project that was very rich. DARPA funding meant that the project’s cap was infinite. Now, all that stood between me and the book was the ability to photocopy it. I taught myself a variety of useful skills, to efficiently double-sided, two pages on each side, shrink to fit, copying. I prefer short edge binding over long edge. After a while I was able to size up a book and pretty much get the exact amount of shrinkage right first time.

Having mastered the art of photocopying, ICS provided further opportunities. For a while I was spiral binding most of my photocopies using the machine that cuts rectangular holes down one side of the photocopy stack and the other machine that inserts the spiral binding. I put front and back covers on some of my efforts. I still have one of those to this day, the photocopied proceedings of the first conference on Software Engineering held in Garmisch. And then there was the experiment with the glued binding. There were binders that had glue on the inside of the spine and a machine that would heat it up, you would then stick the paper to be bound in, let the glue run over them and then take the entire thing out of the machine. The trick with this machine was in heating but not overheating the glue. And I have to admit the machine made me nervous, I worried about the potential for fire. I’m actually not sure whether that was a valid concern, but I worried about it and consequently I decided to return to spiral binding even though glue bound photocopies made for a flush on shelf filing.

Of course, I couldn’t experiment while the staff member was there. I was not using my correct code. Perhaps I was photocopying more than I should. I had no idea whether graduate students were “allowed” to use these other machines. So most of these skills were developed in the small hours of the night. Walking home with my latest creation afterwards, I primarily feared the roving packs of raccoons that wandered around campus being generally annoyed by the presence of humans out during the time in which they occupied campus. Sometimes I hid from them as to not invoke their ire. After all I had something to read in hand.

I’m scanning a book chapter on the third floor mopier. I’m going to send it to myself so that I can read it on my iPad. I like reading academic papers and books on my iPad. I decide to add my email address to the list of frequently used emails so that I don’t have to type it all in each time I do this. I look through the list of emails already there, and now I’m even more curious about the organizational history of the machine. There are various addresses in there. Some are graduate students who have since graduated. I’m surprised to read that this machine has been in existence on the third floor even, for longer than I think. But there are some addresses for people who’ve never worked proximate to this machine while it’s been in this location. I wonder why they are there. I wonder whether the machine lived somewhere else in a former life, proximate to those users.  Ive never really thought about reading an organizational history from a photocopier, but I see at least two departmental identities as well as some longevity of history represented in the collections of emails that make up the frequent users of the machine.

The TSRB 3rd floor photocopier is now something I can print too. But it’s given me far more than that, an opportunity to reflect on how this machine lives in the physical and digital worlds, a recollection back to my learning how to photocopy, and about the institutional elements of the machine.

MOOC Participation: Diversity and Assumptions of Development

Continuing my series of posts about MOOCs. Today’s is about a type of open/development rhetoric I keep hearing associated with MOOCs. It’s well meant I am quite sure, but I’ve heard the following sentiment: MOOCs will allow anyone from any continent to access content. And that in turn leads to increased education, skills for all.

I have a number of problems with this argument.

Starting with the obvious, this sentiment makes important assumptions about access. That access to the Internet and its content is uniform across the world. But it’s not. The Internet is a very different experience if you have a smartphone as your only means of access, versus if you have a laptop. Behind the hardware, there are questions of corporate policies and pricing mechanisms that influence access. Bandwidth caps, bandwidth pricing can influence how people use their phones, and in many parts of the world also how they use the wired network.

Behind these crucial practical questions of access lurk other assumptions, which warrant questioning. Is the content we create relevant or useful for everyone? What assumptions do the producers of content make about, say, what has been previously taught? What assumptions are made about the types of hardware and software the students have access too? And most critically, what assumptions get made about why the person is taking the course and whether that content will ultimately be most useful?

Although its not used too much, I have heard the word “Africa” used to describe diversity. I do think its well meant but it has the danger to collapse all of these questions into a stereotype of a person. Africa is not a person, nor is it a country, it’s a continent of great diversity in all senses. A person from Africa may well contribute to diversity in a MOOC setting, but so might a person from America.

Like others, I see this as being part of understanding the participation divide that shapes the Internet today. Some of that divide is the question of access, its costs, modalities, and so forth. But that’s not all that shapes the participation divide. When we overly simplify an entire continent we close down the question of what shapes participation in very problematic ways. If we are really committed to understanding how online education might help more people learn, the participation divide is precisely the question we ought to open up, to really take account of the highly diverse population of people that have some reach to the Internet. Because it’s only when we actually take diversity seriously that we have any shot at getting to something better than more education for the already well educated.

The Mean, Misogynistic Internet—Another Diversity Problem for MOOCs?

In academia, discipline, women on February 6, 2013 at 12:49 pm

Yesterday I wrote about what made me stay with Computing despite the horrible gender imbalance—the personal encouragement I received from teachers who went out of their way to support me. Today I want to broach another piece of why I’m reticent to offer a MOOC: the comments.

I’ve been looking at comments that others have received on their MOOC offerings. No surprises in some ways, they look like a lot of Internet comments. Some are mean, some are stupid, and some are sexist. Of course there are some helpful comments too, but not all.

A few weeks ago a colleague of mine posted this story about a British female academic who argued a position on immigration and was vilified on Twitter as a result of it. The remarks made about her are vile, with levels of misogyny that are depressing. Clearly MOOCs are not the same as arguing a position on immigration, but the same patterns of misogyny exist. It’s rare, but I have received remarks in my teaching evaluations that exhibit this quality. I see Rate a Prof being used in similar ways. Why should MOOCs be exempt?

In discussing this with a colleague he told me about how a video of his technology that featured a woman received a misogynistic comment about her. He removed the comment, but I’m not sure one can moderate comments about MOOCs. I can see that as appearing problematic. Its easy to imagine being accused of moderating comments in such a way that the course reviews were biased towards the positive. The very commentators who likely want to make their vile remarks might be as angry about having their comments are removed. Censorship and freedom of speech are powerful arguments.

I am not willing to expose myself to a situation where any person can use comments to promote attitudes that defy belief that will subsequently end up in one of Google’s data center forever associated with my name. That’s my name, my reputation. And how will other women see those comments? What will they think of the people who take those classes? That people who like Computing hate women. Great.

On a more personal level, and even if the remarks were removed, I still have to live with the idea that someone out there really hates me, hates what I represent, hates what I’ve achieved. Probably more than one person. I already have moments of self-doubt. And then we add in that these people will chose to express that hatred in the most disgusting of ways. It maybe electronically deleted from the record, but it won’t be deleted from my mind. I’ll still have to live with the idea that someone said that about me. I don’t find that a terribly compelling argument for offering myself up to that situation.

I think this warrants more discussion than its receiving, because of course its not the Internet itself, it’s the fact that its a forum for still far too widespread misogyny that exists in the real world. Further, because of the chronic diversity problem that Computing has, it’s hardly surprising that most of the people promoting MOOCs are just the sort of people who don’t experience the Internet as a minority and would be far less likely to be exposed to the mean, misogynistic Internet out there.

Diversity and Service

In academia, academic management, computer science, discipline, women on February 4, 2013 at 8:15 am

As I mentioned in a previous post recently I read this article about the advantages of being married for male academics versus the disadvantages of being married for women academics. It’s left me with a lot of questions. And being inspired by  Female Science Professor‘s question “why don’t more senior women in STEM blog?” I want to continue

In addition to teaching, research, and publishing responsibilities, service constitutes a major part of a professor’s career. … The gender breakdown within a department plays a significant role. Typically, there are more men than women within a discipline, and yet committees seek as much diversity as possible. Women, then, are often asked to do double the amount of service as men, a number that increases for women of color. While service is certainly considered when promoting, publications play a much larger role.

I understand the logic, to have a diversity of representation/voices at the table and so forth. But this is clearly the flip side of it, that women and minorities can get over-serviced. And since time is limited, service will eat into other important activities like research and teaching. This is a serious problem. But I don’t know what to do to change it. In the long-term we do need to recruit and retain women and minorites in STEM, but what do we do in the short-term? There seems to be a conflict here: we want to hear from diverse voices but in so doing we ask them to participate in things that compete for their precious research time.

One short-term piece of advice I would offer to anyone who fits this potential category, is to be very aggressive about saying no. Benchmark your service against a non-minority in your department at your rank. Do no more. (Read studies such as Link et al. “A time allocation study of university faculty” to see broad trends and uneven distributions as a reminder to do no more.)

The Marriage Advantage for some Faculty

In academia, academic management, computer science, discipline, women on January 28, 2013 at 10:56 am

I was just catching up on Female Science Professor’s blog (fabulous). Last year she asked “why don’t more senior women in STEM blog?

I’ve been quiet on my blog for a while, I had lost touch with it. It was out of my routine. So it sat quietly.

Recently I read this article about the advantages of being married for male academics versus the disadvantages of being married for women academics. It’s left me with a lot of questions.

Female professors were more likely to have a spouse or partner with a doctoral degree, 54.7 percent to men’s 30.9 percent. Their partners were also more likely to work in academe, 49.6 percent to 36.3 percent.

I wonder whether the same is true in Computing? I was thinking of my department, counting up the numbers of men and women married to other academics. There’s a difference.

A woman is quoted with her theory about why the balance is the way it is, she says

“I have a theory about this,” said Tara Nummedal, an associate professor of history at Brown University. “It seems pretty clear that smart women are going to find men who are engaged, but I just don’t see that it works the other way.”

I have another theory, based on my experience of dating, which is that some men find dating women with doctorates (when they don’t have one) difficult. I recall with some pain a date in which I was subjected to something that felt a bit like being on a quiz show. Yes, I happen to know what the second longest river in the U.S. is the Mississippi since the first longest is the Missouri, but I didn’t need to spend an evening playing this game. And, more crucially, a Ph.D. is not actually about being good at quiz questions. You can guess that the relationship didn’t last long, but this experience was emblematic of the problems I had dating non-Ph.D’s.

She added that a female professor with a stay-at-home spouse is quite rare, but often sees men with stay-at-home wives, allowing them to fully commit themselves to their professions.

I’ve wondered this before also. In one job I had, where I was one of a very small number of women, two of us were single and the other married to an academic. There were some single men in the department, but it was a small fraction of the entire department and a healthy number of my male colleagues, including all the managers, had stay-at-home wives. At that time being married to someone who could take care of all the things that arise in life that require being dealt with during office hours seemed like a huge advantage to me. Some of it was probably that I was often lonely (I had very much made my employment decision because I knew it would advance my career and not my personal life, that was hard, but I think it was crucial for getting to the next steps where I was able to balance both). Years later, I’m not sure whether it’s an advantage or not, because I’ve not ever experienced it. I have no comparison points, nor am I sure that the division of labor that I’ve described is ideal (accurate, enthusiastically embraced)… and I am more aware that my salary is a luxury that these families do not have. But, returning to the point of the article, I think it’s important to pay attention to the last part of the sentence, if there is the possibility for someone to fully commit themselves because that’s what the relationship supports, then yes, I still think that is a type of advantage.

I’ll cover another piece of this article later. That’s enough for now.

Program Chair: Reasons to Say Yes…

In academia, academic management, discipline, research on January 27, 2013 at 4:14 pm

I’ve chaired the papers track of a couple conferences now. I could write about the process itself, but instead I want to write about the learning experience of doing this. The first conference I ever co-Papers Chaired was CHI 2006 (with Tom Rodden). I owe Tom a huge thank you because he taught me several useful management strategies that I used during both processes, but also have found useful in my day-to-day activities.

And that is a good reason to volunteer to chair a conference. One of the reasons you’ll hear most often for agreeing to do this kind of service is that it’s good for the community. And you do give your time, as you do as a reviewer, member of the program committee and so forth. Another is because it looks good on the vita. I was told, for example, that serving for CHI meant that the community trusted me with the products of their academic research. I’ll add another one into the mix. For anyone who has ever complained about the way a conference is run, or what happened to their paper, nothing beats seeing what the processes are by which the conference is put together. Actually, I think it should be mandatory that anyone who complains, especially more than once, have to get involved with the organization of the conference.

And today I want to offer another reason, what you learn in doing this. Papers Chairing throws up a myriad of management situations. Each one requires a thoughtful response, many require subtle negotiation to balance needs of the various parties. As a program chair, you are responsible for ensuring that everyone who is giving their time to review etc. gets a fair shake and feels that you support them in their service. I like doing that. I feel its a great way to say thank you. Sometimes it’s harder though, as you have to work something out as best you can. Sometimes there are difficult messages to write, and the practice in getting tone as well as content right is invaluable.