teaching and presenting with the iPad is broken

I was hoping, desperately hoping that Keynote for the iPad would become my dedicated presentation and teaching device. Imagine it: highlighting, circling, presenter notes, and all of the media I could want in a seamless experience, all pumped out of the video out cable to a project.

Unfortunately, it’s nowhere near that experience yet. It turns out that video out is handled in very app specific ways. Keynote, for example, projects the slides onto the secondary display and simply shows which slide number your on. That’s right: you can’t see your slides while your present. If you want to see what you’re presenting, you’ll need to look behind you.

But it’s worse than that. Let’s say I want to show a YouTube video; when I leave Keynote, the display stops signaling altogether. So rather than an elegant black screen, or mirroring the iPad display, you get your projector’s big ugly blue no signal screen. The experience is quite broken. Any app that doesn’t explicitly support video out simply doesn’t provide a signal. I can’t project anything on the web, for example, or sketch in front of students.

Luckily, most of these are software limitations. I hope that the lack of mirroring isn’t a hardware limitation. Does Apple know these things? Is it rectifying these issues? Who knows. They’re not known for their transparency. Maybe I’ll find out that everything is fixed with iPhone OS 4.0…

What the iPad is and isn’t

After 4 hours of continuous use, I can confidently say that the iPad rocks in many ways, and fails in only a few. It’s a genius way to browse the web, to write short emails, to listen to music, to watch short videos, to use Facebook and Twitter, to give simple presentations, to read news, and to show photos. Theres literally no better experience out there for most of these activities. It’s also a great sketchpad — not as great as a real sketchpad, mind you, but oh so much easier to share and archive.

It fails in the obvious places. The onscreen keyboard is bearable. I can tye a lot faster than on my iPhone or any cell phone, but I’m not getting my typical 80 wpm. A wireless keyboard would make up for a lot of these limitations, but it sort of defeats the purpose of carrying something slim and lightweight. I’ve typed a lot on is in the past few hours and I feel a bit held back, but not so much that I don’t feel productive.

There are still some ways where multitouch is inherently limited. One out of every 10 times I tap or drag, it doesn’t do what I want. This is no different than on the iPhone, but I’ve noticed myself acclimate to the inaccuracy. The device hasn’t gotten any smarter, I’ve just gotten more tolerant.

The iPhone UI toolkit still breaks many pervasive web conventions. For example, I’m typing this in a text field in a WordPress page, and scrolling up to edit the previous paragraphs is incredibly slow, even with the two fingered scroll interaction, compared to a scroll wheel.

But I already love this thing. For all of the activities I mentioned earlier, the iPad is the clear winner. It’ll sit next to me at my desk and be a constant source of distraction during work. I’m thinking it’ll be a dedicated calendar while I use my laptop. Time to start exploring what else this form excels at! Like multitouch visual programming, hint hint…

spreadsheet error costs time and money, yet again

Back in November, I got my first water, sewage, and gas bill from a company called ISTA. My apartment management had taken a while to set up the billing after the previous billing company went of business (or dropped the contract, I don’t remember exactly what happened). So I hadn’t actually paid water and sewage for two or three months.

So when the bill came for $173, I wasn’t too surprised. I didn’t really remember what I’d paid the previous year, but this seemed reasonable for a few months of water, sewage and gas. I wrote the check, and forgot about ISTA.

Forty five days later, I got the next bill, but this time something seemed wrong: $463 and it had only been a month and a half. What the hell was going on? I looked back on my old bills and noticed that my average 30 day bill was about $30, even in the winter. Either the company was trying to extort money from me or somebody had made an accounting error.

I looked more closely at the bill, which had three columns: previous usage, current usage, and usage. The difference between the first two columns was exactly 1000. The value in the third column was 10,000. Was there some hidden multiplier I didn’t understand? Maybe there was some rate that just happened to be 10, and I had just kept my apartment and showers really warm this winter.

So I called ISTA and disputed the bill. They immediately escalated it to their dispute manager, who called me back after a few days. They said that there had been a misread meter and that they had corrected the reading, and that after the bill was now only about $270, after they had applied the credit. When I got the call, I had a meeting to be at, so I didn’t think about it much.

After I got home that night, it still didn’t seem right. $270 for 45 days? What happened to the rates? They must have gone up by a factor of 10! So the next day, I called ISTA back, and spoke to a nice lady about my problem. Rather than call the dispute manager again, she told me she was opening my spreadsheet. She proceeded to walk through the calculations with me, describing the rates and the formulas. I jotted them down on paper as we went. Finally, we got to the final total calculation, and she said, “so this times the multiplier is … wait, it shouldn’t be.” She immediately put me on hold.

A few minutes later, she came back, saying that she needed to have the accounting department look at my spreadsheet. My spreadsheet, implying that every customer has their own. She said that the dispute manager would, yet again, call me back in a few days.

Four days later, the dispute manager called me back and explained that there was some sort of disagreement between billing and accounting, regarding the cause of the problem. Billing thought it was the spreadsheet and accounting thought it was the meter readings. She said she’d call back in a few more business days, after they’d worked out their differences.

When she did call back, she leveled with me: accounting was wrong, there was an error in the spreadsheet, and after fixing the multiplier cell, my bill was reduced by a factor of 10. After the credit calculators, they determined that I had overpaid from the previous bill by about $100, and that I probably wouldn’t have a bill for the next two cycles. She apologized for how long it took to resolve the issue, but reassured me that it wouldn’t happen to me again.

But I wasn’t thinking about me at this point. I was thinking about all of the other customers, whose spreadsheets probably had the same error. Would the accountants audit all of the spreadsheets that copied the error? How many customers would call about the bills? How many would insist, like I did, that there was a spreadsheet error, and demand that it be properly diagnosed? And how much of this feedback would ever make it to the accountants writing the buggy spreadsheets?

Oh, end-user programming. Your manifestations in society abound.

my juxtaposition on the ipad

Yeah, I’m a little late to the discussion. But as I’ve contemplated over past weeks the merits of the iPads form and function, trying to imagine what I’d do with it and what others might do with it, I keep coming back to the same problem: the iPad, nor the iPod or iPhone, support juxtaposition. That’s what all of this whining about multitasking is about. So many things we do on computers is compare, contrast, and cross-reference between applications, and yet that’s one of the major things the iPad cannot do.

I wish copy and paste were enough, but it’s not. It’s about writing an email about the news article you have open, or quickly checking the status on some build, or reading a dictionary definition online while you’re writing. You can’t do these things on single-task UIs, because the cost of leaving one app, opening another, and then returning to an app is at least 30 seconds. That, and everything you might want to juxtapose against has to be kept in your head for these 30 seconds. Good luck with that when you’re trying to think.

So maybe Apple decided the device wasn’t for thinking or creating. Maybe it’s just for consuming. But even consumption takes juxtaposition. I find myself on my iPhone all the time, wanting to read a Yelp review and see where a place is on the map at the same time.  Because this isn’t possible, the Yelp app tries to do maps well, and the Maps app will probably try to incorporate reviews, leading to substandard experiences in both apps. Or, another example was when I was doing my taxes online: I was referencing advice in forums about teacher deductions (which I found out I can’t take), while trying to decide how to answer a TurboTax question. On the iPad, I’d have to go back and forth between the two, memorizing all the numbers and exceptions in the forum post in order to act upon them in the tax software.

People are going to realize this soon, too, and Apple’s going to suffer for it. Either Apple is just waiting for the right time to support juxtaposition, or their designers just have no idea how people produce and consume information.

managing time management

I’m no fan of time. It is a relentless, immovable force, destroying plans, inducing stress, and bringing unpredictable change to our delicate expectations of consistency. But I’m slowly realizing that of all of my scholarly responsibilities, time is the one constant, the ever present subject. Teaching is designing pace and order of content. Research is predicting and inventing the future based on all that has passed. And doing both of these well requires me to carefully sequence my thought and actions to best leverage what little time I have. To manage my time.

I don’t feel like an expert at this. But people often comment on my ability to manage time well, and I’m always struck at people’s inability to understand where their time goes. What am I seeing that others can’t? What am I doing to control time?

After thinking about it a bit, I think it boils down to three basic skills. First, I’ve always been able to prioritize. That might sound simple, but it’s an astounding act to take the thousands of things one cares about and rank them. This is because there’s nothing inherently ordered about the things one cares about. The laundry isn’t inherently less important than writing a lecture if all my clothes are soiled. Knowing how to prioritize and constantly reprioritize as situations change needs to be a conscious, explicit act, if one hopes to influence the future. Otherwise, the forward thrust of time will always dominate and dictate what’s important.

But knowing what’s important isn’t enough. Being able to articulate what’s important, and define what that means, is important as well. For instance, my allergies have been bothering me a lot lately and I’ve wanted them to bother me less. But there are a lot of ways to articulate that goal, some more useful and actionable than others. For example, when I say, “bother me less,” I have to be very careful to know what I mean by “bother” and “less.” Do I want to get off allergy pills? Do I want to buy less facial tissue? Do I want fewer sinus headaches? These are all different goals and I might do different things to accomplish each. They also might improve my life in different ways. It’s not until I start giving meaning to verbs and adjectives in my goals that I can start to evaluate what’s really important to me.

Finally, a clearly articulated goal isn’t enough to accomplish it. A third skill is being able to take a well articulated goal and carefully break it down into smaller goals. For example, if I decide I want fewer sinus headaches, I need to find out what causes sinus headaches and see what causes I have control over in my life. This might lead to several new activities and goals, such as drinking more fluids. How can I drink more fluids? Maybe I need a water bottle. What’s a good water bottle? And so on. Without breaking down the future into manageable parts, it would be quite difficult to find a moment where progress seems actionable. Big goals only lead us to imagine big, unwieldy futures. A good small goal is something you can imagine getting done in an hour.

So how does one get good at these things? Practice is probably the best approach. Having accomplished a goal before makes it a lot easier to know how to break down the goal into smaller goals, with confidence. It also gives one practice at articulating goals. Another trick I like to use is to write to do items about to do items, such as “break down the goal of reducing sinus headaches”. That way, the first and most important step of accomplish the goal becomes a legitimate step in a process, and something that I can accomplish in 20 minutes.

Tools can help with all of this, but most are just glorified lists. Most of the important parts of managing time come with discipline.

Emerson interview (part 2); writing for HCI venues

Here is part two of Emerson Murphy-Hill’s interview with me. This part covers some of the challenges in publishing in HCI venues.

Q: A prominent proponent of empirical software engineering once told me that that he typically spends a full page discussing the threats to validity of his evaluations. At the same time, it’s not unusual to find a CHI paper that doesn’t discuss threats. How does one choose which threats to include and exclude, and how to present those threats, to the CHI community?

Most CHI papers clearly discuss threats, just not in a section titled “threats to validity.” This tradition comes from CHI’s cognitive psychology research, where the threats were inherent to the study design and discussed throughout the method and discussion sections. There never needed to be a separate section because it was expected that discussion of the limitations would appear throughout the article. As a guideline, one should always discuss all non-obvious threats to validity. Its a necessary part of honest scholarly work.

Q: Where do you draw the line about whether a threat is obvious?

Some threats are common to all empirical research: the sample size was to too small, the study may not generalize, situations may not have been representative. These are standard disclaimers and its always worth mentioning them briefly. The ones to really spend time on are the definitions and measures one uses and what likelihood they have at actually capturing the concept of interest (the construct validity) and whether they have any meaning for the real world (ecological validity).

Q: Have you had an HCI reviewer suggest that your work is better suited for a software engineering venue, or vice versa? If so, how did you deal with the suggestion? If not, how do you think you preempted it in the first place?

No, I’ve never had a reviewer suggest that. Of course, the work that I publish at HCI venues usually has more to do with the actual work of software engineers, their collaborations, or their interactions with users, as opposed to conventional software engineering research on automation. I think one of the main stumbling blocks that software engineering researchers will have trying to publish at HCI venues is demonstrating that the problems they work on are of significance. For example, a common type of software engineering paper will find some specific set of circumstances that can be exploited to automate bug finding or prove correctness within a certain set of assumptions. In general, HCI researchers aren’t interested in these types narrow contributions, unless there’s some good evidence that the set of circumstances exploited is large and generalizable to some degree.

Q: In an HCI paper, where do you make the argument about generalizability? Is there room for speculation?

Andrew: There’s always room for speculation. That’s what discussion and limitation sections are for. The whole point of studies is to use a kernel of rigorous and trusted analysis in order to make predictions about the larger context of the world. In fact, I think too many software engineering papers simply report results and ignore what impact a tool design or study might have on our understanding of software engineering. Tools, after all, are embodiments of theories about the world, and they have just as much potential to teach us about our surroundings as studies – perhaps more.

Q: As a reviewer for HCI venues, what is the most common mistake that you see software researchers making?

Being more fascinated with technology itself than what technology does for people (whether those people are technology users or hardcore software developers). More often than not, I will read software engineering papers published at HCI venues that try hard to persuade me that the clever tricks they devised are interesting enough to overcome the minimal impact the tricks will have on users’ work and experience with a tool.

I also see software engineering researchers try to make knowledge contributions about software development practice without citing the large body of work done at CSCW and other conferences about group work. HCI researchers tend to view software development as just one of many examples of collaborative work. The argument that its special and unique usually doesn’t fly without evidence.

Q: Although HCI submissions are often anonymous, people tend to be suspicious of “outsiders,” and may treat outsiders’ work with some undue hostility. What can a software researcher do to avoid identifying himself as an outsider in the HCI community?

All HCI researchers are outsiders. There’s not enough of a concentration on any one topic or problem for there to be a common core. The best thing to avoid sounding naive is to read as much about a topic outside of your discipline as possible. HCI draws from cognitive science, psychology, design, computer science, engineering, anthropology, social psychology, communication, education, and several other fields. Chances are, there’s work in all of those fields you should at least be aware of, if not read and cite.

Q: Suppose that you attempt to solve a usability problem for a certain kind of software tool; HCI researchers may perceive that you are solving only a very narrow problem, and thus your contribution is small. How do you deal with that?

The typical solution to this problem is finding a community that thinks your problem is broad instead of narrow. HCI research tends to have a fairly broad view of the world, since its so applied, so understandably, many problems will viewed as small (just like any non-academic would view our problems as narrow). The best one can do is demonstrate what relation the problem has to society and what impact it might have on the world – not just on the tool users.

Q: Where should software researchers send their human-centered papers?

Anon: CHI is an obvious choice, but it’s the premier conference in HCI, which makes it a difficult target even for very experienced HCI researchers. Beyond CHI, what would we recommend? I’ve been investing in VL/HCC, the logical successor to the sadly defunct Empirical Studies of Programmers (ESP) conference. It’s a strong secondary conference, with a first-rate community, but with less content about professional SEs than I would like. I’m hard-pressed to recommend another HCI conference.

Emerson Murphy-Hill interviews me (part 1)

About a month ago, Emerson Murphy-Hill (currently a post-doc at UBC) asked if he could interview me about the challenges of doing HCI research about Software Engineering (and vice versa). I’ll post our interview in two parts: the first, listed here, covers HCI and software engineering research and the second covers publishing in HCI venues.

Q: What are the biggest differences between the HCI and software research?

Both pursuits are very problem-driven: we want things that work, that demonstrably solve issues, and move practice and our understanding of practice forward. Not only that, but both pursuits are largely interested in solving the same problem: we want to find ways of creating software and technology that achieves its requirements and ultimately serves customer and user needs.

Where HCI and software research differ are in their methods. Most HCI research focuses on understanding the context being designed for and using this understanding to drive innovation. Software research often works the other way around, seeking innovative technological approaches to well-established problems, but not often doing formative research to discover new problems. The other difference between the two pursuits is the breadth of their methodological toolboxes. HCI researches will use whatever method is appropriate for a research question, whether that means controlled experiments, interviews, ethnographic field work, or any number of theoretical frameworks and formalisms. Software research tends to be more restricted methodologically, focusing mainly on first order logic and quantitative empiricism. In my view, this restricted set of epistemological tools prevents software researchers from seeing the larger context of the problems we try to solve.

Q: It sounds like you are suggesting that software research could make more of an effort to do formative research to find problems. Are there any areas in software research that you think that we don’t truly understand the nature of the problems?

I think that software research, as a whole, severely underestimates the role of communication, coordination, and management in successful software projects. The importance of this factor has been claimed for decades and studied infrequently for nearly 20 years, but it plays such a minor role in most software engineering research. This is surprising, since most other engineering disciplines focus heavily on the actual human process of engineering different goods.

Another area that is understudied is the notion of require- ments and what they actually mean. Ultimately, the role of software in humanity is to support humanity, but more often then not, software engineers let the medium, rather than humanity, dictate the design. There are interesting connections between Requirements Engineering and HCI, in that both seek to elicit requirements, but using different methods. I’ve seen very little work that bridges these different approaches to software design. Generally, most software research focuses on “getting the design right” rather than “getting the right design.”

Q: I noticed that you mentioned that software research has begun to use quantitative empiricism, but did not mention qualitative empiricism. Do we not use both?

In my experience, software engineering researchers are highly skeptical of qualitative methods. It is quite rare to see a paper at a top conference that uses qualitative methods exclusively. I’ve personally received reviews that suggested I convert my qualitative observations into numbers in order to make them more objective (which, epistemologically, is both ineffective and naive). Unfortunately, there are some questions for which quantification is insufficient. For example, if you want to know how a software development team selects which bugs to address first, what do you measure? Some objects of study are processes and activities, with no single measurable dimension.

I understand why the community is skeptical; we come from quantitative traditions; most software engineering researchers only have a vague idea of what sociologists and anthropologists do. It’s unfortunate that at the moment, to publish research about inherently qualitative phenomenon, one has to create artificial and unhelpful measurements of phenomenon to make it palatable.

Q: Do you have any recommended reading for doing research in HCI?

There’s no small set of reading that would suffice. HCI spans over 50 years, 20 conferences, and at least 20 journals. CHI, the leading HCI conference, is the second largest ACM conference, second only to SIGGRAPH. Deciding what to read can be daunting and there’s really no way to reduce its complexity.

Instead, I’d suggest that learning to do research in HCI is more about choosing which methods you want to excel at. Personally, I focus on user interface design, evaluation, and empirical research, and that only covers a small subset of the kinds of methods that people use in HCI.

I can recommend some books, which offer some perspective on the mindset of HCI researchers. For example, Bill Buxton’s Sketching User Experiences [2] is a fantastic look at what it means to design systems and how finding the right design (the HCI part) can greatly simplify getting the design right (the software engineering part). I very much subscribe to his perspective (which isn’t surprising, since Bill unofficially advised my advisor, Brad Myers, at Toronto).

Q: At what point in the research process should researchers consider what venue to submit to? Should the venue influence how you conduct your research?

There are definitely more experienced people to ask than me! But perhaps I have a fresh perspective on the issue, as I straddle the boundary between the HCI and software engineering worlds. Ideally, researchers would select important, interesting problems and publish the work when its done. The venue should only matter once one knows what the contribution is and which communities would appreciate it.

Unfortunately, the conference culture in both HCI and software engineering tends to incentivize work of limited and conservative scope. This has been discussed across a variety of venues, including HCI articles and conference panels, as well as several ICSE keynotes and papers. That, and a lot of good work gets rejected because its not yet fully formed. I believe that journals, with their multiple rounds of review and lack of deadlines, offer a healthier process with which to vet and disseminate academic research.

Q: Tasks used in HCI studies often appear to require little domain expertise and can be conducted in a short amount of time whereas software studies often require substantial domain expertise and can be difficult to structure to complete in a short amount of time. Is this statement true in your experience and if so, how have you managed the issue?

I don’t think this is a fair characterization of HCI research. In the past, HCI has focused a lot on novice tasks, partly because user interfaces were so bad; there is also a subset of HCI research that focuses in input techniques, which are more amenable to experimentation because of the more limited variance in human motor performance. But in the past few decades, there’s been a broad focus in HCI on supporting experts and expert teamwork in a variety of domains. Designing studies to support these activities are just as or more difficult than designing studies to evaluate software tools. This is one reason why HCI has adopted so many other kinds of methodologies: one can’t design a controlled experiment to learn how first-responders use cell phones to coordinate. We have the same challenges when designing controlled experiments to learn about coordination in software teams.

I deal with this challenge in my own work in a few ways. First, like researchers in all other empirical fields, I carefully design my measurements, stating their limitations and potential confounds, and then move forward despite the threats. The ultimate product of any empirical work is not the one perfectly designed study, but a large collection of studies that repeatedly demonstrate consistent and convergent results across a variety of contexts and with a variety of operationalizations. There is still an attitude in software engineering research that a single study should suffice; we need to move away from that view and start to plan for decades of study and experimentation on fundamental issues.

Another way that I deal with this challenge is to design studies that explain how what my tools do for people and how they do it. For example, I’m designing a study at the moment with James Fogarty and Kayur Patel to evaluate how their integrated classifier development environment helps developers find bugs. The goal of the study is less about showing a difference in success (since success in the real word depends on too many other factors) and more about explaining what the tool does differently than contributes to developers’ success. To do this, we’re asking participants to verbally state changes in their goals, and associating these shifts in goals with the use of different parts of the tool. This way, the study result is not “participants were more successful,” but “participants were more successful because they spent more time confirming fewer hypotheses.” This is the kind of knowledge that helps design other debugging tools.

Q: So do you think that any parts of HCI or software research will have a lasting impact?

Well, this is a controversial topic within HCI, but I personally believe that there is fundamental HCI research and then there are applications of HCI methods (which are actually the methods of other communities, such as cognitive psychology, anthropology, and design). My body of work, for example, is largely an application of HCI methods to the problems of software engineering practice. I view the core areas of HCI as input and output devices and anything else having to do with feedback and interactivity. This latter category has and will continue to have lasting impact.

Software engineering, like HCI, has made several foundational contributions to practice, such as version control, limited forms of model checking, compilers, debuggers, and development environments. However, many of the coordination, planning, and management aspects of software engineering have moved along largely without the help of research. I think the challenge for software engineering research is to recognize that many of the fundamental challenges in practice are human challenges, and that many basic software engineering tools must be designed with these challenges in mind.

One philosophical issue surrounding the future of both applications-driven HCI research and software engineering is whether the domains we study and design for are moving targets. Psychology, medicine, and the natural sciences operate under the assumption that people and nature don’t change in their fundamental nature (or at least very quickly). This makes it possible to advance knowledge with empirical study over the course of 100 years. Can we make the same assumptions for the nature of coordination in software development? Are there really fundamental, unchanging aspects of software engineering practice, or are all of the challenges we observe ephemeral? This is an open question that neither HCI or software research have begun to address.

Q: You mention that doing HCI studies are hard. How might one get started doing an empirical evaluation for the first time, considering both the need to get useful results and the high likelihood of making a mistake?

To really get good at empirical evaluation, a lot of things are necessary. First, find an expert at empirical evaluation who is interested in applying their skills outside of their content area. These might be statisticians, experimental psychologists, or researchers in policy departments. Second, get a good book about epistemology: there’s no end of gentle introductions to the power and perils of measurement. I recommend The Numbers Game [1] for an intuitive sense of the complexity of measuring things. The key thing is to learn to be extremely skeptical about the validity, reliability, and semantics of measurement.

The rest of the challenge is knowing your audience. Do you really need an experimental study to support your claims? Or would finding one person to adopt your tool for a week suffice? Do you really need to demonstrate causality, or are there other more pressing questions that might be interesting to investigate? There are lots of ways to gain confidence that your design choices were good by some measure.

what’s surprising?

A common complaint of research is that it’s not “surprising.” For example, a reviewer might say, “The study was well done, but the results weren’t really that surprising.”, or, “I found the results a bit predictable.”

But what do these statements really mean? Do they mean, “Had you asked me the research question, I could have guessed the results with some degree of confidence.”? Or, “If you asked your research question of 100 experts, 95 of their guesses would have been right.”?

Maybe we might intend for them to mean that, but they don’t actually capture what happens when a reviewer reads a paper. What usually happens is the reviewer reads the research question and thinks, “Hm, I could guess, but I’m not sure.” Then, upon reading the results, the reviewer thinks, “Well of course, that’s not surprising at all.” The test executed here is not whether an expert can confidently predict the answer to a research question, but whether in hindsight it seems plausible that an expert could have guessed the result.

In this sense, what makes a result “surprising” has less to do with what we know as scientists and more to do with what we think we know about what other researchers know.

This social fabric that apparently underlies our judgements of what is known has other interesting effects on what is accepted as advancing knowledge. For example, that some finding has not been published, is rarely a satisfactory argument for why something should be published. What underlies this belief it that it is not our goal as scientists to document everything that we know. Instead, it is our job to document the subset of what we know that is interesting, important, and surprising.

But aren’t most judgements of what is interesting and important are grounded in the present? How are we to know what is interesting or important in the future? Who are we to judge that the future of humanity will find no interest in the uninteresting, unimportant results of today? Take, for example, a recent review I wrote on a paper about using multitouch, tabletop displays for engineering design. I argued that it was unclear what problem was being solved. But what if it solves a problem that doesn’t exist yet? Or what if it solves it in such a way that another problem I hadn’t even thought of becomes trivial? On what basis could I really judge whether the work would have future worth?

All of this makes me think I don’t give papers a fair shake. Maybe I’ll adopt a new reviewing protocol: instead of reading the paper straight through and recording my thoughts, I’ll look at the authors’ research question and try to answer it myself for five minutes. Then, I’ll read the paper and if they came up with a different solution or answer that mine (that is of course reliable, sound, etc.), whether or not I’m surprised, the authors get credit for discovering or inventing something that I didn’t know. Of course, If I guessed their results or solution in a mere five minutes, what could they possibly have contributed?

the semblance of objectivity in numbers

I just received my first ever first-authored conference paper rejection from FSE. The primary reasons, quoted from the reviews, include:

  • “The qualitative nature of the study … is liable to misinterpretation and bias.”
  • “I was expecting a quantitative analysis: is there any correlation between some of the characteristics and between [the results] and the time a bug takes to resolve and its resolution status?
  • “I would have thought that what types of elements to look for in discussion should be decided before by the researchers as it should be based on the problem”
  • “I was expecting concrete advice on HOW the tools should structure the discussion.”

I was hoping the reviewers would have been more epistemologically informed. For example, the first and second quotes are quite telling: they imply that some forms of empiricism are not subject to misinterpretation or bias. But quantitative empirical measures are just as subject to bias as any other measure. For example, if I had counted certain kinds of data and run correlations between these counts and other outcome measures, not only would one in twenty of them be “statistically significant” by chance, but whether there was any real meaning in the variables depends on the construct validity of the quantitative measurements. For example, if I had correlated hyperboles with bug resolution time, not only would the hyperbole measure have the same limitations as it did as a qualitative classification, but the bug resolution time would have any number of contextual factors that could influence its true reflection of the hyperbole’s impact on consensus. Transforming empirical observations into numbers does NOT make them objective, nor does it prevent bias and misinterpretation.

The third quote is ironic: this reviewer seems to believe that the only way to analyze a problem is to make some assumption about its nature upfront. The whole point of qualitative research is that the more you make upfront assumptions, the more you bias your findings. What this reviewer is proposing would have lessened the objectivity of the results and prevented us from uncovering the trends we did.

The last quote reveals the systemic bias in software engineering research (and also some HCI venues): qualitative studies are only valuable if they explicitly inform design. What this really reduces to is a view that material goods are real work, but the production of knowledge comes for free. Building a system or automating some activity, even if the system and automation are entirely impractical in the real world, is more valuable than understanding the real world. The comment also reveals the reviewer’s lack of understanding about design: innovations don’t come from studies, they come from people. Studies can support design decisions (and the results throughout our rejected submission have been quite valuable in our current design efforts), but they cannot generate ideas. People generate ideas.

Had I really wanted the paper in, I would have littered the submission with arbitrary, but seemingly objective quantifications and correlations of our data (which is what most quantifications are in software engineering papers). This has worked in past papers and is a tried and true workaround for the software engineering community’s lack of experience with qualitative methods. Reviewers would have thought, “I don’t get all of this qualitative stuff, but these numbers are great.” I decided not to do this on principle, since doing so would have only made the results seem more objective without adding any real objectivity.

So much for principle. Time to start correlating things!

tough T

I just spent a day at Edward Tufte’s course on information design at the Seattle Marriott Waterfront. I’ve always known his work, I’ve talked about it in design classes, I’ve told students to read his books, but not once have I heard him speak. Now I can confidently say that his captions speak louder than words. Snicker.

That’s not to say he wasn’t insightful. The books have always been a nice translation of classic design principles into static visual information design, but most of the course was simply him parroting his own words. What made it unbearable was that he spoke them with the lifeless apathy of a statistics professor. Oh wait, he was one.

Aside from his lack of spark, there were a number of nice things about the day. I got a box full of his books; I got a refresher on visual information design; I had a chance to think more about forms of dissemination for my research (I tire of limiting my influence to academic publications). It was also a nice calm before my early May storm of deadlines.