Friday, November 30, 2012

The Challenge of New Research

Since last spring quarter, I’ve found myself in an awkward place as I try to work on my Qualifying Paper (QP). The ideal case, as one of my Advisors recently stated, is for students to decide on a QP topic during the spring of the 1st year, to read and conceptualize over the summer, to write a proposal in the fall, to collect data in the winter, and to write up the research in the form of a QP in the spring of the 2nd year. Very few students follow this idealized path, but my process is especially odd because there are a number of fundamental assumptions about the research process built not only into the pathway as a whole, but in the particular (and particularly ordered) milestones along the way.

Over the last year, I’ve increasingly come to identify myself as a Learning Analytics (LA) researcher. LA is a field which is in its infancy, and as such has very few established norms, seminal papers, and codified methods. My belief is that, like in other areas that have undergone data revolutions, education and education research will be totally upended by LA as it matures. In a world of easily available big data, in short, it’s increasingly difficult to justify research that ignores big data, especially if the primary reason for ignoring it is discomfort or a lack of expertise, and not its irrelevance (since it can be relevant to almost any research agenda).

The implications of a coming data revolution in education for an individual researcher who is still very much “in training” are many. I find myself torn in multiple directions, needing not only to develop a broad and deep understanding of existing literature in educational research – particularly on topics that interest me – but also needing urgently to develop computational and statistical competencies that go well beyond what is traditionally taught in schools of education. Regression models don’t cut it anymore.

This raises an essential point about the research process – particularly for an apprentice researcher – at this stage in the development of LA as a field. In traditional education research, tremendous emphasis is put on the development of good research questions. This is equally important in LA research, but with a key difference: in LA the most important and interesting questions are liable to arise well after data has been collected and analysis has already begun. In traditional research, collecting data before articulating, motivating, and conceptualizing key research questions and developing or appropriating instruments would be insanity. In LA – at least for a beginner who is still developing technical competencies, in addition to conceptual ones – it’s sometimes impossible to ask the question without first knowing what the data looks like, how it can be manipulated, and what kinds of questions are liable to yield interesting results.

Now, I’ve heard in my first year and a half as a PhD student the admonition that research should never be driven by methods. But research is always driven by methods. It’s a fine ideal to say that we should ask the most important and interesting questions, regardless of the methods necessary to answer that question, but in practice that’s an impossible way to do research. Now, what I’m not talking about here is the narrow refusal of so-called “quantitative” and “qualitative” researchers to use each other’s methods to answer questions. It is patently absurd for a researcher to refuse to ask a certain type of research question, generating the answer to which would require conducting interviews, simply because his own professional expertise is in regression models. That’s what collaboration is for.

No, what I mean is that in the very definition of what makes a good research question in Education there are implicit methodological biases that go well beyond the quantitative / qualitative divide. There are a set of methods which, though vastly different from each other, generally constrain the entirety of modern educational research. These are effective methods, no doubt, in many cases. They are tried and true, tested and approved, beloved and practiced. They are what any first year PhD student in a school of Education will be trained in, or at least exposed to, as a part of the core curriculum. They are what Advisors teach their advisees. I speak, of course, of regression modeling, of survey design, of psychological 2-by-2s, of “think alouds” and semi-structured interviews, of observation protocols, of video capture, of discourse analysis, of… you get the idea: the bread and butter of research in education. This set of methods for data collection and analysis constrains research questions, not because researchers cannot ask questions that these methods cannot answer, but rather because researchers do not call such questions research questions.

If I want to know how students use language in a Massive Open Online Course, broadly, I cannot use any of those existing methodologies without significantly altering my question. Such was my first pass at developing a research question, in which I planned to do discourse analysis on a sub-set of threads in the forums, sampled based on certain key elements (namely, debates around a particular problem from one of the problem sets). To do that research, however, would have been arbitrarily limiting, driven by the methods of traditional educational research. But the data set is so much richer than that. The problem is that there is no existing literature from which to borrow research questions, and no set of accepted methods for this kind of data that can help scope a research agenda. In falling back on traditional educational research, my first efforts at crafting a study were arbitrarily limiting, driven by a lack of technical expertise.

So what is the best question to pose? The answer is: the first question that an LA researcher needs to ask is exactly, “what is the best question to pose?” The first pass at data analysis ought to be driven by theory, of course, and a sense of what questions might be interesting, but the wonder of computation and massive data sets is that it is relatively cheap to get data, relatively cheap to test hypotheses with it, and relatively easy to totally change one’s research questions. In contrast to traditional educational research, where a change in research question may totally invalidate all of the collected data, in LA research there is so much data that ignoring significant chunks of it is essential to making progress in the first place.

In my case, in particular, the question generation process is even more emergent because my computational and statistical abilities are a work in progress. I know I am interested in language use, in one form or another, but I am still a novice in the area of Natural Language Processing (NLP), and thus do not know what questions are answerable and what are not. Every time I learn to do something new computationally, the possible research questions I might be able to ask change. As research questions change with technical skill, so to do the needs and demands for conceptual literature reviews. I’ve been hesitant to commit to a single conceptual framing, therefore, and to do an in-depth literature review because almost every week the interesting and important questions look so different that the literature I need to review changes.

What’s more, I can think of only one researcher / research team in LA so far that has done much NLP (Simon Buckingham Shum’s group at the Open University in the UK). Of course I should learn from Simon, but his research alone hardly constitutes a field from which I can draw on accepted norms, best practices, and a technical or conceptual training regimen. NLP itself is a fairly well-developed field, but as I begin to learn its methods and apply them to my big educational data sets I am faced with a conundrum: spend more time building technical expertise, or use my current expertise – limited though it is – to begin to do educational research. This is an important point: LA is not, in the end, a computer science field. It uses computer science, it uses NLP, it uses machine learning, it uses database management… but at the end of the day it’s a branch of educational research, with its roots in the Learning Sciences. I suspect many LA researchers – and particularly Educational Data Mining (EDM) researchers – would disagree with that statement, valuing the higher prestige ties to computer sciences and mathematics over the less sexy relationship with education, but I think it is vitally important that we don’t let computational power totally overwhelm the theoretical and scientific knowledge generated by the last century of research in education, lest we make the classic mistake of assuming that learning is simpler than it is.

So the question is unresolved: draw on expertise from NLP and other computational fields (data visualization, for example, and data mining for the purposes of clustering before performing NLP analyses) and thereby invest my time and energy into expanding my technical base, which in turn opens up richer and more interesting avenues for research, or begin to develop a stronger conceptual base for the kinds of research questions I feel I can ask now, and make progress on completing the QP. The answer, of course, is both. The problem, of course, is that each step in the former direction heretofore has forced a (totally unanticipated) redefinition of the problem in the latter. Add to this that the process of LA research is almost necessarily collaborative (in my case it is), and the problem becomes even more intractable. I cannot arbitrarily stop my collaborators from developing new and exciting questions simply because I have now invested time and energy into better conceptualizing a prior question. Ultimately, as I build my technical expertise, and as the field adopts norms and expectations, and as training in LA becomes more formal and less ad-hoc, this problem begins to go away, both for me and those who will come after me. But for now there is a tension.

Such is the challenge of doing new research.

Thursday, November 29, 2012


In memory,
  imagined language
    signals desires,
      pleading silently
        with constrained,
          subtle innuendo.
        Such secrets
      tease edges
    of devious
  minds, seeking
their owners.

Sunday, November 4, 2012

Why I'm Voting for Jill Stein in 10 Arguments

As an aspiring researcher in a School of Education, it seems almost a foregone conclusion that I should be voting for Barack Obama. The Academy is famously lefty in rhetoric, but center-lefty in its actual electoral practice, which pretty much sums up the President. As a student who is actively being socialized to the academy, I have certainly felt significant pressure (not direct pressure, which is ineffective anyway, but the indirect pressure of assumed common perspective) to support President Obama in this election.

It's not just a matter of social, economic, or foreign policy, you see. It's a matter of our very livelihood in the Academy: research tends to do better under Democrats than Republicans, and especially President Obama, who is bullish on educational technology and the transformative potential of innovations like Learning Analytics (my emerging research area).

So it seems like I should vote for the President. But I'm not going to. I'm voting for Green Party nominee Jill Stein.

The following are a set of perspectives and arguments for why I would do such a crazy thing. You'll note that "I live in California, which is 'safe' for Obama" is not one of my reasons. I might think longer about my vote if I still lived in my native Colorado. But in the end I would still vote for Dr. Stein.

1) I reject the idea that politics can easily be placed on a linear spectrum.

The left-right dichotomy in political discourse serves to marginalize minority perspectives, and to ossify the debate. Because we apply broad categories called "left" and "right" to the Democrats and Republicans respectively, we ignore the complexity of actual policy-based problem solving. The Green Party, if it can be characterized as a fringe left-wing party, is a satellite of the Democrats. In reality, there is no line that can be drawn which contains all three points: Republican, Democrat, and Green. The Green Party does not offer policies "left" of the Democrats. They offer policies fundamentally and categorically different from the Democratic Party in complex ways.

2) Jill Stein is a Doctor, and she would make decisions as scientifically as possible.

Lawyers and businessmen often make good public speakers, and sometimes make good leaders. Barack Obama is both a good speaker and, I think, a good leader. But Dr. Stein is a scientist, who believes in making policy decisions based upon scientific knowledge, and not political expediency. That may be 'impractical,' but even getting the idea of data-driven policy on the table seems worthwhile to me. Too often our political discourse is dictated by sophistical pandering, so much so that we now inhabit a culture in which "you have your facts, I have mine," has become an acceptable rebuttal. It may be a futile fight, but it is worth fighting for a political discourse based on data and scientific reasoning. I don't believe either major party wants to fight that battle.

3) The Green Party has been right about climate change for decades.

Whereas the Democratic Party has certainly used rhetoric about climate change to energize its base, it has shown little to no leadership in actually passing legislation that would curb the potentially catastrophic impact of global warming. The Green Party was formed by and large because of the recognition that combating climate change and other environmental issues caused by overpopulation and international industrialization would require a paradigmatic shift in the way we organize our governments and societies. Though this has been an unpopular insight, it is increasingly proving to be true. Eventually America - and the rest of the planet - will need leaders with the foresight of people like Dr. Stein.

4) The Green Party is international.

Unlike other major political parties, the Green Party is an international organization with local branches. Their international focus is unique in global politics, and is entirely necessary at a time when the world is increasingly interconnected, and what happens in Europe or Asia has as much or more impact on what happens in the United States as what happens in Ohio. Parties that are merely national, who see "foreign policy" as a separate category of political discourse from "domestic policy," are woefully obsolete.

5) The Green Party and Dr. Stein have not been bought and sold.

I believe that Barack Obama is more idealistic than a President can afford to be. And afford is exactly the right word. In order to be elected President in the United States, you need to raise lots and lots of money. In order to raise money, you have to raise money from major corporations, who then have expectations around what kind of policies you will or will not enact. That's vastly oversimplified, of course - it's more about shaping the conversation and determining who gets to come to the table than dictating policy decisions. For example, there were no significant advocates for single payer health care allowed at the table in the Affordable Care Act conversations. That put boundaries on the debate.

Electoral necessity means Democrats and Republicans have to constrain the debate. But innovation does not come from putting narrow boundaries on conversations and limiting acceptable solutions before the conversation even begins. The biggest benefit to politicians who are not awash in corporate money is that they are free to engage in real dialogue and real problem solving.

6) We must change our democratic processes.

Related to the above point, the way we elect Presidents (and other officials) simply does not make sense, and simply is not democratic. There are several problems:

- Wealth has a disproportionate impact on an individual's political clout.
- Corporations count as "individuals," and are wealthier than any real person, and therefore more powerful.
- Politicians are by-and-large more focused on reelection than governance.
- Plurality is a poor way of choosing a winner in an election, and run-off is too expensive. We need Instant Run-off Voting.
- Along with plurality, federal representation by geography alone is outdated, as it ensures that only a limited set of ideologies get represented in the government. Most other first world countries apportion seats by party (also flawed, but better).
- Our current system creates far too much incentive to cynically manipulate voter turnout.

Raising these issues is important, because neither major party stands to gain by doing so. Supporting third parties is - in a wonderful catch-22 - the only way to raise these problems to prominence so they become a part of the discourse.

7) This is not the most important election of my lifetime.

Every single election of my lifetime has been described as the most important election of my lifetime. The truth is, none of them are (I suppose one will be, but it's hard to know now which). While the Democratic and Republican parties do offer stark differences on many issues, it's worth noting that on many fundamental issues of process (see point 6, above) there is no difference between the major parties. Because there is no difference, there is no discourse. Because there is no discourse, there is no opportunity for change. As long as we buy the hype around how important the differences between Democrats and Republicans are (and how vitally close the election is), we continue to push aside the opportunity to raise more fundamental issues. Indeed, it is particularly in states where elections are close where third parties - Green, Libertarian, Constitution, whatever - stand to have the greatest impact on the debate. That is, as long as we don't silence them.

8) Revolutionaries spoil corrupt systems.

I was going to call this point two different things. First, I was going to say: "third parties are often responsible for positive change." Then I was going to say: "third parties are not actually spoilers" (except inasmuch as they open up uncomfortable, but important conversations). I combined them into one.

To the first sub-point: it took a sustained and politically meaningful push by the Socialist Party to get FDR to actually adopt his New Deal reforms. Recall that he was President for a significant amount of time before he even began to implement reforms. Consider, more recently, how President Clinton's economic policies were at least in part shaped by the rise of Ross Perot. It is saddening that the Democratic Party, instead of adopting the pieces of the Green Party platform that made Ralph Nader so (relatively) successful in the late 90s and early 2000s, they actively worked to shut him out of the debates. Again, the lesson from history is: significant third party support demonstrates an avenue for opening up a new conversation. The Democrats of 1996-2012 have chosen instead to work to actively suppress 'marginal left-wing' Greens (see point 1).

To the second sub-point: Nader did not, in fact, cost Gore the election in 2000. For one thing, the logic of apportioning Nader votes to Gore and saying, "see the difference between Bush and Gore was smaller than the number of Nader votes" is deeply flawed. By that very logic, Pat Buchanan cost Bush more electoral votes than Nader cost Gore. But more importantly, exit polls suggest that Nader voters, if they had not voted for Nader, by and large would not have voted, period. It is naive to think that Nader was just a "more left-wing" version of Gore (again, see point 1).

In short, Nader was a "spoiler" because we re-wrote the narrative to describe it that way. In reality, though, what Nader spoiled was an already corrupt system. It was laid bare by virtue of his presence in the race.

9) The Presidency isn't as important as you think.

I am continually amazed at how important Americans think the Presidency is. Simply because the President is the most visible elected office in the country does not mean it is an all-powerful office. In reality, while President Obama may be the most powerful man in America, it would be quite easy to find two people combined (or 538 people...) who wield far more power than the President. Which is to say: even if the President gets to set an agenda in a broad sense, whether that agenda gets enacted or not has a lot to do with the congress, with the weather, with whether or not Greece stays in the Eurozone, with Chinese environmental regulations, with Mexican immigration patterns, with Iranian protesters, and so on. The President has direct control over precious little, so let's not over-emphasize his importance. Our nation was designed to ensure that no one man or woman was so important that his or her decisions would make or break our society.

10) The student debt bubble is as risky as the sub-prime mortgage bubble was.

Perhaps the most compelling argument for Barack Obama, to me, is his forward-thinking approach to education. He has instituted an office of educational technology, and the common core movement has blossomed under his administration. That said, there are still huge swaths of the educational conversation that I feel as though Jill Stein and the Green Party are willing to have that the Democrats and Republicans are not. Chief among these is the student debt bubble, and who ought to pay for education. I am perhaps naive in believing that education is one of those services for which the costs ought to be socialized, simply because an educated populace is a necessity for democracy. While the Democratic Party still uses that rhetoric, it is worth noting that modern public universities are as expensive now as private universities once were. The cost of higher education is unsustainable, as is the issuing of massive (and undischargeable in bankruptcy) student debt.

Thursday, November 1, 2012

Two World Clouds

The following word clouds were inspired by the conversation this week in Education's Digital Future, a class I'm currently taking. The question: what is a University? In particular, we talked some about Stanford. While these word clouds are far from a total picture of what Stanford is, they are two interesting perspectives.

The data for the first world cloud comes from Stanford's Wikipedia page

What Wikipedia thinks about Stanford.

The data for the second world cloud comes from the mission statements from Stanford's six schools:

What Stanford's six schools think about Stanford.

In my opinion, the most interesting word that is in the Wikipedia page, but not in the missions, is "campus." The most interesting word that is in the mission statements but not Wikipedia is, I think, "resources," though "interdisciplinary," "collaboration," and "ideas" are notable as well.

We discussed seven metaphors for the University in class: temple, sieve, hub, incubator, mangle, quasi-sovereign, and fluid. All are fitting in certain ways, but none is a perfect metaphor. I think the mission statement word cloud emphasizes the "incubator" aspects of Stanford, in that they reflect both the commitment to training students and to producing research (and ideas). Wikipedia, on the other hand, reflects Stanford's role as a "hub" (and possibly "temple"), emphasizing its physical location and its place in history.

(Both word clouds made at