DENIZENS of the Twitter-verse, please be advised: Whether you are a Libyan celebrating the demise of Col. Muammar el-Qaddafi, a New Zealand office worker sleepily starting your day or a California teenager trying out the latest slang, your words are being analyzed.
Twitter is many things to many people, but lately it has been a gold mine for scholars in fields like linguistics, sociology and psychology who are looking for real-time language data to analyze.
Twitter’s appeal to researchers is its immediacy — and its immensity. Instead of relying on questionnaires and other laborious and time-consuming methods of data collection, social scientists can simply take advantage of Twitter’s stream to eavesdrop on a virtually limitless array of language in action.
At the University of Texas, for example, a group of linguists and social psychologists has been monitoring Twitter to track on-the-ground sentiment over the course of the Arab Spring, particularly in Egypt and Libya. After the death of Colonel Qaddafi, the linguist David Beaver and his assistants quickly summoned thousands of Arabic-language tweets before and after the event. They zeroed in on messages known to be from Libya by using Twitter’s system of geocoding. (Posts from cellphones, for instance, very often encode the user’s geographic coordinates.) The tweets were then automatically translated from Arabic to English and fed into a text-analysis computer program.
The researchers were able to create a dynamic portrait of Libya’s Twitter traffic. The overall traffic skyrocketed in the hours after Colonel Qaddafi’s death was announced, as did terms related to positive sentiment like “good” and “wonderful.” Religious sentiment was also on display, with a significant increase in the frequency of words like “Allah,” “sacrifice” and “gospel.”
In this burgeoning field of Twitterology, moods are also being gauged on a more global level. Two sociologists at Cornell University, Scott A. Golder and Michael W. Macy, recently published a study in the journal Science that looked at how emotions may relate to the rhythms of daily life, across many English-speaking countries. They observed a gradual falloff in positive terms from the beginning of the workday, bottoming out in the late afternoon.
One criticism of “sentiment analysis,” as such research is known, is that it takes a naïve view of emotional states, assuming that personal moods can simply be divined from word selection. This might seem particularly perilous on a medium like Twitter, where sarcasm and other playful uses of language often subvert the surface meaning.
James W. Pennebaker, a social psychologist at the University of Texas who pioneered the text-analysis program often used in this kind of research, warns that positive and negative emotion words are the “low-hanging fruit” in such studies, and that deeper linguistic analysis should be explored to provide a “richer, more nuanced view” of how people present themselves to the world.
But even if we can’t expect Twitter to be an unerring emotional barometer, it is proving extremely valuable for understanding how language varies among different demographic groups. A team of computational linguists at Carnegie Mellon University led by Jacob Eisenstein and Brendan O’Connor has used geocoded tweets to build maps of regional language use across the United States. The amount of data available for analysis is many orders of magnitude bigger than what could be collected with traditional dialect surveys.
From these mountains of data can be gleaned hidden patterns of informal English, like the profusion of hella as a form of emphasis in Northern California, as in, “It’s hella cold out there.” Slangy phonetic spellings also show distinct patterns of distribution, with New Yorkers preferring suttin to sumthin (forsomething) and Californians writing koo orcoo for cool. Even emoticons differ from region to region.
This study attracted negative attention this year from Senator Tom Coburn of Oklahoma, who listed it as one of the “questionable” projects financed by the National Science Foundation in a report challenging the foundation’s budget for the social sciences. But the research was vigorously defended by Randal E. Bryant, dean of Carnegie Mellon’s School of Computer Science, who pointed to its real-world applications. “The key finding was that seemingly meaningless slang and jargon can reveal important properties of the author’s identity, a point of interest for both corporations and the intelligence community,” Mr. Bryant said.
Still, the Twitterologists will continue to have a tough row to hoe in justifying their research to those who think that Twitter is a trivial form of communication. No less a figure than Noam Chomsky has taken Twitter to taskrecently for its “superficiality.”
“It is not a medium of a serious interchange,” Mr. Chomsky said, a blanket charge that ignores the diversity of voices to be found on Twitter. Regardless of how unserious Twitter exchanges may appear on the surface, many of Mr. Chomsky’s fellow linguists are discovering that Twitter can help uncover truths about our social interactions that are quite serious indeed.
Ben Zimmer, a former On Language columnist for The New York Times Magazine, is the executive producer of VisualThesaurus.com and Vocabulary.com.
A version of this op-ed appeared in print on October 30, 2011, on page SR9 of the New York edition with the headline: Twitterology: A New Science?.