Background

For Russia, the last few decades have seen growing freedoms, wider availability of technology and a period of great change. This all coincided with and contributed to a growing internet.

Through all of this, Russian youth took to the internet – Russian language sites collectively coming to be known as “RuNet” – and quickly started making their presence known. After experiencing a large increase in exposure in the early 2000’s, Russian internet use has grown rapidly.

Today social networking dominates Russian internet usage just as it does in America. Here, Facebook; there, V Kontakte (“In Contact”). In 2009, RuNet users’ average usage was far above the world average, social networking occupying 59% of this usage. During these years, Twitter usage, however, was sparse. Twitter didn’t make Russian a Russian language interface available until April of 2011. Of course, this didn’t stop Russian users from making use of Twitter – our corpus contains tweets from as early as 2008.

Even before social networking came to such a prominent role in RuNet, the blogosphere became an important part of life for so many users. “Where traditional media may lag in reportage or analysis of important events, LiveJournal, or blogging more generally, has come to serve a crucial social function in analysing, disseminating and influencing offline events.” For Russian users, this free and open self-publishing harkens back to the days of state censorship and clandestine distribution of banned texts. Nowadays, blog usage has less politically-charged and dramatic meaning to the average user, but it nevertheless occupies a place in the lives of RuNet users that it fails to capture for American users.

Livejournal, the most popular blogging site on RuNet, was created by an American teen in 1999. In 2007, it was sold to a Moscow-based media company. Currently, Russian users generate 58.7% of pageviews. American users, only 9.7%. In America, a more traditional (that is, “text-centric”) blog site fails to break the top 20 most popular sites (Wordpress.com comes in at 26). In Russia, Livejournal is the 10th most popular.

With the use of text-based blogs (where posts are often written in a style fitting magazine contributions), a study of language use – and how it differs between blogs and a character-limited social networking site such as Twitter – will produce far more interesting results, arguably, than among any other global user base.

Our Project

Our project was born with the hypothesis that a distinct stylistic difference would be found between the use of “non-standard language” (slang, “chat-speak”, swearing, etc.) on blog sites and Twitter. Blog sites, we believed, would contain fewer instances of such language than Twitter, where the writing style is more informal and further restricted by a 140-character limit.

This hypothesis seemed to be almost common sense. This is a conclusion that most would come to after glancing at blog posts against tweets. Blog posts are simply written in a long(er)-form, more professional style than tweets. Tweets, naturally, will contain a greater usage of this non-standard language. But to trust an impression and base a conclusion off of human intuition has often proven itself to be a poor method of analysis. So our project sought to prove this with more certainty.

In order to do this, we obtained a corpus of Russian internet language to work with; our process for obtaining this and managing it can be found here