When we began looking at the Russian Internet, we were particularly interested in exploring the differences between the use of Russian on Twitter and traditional blogging platforms. As discussed on our about page and corpus page, the majority of Russian blogging occurs on the site LiveJournal, but all of our blog posts came from the sites popular in America: Wordpress and Blogger. Because of this, we were doubtful of the worthiness of the blog posts in our corpus as being representative of Russian blogging. This wariness was validated when we explored further and realized that the majority of the blog posts were miscategorized, and, furthermore, all the blog posts containing non-standard language were miscategorized.

That the blog posts on these fringe blogging sites did not use non-standard language in a regular or copious manner does support our hypothesis that blogs are treated as a more standard and traditional form of communication than microblogging platforms like twitter. However, a wider and deeper analysis of Russian-language blog posts from the more popular LiveJournal site would be required to substantially support this hypothesis. From the following table, showing the most common non-standard words used in the Russian blog posts and on Twitter, it is easy to see the preliminary difference in register that we found.

TwitterBlogs
WordGloss WordGloss
твиттереTwitter блогеblog
щасnow гороскопhoroscope
пиздецswear цветочкиflower, crafts
о_оemoticon вышиватьembroider
твитtweet фоткиphotos
вконтакте"v ktontakte" - in contact конфетку"sweet"
бля / блятьswear обожаюI adore
обожаюI adore вязатьknit

After exploring the differences between the blog posts in our corpus and the tweets, we decided to turn our attention to Twitter itself.

Over the time period our data covered, Twitter’s Russian userbase grew rapidly, and the tweet count spiked as a result. Despite accounting for those (by calculating our data using percentages rather than raw counts of word usages), we saw usage of slang and swearing rise steeply and steadily over this time.

What this means is not exactly clear. However, our hypothesis is that as Twitter’s user count increases and users begin interacting more (shown through the sharp rise in usage of retweets and hashtags), the language of the internet and the language of everyday life blend more and more.

Words we marked as “internet language” rise in usage over this time, but not nearly to the extent that slang and swear words do. This lends itself as evidence to this theory, and also supports another: that as Twitter became more popular, it reached a wider audience. This new audience, in contrast to RuNet’s “early adopters,” employed “common slang” and swearing, as opposed to internet slang.

These trends clearly deserve more exploration so that their causes and significance can be examined fully. To this end, we have created an interactive graph that can be used as a tool to explore the use of different words in the twitter corpus; it can be found on our graph page. We have also provided the corpus itself for study and manipulation, marked up in XML so that it is more wieldy and useful than it was in its original raw form. That file can be found here.