RuNet is a project for Computational Methods in the Humanities, a course at the University of Pittsburgh taught by Dr. Birnbaum. Currently worked on by Jay Boehmer and Joe Petrich, it is a project to discover how Russians use language on the internet. We are doing this by comparing two corpora: one of blog posts, and one of tweets, looking at how they differ in style, vocabulary, and register. In order to mitigate factors besides domain, we are controlling for several factors including date and topic.

Recently, we have have compiled a list of non-standard words that occur in blogs and tweets; we have extracted those tweets which contain these words and have marked them up in XML for Retweets, hashtags, swear words, and date. This marked-up XML file can be found here.