Final Reflection

Initially my project was meant to examine governmental control over the internet by looking at the January revolution in Egypt. On Twitter major events are generally tagged with what is called a hash tag. This means a keyword is preceded by the hash character (#). This groups tweets together by theme and allows for searches regarding that event or meme. For the revolution in egypt the most common hash tag was #jan25. My plan, then, was to gather tweets using that as my keyword.

I quickly ran into two problems. First, Twitter only gives access to a limited history of tweets. This meant that by the time I started my data collection in April the tweets that were sent out live from the event were no longer accessible to me. Secondly many of the tweets were in Arabic, which I do not read and which would not be easy for me to work with programmatically. So, my thought was to do this as a proof of concept and gather current tweets in English which were still dealing with the ongoing changes in Egypt.

My first plan to go about gathering tweets and creating a visualization was to use the Python programming language. I had had it recommended to me in the past as a good learning language and had previously begun learning it. However, when I met with Don Craig from the DXARTS program he recommended Processing as an easier, quicker way to go about this. Since it was another language I had heard was relatively simple to learn I looked into it. I quickly changed my mind about Python and switched to Processing as my language.

However, when I first looked into accessing the Twitter API it appeared as though I could just use a web browser to gather data. I spent one lab session doing so, but it was awkward and time consuming. So, I went online and searched for ways to gather tweets directly from Processing. I found a program by a woman who goes by Robotgrrl online called Simple Processing Twitter. It is an implementation of the Twitter4J Java library for Processing and is quite simple and powerful. I was able to use her code to write a program that gathered tweets from Twitter as they were being published.

Later in the term the perfect event presented itself on which to try my program. It was announced on May 2nd that Osama bin Laden had been killed in Pakistan. This was a major event all over the world so Twitter was very active. I modified my program to gather 100 tweets with the keywords “bin laden” every half second. I ran it until Twitter timed me out and was able to gather roughly 73,000 tweets. Although in the scheme of things this is a relatively small sample, it is more than enough for the proof of concept I’m working on now.

After gathering the data I needed to write software to sort it and generate 3D coordinates. At first I couldn’t figure out the best way to do this. I started chatting with a programmer friend of mine, Greg Williams, and he suggested making each tweet into an object. This suggestion was what allowed me to do what I wanted to do. I created an object that stored the username, date, and content of the tweet separately, then also had space to store the coordinates. This allowed me to sort the tweets, analyze the keywords, and store the coordinates, all in one place.

As written now my program sorts the tweets alphabetically by username to generate the x coordinate, then by time to generate the y coordinate. The z coordinate is created by analyzing the tweets for common keywords from throughout the sample. I used the top 15 keywords and assigned them numerical values. Then, each time a keyword appears in the tweet, a variable called weight is increased by the associated value. Weight then becomes the z coordinate.

The downside to this method of creating the z coordinate is that many tweets do not contain any of the top 15 keywords. This means that out of my 100 test tweets almost all are assigned 0 as the z coordinate. I expect this to change somewhat when all 73,000 are included, but I may need to either change how the coordinate is assigned or increase the number of keywords used. Somewhere between 50 and 100 may make the visualization more interesting. When getting into that large of a list, though, I’ll probably need to create a function to load them from a file rather than hard code them into the software. This will also make the program more flexible when doing other events or memes.

The major hurdle I faced in doing this project was learning to program in Processing. It has been six or seven years since I’ve done any programming (beyond making other people’s javascript work in my webpages), so refreshing myself on the concepts took a bit. I had also never done any object oriented programming, so learning about objects and how to write them was a fun challenge. Like with any new skill the more I did the better I got. I feel now that I have a solid foundation on which to improve my skills and I fully intend to do so in the future. As Processing is based on the Java programming language, moving to the more robust and powerful Java should be relatively easy. I’m excited for this new challenge.

Currently the visualization I’ve created only has 100 points. Also, because it is just points I’m not sure I can export it as a DXF file as I had planned. It looks from the Processing documentation that I need a triangle-based image to do that. I plan to experiment anyway because I need some method to translate these points into a file format that can be read by 3D imaging software. This is the immediate next step in my project.

After figuring out how to export the file, I’ll move it into Blender, the open source 3D imaging software I have. I should be able to create a solid piece based on the points. Once that is complete I’ll be able to send it out for printing and get back a physical artifact that represents this event as seen on Twitter.

Ultimately I’d like to do this for multiple events and even memes on Twitter. I’d also like to start experimenting with ways to color code the pieces so they are not just plain plastic.

I’d also like to rewrite the software so it groups tweets by network rather than alphabetically by username. So, if one user retweets another, their tweets will be near each other. I think this will result in much more interesting points than simply sorting alphabetically. I thank my friend Dave Proctor for this suggestion.

JavaTM 2 Platform, Standard Edition, v 1.4.2
API Specification
http://download.oracle.com/javase/1.4.2/docs/api/overview-summary.html

“Processing Forum”, accessed May 25th, 2011, http://forum.processing.org
This forum is open to anyone using Processing and is a great resource for finding answers to programming problems not covered in the documentation. If it hasn’t been done already, one can post one’s problem and get input from programmers from around the world.

“Java Language Specification 2nd Edition”, accessed May 25th, 2011, http://java.sun.com/docs/books/jls/second_edition/html/jTOC.doc.html
This is a resource covering the nuts and bolts of programming in Java. Very useful for advanced users of Processing wanting to go beyond the documentation on processing.org.

“Object Oriented Programming Tutorial”, accessed May 25th, 2011, http://processing.org/learning/objects/
This outlined the basics of what an object is, how it can be used, and how to create one. It was essential to writing my own object.

“Java Notes”, accessed May 25th, 2011, http://leepoint.net/notes-java/index.html
This is a tutorial for the Java programming language. Useful for basic to advanced concepts.

“Java 2s”, accessed May 25th, 2011, http://www.java2s.com/
This is a tutorial website for many programming languages. It includes a great Java tutorial as well as the documentation for the language.

“Twitter4j”, accessed May 25th, 2011, http://twitter4j.org/en/index.html
Twitter4j is the Java library that allows access to the Twitter API. It is what Robotgrrl’s Simple Processing Twitter is based on.

“Simple Processing Twitter”, accessed May 25th, 2011, http://robotgrrl.com/blog/2011/02/21/simple-processing-twitter/
This code allows direct access to the Twitter API from within a Processing program.

Shiffman, Daniel. 2008. Learning Processing: a beginner’s guide to programming images, animation, and interaction. Amsterdam: Morgan Kaufmann/Elsevier.
This is a step by step guide to learning Processing. It starts with very basic concepts like drawing shapes and lines, then moves on to more advanced aspects of the language.

Posted in Uncategorized | Leave a comment

Tweets sorted and coordinates written to file

Somehow the post I wrote Thursday didn’t make it on here. Thursday morning I finished writing the code to sort all the tweets by username and date/time and write the resulting coordinates to a file. Here’s the final code:

Tweet tweet;
int l;
String[] lines;
int index = 0;
ArrayList tweets = new ArrayList();
String[] cordArray = new String[100];
int xcord = 0;
int ycord = 0;
int zcord = 1;
String xcordString;
String ycordString;
String zcordString;
String xyzString;

void setup(){
lines = loadStrings(“tweets0.txt”);
}

void draw(){
//Load the tweets into tweet objects withing the Tweets ArrayList
for(int index = 0; index < lines.length; index++){ String[] pieces = split(lines[index], '~'); if(pieces.length == 3){ tweet = new Tweet(index, 0, 0, 1, pieces[0], pieces[1], pieces[2]); tweets.add(tweet); } } /*println(tweets.size()); for(int i = 0; i < tweets.size(); i++){ println(tweets.get(i).username); }*/ //Sort the tweets by username Collections.sort(tweets, new UserComparator()); for(int l = 0; l < tweets.size(); l++){ println(tweets.get(l).username); } //Write the xcord to each tweet for(int x = 0; x < tweets.size(); x++){ tweets.get(x).xcord = x * 100 + 1; } //Sort by tweets by date Collections.sort(tweets, new DateComparator()); for(int l = 0; l < tweets.size(); l++){ println(tweets.get(l).date); } //Write the ycord to each tweet for(int y = 0; y < tweets.size(); y++){ tweets.get(y).ycord = y * 100+ 1; } /*Write zcord to each tweet for(int z = 0; z < tweets.size(); z++){ tweets.get(z).zcord = z + 1; }*/ //Write x y and z cords to cords array as strings for(int cords = 0; cords < tweets.size(); cords++){ xcord = tweets.get(cords).xcord; xcordString = Integer.toString(xcord); ycord = tweets.get(cords).ycord; ycordString = Integer.toString(ycord); zcord = tweets.get(cords).zcord; zcordString = Integer.toString(zcord); xyzString = xcordString + "," + ycordString + "," + zcordString + ",20"; cordArray[cords] = xyzString; } //Write the coordinates to a text file saveStrings("tweets.csv", cordArray); noLoop(); } class Tweet { int id; int xcord; int ycord; int zcord; String username; String date; String content; Tweet(int tempid, int tempxcord, int tempycord, int tempzcord, String tempuser, String tempdate, String tempcontent) { id = tempid; xcord = tempxcord; ycord = tempycord; zcord = tempzcord; username = tempuser; date = tempdate; content = tempcontent; } String getUsr(){ return username; } String getDate() { return date; } String getCont(){ return content; } } class UserComparator implements Comparator { int compare(Object o1, Object o2) { String user1 = ((Tweet) o1).getUsr(); String user2 = ((Tweet) o2).getUsr(); return user1.compareTo(user2); } } class DateComparator implements Comparator { int compare(Object o1, Object o2) { String date1 = ((Tweet) o1).getDate(); String date2 = ((Tweet) o2).getDate(); return date1.compareTo(date2); } }

Posted in Uncategorized | Leave a comment

2D image generated

I have successfully generated a 2D image of 100 of my tweets sorted based on username and date! I used the same visualization tool used to create the video for the band Radiohead’s song House of Cards.

Posted in Uncategorized | Leave a comment

Tweets stored in objects

I was struggling with how to break the tweets apart and sort them in order to assign coordinates in 3D space. I ran the problem by my friend Greg, a professional programmer, and he recommended creating a class so that each tweet could be stored in an object. So, I took his advice and created a Tweet object in Processing with properties for username, date, and content. I was able to store each tweet as an object within an array. Next I had to write a Comparator class to use the Collections.sort() method. Now to write the comparator for the date and figure out how I’m going to analyze the content for keywords.

Posted in Uncategorized | Leave a comment

Preliminary results of keyword analysis

I’ve begun using AntConc to analyze the tweets I gathered last weekend. The top words are “bin” and “laden” (predictably), however “bin” is listed only 73,065 times and laden 72,577 times. This indicates to me that either there is something odd going on with the keyword analysis or the search function is not an AND search but an OR search. The next most common string is “rt”, which indicates there are 48,739 retweets in the dataset. “Osama” is the third most common word at 37,014. There were at least 28,838 links posted in the dataset (as indicated by the occurrences of http) and the most common link shortener was bit.ly, with 6502 occurrences. However, as http only occurs concurrently with the string bit 6500 times it is likely that many links were posted without the requisite http before them.

On limitation of AntConc is that it excludes numbers and all punctuation. This means that all the tweets with “May 1st” make “st” show up as a word. It also means that don’t becomes “don” and “t”. I will have to look at documentation and see if there is a way around this.

Posted in Uncategorized | Leave a comment

Mid Term Report

This project has changed significantly over the last five weeks. Initially I grew interested in analyzing Twitter data because of issues of censorship in many countries around the world. I thought looking at the January revolution in Egypt would be an interesting way to do so, especially since internet access was cut off at one point during the event. By gathering tweets from that period I thought a good view of what was going on and how it changed as the government tried to censor the citizenry I could examine issues of censorship in Egypt.

As the term progressed I realized that I was too far out from the event to gather tweets from that period easily. The Twitter API only allows access so far back in the public time line. So, I started by gathering more current tweets about Egypt. These are currently available on the data page of this site. Many of these tweets turned out to be in Arabic, which makes working with them in Processing difficult as it does not seem to support Arabic text by default.

Last Sunday my project took a very distinct turn. Osama Bin Laden was killed and Twitter exploded talking about it. I had already been working with Processing to gather tweets as an alternative to accessing the API via a web browser (as I did initially). I modified Robotgrrl’s Simple Twitter for Processing program to gather more tweets, limit them to English, and then write the username, date, and tweet to a text file. So, I was part way to gathering mass data from Twitter programmatically.

I did not think to start my program running during President Obama’s address to the nation. However, immediately after I set to work gathering data. I rewrote my program to access the Twitter time line a total of 1000 times with a half second pause between each call. This was done with a simple for loop. I was able to gather a total of 73,200 tweets. Unfortunately Twitter timed out my access before the full 1000 cycles of data collection so I was unable to obtain the full 100,000. However, I now have plenty of data to analyze. Having this current event to analyze makes this project much more timely, even if it is a divergence from the original plan.

My next step in this process is to begin keyword analysis of the event. After my presentation on Tuesday Stacy suggested some tools for conducting keyword analysis that will not require me to write a program from scratch in Processing. I will be looking at AndConc tomorrow and exploring its features. Hopefully it will output in a format that Processing is able to work with, perhaps as an array or hashmap.

After doing keyword analysis I want to work on a way to turn the data generated into a three dimensional map of the event. Ultimately I would like to develop a means to create the 3D image and print it in 3D. Having a physical representation of events in the world based on the immediate reaction of thousands of people around the world could be a powerful way to examine events. This has turned into as much of an art project as a research project and ultimately I would like to create enough of these to be part of a gallery showing.

Posted in Uncategorized | 1 Comment

New data gathered

With the announcement of Osama Bin Laden’s death I had a great opportunity to gather data about an event in real time. I modified my program to capture 100 tweets every half second and came up with 73,200 tweets to work with. They are stored in structured text files so I can easily break them into components later. It’ll be interesting to see what I can do with them.

Posted in Uncategorized | Leave a comment

Problems with Arabic

I tried gathering tweets with the hash tag #jan25 using my write to file sketch. Unfortunately Processing does not deal with Arabic well and they came out as garbage characters. Another hurdle to jump.

Posted in Uncategorized | Leave a comment

Success writing to file

I have successfully written a search using Robotgrrl’s Simple Processing Twitter to a text file. I can’t post the Processing sketch online because it contains the keys to use my personal Twitter account with it, but I will post the text file and an explanation of how I did it. Check out the Processing Sketches page of the website to see.

Posted in Uncategorized | Leave a comment

PHP

Our class sessions with Arthur Lee have inspired me to rewrite part of my website in PHP. I’ll be reworking the background page to become a regularly updated list of articles related to Egypt scraped from major news sites (probably NPR and the New York Times).

Posted in Uncategorized | Leave a comment