18147 tweets about the Toronto Mayoral election
Toronto Mayoral Election: @RobFordTeam vs @G_Smitherman vs @JPantalone
- Tweets with @RobFordTeam are in red.
- Tweets with @G_Smitherman are in purple.
- Tweets with @JPantalone are in teal
- Tweets mentioning two or more are multicoloured.
- Tweets mentioning none of them are transparent.
Tweets were gathered from searching via the api for:
#voteTO OR #RobFord OR #Smitherman OR #Pantalone OR “Rob Ford” OR “George Smitherman” OR “Joe Pantalone” OR @g_smitherman OR @robfordteam OR @jpantalone OR from:robfordteam OR from:g_smitherman OR from:jpantalone OR to:robfordteam OR to:g_smitherman OR to:jpantalone since:2010-10-24 until:2010-10-25
First version. Just highlighted Ford and Smitherman in red and blue.
Toronto Mayoral Election: Rob Ford vs George Smitherman
- Tweets with @RobFordTeam are in red.
- Tweets with @G_Smitherman are in blue.
- Tweets mentioning both are in black.
Twitter API: to cache or not to cache? When, why and how?
How do twitter-based apps access tweets going back several months? Do they keep data in dbs or do they re-search for tweets each time a visitor makes a query?
I’m asking because most of the playing around I’ve done with the twitter api has focused on the query functionatlity. This is a great way to pull in data but it is slow and I believe it would be much quicker if I had that data stored in a csv or db format. So the question I’m asking myself is whether I should be setting up a cron job that bakes out a csv or writes to a db once per day for the specified query strings. That would allow me to quickly pull in old tweets going back as far as I’d like. But this would mean that I would still need to write another set of methods to handle incoming queries not already in the daily cron list.
In looking at the Design Patterns section of the Twitter API FAQ, I came across this suggestion for caching:
“We recommend that you cache API responses in your application or on your site if you expect high-volume usage. For example, don’t try to call the Twitter API on every page load of your hugely popular website. Instead, call our API once a minute and save the response on your end, displaying your cached version on your site.”
Maybe this is the answer, but at what point does the traffic and response time warrant this approach? Lots of questions!
Edit:
Links