BarcelonaBarcelonaBarcelonaBarcelonaBarcelonaBarcelonaBarcelonaBarcelonaBarcelonaBarcelonaBarcelonaBarcelonaBarcelonaBarcelonaBarcelonaBarcelonaBarcelonaBarcelonaBarcelonaBarcelona

Uncovering affinity of artists to multiple genres from social behaviour data

Posted on August 14, 2008

Claudio Baccigalupo and I have a paper at ISMIR entitled Uncovering affinity of artists to multiple genres from social behaviour data.  The paper details a project we worked on for the past year or so involving popular music listening activity from a pool of MusicStrands (MyStrands) users.

We provide not only the paper, but also the dataset and the code used in our analysis.  All of this is available at the website we have set up for the project.

The main contribution of the project is an analysis and illustration of genres as “fuzzy sets” rather than boolean labels.  Through a co-occurence analysis of hundreds of thousands of user playlists, a frequency based “affinity” metric is formed between artists and genres.  This affinity metric is a more detailed expression of the style of a given artist’s music.  The idea and awareness of predominant genres are a trivial part of any person’s understanding of the vast corpus of popular music.  However, genres typically are used as boolean categorical labels.  I.e. an artist is understood to be associated with only one given genre.

By expressing a connection to multiple genres through our affinity metric, a more detailed picture of the artist emerges.  We give a lot more examples in the website, so be sure to check it out.

RJDBC to the Rescue

Posted on August 4, 2008

I have been using the statistical language R for some time now to do some of my intensive off-line calculations.  The great thing(s) about R is that it’s fast, well documented, extensive, and open source.

However, I regularly run into problems configuring and connecting to Oracle databases, mainly because the API’s are extremely obtuse, the code for these systems are closed, and licences are extremely expensive.  Even the ODBC drivers (a universal database connectivity layer) are pretty difficult to find (at least for my OSX platform).  The whole process for getting everything working is usually pretty convoluted.

However, the RJDBC package for R simplifies everything by using a Java database connector.  Rather than setting up ODBC profiles, downloading drivers, etc.  Everything can be handled in R by including the package, setting pointers to the JDBC class package (Oracle class package available for the mac here), and formatting a single connection string, like so:

library(RJDBC)

# point the R package at the appropriate java driver class, as well as the class file
drv<-JDBC(”oracle.jdbc.driver.OracleDriver”, “/location/of/ojdbc14.jar”)
# the “easy” connection string spec for oracle db’s
conn<-dbConnect(drv, “jdbc:oracle:thin:@<host>:<port default=1521>:<SID>”,”username”,”password”)

# etc…
query = “SELECT * FROM A TABLE”
dbGetQuery(conn, query)

And that’s it!  Maybe this will save someone else some time and headaches. Arguably, the use of an additional Java layer may negatively affect performance in some cases, but in my case it is not noticeable.

Literate Versus Spatial Search

Posted on July 21, 2008

One of the major problems facing my research in interactive search visualization is the difference in search strategies for list based layouts versus other approaches. List based approaches dominate current methods of searching the Web for information. Anyone who’s used Google to find something online is familiar with the standard “10 relevant results on a page, in a list, with title and ’snippet’ information”.

However, there’s an interesting tension between strategies involved with presenting information to users in this fashion. There is the TF/IDF and “PageRank” strategies that Google uses to filter and order (resp.) the relevant pages for each query. These are essentially two different information “spaces” that are used to relate documents to a query. TF/IDF essentially helps to match a set of documents that are “semantically close” to a query, while PageRank helps to identify pages that are close to the “center” of a network of linked pages (Hypothetically, these “central” pages have higher quality information).

In a sense, PageRank is a sort of filter or lens that you can’t turn off. You can’t really tell how much PageRank is affecting the order of items presented, and you also can’t tell if Google has “enhanced” a query by tailoring its results to a more “useful” set of documents.

In fact, most people probably don’t even spend much time following the results exactly as they are arranged. Instead, they quickly scan the list of titles and summaries looking for more keywords that suggest the kind of content they are interested in. In this sense, Google doesn’t even need to provide a perfectly valid semantic or document-link space as long as the user gets something useful in the top 10 results. This method of search is really more about reading, and is therefore a literate form of search.

However, this isn’t typically the way that we find things in the real world. Generally, we do so by using our senses to size up our environment. Is there a good smell in the air? Is there a group of people moving a certain direction? Maybe there’s something interesting that direction? All of these strategies involve notions of proximity/approximation, social awareness, and direction (resp.).

The strategies involve not knowing explicitly what you want or where you want to go, but in manipulating your environment gradually until you identify a source or local optimization of the resource in the surroundings. In order to maximize your efficiency in this environment, you want to have control over how you can move through these spaces.

In the Google example, this may mean going “against” the PageRank position, or only operating in a certain “radius” from the center of the network of document links. Perhaps you don’t want to find something “too popular”, or are looking for something new. Maybe you want to find other complimentary topics or content for the document space you’re already familiar with?

Many of these strategies involve more control over the information spaces than the search portals provide. In addition, they also involve the use of human generated meta-data to a greater extent. Sometimes you want to follow the herd, but sometimes it makes sense not to.

I’ve been interested in how to handle this scenario with a recent paper on “Visualizing Social Links in Exploratory Search” (here’s my author’s copy.).

One of the problems I had was how to overcome the familiarity and reliance on literate search strategies in a spatial interface. In general, if I use the two dimensions of an applet to display document candidates, it often becomes impossible to layout the “snippets” of text necessary to describe the documents in greater detail. We handled this situation by essentially removing these snippets from both the simple “list” interface and the “map”-like interface. The interfaces using maps fared slightly better than the list in this study, but none of them were particularly well liked due to the absence of snippet text.

So… the question is, how do we overcome or accommodate literate search expectations in a spatial environment? Tag clouds? Force directed snippet placement? Abbreviations? It seems like there should be a way to do it.

The Internet is Watching You

Posted on July 10, 2008

This is one of those issues that I think is very scary for American citizens, but that (still) isn’t making much press.

Essentially, this means that we are moving more towards an Orwellian “Big Brother” state, all in the name of protection from terrorism.

Originally, the government was only supposed to be monitoring international phone calls.  However, it was in fact monitoring every communication, both phone and internet, and both domestic and international.

Keep in mind that nearly anybody can be targetted directly through this process.  The “seven degrees of Kevin Bacon” effect is pretty much true for our society as well, although probably at a slightly higher degree.

There’s a surprisingly small chain of acquaintances you need to follow to link yourself to nearly everyone else in the country or planet.  If one of those individuals is targeted for investigation, you will most likely have all of your information pulled and filtered as well. Keep in mind the government has already lied to us once about the extent and nature of the information it looks at.

Some people may not have a problem with this, but keep in mind that the guy in the ABC documentary was able to walk in, without security clearance, and gather information on this illegal setup in the ATT building.  Others may be able to do the same, and unscrupulous employees with access to the actual machines may be able to as well.  So, even if you say you’re not doing “anything wrong”, you’re still at risk from a breech of information.

Others might say “I’m not important enough for criminals to target”… but keep in mind that other people are.  Politicians, judges, CEO’s, etc.  It’s not a question of if, it’s a question of when this power will be abused.  Information about or against certain people is worth much more than the lifelong salary of an NSA or security guard.  The blackmail or abuse of information for such people will have a far reaching negative impact for all others.

Furthermore, the government has just recently enacted a FISA amendment which effectively eliminates a good deal of the legal responsibility of phone companies. Here’s the actual bill, and here’s a wikipedia entry describing it.

Here’s the relevant section (802A):

[A] civil action may not lie or be maintained in a Federal or State court against any person for providing assistance to an element of the intelligence community, and shall be properly dismissed, if the Attorney General certifies to the district court of the United States in which such action is pending that . . . (4) the assistance alleged to have been provided . . . was –

(A) in connection with intelligence activity involving communications that was (i) authorized by the President during the period beginning on September 11, 2001, and ending on January 17, 2007 and (ii) designed to prevent or detect a terrorist attack, or activities in preparation of a terrorist attack, against the United States” and(B) the subject of a written request or directive . . . indicating that the activity was (i) authorized by the President; and (ii) determined to be lawful.

This means, even if the phone companies do something illegal like monitor internet traffic for the government, or something even more illegal like leak the information to others (accidentally or otherwise), the Attorney General can give them a get out of jail free card, saying that they were helping with national defense. Then, they cannot be held accountable by state judicial systems (which most individuals would work through to settle claims).  Furthermore, the government’s ability to (legally) monitor and retain information has been greatly increased  (note that these limitations have not mattered to them before).

Department of Defense officials play up the role of needing these powers to catch terrorists and sexual predators and deviants.  However, keep in mind that they did not need these powers to stop 9/11, they could’ve done so with competent (legal) intelligence gathering and sharing.

The worst part is, neither one of the two presidential candidates voted against the recent FISA bill. Obama voted for it, McCain didn’t bother showing up.

It’s very, very, sad to see it happen in my generation.  I had thought of the Internet and Web as one of the greatest achievements of mankind. Now it more of an impending tool of oppression.

Candidate Mapping at IV08

Posted on July 9, 2008

candidatemap2008

I’m here in London for the IV08 Information Visualisation conference where I’m presenting a paper titled “Candidate Mapping: Finding Your Place Amongst the Candidates.” This is the product of an independent study I did with Katy Börner. You can grab my author’s copy here, and can see my presentation slides here.

The main thing that people had issue with in my talk was the use of metric analyses with (initially) non-metric data. The argument against using non-metric analysis is that in many cases you can’t compare “apples to oranges.” In my case, I had to define a distance metric between candidates based on the similarity of several issue stances. These issue stances could be simple support/opposes positions, or something “in between” these positions. For instance, some candidates supported same sex marriage, but indicated that they wanted the states to decide for themselves. This is a “weaker” position since it doesn’t indicate the strongest possible endorsement.

Read more

HaXe and the Future of Web Development

Posted on June 27, 2008


Even though I’m focusing more on several theoretical issues these days, I’ve also been keenly interested in web development. When I say Web development, I mean just about everything involved with creating, accessing, and analyzing information using the WWW.

Development on the web takes place on two main fronts:

  1. The server side backend: This aspect deals with managing raw information and processing it into information that client side systems can use. Coding involves using declarative languages such as SQL to describe relationships between pieces of information, and object oriented languages like Java to define and enforce structural aspects of pieces of information.
  2. The client side frontend: This aspect deals with managing the information from the server using code downloaded from the server running on applications supported by the client. The code can contain a document object (such as html), or methods of manipulating html content on the client side (using javascript), or it can include a self-contained virtual machine application that can run inside a page (using SWF/java).

While the Web is completely open for a wide variety of formats, It is imperative for most producers of content to understand the ecosystem of common languages, protocols, and obstacles currently in practice. Web developers can spend years coming to terms with learning the wide variety of languages necessary to create more complex websites, and then spend more years learning how to make their site behave the same on different browsers, and then spending even more years learning how to make their website “gracefully degrade” with the limited feature sets of primitive browsers.

Read more

Relaunching the Blog

Posted on June 24, 2008

It’s been over a year since I posted anything to this blog. This has been due to a variety of personal and professional reasons, but I think I need to get back on it and start again!

This next time around, I’m going to focus more on research/development issues, and I have quite a few of them. In general, the blog will now concern itself with:

  1. Acoustic and social music analysis
  2. Cognitive Science and Computational Neuroscience
  3. Visualization and dimensionality reduction
  4. Web development trends
  5. Interaction methods for any of the above

That should be about it. It’s hard to get things going again, I don’t know where to begin.

Post-chi wrap up

Posted on May 7, 2007

I recently made it back from the Chi Conference in sunny San Jose California. Even though my paper wasn’t accepted, I did get accepted as a student volunteer, and decided to go for some networking and to get out of Bloomington for a while.

I met some interesting folks from Attenex, and the Visual Communication Lab at IBM (many-eyes) that seemed to be in line with some of the stuff I’ve been up to recently.

The conference itself was pretty good, although I didn’t have as many “WOW” moments as I’ve had at previous Chi’s. Maybe folks are just saving up for next year (in Florence, Italy), or maybe I’m just getting old and jaded. The closing plenary, in particular, stood out as being pretty bad. It wouldn’t have been passable as an undergraduate seminar discussion, let alone the “coup de grâce” of an academic/professional research conference. I honestly hope that I’m missing something, because it was just crazy. Note to self: In the future, do NOT run a slideshow for more than 15 minutes!

However, there were some high points. Richie, myself, Paul, and Anelie (sp?) from Open University in Englanda invented our own drinking game using magnetic rocks, and I got to finally ride one of those ballyhooed segways that all the tech pundits love to poke fun at.

All in all, there’s too much to mention here, and it’s a welcome distraction from the all the tedium of my program and my other recent airline misfortune. It was also good seeing my old friends in their new “recent alum” status. I’m glad you talked your bosses into ponying up some cash for the trip!

Filed Under HCI | Leave a Comment

A true solution to sustainability problems

Posted on May 1, 2007

I can’t help but think that Cookie Monster has the answer to all of our problems on sustainability.

Filed Under Informatics | 2 Comments

More Airline Fun

Posted on April 28, 2007

Well… I made it to San Jose for the Chi Conference. However, there were more shenanigans en route. The flight out of Indianapolis was late AGAIN (no weather excuses this time… just 45 minutes late for the hell of it).

When Richie and I arrived in O’Hare, we had missed the connecting flight…. so, we said “hello” to a nice fat 3 hour layover and overpriced terminal food and drink. We ended up not getting into San Jose until well past midnight (3AM Bloomington Time). Luckily, our baggage made it through intact. We get to the hotel room, and meet up with our third room mate. Unfortunately, since we didn’t get into the hotel until late, there was only one guy working behind the counter at the Ramada. He said there was no way he could get an extra cot for us that evening… which was pretty pathetic, considering that’s a standard request at a hotel. The best he could do was provide a single extra pillow. We were trying to decide who got the bed, and who got the floor, when luckily the manager’s better sense caught up with him and he decided to give us a cot (thanks “bro”).

I’m pretty jet lagged, and am trying to just get through the day. However, it’s been fun hanging out and trying to stay caffeinated to hold conversations. I’m sure people think I’m on drugs or something.

Oh, and by the way, I flew on American Airlines this time, not United. I’m beginning to think I’m just cursed with air travel.

© Copyright Self Conscious White Noise • Powered by Wordpress • Using a modified version of the Detour theme by Brian Gardner.