|
|
CTO ArticlesPublished in IT World
Web Rings and the problem of finding people on the WebBy Sean Mc Grath Do you remember the old nugget from your statistics classes about birthdays? The one which asserts that in a class of 30 or so people, the chances of two people having the same birthday are much higher than you would expect? Statistics never was my thing and although I have read through the argument many times over the years, I have totally forgotten how it goes. The argument has a way of seeping straight out of my system like so much prune juice. Given my track record with statistics, I am in no position to make the following statement but I will make it anyway: In any particular area of the web, the chances of multiple individuals, doing similar jobs with similar interests, is higher than you would expect. I have no proof of this. What I do have is an example (hardly a compelling argument I know) of one such name clash. Namely, the name Sean McGrath. What are the chances of multiple bloggers having the name Sean McGrath? Pretty good I think you will agree. It is an ordinary name. There are probably thousands of people with that name in the world. Given the increasing prevalence of blogs, some of them are bound to be Sean McGrath blogs. What are the chances of multiple geekish/technical bloggers having the name Sean McGrath? In places where the name is common, like Ireland, it would not be too surprising to find some. Let us keep going. What are the chances of multiple geeking/technical bloggers with the name Sean McGrath both in Ireland, both working a lot with the Java ecosystem and both dabbling in Python? Unlikely as it may seem this is true[1]. What's my point? Two things really. Firstly, I am just highlighting a single case of an interesting name clash in cyberspace. A case I am personally very familiar with. Secondly, I am wondering how common this is out there and what the future holds if it is commonplace. Personal websites - be they blogs or good old fashioned personal domain names - are not going away. We are witnessing exponential growth in these things at the moment. Finding out about individuals by "Googling" for them is a very common practice in all aspects of life. Therein lines the problem. Names of individuals in which one or more of the members are inordinately well linked compared to the rest of the individuals, will likely drown some members out of the search results lists. For example. One of my earliest influences in thinking about the analysis and design of computer programs was Michael Jackson. No, not that one[2]. This one[3]. How likely is it that anybody searching for the latter will find what they are looking for by simply Googling the name? Solutions? Well, one phenomenon that partly addresses the problem is the so-called social networks created by applications such as Orkut and Linkedin. Using these, one can search for people exclusively, cutting out all the other stuff in cyberspace. Another technique is to use blog searching tools such as Blogpulse[4], Technorati[5], Feedster[6] etc, Again, the primary effect is to separate the Web of people from the rest of the Web. Neither approach addresses the core problem though. How to find the John Smith you are really after in a sea of John Smiths? I have a suggestion for how this might be done quite simply. All we have to do is all change out names to be 1024 digit unique numbers... That was a joke. Seriously though, I think we could go a long way to alleviating the problem of name clashes by using the Webring concept[7]. What if folks with the same name linked to each other in a web ring. That way, once you get on the ring as it were, you can navigate from one false positive to the next false positive, until you get to the person you are looking for. Of course, to enable automation, we would want to add some semantic markup, perhaps in the form of a microformat, to indicate "this is a link to another website, the owner of which, has the same name". Would that work? I don't see why not. I think I might give it a try. All I need now is to find someone whose name leads to potentially confusing[8] name clashes...
[1] http://seanmcgrath.blogspot.com and http://blogs.sun.com/smg |