Friday 5 December 2008, 12:10 PM
The real time web?
I posted a comment there - repeated here, in the interests of discussion...
"It's interesting to think how journalism is evolving in terms of the real-time versus the static web (although I'm not sure exactly how to define either to exclude the other). How much journalism these days is spotting patterns form in the real-time web? How much is mining the static web? (There is another form of journalism, which involves spending time in the real world, but it may be falling out of fashion.)
Journalism was the original search engine, albeit with a rather baroque query interface. It tends to adopt the most efficient use of people and technology to produce good data, being a notoriously Darwinist entity, and it's quite good at adapting quickly - hasn't taken long for blogs to make their mark. So I think it's a good thing to track if you want to sniff out utility on the Web - after all, journalism is the first draft of history.
I'm not sure that there's a huge great wobbly lump of wondermoney sitting at the end of the real-time web search rainbow. And if there is, I wonder if it's much bigger than the one sitting a day further down the line, where the massive outpouring of us auto-digitising hominids has been filtered by the mechanisms we have, more or less, in place now.
Google's big problem isn't that it can't be Google a day earlier, it's that it can't be cleverer about imparting meaning to what it filters. For now, and until AI gets a lot better, the new worth of the Web is how we humans organise, rank and connect it. The good stuff takes time and thought, and so far nobody's built an XML-compliant thought accelerator."
Comments on this post
I could see that search engines might offer advertising dollars to websites not only for running the ads but for feeding "RSS-like" the content of new pages straight to Google. It would eliminate the download step for their service. There's no reason a blog could publish on the website but also straight to Google.
Something just occurred to me. The guys at Langley, the ones with the big computers, they're already doing it. Real-time web is already a reality. They just won't let us have access to it. Maybe a "Freedom of Information Act" filing will get the source code! Ha! Our tax dollars at work.
It's been too long since I checked your blog Rupert, always a good read. Light sycophancy over, this is a very interesting discussion point, the copious references to philosophy in the John Battelle article and discussion attest to that.
Journalism as the original search engine is an illuminating thought; it had never occurred to me, despite my dependence on the journalistic services of cr@p filtering, context provision and, ultimately, dissemination. One thing the proliferation of the web (in its various forms) has taught me is the value of a good information source, preferably one which provides the jump to knowledge without applying to much personal bias. I can’t see this service being replaced, for me the process represents an intrinsic and defining part of human nature.
So how can we make search cognizant of what is real time and what is static? There seems to be a fork here, one which you and John highlight in your posts. The newer feed focused web technologies all (I think) feature the notion of time in their formats, although the elements are not mandatory and the formats differ. So I agree there is the potential to create real time search over these sources, and can see no significant technical challenges besides scale. However the static web, and this includes blogs, is practically impossible.
Personally I hate finding blog posts that give no indication of publish or update times, and will disregard them if they are unknown to me, but it’s a very common foible. But even if they do, the display formats and HTML style markups all differ wildly, making scraping difficult at best. Is there a way to standardize? Not really, unofficial affiliation to a specific date notation might work, but that imposes fatal bounds to the information sources. The value add from technocrati suffers in a similar fashion.
Gazing forward with ignorant eyes, it seems that even the semantic web will not intrinsically support a temporal construct. Which makes me wonder whether perusing the ideal is in any way worthwhile?


