Advertisement
Promo

Become a member of the ZDNet UK community

Rupert Goodwins

View blog's RSS Feed

Mixed Signals

Any sufficiently advanced information is indistinguishable from noise

Friday 5 December 2008, 12:10 PM

The real time web?

Posted by Rupert Goodwins

Over on John Battelle's Searchblog, there's an interesting post on what might happen when search catches up with the realtime web - when you can ask not just what's on the Web in the static pages, but what people are doing right now He thinks it'll be very significant when we get to search across all those Web services like Twitter at once.

I posted a comment there - repeated here, in the interests of discussion...

"It's interesting to think how journalism is evolving in terms of the real-time versus the static web (although I'm not sure exactly how to define either to exclude the other). How much journalism these days is spotting patterns form in the real-time web? How much is mining the static web? (There is another form of journalism, which involves spending time in the real world, but it may be falling out of fashion.)

Journalism was the original search engine, albeit with a rather baroque query interface. It tends to adopt the most efficient use of people and technology to produce good data, being a notoriously Darwinist entity, and it's quite good at adapting quickly - hasn't taken long for blogs to make their mark. So I think it's a good thing to track if you want to sniff out utility on the Web - after all, journalism is the first draft of history.

I'm not sure that there's a huge great wobbly lump of wondermoney sitting at the end of the real-time web search rainbow. And if there is, I wonder if it's much bigger than the one sitting a day further down the line, where the massive outpouring of us auto-digitising hominids has been filtered by the mechanisms we have, more or less, in place now.

Google's big problem isn't that it can't be Google a day earlier, it's that it can't be cleverer about imparting meaning to what it filters. For now, and until AI gets a lot better, the new worth of the Web is how we humans organise, rank and connect it. The good stuff takes time and thought, and so far nobody's built an XML-compliant thought accelerator."

Comments on this post

Xwindowsjunkie

I could see that search engines might offer advertising dollars to websites not only for running the ads but for feeding "RSS-like" the content of new pages straight to Google. It would eliminate the download step for their service. There's no reason a blog could publish on the website but also straight to Google.

Something just occurred to me. The guys at Langley, the ones with the big computers, they're already doing it. Real-time web is already a reality. They just won't let us have access to it. Maybe a "Freedom of Information Act" filing will get the source code! Ha! Our tax dollars at work.

Updated by Xwindowsjunkie on Dec 8, 2008 8:17 AM

Simon W

It's been too long since I checked your blog Rupert, always a good read. Light sycophancy over, this is a very interesting discussion point, the copious references to philosophy in the John Battelle article and discussion attest to that.

Journalism as the original search engine is an illuminating thought; it had never occurred to me, despite my dependence on the journalistic services of cr@p filtering, context provision and, ultimately, dissemination. One thing the proliferation of the web (in its various forms) has taught me is the value of a good information source, preferably one which provides the jump to knowledge without applying to much personal bias. I can’t see this service being replaced, for me the process represents an intrinsic and defining part of human nature.

So how can we make search cognizant of what is real time and what is static? There seems to be a fork here, one which you and John highlight in your posts. The newer feed focused web technologies all (I think) feature the notion of time in their formats, although the elements are not mandatory and the formats differ. So I agree there is the potential to create real time search over these sources, and can see no significant technical challenges besides scale. However the static web, and this includes blogs, is practically impossible.

Personally I hate finding blog posts that give no indication of publish or update times, and will disregard them if they are unknown to me, but it’s a very common foible. But even if they do, the display formats and HTML style markups all differ wildly, making scraping difficult at best. Is there a way to standardize? Not really, unofficial affiliation to a specific date notation might work, but that imposes fatal bounds to the information sources. The value add from technocrati suffers in a similar fashion.

Gazing forward with ignorant eyes, it seems that even the semantic web will not intrinsically support a temporal construct. Which makes me wonder whether perusing the ideal is in any way worthwhile?

Updated by Simon W on Jan 23, 2009 4:31 PM

Rupert Goodwins
  • Rupert Goodwins
  • Location, location, location
  • Member since: October 2006
ZDNet Staff

My Blog Archive


Contacts' Latest Discussions

Number of Tracked Discussions: 3,210

Adrian Mars Adrian Mars

Shiny, shiny, shiny

Thursday 3 December 2009, 12:07 PM

1 comment

Contacts' Latest Blogs

Number of Contacts Blogs: 18

Avatar David Meyer

Nokia halves smartphone portfolio

Friday 4 December 2009, 5:03 PM

1 comment

Skip Sub Navigation Links to CNET Brand Links

Help

Become part of the ZDNet community.

Newsletters