Friday, July 23, 2004

[emerging tech] "Web Engineering: The Evolution of New Technologies" & the Ultimate Killer App

Friday, July 23, 2004
Dateline: China
Reviews of papers from the current (July/August 2004) issue of Computing in Science & Engineering, special issue on "Web Engineering: The Evolution of New Technologies."  To order articles from this issue, first click on .
Earlier this week I posted excerpts from the lead article in the current issue of CiSE.  The article was titled, "Managing XML Data: An Abridged Overview," which is a good, accurate title.  The excerpts contain useful links, too. 
I am going to take a variety of approaches for handling four other papers in this special issue.  However, I first want to provide a link to the introduction to this special issue, i.e., .  The intro itself provides a few useful references and links.
The second article is titled, "Information Retrieval Techniques for Peer-to-Peer Networks."  Fortunately, a full-text PDF copy of this paper can be accessed at either or, although the URL for the former looks a little bit too generic and might change at a moment's notice (also, the two papers are slightly different).  I have 19 bookmarks on my smartphone for this paper, but I guess I can summarize by saying that IR for P2P networks is hard and very different from "traditional" searchThe last statement actually says a lot -- read between the lines.  This paper covers all the usual suspects and also includes Skype. This paper is based upon the lead author's Master's thesis which can be accessed from .  Other papers by the lead author can be accessed at .  This is an important issue which needs to be resolved, especially as collaborative grid computing (CGC) comes to life.
Two figures; 20 references (28 references in the preprint).
Less luck with the paper titled, "Web Searching and Information Retrieval," i.e., I couldn't find a free copy on the Web.  The author's site is woefully outdated, too.  The author does speak favorably of a particular approach to decentralized P2P web crawling called "Apoidea."   A copy of a paper describing Apoidea can be accessed at ; accompanying slides can be accessed at .  As described in the CiSE paper, "Apoidea is both self-managing and uses the resource's geographical proximity to its peers for a better and faster crawl."
Two figures; 21 references.
To request a copy of this article click on: or (I'm not sure which address works; I already have a copy of this article so I don't need to contact the author!).
"Web Mining: Research and Practice" is not available, either, but a lot of excellent info on the senior author's projects related to this paper is available.  First, take a look at the eBiquity research areas at .  Next, you may want to take a look at the abstracts for papers published as part of the eBiquity Group at (current through December 2004 -- it doesn't get more current!!).  Move on to their "Semantic Web" page at .  I then downloaded a PDF copy of their paper titled, "Mining Domain Specific Texts and Glossaries to Evaluate and Enrich Domain Ontologies" (see ).  It looks like a relatively recent paper, newer than the CiSE paper (different authors and different subject matter, though).  The PDF is part of their Semantic Web research, whereas the CiSE paper is more "generic."  Anyway, the "Web Mining" paper is another call for distributed mining techniques, and covers fuzzy clustering as well as content-based recommender systems -- but doesn't forget good 'ol HITS (Hyperlink-Induced Topic Search), the basis for IBM's Clever and Google (to a certain extent).
No figures; 31 references.
To request a copy of this article click on: .
Finally, "Intelligent Agents on the Web: A Review" was very disappointing.  The lead author has impeccable credentials, but his paper is based on yesterday's news:  Old, outdated, buried stuff (like Firefly).  Matter of fact, the only live link I can recall finding was Recursion Software's "Voyager" home page (see ), which states that the "Voyager applications development platform provides the software layer which handles communications across the network for distributed JAVA applications."  (Looks interesting.)
I did a little more digging and surfed over to two stand-by sites (both referenced directly or indirectly in the "Intelligent Agents" paper), namely the MIT Media Lab Software Agents page and Oren Etzioni's (oops, I mean the University of Washington, Department of Computer Science) page.  At the MIT projects page (see ) is a listing of several "commonsense" projects, e.g., "Using Commonsense Reasoning to Enable the Semantic Web" (see ).  A draft White Paper on this is available at , as is a presentation at along with a couple of video demos.  I also downloaded a paper on GOOSE (GOal-Oriented Search Engine) at .  At UWash I went to their XML data management page (see ) and then grabbed two papers:  One on "Probabilistic Methods For Querying Global Information Systems" dated 14 July 2004 (see ) and another titled, "Learning Text Patterns for Web Information Extraction and Assessment" dated May 2004 (see ).  (To download other unrestricted reports, go to .)  Frankly, I need a bit of time to digest the two recently published UWash papers.
As the chair of the Internet and Web applications session of the First International Conference on Autonomous Agents (1996), I have a soft spot for agent-oriented everything (especially Web apps).  I remember an old saying from IJCAI (International Joint Conference on Artificial Intelligence) in the mid-70's:  Artificial intelligence is better than none.  (I probably still have a button with this saying somewhere.)  I'm keeping the faith, sans the hype and more toward the realities of software agents.  BTW, this CiSE paper isn't bad if you don't have any background in this space.  It covers the basics, such as ACLs, but with an "updated" perspective.
No figures; 27 references.
To request a copy of this article click on: .
The Ultimate Killer App
BTW, the "Ultimate Killer App" is attached and in some browsers it will automatically download.  (See the bottom of this message.)  You have to admit, this really is the ultimate killer app!!
I've never sent an attachment this way simultaneously to both my e-newsletter and blogs (and blog variants).  Just in case the attachment isn't included, I've uploaded it to the "Photos" section of the e-newsletter (see .)
>> Note to AlwaysOn readers: You'll need to go to the e-newsletter ( ) in order to see the "Ultimate Killer App."  You can try the blogs, but no guarantees.
Tidbits on Enterprise Software
.NET wins converts.  For the VARBusiness story see .  Evans Data reports that .NET usage showed a sharp YoY increase in adoption with 52% saying they use .NET and 68% saying they plan to deploy .NET apps by 2005.  In May, Forrester reported that 56% of developers consider .NET their primary development environment contrasted with 44% for J2EE.  (It must have been a binary choice!)  VARBusiness found in a May survey that 53% have already deployed a .NET app and 66% plan to do so within the next 12 months.  In the VARBusiness survey, the most important reasons for going with .NET were ease of use and quicker time to market.  A developer goes on to state that .NET development time is to Java what Java is to C++.  (Wow, what a claim!)
Python and Perl beat Java?  (See for the PDF file.)  Actually, an indirect "attack" against all "mainstream" programming languages, notably Java, C and C++.  The idea is that the "mainstream" languages are ill-suited for many distributed computing and integration apps.  Gives a "thumbs up" to Python, Perl and PHP, with a peek at PEAK -- the Python Enterprise Application Kit.  (Sorry for the pun.)  PEAK's developers claim future superiority over J2EE.  They also knock Java for not being suited to rapid application development.  PEAK's developers believe a Python-based approach to component-based apps will result in systems that are simpler, faster and easier to install, manage and maintain than variants in J2EE.  PEAK, however, is still immature.
Grid computing takes off.  Another survey from Evans Data (see ).  37% of database developers are implementing or planning to implement a grid computing architecture.  In related data, 34% of companies are focusing their database development work on BI (business intelligence) platforms.  See also Oracle's spin on this at .
The spoils of ROI.  From IDC's Group VP, Solutions Research, there are several issues which must be addressed in order to maximize IT ROI.  (See .)  Four of the key issues are:
  • Should the IT agenda include investment in outsourcing technologies or services?
  • Does the future of the business include operations in, or electronic trade with, additional countries - China, for example?
  • Are the services of an outside provider being considered to help in managing proliferating applications or complex "interenterprise" business relationships?
  • What role will utility computing play in the future of IT?
(All items in bold are my emphasis.)  The article goes on to discuss various ways of evaluating ROI, including one of my favorite ways, ROA (real options analysis). 
TTFN.  Have a GREAT weekend!
David Scott Lewis
President & Principal Analyst
IT E-Strategies, Inc.
Menlo Park, CA & Qingdao, China
WARNING:  To avoid spam (well, to avoid getting at least some spam), I'm using a Gmail account with a special address.  However, I have NOT been able to access the messages in my Gmail account for the past FOUR days!!  Not sure how long this will last.  In the interim, also use: -- but also Cc: the above address.  Of course, if you already know me, feel free to send messages to my primary and secondary e-mail accounts.  (If you know me, you already know what they are.  The primary account is working fine.) (current blog postings optimized for MSIE6.x) (access to blog content archives in China) (current blog postings for viewing in other browsers and for access to blog content archives in the US & ROW) (AvantGo channel)
To automatically subscribe click on .