lets share the knowledge: September 2007

AJAX

What is Ajax?

If you are thinking about getting information about Ajax FC or the Ajax city, then you've just hit the wrong page. We are here to talk about the term AJAX which stands for Asynchronous JavaScript & XML. The term Ajax was coined in 2005 by James Garrett and is used to describe a combination of technologies of JavaScript, XML, CSS, XHTML and DOM and the all important XHR (XMLHTTPRequest) object. While the name AJAX has only recently been coined, the underlying technologies (JavaScript, XML,

SOAP, and so on) have been maturing for years .XHR was created by Microsoft in IE5 in March of 1999, as an ActiveX Object and it only worked in IE.XHR was then adopted by Mozilla 1.0 and Safari 1.2.And a new generation of web application was born. Its core technique is centered on communication to the server without a page refresh .It is also known as XMLHTTP technique.

Ajax is all about improving the user experience. Since the beginning of then Web, and until recently, the users have nothing to do but fill some forms, click some links and then wait for reloading the whole page. It was not only becoming monotonous but tedious as well. This is what could be called the Web 1.0 model. Google Maps (http://maps.google.com) really ignited the Ajax fire. Google, through the clever use of XHR callbacks provided the first in-page scrollable map. There was no waiting around for a bunch of ads to refresh. It really opened the new pattern of web UI which we prefer to call web 2.0.Real time Live searches, Web based chatting, Dragging and dropping ,instant feedback etc, have really established AJAX as a tool for making richer application. Google gmail, Google suggest, Google maps, Flickr etc. are the examples of Ajax application that have sparked people's imagination.

The four defining principles of Ajax

1. The browser hosts an application, not content

Classically when a user logs into a site, a session is initialized and several serverside objects are created and pages are served to the browser. The home page is dished up to the browser, in a stream of HTML markup that mixes together standard boilerplate presentation and user-specific data. The browser acted as a dumb terminal and flushed whatever was sent to it. But Ajax moves some application logic to the browser. When the user logs in. a more complex document is delivered to the browser, a large proportion of

which is JavaScript. This document will stay with the user throughout the session, although it will probably alter its appearance considerably while the user is interacting with it. It knows how to respond to user input and is able to decide whether to handle the user input itself or to pass a request on to the web server.

2. The server delivers data, not content

As we noted, the classic web app serves up the same mixture of boilerplate, content, and data at every step. An Ajax based application can behave smartly by sending asynchronous requests to the server so that the server can only send back the relevant data while all other features of the page layout are there already.

3 .User interactions with the application can be fluid and continuous

In classic web applications, the only way to contact the server was either to click a hyperlink or to submit a form and wait for the server to respond. But Ajax completely changes this style of communicating the server. With Ajax technology, we can contact the server without interfering the user's flow. Interactions can be made with the server on the basis of mouse drag or movement or on every keystroke as on Google suggest.

4. This is real coding and requires discipline

An Ajax technology doesn't use JavaScript just to get fancy decorations or designs in the site. It really has to do some serious works. In Ajax, the code that you deliver when users launch the application must run until they close it, without breaking.

What can you do with Ajax

The technology for Ajax has been around since 1998, and a handful of applications (such as Microsoft’s Outlook Web Access) have already put it to use. But Ajax didn’t really catch on until early 2005, when a couple of high-profile Web applications (such as Google Suggest and Google Maps) put it to work, and Jesse James Garrett wrote his article coining the term Ajax and so putting everything under one roof. Google has been the pioneer in developing Ajax applications and has done more than nyone to raise the profile of Ajax applications. Some of the applications of Ajax are:

a. Auto complete

This is the Ajax feature that reacts to keystrokes and tries to make intelligent

guesses on what the user is going to type and then suggest it. For example

suggestions in gmail from the address book on typing email addresses.

b. Searching in real time with live searches

This feature gives result to your queries instantly as you enter them.

Googlesuggest is an example of this feature. You can view an example by logging onto http://www.google.com/webhp?complete=1

c. Chatting

Ajax has enabled us to have a web based chat service. http://iloveim.com is an example of a site based on this technology.

d. Dragging and dropping

As you might have seen in many e-commerce sites, you can drag and drop items to your cart which you prefer to buy. This has been made possible by Ajax. Ajax basically deals with the asynchronous handling of events. By using Ajax a rich application can be achieved without any additional plug-ins whatsoever to the browser as Ajax is supported by all major browsers IE, Firefox / Netscape / Mozilla with slight variations. So, Ajax has been a major tool in making the web UI more attractive and user friendly and is definitely here to stay.

In the internet there are hundreds of millions of pages providing the information on an amazing variety of topics. So, retrieving the useful information from the web is really a daunting task. How to obtain the required information from those millions of pages? Of course internet search engine site like google.com, yahoo.com, live.com etc are one and only option. These are special sites on the Web that are designed to help people find information stored on other sites. At the first glance it seems like a magic .These site understand what we intended to search. Really amazing, Search engine can be Crawler-Based Search Engines and Human-powered directories. Crawler-based search engines create their listings automatically. It automatically tracks any changes on the web pages where as a human-powered directory depends on humans for its listings. So, in the rapidly growing web, Crawler-Based Search Engine is better.

Crawler-based search engines have three major steps.

a) Crawling

b) Indexing

c) Searching

Crawling:

Web crawlers are programs that locate and gather information on the web. They recursively follow hyperlinks present in known document to find other document. The usual starting points are lists of heavily used servers and very popular pages. In this way, the spider system quickly begins to travel, spreading out across the most widely used portions of the Web. The spider visits to the site on a regular basis, such as every month or two, to look for changes.

Indexing:

An index helps to find the information as quickly as possible. The index is also known as catalog. If a web page changes, then index is updated with new information. Indexing basically consists of two steps:

a) Parsing

b) Hashing

a. Parsing:

Parser extracts the link for further crawling. It also removes tag, JavaScript, comments etc. from the web pages and convert the html document to plain text. For the automated analysis of the text Regular expressions are extensively used. Parser which is designed to run on the entire Web must handle a huge array of possible errors.

b. Hashing:

After each document is parsed, it is encoded into a number. For hashing, a formula known as hashing function is applied to attach a numerical value to a word. So, every word is converted into a wordID by using hash function. Inverted index is used to maintain the relationship between WordID and DocID which helps to quickly find the document containing the given word.

Searching:

All the documents matching the index are not equally relevant. Among the millions of documents only the most relevant documents have to be listed. In the simplest case, a search engine could just store the word and the URL where it was found. In reality, this would make for an engine of limited use, since there would be no way of telling whether the word was used in an important or a trivial way on the page, whether the word was used once or many times or whether the page contained links to other pages containing the word. So, to provide quality search results efficiently, searching process has to complete following steps

· Parse the query.

· Convert words into wordIDs using hash function.

· Compute the rank of that document for the query.

· Sort the documents by rank.

· List only the top N numbers of documents.

For those who are interested in the implementation of the web crawler, check out any of the open source crawler listed below:

Heritrix is the Internet Archive's archival-quality crawler, designed for archiving periodic snapshots of a large portion of the Web (written in Java).

ht://Dig includes a Web crawler in its indexing engine.(Written in C)

Larbin a simple web Crawler(Written in c)

Nutch is a scalable crawler written in Java and released under an Apache License. It can be used in conjunction with the Lucene text indexing package.

WIRE - Web Information Retrieval Environment (Baeza-Yates and Castillo, 2002) is a web crawler written in C++ and released under the GPL, including several policies for scheduling the page downloads and a module for generating reports and statistics on the downloaded pages so it has been used for Web characterization.

Ruya Ruya is an Open Source, breadth-first, level-based web crawler written in python.

Universal Information Crawler Simple web crawler, written in Python.

DataparkSearch is a crawler and search engine released under the GNU General Public License.

lets share the knowledge

Ajax :Defacto of web 2.0

WEB CRAWLER: How It Works?

Indexing:

VISITOR INFO

Blog Archive

About Me