Pages

Wednesday, December 14, 2011

Data Mining and Web Mining

Data mining (DM), also known as Knowledge Discovery (Frawley et al., 1992), is one of the rapidly growing field due to the large demand for value-added of large-scale database that accumulates more in line with the growth of information technology. In general, data mining can be defined as a series of processes to explore the added value of science, which is not known manually from a data set (Pramudiono, 2003).Web mining is the application of data mining techniques to the web in order to acquire knowledge and information over the web. Web mining can be categorized into three different scope, namely Web content mining, web structure mining and web usage mining (Srivastava et al., 2000).

Association Rules and Apriori AlgorithmAssociation rules are one of the functions of data mining techniques to discover associations among variables, or a structure correlations between items or objects in transaction databases, relational databases, as well as on other information storage.As illustrated in the analysis of the weblog of association rules is as follows, a pattern that might be "if someone visits the CNN website, there is a probability of 60% of people are visiting the website Seconds in the same month." In the illustration, the pattern found potentially generate pieces of information interesting and required by the related companies.Processes in engineering assocation rules is to find rules that satisfy minimum support and confidence. The algorithm was first used in the technique of association rules and the most widely used is the a priori algorithm (Agrawal & Srikant, 1994).

Web Crawler 
Web crawler (also known as web spider or web robot) is a program or automated script that explore the WWW by using a method or an automated way. The names that are rarely used on a web crawler are ants, automatic indexers, bots, worms (Kobayashi & Takeda, 2000).

Extended Log File Format 
Extended Log Format designed to meet several needs below (Baker & Behlendorf, 1996):* Allow control on the data recorded.
* Meet the needs of the proxy, the client and server in a common format.

* Provides a perfect handling of character removal problem.

* Allow the exchange of demographic data.

* Allow the recapitulation presents the data.

2 comments:

  1. based on the above article, the research could be a reference link below

    http://repository.gunadarma.ac.id/bitstream/123456789/2635/1/Ekon-16.pdf

    thank you

    ReplyDelete