The main reason why data mining is very interesting information industry in recent years is due to the availability of large amounts of data and the magnitude of the need to transform data into useful information and knowledge.
Data mining is the activity of extracting or mining knowledge from data size / large numbers, this is information that will be very useful for development. Where the steps to perform data mining is as follows :
- Data cleaning (to remove noise inconsistent data) Data integration (in which the divided data sources can be put together)
- Data selection (where data relevant to the task of analysis is returned to the database)
- Data transformation (where the data is changed or united to form the right to mine with a summary of operating performance or aggression)
- Data mining (an essential process in which the intelligence methods used to extract data patterns)
- Pattern evolution (to identify the pattern that is really interesting that represents knowledge based on several measures of interest)
- Knowledge presentation (where the image and knowledge visualization techniques are used to provide the knowledge that has been mined kpada user).
The architecture of a typical data mining has several main components, namely:
- Database, data warehouse, or other information storage.
- Server database or data warehouse.
- Knowledge base
- Data mining engine.
- Pattern Evolution module.
- Graphical user interface.
There are several types of data in data mining are :
- Relation Database: A database system, also called a database management system (DBMS), containing a collection of related data, known as a database, and a set of software programs to manage and access data.
- Data Warehouse: A data warehouse is a space penyimpaan information gathered from a variety of sources, stored in a unified scheme, and is usually located on a site.
Then what kind of pattern that can be mined ?
The usefulness of data mining is to specify a pattern to be found in the task of data mining. In general, data mining tasks can be classified into two categories: descriptive and predictive. Mine the descriptive task is to classify the general nature of the data in the database. In predictive data mining tasks is to take the conclusions of recent data to make predictions.
Concept / Class Description
Data can be associated with the division of classes or concepts. For example, All Electronics stores, the division of classes for the goods to be sold including computers and printers, and concepts for the consumer is Spender Big Spenders and budget. This is very useful to describe the distribution of individual classes and concepts are concise, succinct reports, and also setting the price. Description of a class or concept as it is called class / concept descripition.
Association Analysis
Association analysis is the discovery of association rules that show the value of an attribute conditions that occur together constantly in memmberikan data sets. Association analysis is widely used for market basket or transaction data analysis.
Classification and predication
Classification and prediction may need to be processed by the relevant analysis, which seeks to identify the attributes that are not added to the process of classification and prediction. These attributes can then be issued.
Cluster Analysis
Unlike the classification and prediction, which analyze the data object with a class that terlabeli, clustering analyzes data objects without seeking information on a known class label. In general, the class labels are not displayed in the training data simply because they do not know how to begin. Clustering can be used to produce labels.
Outliers Analysis
Outliers can be detected using a test statistic that is taking a distribution or a probability model for the data, or using measures distance at which objects are important away from other clusters are considered outliers.A database may contain data objects that do not follow the general behavior or model of the data. These data are called outliers.
Evolution Analysis
The data analysis illustrates the evolution of model statutes or the tendency of objects that have a habit of changing all the time. Although this might include the characteristics, discrimination, association, classification, or clustering of data based on time, clear advantages such as data analysis including time-series analysis, sequence or pattern pencocockkan periodically, and similarity based on data analysis.
To perform data mining is good there are several main issues concerning mining methodology and user interaction, performance and different types of databases. This is often encountered when we want to do data mining.
ReplyDeleteMy cousin recommended this blog and she was totally right keep up the fantastic work!
Data Mining