Pages

Thursday, December 15, 2011

Data Preprocessing

Stages in performing data mining one of them is data preprocessing. The question is why the data needs to be cleaned before it is processed?
This happens because usually the data to be used has not been good, the cause include:
- Incomplete : lack of values ​​of certain attributes or other attributes.
- Noisy : containing errors or outliers values ​​that deviate from the expected.
- Inconsisten : mismatch in the use of code or name.
Here are good quality data was based on good decisions and data warehouse needs consistent integration of quality data.


Some things to consider to get good data are:

  • Accuracy
  • Completeness
  • Consistency
  • Timeliness
  • Value added
  • interpretability
  • Accessibility
  • Contextual
  • Representational

Techniques
or methods used in data preprocessing, including
:
Data cleaning
Eliminating data values ​​are wrong, fix the mess of data and checking data inconsistencies.
Data integration
Combining data from multiple sources (databases, data cubes, or files) into the appropriate data storage.
Data transformation
Normalization and data collection so that it becomes the same.
Data reduction
Describe the data into a smaller form size but still yield the same analytical results.
Data diskretisasi
Part of data reduction but it has its own significance, especially for numerical data.


No comments:

Post a Comment