What is knowledge discovery and data mining?
The activity in which machine learning techniques are applied to find patterns in the relationship between data elements is called data mining. The data mining activity is one step in the knowledge discovery process1. This process seeks to gain insight into the relationship between data elements.
To provide a better understanding of a knowledge discovery endeavour a general process model is useful. Such a process model consists of a set of processing steps needed to complete a knowledge discovery and data mining (KDDM) project1. Various process models have been proposed. A popular KDDM process models is depicted in the figure below. The process model consists of six steps and several feedback loops.
data:image/s3,"s3://crabby-images/7d700/7d700575f8d8728cbd58bbe0c2d665c325c70da8" alt="KDDM process model. KDDM process model"
THe process unfolds in intentional cycles as follows:
1 In the first step of the KDDM process, a general understanding of the application domain and the relevant prior knowledge is developed. During this step the data mining problem and the objectives of the knowledge discovery and data mining endeavour are defined. 2. The second step involves the identification and acquisition of appropriate data sources, data exploration, data sampling, as well as the selection of appropriate, relevant and interesting attributes. 3. Data preparation, the third step, involves the preprocessing of the data set into the correct structure and form for use with the selected machine learning technique. During this step the appropriate machine learning technique or combination of machine learning techniques are identified in line with the data mining objectives set out during step one. 4. Step four, the data mining step, involves the application of the selected machine learning techniques to the prepared data. 5. In the context of the data mining objectives, the usefulness of the discovered patterns is evaluated and any alternative actions needed are identified during the fifth step. Useful knowledge learned is deployed for practical use in the final step.
Notes
- Originally published as part of Wilgenbus, E.F., 2013. The file fragment classification problem: a combined neural network and linearprogramming discriminant model approach. Masters thesis, North West University.
References
Footnotes
Related tags