Data Mining System Architecture
The data mining system architecture consists of the following modules as shown in the Figure 1 below.
Figure 1: The architecture of a data mining system
1. Data sources: The data mining system assembles data from different data sources for performing investigation task. The sources of data are data warehouses, flat files, databases, World Wide Web (WWW), spreadsheets, or other kinds of information repositories. Data selection and data preprocessing techniques are in use with the data.
2. Database or Data Warehouse Server: The database or data warehouse server is the central storage accountable for extracting the related data, according to the data mining request or query issued by the user.
3. Knowledge Base: Data mining procedure might refer to a knowledge base, which is a repository of knowledge related to a particular domain that would help the searching procedure for finding the interesting patterns. This kind of knowledge may include “concept hierarchies” which organizes features or feature values into several levels of abstraction. It may also include “user beliefs”, which can evaluate the interestingness measure of a data pattern according to its suddenness or unexpectedness. The other instances of domain knowledge are any added thresholds or interestingness constraints and metadata (i.e., data about data).
4. Data Mining Engine: This is an important part of the data mining system. It contains a set of functional modules for performing several tasks such as summarization, association analysis, classification, regression, cluster analysis, and outlier detection.
5. Pattern Evaluation: This module usually applies some thresholds or interestingness constraints to determine the interesting knowledge. It also communicates with the data mining module so as to help focus the search for interesting patterns.
6. Graphical User Interface: The module interacts with users and the data mining system. It allows the user to communicate with the system by providing a data mining request or query and offers the necessary information to guide the search. Based on the users’ data mining application, the mined knowledge is presented to the user using some visualization techniques.