Next: Data Mining Techniques
Up: Data Mining and Document
Previous: Data Mining and Document
The amount of electronically available data has grown rapidly
because of
increase in use of electronic data gathering devices, e.g.,
point-of-sale, remote sensing devices etc., and
because
data storage has become easier
and cheaper with increasing computing power and disk storage
capacity.
Data base management systems (DBMSs) have given access to
the data stored but they give no analysis of the data.
Analysis is required to reveal the hidden relationships
within the data, for instance, for decision support.
Size of databases has increased and therefore
there is a strong need for automated techniques
for automated analysis. The solution is data mining
that has been defined as:
- the non-trivial extraction of implicit, previously unknown,
and potentially useful information from data.
(William J Frawley, Gregory Piatetsky-Shapiro and Christopher J. Matheus)
- a variety of techniques to identify nuggets of information or
decision-making knowledge in bodies of data, and extracting these in
such a way that they can be put to use in the areas such as decision
support, prediction, forecasting and estimation. The data is often
voluminous, but as it stands of low value as no direct use can be made
of it; it is the hidden information in the data that is useful.
[
])
Data mining has many synonyms and related areas of research.
One of the most popular alternatives for naming the area
is
Knowledge Discovery in Databases (KDD).
In the list of frequently asked questions
[
]
KDD is characterized as follows:
- The notion of Knowledge Discovery in Databases (KDD) has been given
various names, including data mining, knowledge extraction, data
pattern processing, data archaeology, information harvesting,
siftware, and even (when done poorly) data dredging. Whatever the
name, the essence of KDD is non-trivial process of identifying
(1) valid, (2) novel, (3) potentially useful, and (4)
ultimately understandable patterns in data. [
]
The structure of this chapter is partly based on
a WWW course material
prepared in the Queen's University of Belfast [
].
For introductory texts on
the topic see, e.g., [
,
,
].
Next: Data Mining Techniques
Up: Data Mining and Document
Previous: Data Mining and Document
Heikki Hy|tyniemi
Tue Aug 5 14:39:14 EET DST 1997