The Human Factor
Despite the technological advances in computational methods, humans are an indispensable part of the data mining process. One reason for their continued inclusion in what would otherwise be an automated process is that current technologies assume uniform and relatively simple data structures. Very large, complex databases, replete with multiple potential relationships present scalability issues that may require significant computational time on powerful computer systems. In addition, many of the traditional data mining methods were developed for homogenous numerical data. However, bioinformatics databases increasingly hold text sequences, protein structure, and other data sets that are anything but homogeneous.
The technical challenges associated with data mining are compounded by the lack of statistical methods that can adequately assess the significance of figures calculated from very large database sets. Similarly, because few bioinformatics databases are static, but are growing exponentially with time, the statistical concept of a fixed population from which samples are drawn is violated. As a result, a statistical analysis of a particular relationship at one point in time may provide a different result a month or two later. These and similar challenges remain to be solved.