Loading ...
Global Do...
News & Politics
32
0
Try Now
Log In
Pricing
COMPARISON OF NEURAL NETWORK AND MULTIVARIATE DISCRIMINANT ANALYSIS IN SELECTING NEW COWPEA VARIETY Adewole, Adetunji Philip * Department of Computer Science, University of Agriculture, Abeokuta philipwole@yahoo.com Sofoluwe, A. B. Department of Computer Science, University of Lagos, Akoka Agwuegbo , Samuel Obi-Nnamdi Department of Statistics, University of Agriculture, Abeokuta E-mail: agwuegbo_son@yahoo.com ABSTRACT In this study, neural networks (NN) algorithm and multivariate discriminant (MDA) based model were developed to classify ten (10) varieties of cowpea which were widely planted in Kano. . In order to demonstrate the validity of our model, we use the case study to build a neural network model using Multilayer Feedforward Neural Network, and compare its classification performance against the Multivariate discriminant analysis. Two groups of data (Spray and Nospray) were used. Twenty kernels were used as training data set and test data to classify cowpea seed varieties. The neural network classified the new cowpea seed varieties based on the information it is trained with. At the end both methods were compared for their strength and weakness. It is noted that NN performed better than MDA, so that NN could be considered as a support tool in the process of selection of new cowpea varieties. KEYWORDS: Cowpea, Multivariate Discriminant Analysis (MDA), Neural Network (NN), Perceptron. (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 350 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 . 1.0 Introduction The history of neural networks begins with the earliest model of the biological neuron given by [5]. This model describes a neuron as a linear threshold computing unit with multiple inputs and a single output of either 0, if the nerve cell remains inactive, or 1, if the cell fires. A neuron fires if the sum of the inputs exceeds a specified threshold. In functional form, this gives f(x) = 1 for x greater than some threshold, and f(x) = 0 otherwise (this is commonly known as the indicator function). In theory, such a "system" of neurons presents a possible model for biological neural networks such as the human nervous system. The [5] model was utilized in the development of the first artificial neural network by [12] in 1959. This network was based on a unit called the perceptron, which produces an output scaled as 1 or -1 depending upon the weighted, linear combination of inputs. Variations on the perceptron-based artificial neural network were further explored during the 1960s by [12] and by [15], among others. In 1969 [6] demonstrated that the perceptron was incapable of representing simple functions which were linearly inseparable. This includes the case of the "exclusive or" (XOR). Because of this fundamental flaw (and Rosenblatt's untimely death) the study of neural networks fell into something of a decline during the 1970s. However, this limitation was overcome in the early 1980s. According to [11] : The post-perceptron era began with the realization that adding (hidden) layers to the network could yield significant computational versatility. This yielded a considerable revival of interest in ANNs (especially multilayered feedforward structures), which continues to this day. Presently, much research on neural networks is taking place within two areas: (a) the aforementioned multilayered feed-forward networks, also known as multilayer perceptrons, and (b) symmetric recurrent networks, also known as attractor neural networks or Hopfield nets. The former model is used for classification problems, while the latter is used for developing associative memory systems. The investigation into neural network structures and performance has taken on a substantially pragmatic feel in recent years. There is greater interest in using neural networks as problem-solving algorithms than in developing them as accurate representations of the human nervous system. ANNs have been implemented to solve a variety of problems involving pattern classification and pattern recognition. 2.0 Neural Networks and Standard Statistical Techniques Similarities between ANNs and statistical methods definitely exist. Indeed, neural networks have been categorized as a form of nonlinear regression. It has also been observed that multiple linear regression, a standard statistical tool, can be expressed in terms of a simple ANN node. For example, given the linear equation y = b0 + b1x1 + ... + bnxn, the xi can be taken as the inputs to a node, the bi taken as the corresponding weights, and b0 taken as the activation function. There are at least two key differences between ANNs and statistical methods. Often remarked upon as a major drawback of ANNs is the fact that their internal functional structure remains unknown once they have been trained. In effect, a neural network remains a "black box" that may produce useful results, but cannot be precisely understood. Statistical procedures do not exhibit this sort of opaque design. The construction of a neural network is also something of an ad-hoc process whereas there are commonly formalized guidelines for fitting the best model in statistics. The performance of ANNs has been extensively compared to that of various statistical methods within the areas of prediction and (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 351 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 . classification. In particular, a fair amount of literature has been generated on the use of ANNs in time series forecasting. In an examination of two time series without noise, [4] concluded that basic neural networks substantially outperform conventional statistical methods. [7] found that a neural network and the Box-Jenkins forecasting system performed about the same for the analysis of 75 different time series. Interestingly, the memory of a time series has been demonstrated to influence relative performance of ANNs and the Box-Jenkins Model. The Box-Jenkins model slightly outperforms ANNs for time series with long memory, while the reverse tends to be true for time series with short memory. Stern has also concluded that for time series analysis "NNs work best in problems with little or no stochastic component". Neural network and statistical approaches to pattern classification have been compared by a number of researchers. For the most part, reviews seem to be mixed. For instance, [3] concluded that neural networks show little promise as real-world classifiers, while a case study examined by Yoon points to the superiority of ANNs over classical discriminant analysis. In a comprehensive study of classification techniques, [6] rated the performance of a large selection of neural network, statistical, and machine learning algorithms on a variety of data sets. In the analysis of their results, they presented the top five algorithms for twenty- two different data sets based on error rates. Though not conclusive, the study by Mitchie would seem to suggest that neural networks aren't necessarily replacements for, or even preferable alternatives to standard statistical classification techniques. 2.1 Multivariate Discriminant Analysis The term multivariate discriminant analysis refers to several different types of analyses. Classificatory discriminant analysis is used to classify observations into two or more known groups on the basis of one or more quantitative variables. Classification can be done by either a parametric method or a nonparametric method. A parametric method is appropriate only for approximating normal within- class distributions. The method generates either a linear discriminant function or a quadratic discriminant function. When the distribution within each group is not assumed to have any specific distribution different from the multivariate normal distribution, nonparametric methods can be used to derive classification criteria. These methods include the kernel method and nearest-neighbor methods. The kernel method uses uniform, normal, bi-weight, or tri-weight kernels in estimating the group-specific density at each observation. The within-group covariance matrices or the pooled covariance matrix can be used to scale the data. The performance of a discriminant function can be evaluated by estimating error rates (probabilities of misclassification). Error count estimates and posterior probability error rate estimates can be evaluated. In multivariate statistical applications, the data collected are largely from distributions different from the normal distribution. Various forms of nonnormality can arise, such as qualitative variables or variables with underlying continuous but nonnormal distributions. If the multivariate normality assumption is violated, the use of parametric discriminant analysis might not be appropriate. When a parametric classification criterion (linear or quadratic discriminant function) is derived from a nonnormal population, the resulting error rate estimates might be biased. 2.1.1 Discriminant Function A simple linear discriminant function transforms an original set of measurements on a sample into a single discriminant score. The score or transform variable represents the sample’s position along line defined by the linear discriminant function. We can therefore think of the discriminant function as away of collapsing a multivariate problem down into a problem which involves only one (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 352 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 . variable. One method that can be used to find discriminant function is regression; the dependent variable consists of the differences between the multivariate means of the group. In matrix notation, we must solve an equation of the form [Sp 2]*[λ] = [D]………………………………… (1) Where Sp 2 is the m*m matrix of pooled variance and covariances of the m- variable, λ is the coefficient of the D-equation. [λ] = [Sp 2]-1*[D]………………………………… (2) To compute the discriminant function, we must determine the various entries in the matrix equation. The mean differences are found simply by Dj=Ậj-Bj= /na- /nb …………………… (3) In expanded form D1 Ậ1 B1 D2 Ậ2 B2 - = - - - - - - - - - Dm Am Bm We must construct the matrix of sum of square and cross product of all variables. Sum of product of Matrix A = – ……. (4) Sum of product of Matrix B = ( …… (5) Then the matrix of pooled variance can be found as Sp 2 = SPA + SPB / (na + nb - 2)………….. (6) It could be observed that this equation for pooled variance is exactly the same as that used in the T2 test of the equality of multivariate means. Although the amounts of mathematical manipulation that must be performed to calculate the coefficients of a discriminant function appear large. (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 353 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 . The set of coefficients that can be found are entries in the discriminant function equation of the form: R0 = λ1ψ1+ λ2ψ2+ λ3ψ3 +…….+ λmψ…………………. (7) The discriminant index, R0, is the point along the discriminant function line which is exactly halfway between the center of group A and the center of group B. Then we substitute the multivariate mean of group A into the equation to obtain RA (that is, we set ψj = Aj) and the mean of group B. 2.2 Multilayer Feedforward Neural Network Feedforward neural networks (FF networks) are the most popular and most widely used models in many practical applications. Feed- forward ANNs allow signals to travel one way only; from input to output. There is no feedback (loops) i.e. the output of any layer does not affect that same layer. Feed-forward ANNs tend to be straight forward networks that associate inputs with outputs. They are extensively used in pattern recognition. This type of organisation is also referred to as bottom-up or top-down. They are known by many different names, such as "multi-layer perceptrons. The manner in which the neurons of a neural network are structured is intimately linked with the learning algorithm used to train the network. We may therefore speak of learning algorithm used in the design of neural networks as being structured. The classification of learning algorithms can be considered based on two fundamental different classes (Single layer feed-forward, Multi-layer feed-forward) of neural network architecture. The diagram bellow describes the structure of the multilayer feed-forward neural network. Input Layer Hidden Layer Output Layer Fig1 a. The activity of the input units represents the raw information that is fed into the network. b. The activity of each hidden unit is determined by the activities of the input units and the weights on the connections between the input and the hidden units. c. The behavior of the output units depends on the activity of the hidden units and the weights between the hidden and output units. This simple type of network is interesting because the hidden units are free to construct their own representations of the input. The weights between the input and hidden units determine when each hidden unit is active, and so by modifying these weights, a hidden unit can choose what it represents. (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 354 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 . 3.0 Implementation A package was developed and implemented using java and S-plus as the computing environment and run a on Pentium IV 1.80GHz of processor, 512MB of RAM and 40GB of local disk. These packages can work on any windows based operating system but for best performance, window XP professional is recommended. The implementation could be started by lunching into the interface. It is followed by the menu editor which gives user access to the whole application of neural network. The two classes (No Spray, Spray) of the data were first tested for the similarity/dissimilarity and correlation test was performed to generate the training weights from the given data using S-Plus, and the results were shown below: Table: Results of the analysis. Variable RankSumSquare(RSS) Weight Std.Error Intercept Good Seed 84900.2 0.52 0.000000+00 spray.gs Bad Seed 84900.2 0.45 0.000000+00 spray.bs Germinating Seed 1824.32 0.40 0.000000+00 spray.sw Seed Weight 7339.87 0.25 0.000000+00 spray.sg Days to flowering 12295.91 -0.31 0.000000+00 spray.df Days to maturity 1828.34 0.27 0.000000+00 spray.dm Plant stand/hectare 20272.38 0.24 0.000000+00 spray.sh Degree of freedom: 30 totals; 28 residual Residual standard error (on weighted scale): 55.06496 The model for the correlation and the neural network are as follow: Y = Cowpea yield Y = Yij = + Error Where Wij = (X”X) -1X”Y The figure shows the application as it appears when it initially starts up. Initially, the network activation weights are all zero. The network has learned no patterns at this point. (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 355 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 . Fig2: neural network before training To train the network to recognize the pattern 1000000, enter 1000000 under the “Input pattern to run or train”. Click the “Train” button. Notice the weight matrix adjust to absorb the new knowledge. Fig3: neural network after the first training Now the network will be tested. Enter the pattern 1000000 into the “Input pattern to run or train” (it should still be there from your training). The output will be “1000000”. This is an auto associative network, therefore it echoes the input if it recognizes it. Now try something that does not match the training pattern exactly. Enter the pattern “01000000” and click “Run”. The output will now be “0111000”. The neural network did not recognize “01000000”, but the closest thing it knew was “0111000”. Now test the side effect mentioned previously. Enter “0111111”, which is the binary inverse of what the network was trained with (“0111111”). The networks always get trained for the binary inverse too. So if you enter “00000001”, the network will recognize it. Then , the final test. Enter “0000000”, which is totally off base and not close to anything the neural network knows. The neural network responds with “0111000”, it did try to correct, it has idea what you mean. You can play with the network more. It can be (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 356 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 . taught more than one pattern. As you train new patterns it builds upon the matrix already in memory. Pressing “Clear” clears out the memory. 4.0 Discussion of Result Table 2: Ten (10) varieties of Cowpea were used for classification and the results are shown bellow: Variety MDA Neural Network Test IT845-224-4 A A A IT86D-716 R A A IT90K-277-2 A A A Tvu-13743 A A A Tvu-1890 A A A Tvu-14476 A A A IT86D-715 A A A IT86D-719 A A A Tvu13731 A A A TvNu72 A A A A = Accept R = Reject For the NN, the results may be considered very promising. In ten (10) varieties that were tested, all the varieties were accepted in the test, one (IT86D-716) is rejected by the multivariate discriminant but accepted by the neural network. 5.0 Conclusion To explore new ways to help the process of selection of new cowpea varieties, neural networks (NN) algorithm and multivariate discriminant (MDA) based model were developed to classify ten (10) varieties of cowpea which were widely planted in Kano and, the methods (NN and MDA) used showed that Neural Networks could be considered as a promising technique to develop support tools for the process of selection of new cowpea varieties. For the NN, the results may be considered very promising. In ten (10) varieties that were tested, all the varieties were accepted in the test, one (IT86D-716) is rejected by the multivariate discriminant but accepted by the neural network. REFERENCES [1] Alan S. Lapedes, Robert M. Farber: How Neural Nets Work. NIPS 1987: 442-456 [2] Elisa R. and Marco B. (2007). Multivariate Statical Tools for the Evaluation of Proteomic 2D-maps. Protomics 4:53-66. (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 357 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 . [3] Jyhshyan Lan, Michael Y. Hu, B. Eddy Patuwo, G. Peter Zhang: An investigation of neural network classifiers with unequal misclassification costs and group sizes. Decision Support Systems 48(4): 582-591 (2010) [4] Lapedes, A. & Farber, R. (1987): Nonlinear signal processing using neural network. International Journal of Forecasting. v14. 323-337. [5] McCulloch W.S. and Pitts W. (1943), "A Logical Calculus of the Ideas Immanent in Nervous Activity," Bulletin of Mathematical Biophysics, 5, 115-133. [6] Minsky M. and Papert S. (1969): Perceptrons: An Introduction to Computational Geometry. MIT Press, Cambridge, Mass. [7] Patil, P. N. and Ratnaparkhi, M. V. (2000). A nonparametric test for size biasedness. Jour. of Indian Statist. Assoc., 38, 369- 382. [8] Patil, P. N. and Speckman, P. (2004). Constrained kernel regression. Jour. of Indian Statist. Assoc., 42, 87-98. [9] Potts W.J.E. (2000), Neural Network Modeling Course Notes, Cary: SAS Institute, Inc. [10] Quian N and Sejnowski T.J. (1988). "Predicting the Secondary Structure of Globular Proteins Using Neural Network Models," Journal of Molecular Biology, 202, 865-884. [11] Robert J. Schalkoff: (1987): Analysis of the weak solution approach to image motion estimation. Pattern Recognition 20 (2): 189-197. [12] Rosenblatt, F. (1958). “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain,” Psychol. Rev., Vol. 65, p. 386. [13] Rosenblatt, F. (1960). “Perceptron Simulation Experiments,” Proc. IRE, Vol. 48, pp. 301–309. [14] Trumbo B.E. Norton J.A. Freerks L. (1999), "Using CIS/ED to Make a Reading List on Neural Nets," STATs Magazine, winter, 1999. [15] Widrow 1. B. and. Hoff M. E, Jr. (1960). “Adaptive Switching Circuits,” IRE WESCON Conv. Rec., Part 4, pp. 96–104. [16] Zhang, G., B. Eddy Patuwo, et al. (1998). "Forecasting with artificial neural networks: The state of the art." International Journal of Forecasting 14(1): 35-62. (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 4, July 2010 358 http://sites.google.com/site/ijcsis/ ISSN 1947-5500