主页
A I I E Transactions A Review of Some Cluster Analysis Methods
A Review of Some Cluster Analysis Methods
Kennedy, James N.你有多喜欢这本书？
下载文件的质量如何？
下载该书，以评价其质量
下载文件的质量如何？
卷:
6
语言:
english
日志:
A I I E Transactions
DOI:
10.1080/05695557408974956
Date:
September, 1974
文件:
PDF, 812 KB
你的标签:
在15分钟内，文件将被发送到您的电子邮件。
在15分钟内，文件将被送到您的kindle上。
注意事项: 您需要验证你发送到Kindle的每本书。检查你的收件箱，查看来自Amazon Kindle Support的确认邮件。
注意事项: 您需要验证你发送到Kindle的每本书。检查你的收件箱，查看来自Amazon Kindle Support的确认邮件。
关联书单
0 comments
您可以留下评论，分享你的经验。其他读者也会有兴趣了解您对您所读书籍的看法。不管你喜不喜欢这本书，只要您如实、详细地告诉他们，大家就能找到感兴趣的新书。
1


2


This article was downloaded by: [The University Of Melbourne Libraries] On: 26 September 2013, At: 07:37 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 3741 Mortimer Street, London W1T 3JH, UK A I I E Transactions Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/uiie19 A Review of Some Cluster Analysis Methods James N. Kennedy a a Illinois Bell Telephone Company, 225 West Randolph Street, Chicago, Illinois, 60606 Published online: 09 Jul 2007. To cite this article: James N. Kennedy (1974) A Review of Some Cluster Analysis Methods, A I I E Transactions, 6:3, 216227 To link to this article: http://dx.doi.org/10.1080/05695557408974956 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sublicensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can b; e found at http:// www.tandfonline.com/page/termsandconditions A Review of Some Cluster Analysis Methods JAMES N . KENNEDY Downloaded by [The University Of Melbourne Libraries] at 07:37 26 September 2013 Illinois Bell Telephone Company 225 West Randolph Street Chicago, Illinois 60606 Abstract: Cluster analysis is the term used t o describe a variety of procedures for grouping items with similar characteristics. The purpose of this paper is t o outline some of the algorithms used in cluster analysis and to show how these methods can be used in industrial management. Examples of applications to organizational problems, classification of employee work groups and work planning problems are discussed. Use of Cluster Analysis to Form Divisions in a Company IIn many industrial problems it is necessary to group similar items together either to obtain insights about the underlying structure of the data or to classify the items under study. A technique used to form groups or similar items is called cluster analysis. The term describes a variety of procedures for grouping items and does not refer to one specific algorithm. The methods generally do not result in an optimum grouping of items but provide good solutions to classification problems. To illustrate the procedures involved we will use cluster analysis to form an organizational structure from twenty work groups within a company. That is, we wish to form clusters of work groups which will be designated as divisions of the company, Each division would be under the direction of a different division manager. The measure of association between groups used in the analysisis a mutual interdependency index. The index was determined from a survey of line and staff managers familiar with the functions of the pairs of groups involved. A rating of zero for the two groups indicated minimum interdependency while a rating of eight indicated maximum interdependency. The average rating for the twenty groups in the study are summarized in Table 1. Received June 1973, revised May 1974, revised July 1974. The clustering algorithm we will use is a variation of one described by Rao (23) who attributes it to K. D. Tocher. The procedure begins by selecting the two groups with the greatest interdependency to form the nucleus of the first cluster. The third group added to the cluster is the one which has the greatest average dependency with the first two. The fourth added is the group with the greatest average dependency with the first three and so forth. If at any stage, there is no group to be added which has a sufficiently high interdependency index with groups in the cluster, a new cluster is formed. Table 2 illustrates the computations involved. The results are summarized in Table 3. The fust cluster was formed by choosing the pair of groups with the highest interdependency score. This pair was (16, 17) with a value of 7.38. To select the group with the highest interdependencies with groups 16 and 17, we write the interdependencies in column form in Table 2 and compute the total interdependencies. Group 5 has the greatest total interdependency, 12.26. The average interdependency of Group 5 with Groups 15 and 16 is 12.2612 or 6.13. The total interdependency of the three groups is 7.38 + 12.26 or 19.64. The average index for the three is 19.6413 = 6.55. These results are summarized in Table 3. In order to determine the next group to be added to the cluster, we enter the interdependencies for Group 5 and and add these to the totals for Groups 16 and 17. We find AIIE TRANSACTIONS,Volume 6, No. 3 I Table 1: Interdependency index scores for twenty work groups. Group number 1 11 12 13 14 15 16 17 18 19 0.21 3.06 4.06 3.95 1.02 4.22 3.38 0.58 0.80 1.00 0.70 2.58 1.86 4.52 3.15 0.45 3.82 0.41 1.04 1.65 1.71 3.73 1.29 2.30 1.08 0.81 1.80 3.03 1.00 1.00 5.33 3.56 4.76 1.12 2.92 2.56 0.34 6.42 1.00 5.75 0.36 1.90 1.65 5.84 6.42 1.41 6.19 1.00 4.22 4.40 3.28 2.88 5.00 4.75 1.03 1.29 2.50 3.15 2.43 (63615.76 2.50 3.86 4.83 0.88 1.34 1.75 2.27 0.13 1.67 0.29 1.95 1.17 1.54 3.10 1.76 1.20 1.44 2.14 1.50 1.10 0.71 1.00 1.00 15.0014.30 5.16 0.89 0.32 1.38 0.55 4.58 4.24 0.95 0.27 1.07 1.19 1.06 2.28 2.06 5.48 0.98 1.08 0.22 1.10 13. 1.63 3.80 4.71 4.13 2.60 2.60 0.96 1.86 4.88 0.80 1.08 1.33 14. 3.80 4.90 1.96 2.86 3.56 1.20 2.00 1.56 1.29 1.38 1.57 1.82 1.36 1.63 2.83 2.76 1.57 1. 2. 3. 4. 5. 2 2.45 3 4 5 4.45 1.67 4.69 m 2 . 5 4 3.45 2.74 3.76 5.50 6 7 8 1.00 1.07 2.83 1.95 1.67 3.27 1.43 3.83 1.44 3.44 0.84 2.47 2.64 4.64 3.18 1.69 0.27 0.09 1.14 1.58 4.78 3.63 1.22 1.05 1.80 4.60 4.10 4.17 613 1.68 5.01 1.59 3.00 2.04 1.81 5.63 3.86 3.98 0.63 0.67 6. 7. & 8. 9 0 2 9. 10 1.90 Downloaded by [The University Of Melbourne Libraries] at 07:37 26 September 2013 2 10. 511. 2 12. 0 15. 16. 17. 18. 19. 20. 0 2.14 0.80 initial pairs in clusters that Group 12 has the greatest sum of interdependencies, 13.06. The average interdependency with the three groups in the cluster is 13.0613 = 4.35. The overall average interdependency between groups in the cluster is (19.64 + 13.06)/ 6 = 5.45. These values are also entered in Table 3. We continue to add groups to Cluster 1 until there is no group with a sufficiently high average interdependency index per term added. We have arbitrarily selected 4.00 as this cutoff. After Group 19 is added, the group with the highest average interdependency score is Group 6 with an average interdependency of 22.8716 = 3.81. Hence, we start a new cluster using the same procedures described above. That is, the nucleus of Cluster 2 is selected from the pairs not in Cluster 1. The two groups with the highest interdependency index are 6 and 18 with a score of 6.36. No other groups are added to Cluster 2 because all have average interdependencies of less than 4.00 with Groups 6 and 18. Cluster 3 is formed from Groups 7 and 8. Group 9 is added to Cluster 3 but further additions are precluded because the cutoff criterion of an average score of 4.00 or more is not met. This leads to the formation of Clusters 4, 5, and 6 using the same methods. That is, we choose among those pairs not in a cluster the pair with the greatest interdependency score. We add groups until our cutoff criterion is no longer satisfied. Group 1 is not placed in a cluster with other groups because its interdependency score is less than 4.00. Comments on Clustering Algorithm The above algorithm provides a relatively simple method of clustering a set of items. There are several possible variations Septmeber 1974, AIIE TRANSACTIONS 17381 20 1.40 which we might also use. For example, we could limit the number of groups in a cluster to a specific number, like 3 or 5. A new cluster would be formed when that value was exceeded. Another approach might be to initially select several pairs of groups with high interdependencies as cluster nuclei and then add groups to these clusters so that the interdependency scores for the clusters are a maximum. Likewise, the results of the clustering algorithm can be adjusted on a judgment basis to conform with the experience of the analyst. Thus, in our example, we might add Group 20 to Cluster number 2 or Cluster number 4 in order to equalize the cluster sizes even though the algorithm did not specifically place it in that cluster. The results of the clustering algorithm for this sample indicate how the work groups should be arranged into organizational divisions so that the groups within a division have the greatest mutual dependency. It is apparent that this final arrangement is not unique and depends a great deal on the criteria used by the analyst in forming the clusters. Nevertheless, it provides a broad framework for classifying and organizing the work groups. Measures of Distance for Multivariate Clustering Procedures In the example discussed above, an interdependency index was used to determine how the groups should be arranged in clusters. When the objects being clustered are described by more than one measurement, it is necessary to compute a scalar value from these multivariate measurements to be used in the clustering procedures. One such scalar value is r Table 2: Example of computations of clusters for data shown in Table 1. #I Downloaded by [The University Of Melbourne Libraries] at 07:37 26 September 2013 Cluster Gcerap 16 1 4.22 1.71 3.73 2.92 5.84 2.50 3.15 3.10 1.76 0.95 0.27 4.71 4.13 4.90 2.00 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 GrouD 2 3 6 7 8 9 10 11 13 14 15 18 20  s '1 10 11 13 14 15 20 11 6 1.00 1.67 3.27 1 G X T  3.38 7.60 1.293.00 2.306.03 2.56 5.48 6.42 2.434.93 2.50 5.65 1.204.30 1.443.20 1.07 2.02 1.19 1.46 2.60 7.31 2.606.73 1.966.86 1.563.56 7.38 7.38 3.82 2.834.65 1.36 2.76 4.12 1.63 1.573.20 Cluster #2 Total 0.58 1.58 1.08 2.75 0.81 4.08 6.36  4.10 4.17 1.68 1.59 2.04 3.28 5.00 1.03 6.36 0.88 3.86 2.14 6.31 1.10 2.78 7.062.65 2.06 4.10 1.865.14 2.86 4.86 1.29 2.32 0.80 1.68 Cluster #5 lo 11 T Z 0.21 2.11 5.00  1.90 5.00 0.89 1.38 4.58 0.98 0.32 1.21 0.55 1.93 4.24 1.08 2.06 (1 jg&.l 12 Total& Total 4.69 2.74 3.76 5.50  12.29 5.74 9.79 10.98 3.06 2.58 4.52 15.35 8.32 14.31 1.67 2.54 3.45 17.02 10.86 17.76 (16     4.78 3.63 1.22 1.05 1.80 4.60 5.75 0.36 1.90 1.65  9.71 9.28 5.52 4.25 3.82 6.06 13.93 13.68 7.27 6.52 8.12 11.22 7.31 9.86 6.84 3.18 1.69 0.27 0.09 1.14 1.58 3.56 4.76 1.12 17.11 15.37 7.54 6.61 9.26 12.80   5  )13.06 5.33 4.22 4.40 1.75 2.27 4.30 5.16 0.22 1.10 1.63   7.09 8.76 5.21  1.41 6.19 1.00 6.06 10.31 4.20 Group 7 8 T 1 1.07 1.43 3.83 2.83 1.44 3.44 6.13  3.90 2.87 7.27 2 3 7 8 9 10 11 13    0.96 4.88 1.08 7.02 15.19 5.28 Cluster #3 0.34 5.76 4.83 1.50 0.71 2.28 5.48 0.80 3.56 1.38 10.87 14.62 7.96   7.36 2.14 17.82 12.66 20.79  E l 20.20 9.04 7.32 11.54 18.28 11.67 18.18 9.34 9.50  6.28 1.40 7.68 9 Total Group 2 5.85 3.71 9.91 3.98 0.67 1.67 1.95 1.54 1 .OO 110841 1 2 3 10 11 13 14 15 20 2.45  1.95 0.84 2.64  121.61) Cluster #4 a 15 20 5.63 3.86 0.63 0.1 3 0.29 1 .I7 1.00 6.86 2.44 3.01 5.04 2.46 2.34 I5 __ T+ G x p 2 1.02 3.13   14  Total  6.42 1.00 6.13 5.01 3.00 1.81 2.88 4.75 1.29 1.34  3.1 1 4.68 6.99 4.00 3.34 5.02 2.47 0.70 3.15 3.82 1.04 1.00 3 4.45 5.02 4.64 1.86 0.45 0.41 1.65 1 .OO _Total 6.90  & 2.56 3.60 4.23 2.69 2.00 Cluster #6 3.80 3.80 1.57  a 4.01 3.63 1 13 14 20 4.06 4.20 1.33 Cluster #7 the distance between the two points in the Euclidean space generated by the multivariate values of the two objects involved. The distance between objects with two dimensions is determined from the relationship 21 8  19 0.80 1.80 3.03 14 3.95 4.20 1.20 1 Total )0   2.53 1 .OO  Total  a Group 20 This, of course, is also the computational procedure for determining the length of a hypotenuse of a right triangle from the squares of the other two sides. The distance between two points in more than two dimensional space, can be computed from the relationship: AIIE TRANSACTIONS, Volume 6 , No. 3 Downloaded by [The University Of Melbourne Libraries] at 07:37 26 September 2013 Table 3: Example of clustering of organizational groups according to interdependencies shown in Table 1. Group added to cluster Sum of index scores 16,17 5 12 4 19 7.38 19.64 32.70 49.01 70.62 1 3 6 10 15 7.38 6.13 4.35 4.08 4.32 7.38 6.65 5.45 4.90 4.71 1 6,18 6.36 1 6.36 6.36 2 7. 8 9 6.13 16.77 1 3 6.1 3 5.32 6.13 5.59 3 2. 3 5.02 1 5.02 5.02 4 10,ll 15 5.00 13.82 1 3 5.00 4.41 5.00 4.61 5 13.14 1 4.20 12.21 1 3 4.20 4.00 4.20 4.07 6     m No. of terms in scores Avg. indexa value per term added Average Cluster index number 7 'A cutoff value of 4.00 was used to determine when a new cluster should be started. where k is the number of variables for objects i, j. The difficulty with using this measure of distance is that it is affected by changes of scale. Thus, if one of the variables is income, the distance between two objects would be altered if the income was measured in dollars rather than thousands of dollars. To overcome this drawback, we can standardize the variables by dividing the measurement by its standard deviation. The distance measurement then becomes The standard deviation of each of the variables is usually computed from all objects being considered in the analysis. The standardization of the variables eliminates the problems associated with the scale of the variables but it does not correct for the correlation between variables. In order to do this we must use Mahalanobis (17) distance measure which is computed from the relationship ' where S is the inverse of the variancecovariance matrix for the variables involved. This distance measurement adjusts for the correlation between variables. If no correlation exists, the result is the same distance value computed for standardized variables. See Morrison (22) for a further discussion of distance measures. Example of Clustering Using Standardized and Mahalanobis Distance Measures To illustrate the clustering methods involved in using standardized Euclidean and Mahalanobis distance measureSeptember 1974, AIIE TRANSACTIONS ments, we will use data obtained from studies of employees' attitudes toward work obtained from seven different work groups in the same company. A questionnaire designed to measure work attitudes was administered to each employee in the work groups. The questionnaire consisted of 55 questions arranged in nine batteries. A description of the batteries and the number of questions included in each battery are shown below. Battery Subject Total Questions Work is interesting Job not wasteful of time and effort Need more freedom in planning Reasonable say on how job is done Job provides opportunities Job provides feedback Not too closely supemised Job worth putting effort into Feedback from supemisors The average scores for each group for the nine batteries are shown in Table 4. The scores represent the average sum of responses for questions in each battery. The questions were scored so that the most negative value was zero and the most positive value was six. The variances and covariances are computed for the individual respondents in each work group and then pooled. These estimates could be adjusted by dividing by the size of each group n to obtain estimates of variances for the group averages. But since the resulting distances would be proportional to the distances without the adjustment, the division was not made. The square of the standardized Euclidean and Mahalanobis distances are shown in two matrices in Table 5. To demonstrate how the Euclidean distance values are computed we will use two groups, 1 and 2. The computations are shown in Table 6. The distance is computed from the sum of the squares of the differences of the nine variables using the relationship The Mahalanobis distances are obtained by preandpost multiplying the inverse of the variancecovariance matrix by the vector of differences. This is also shown in Table 6. The algorithm we will use to form the clusters is described by Frank and Green (10) and is similar to the algorithm in our first example. The steps involved in the computations are as follows: 1. Distances are computed for each pair of objects and shown in matrix form as indicated in Table 5. 2. The pair with the smallest distance is chosen as the node of the first cluster. 3. Additional points are added to this cluster (based on closeness to the objects in the cluster) until: 219 Table 4: Summary of Average Responses of Nine Questionnaire Batteries Battery 1 2 3 4 5 6 7 1 53.50 44.14 57.94 42.60 60.71 56.76 57.19 2 26.13 25.96 28.49 26.29 28.37 20.76 15.69 27.26 3 18.21 17.90 18.92 17.54 18.82 4 11.21 10.01 11.10 9.60 7.28 8.49 5 13.83 15.08 15.43 14.91 9.58 17.00 14.48 16.47 6 7 8 9 27.17 29.57 13.20 26.84 15.68 21.03 12.62 27.77 12.93 24.22 12.63 27.83 15.92 19.67 20.36 21.03 18.40 21.84 18.62 20.12 12.75 13.31 12.30 11.86 13.18 10.86 13.28 18.12 14.09 Standard deviation of individual s c o ~ Downloaded by [The University Of Melbourne Libraries] at 07:37 26 September 2013 20.50 7.95 6.41 5.82 8.16 8.79 6.36 6.98 3.92 Covariance matrix of individual scores 420.44 54.15 56.94 45.50 106.55 43.42 34.38 54.1 5 63.25 34.65 14.24 30.43 39.36 19.09 56.94 34.65 41.14 15.13 23.70 32.15 17.34 45.50 14.24 15.13 33.92 23.19 17.38 9.62 106.55 30.43 23.70 23.19 66.56 33.56 16.00 43.42 39.36 32.15 17.38 33.56 77.27 15.18 34.38 19.09 17.34 9.62 16.00 15.18 40.51 78.14 31.20 32.70 17.29 33.91 34.85 16.19 19.02 14.63 13.76 6.81 11.83 17.80 14.13 78.1 4 19.02 31.20 14.63 32.70 13.76 17.29 6.81 33.91 11.83 34.85 17.80 16.19 14.13 48.75 16.01 16.01 15.38 70.0022 0.0008 0.0060 0.0021 0.0011 0.0043 0.0037 0.0202 0.0024 0.0060 0.0063 0.0033 0.0004 0.001 3 0.0286 0.0015 0.0049 0.0004 Inverse of covariance matrix 0.0047 0.0005 0.0005 0.0337 0.0022 0.0202 0.0738 0.0071 0.0092 0.0062 0.0067 0.0008 0.0024 0.0071 0.0416 0.0096 0.0019 0.0021 0.0060 0.0060 0.0092 0.0062 0.0356 0.0063 0.0063 0.0252 0.0019 0.0038 0.0002 0.0063 0.0096 0.0019 0.0090 0.0021 0.0053 0.01 26 0.001 1 0.0033 0.0067 0.0021 0.0019 0.0038 0.0393 0.0057 0.0336 0.0043 0,0004 0.0286 0.0015 0.0090 0.0053 0.0057 0.0642 0.0278 0.0037 0.001 3 0.0049 0.0004 0.0002 0.0126 0.0336 0.0278 0.1405 Table 5: Standardized distance d?.between group averages for seven w o r t groups. group number a) Some prespecified number of points has been clustered. b) The point to be added to the cluster exceeds some prespecified distancecutoff or threshold number. 4. We then proceed to the next pair of points which are closest together of all unclustered points and the above process is repeated. 5. The algorithm can also be modified to shift points from cluster to cluster to obtain final clusters which are best in the sense of having the lowest withincluster distance summed over all clusters at a given stage in the clustering. 1. 2. 3. 4. 5. 6. 7. The computations are shown in Table 7 and the results in Table 8. A cutoff distance of 1.10 for the "Average Distance per Term Added" was used to determine whether a new cluster should be formed. That is, when the item to be added had an average of 1.10 or more a new cluster was formed. 1. 2. 3. 4 5. 6. 7. 1 2 3  0.60 0.62 1.06   4 5 6 7 0.66 0.27 1.08  0.55 1.16 0.51 1.47  2.11 2.55 2.03 1.88 2.67  0.49 0.61 0.55 0.81 0.24 1.98 Mahalanobis distance 6 ! i between group averages for seven work groups group number 1 2   1.03  3  4  5  6 7  1.09 1.63 0.96 0.35 1.55 0.66 1.33 0.67 1.33 1.67 2.17 1.70 2.05 1.15     0.87 0.92 1.02 1.03 0.30 1.21  AIIE TRANSACTIONS, Volume 6, No. 3   is described by five variables each having two categories, zero and one, the fractional match for two objects, i and j , might be Variables Table 6: Examole of the computation of standardized Euclidean and Mahalanobis distances. Squared standardized Euclidean distances 4, = z (  f k l  jEk,)'/sk ' (53.5044.14)' (26.1325.96)' (18.21 17.90)' + 41.44 = 420.44 63.25 (11.21  i a . o i ) 2 (13.8315.08)~ (27.17 29.57)' + 77.27 33.92 66.5 6 d:2  2  3  4 0 0 1 0 0 1 1 1 5  7 + + d:, object i object j + + (15.9212.93)' (19.67 20.36)' + 48.75 40.5 1 2 (12.75 13.31)' 15.38 1 1 0 The objects match on variables 1 and 4. The fractional match then is + 0.60 Squared Mahalanobis distances Downloaded by [The University Of Melbourne Libraries] at 07:37 26 September 2013 s:, = ( X , X,)'S~(X, X,) 2 Fractional Match = J = .40 [ 3 (53.5044.14). ..(12.7513.31) See Table 4 (53.50 :44.14) for Inverse 12.75 2'1 3.31 6:' 1.03 An alternative measure excludes weak matches like the zero match for the first variable in the example above. This measure would only include variables where a one occurs for either object i or j. In our example, the zero match would be excluded and the computed fractional match would be 1 Fractional Match = q = .25 Note: The squares of distances are used to reduce wmputational effort.  Measures of Similarity Used in Cluster Analysis When objects are described by dichotomous or polychotomous variables, indexes of similarity are frequently used in place of "distance" measures. One similarity index is the "percent match" of the values of the variables for two objects. This is computed from the relationship mismeasure is discussed by R~~~~~and Tanimato (24). Similarity Indexes for Mixed Discrete and Continuous Variables Fractional Match: S;; = " P where m is the number of variables which match and p is the number of variables compared. For example, if an object When an object is described by a combination of discrete and continuous variables, the similarity index is computed Table 7: Example of computations of clusters using squares of standardized Euclidean and Mahalanobis distances Squared Standardized Euclidean Distances Cluster #2 Cluster #I 5 7 1 0.55 0.49 )104) 2 1.16 0.61 1.77  3  0.60 2.37 1.06 1 4 Total  3 1.51 0.55 1.06 0.62 (168(   2  4 1.47 2.28 0.66 2.94 1.08 4.02 0.27 5  0.81 0.24         1.98 4.65 2.11 6.76 2.03 2.55 11.34 1.88 13.22        G x 6 7 2.67 0.81 T s  Total  Total  13431 8.79   Squared Mahalanobis distances Cluster #1 Group 1 5  7  0.66 0.87 11.531 1  2 1.33 0.92 2.25 1.03 3.28 3 0.67 1.02 1.69 1.09 (278( 4 1.33 1.03 0.30 2.36 0.96 3.32     1.21 2.36 1.67 4.03 1.70      5 6 7 1.15 0.30 Total September 1974, AIIE TRANSACTIONS Total  3  Total   1429( Cluster #2 2   4  5.73   1.63 4.91 1.55 4.87  Total   Group 6 Cluster #3 Total     2.17 2.05 4.22     Group 6    221 Steowise Clustering Methods Table 8: Determination of clusters using standardized Euclidean and Mahalanobis distances. Standardized Euclidean distance results . Average Group Total No. of distance Average distance Cluster numbers distance terms per term between groups number added 5, 7 1 3 2 4 6 0.24 1.28 2.96 6.39 10.68  0.24 0.52 0.56 0.86 0.86  1 3 6 10 15  0.24 0.43 0.49 0.64 0.71  1 We will now discuss two clustering methods which involve sequential techniques of forming clusters. The first method is discussed by King (16) and is described as a stepwise clustering method. The steps involved in the algorithm are as follows. Step 1A covariance matrix is computed for the items to be clustered. If there are n items there will be n(n + 1)/2 different terms in the matrix. The covariance terms for the items i and j would be computed from the relationship 2 Downloaded by [The University Of Melbourne Libraries] at 07:37 26 September 2013 Mahalanobis distance results 5, 7 1 3 2, 4 6 0.30 1.83 4.61 0.35  1 3 6 1  0.30 0.76 0.93 0.35  0.30 0.61 0.77 0.35  1 2 3 after coding the continuous variables into discrete form. A [simple procedure would be to assign a value of one if the value is greater than the median value for the objects involved and a zero for values below the median. If further class intervals are desired, k intervals can be coded using k1 dummy variables. An alternative approach of computing similarity coefficients for mixed continuous and discrete variables is to determine the range of the variables for the objects beGg studied and dividing the difference by the range value. That isP sij = C k=1 (Xik  q k ) / R k ) 1 This yields fractional matching values for each continuous variable. Step 2A cocrelation matrix for the items to beclustered is determined from the elements of the covariance matrix obtained in Step 1. The correlation between the ith and jth item is computed from the relationship Step 3The clustering begins by selecting the pair of items with the greatest correlation coefficient as the nucleus of the first cluster. A new item is formed from the two items in the cluster by summing the respective values of the two items. The index value of the new item is the minimum of the indexes (i,j) and is designated as b while the maximum of indexes (i, j) is designated as m. The value of the pth variable of the new item b is determined from the sum. Drawbacks of Similaritv Indexes for all p variables for the two items. Step 4The bth column and row of the correlation matrix is revised for all remaining items. The revised correlation zoefficients are computed as follows Matching type similarity indexes of the type described above do not take into account the correlations between variables and, therefore, do not properly weight some of the underlying factors causing the correlation. Also, the variables with fewer categories will tend to have more matches and thus will have greater importance in determining the similarity coefficient. For example, sex is divided into two categories while age might be classified by years into say ten categories, it would be more likely for a match to occur for sex than age. This assumes that the coding would have two states for the first variable and ten for the second. If dummy or indicator variables are used to code the information the matching coefficient could be unequal for various age classifications. In addition, when there are a large number of variables, many of the matches will be due to chance. This reduces the sensitivity of the matching fractions. This sometimes can be overcome by selecting a subset of variables for classification purposes. However, this also may cause some of the information regarding the objects to be lost. Also see (26). where i and j are the indexes of the correlation coefficients of the items selected. The mth row and column of the correlation matrix is assigned a large negative value like 99 to eliminate further selections. Step %Steps 3 and 4 are repeated for the correlation matrix of the original and revised items. That is, the maximum correlation coefficient is used to determine the two items which should be merged in a cluster. The items might be the original values or some combination of the items merged in Step 3. The correlation matrix is then revised as shown in Step 4. The process continues until all items are placed in a cluster. This takes n1 passes. The last pass clusters all items together. In the next to last pass there are two clusters. In general, after p passes there are np clusters. The number of clusters used in the classification is left to the judgment of the analyst. Thus, we may choose the arrangement that involves 2, 3 or more clusters depending upon the problem involved. AIIE TRANSACTIONS, Volume 6, No. 3 Numerical Example of Stepwise Clustering To illustrate the numerical computation of stepwise clustering, we will use the data for the seven work groups shown in Table 4. Each group has measurements for nine variables. We compute the vi3riance and covariance matrix using the vectors of average responses for the seven groups. The covariance terms are computed from the relationship z (zip9 Downloaded by [The University Of Melbourne Libraries] at 07:37 26 September 2013 sij= P=1 zi)(xjp 9 1  zj) wherei= l , 7 a n d j = 1,7. The results are shown in Table 9 along with the correlation coefficients for the pairs of items. The correlation coefficients are computed from the terms of the covariance matrix using the relationshiprij = Sij (siisjj)% Table 9: Example of King's stepwise clustering procedure variances and covariances of group averages 1 2 3 4 5 6 7 1 1361.51 1094.081502.13 1061.75 1577.39 1497.80 1498.62 2 923.60 1201.81 891.73 1263.50 1179.84 1215.03 3 1691.19 1168.05 1759.95 1676.57 1665.58 4 866.26 1226.80 1141.87 1178.34 1847.76 1762.53 1751.16 5 1703.29 1664.30 6 1667.04 7 Group Correlation Coefficients  1st Pass 11.000 2 3 4 5 6 0.976 0.990 1.000 0.962 1.000 0.978 0.997 0.965 1.000 0.995 0.967 0.996 0.970 1.000 0.984 0.941 0.988 0.940 0.994 1.000 0.995 0.979 0.992 0.981 0.988 Revised Correlation Coefficients Fisher's Stepwise Clustering Method ) Maximum correlation coefficients The correlation coefficients in Table 9 are all greater than .94. This raises the question of whether there is sufficient difference between groups to consider assignment to different clusters. We could test this hypothesis using a statistical measure like Mahalanobis D statistic, Rao (23), to determine if there is a significant difference between groups. If the difference was not significant we would not proceed with the clustering algorithm. Our chief interest I here, however, is explaining the algorithm so we will proceed under the assumption that there are differences between items. The algorithm begins by selecting the pair of groups with the greatest correlation coefficient. We can see from Table 9 that this pair is 5 and 7 with a correlation of .998. We form a new group by summing the variables in groups 5 and 7. We designate this group number 5, the minimum of the indexes. We then recompute the correlation coefficients for group 5 and the other groups as shown in Step 4. The correlations with group 7 and other groups are assigned a value of 99 so that they won't be included in subsequent selections. The correlation coefficients resulting from this operation are also shown in Table 9. We proceed in the same manner for the other clusters, each time summing the groups merged and recomputing the affected correlation coefficients. The results are shown in Table 10.At pass number 4 we have three clusters consisting of groups (1,3,5,7), (2,4), and (6). This is the same result obtained using Mahalanobis distance measurements in Table 8. The clustering procedure using standardized distances yielded two clusters. In the stepwise algorithm there are two clusters at pass number 5. The composition is (2,4) and (1, 3, 5, 6, 7). This is slightly different than the standardized distance results which yielded the arrangement (l,2,3, 4,5,7) and (6). September 1974, AIIE TRANSACTIONS I Under a stepwise clustering method discussed by W. D. Fisher (9) any number of measures of group likeness can be used in the algorithm. These include correlation coefficients, distance measurements and measures of similarity. Fisher's method is similar in concept to that of King's discussed above. That is, the clustering proceeds on a sequential basis through n1 passes. On the last pass all items are grouped together. The intermediate results show the arrangements for (np) clusters. Waydicki (31) developed a computer program for stepwise clustering based on Fisher's algorithm. He uses a criterion for clustering designated as r* and defined as  < 1 Wadycki used his program to compare King's and Fisher's methods of stepwise clustering and concluded that the two pethods give similar but not exact results. He points out, however, that under Fisher's methods the covariance matrix is not required and therefore is easier to implement and may have broader application than the King method. through all objects. The algorithm is repeated until a pass occurs in which no objects can be moved. This usually takes about a half dozen passes. This achieves the local optimum. The situation is varied by reassigning objects in a cluster to a nearby cluster and repeating the hill climbing procedures. If no better partition is found, the algorithm stops. The program has considerable flexibility and permits the investigator to try various approaches on the data at hand. It also allows the user to interact with the program so that the best overall partition can be achieved. "Hill Climbing" Algorithm for Cluster Analysis VariableObiect Clustering; the BondEnergv Alqorithm A computer program which permits the analyst to use either distance or' similarity measurements in forming clusters has' been developed by Rubin and Friedman (26) and is available through the IBM corporation. The program permits the user to choose the criterion for forming clusters from several statistical and similarity measures. The statistical criteria for clustering objects are discussed by the authors in a paper (1 2) published in 1967. The criteria used in the programs include the ratio of the determinant of the total sums of squares to the determinant for the within sums of squares. This is the reciprocal of Wilks' lambda value (see Wilks' (30)) and is expressed asWilks' Criterion = ITI  In our previous discussions of clustering methods we have concentrated on procedures which involved clustering of objects. We will now outline a technique which clusters variables as well as objects. This procedure is useful for organizing a data matrix as 'well as for clustering. The method is discussed by McCormick, Schweitzer and White (20) and is referred to by the authors as the BondEnergy Algorithm. It assumes that the variables have similar scales and ranges of values. The criterion used in the algorithm to measure the effectiveness of the arrangement of the M X N matrix A is expressed as where is the sum of all similarity measures between all pairs of items in group g and n, is the number of variables 1 Downloaded by [The University Of Melbourne Libraries] at 07:37 26 September 2013 in subset g. There are n, terms in the summation for <. I IWI where T = B + W the sum of the between and within group sumofsquares matrices. The object is t~ maximize this ratio so that the within variances for the clusters are as small T I) is used to indicate the best as possible. The log ( m a I IWI number of clusters to form. Another criterion that the authors suggest is to minimize the trace of the within cluster sumsofsquares and crossproducts. This is equivalent to minimizing the Euclidean distances within clusters. A third criterion is called Hotelling's Trace Criterion (Hotelling (14)) and it is written Hotelling's Trace Criterion = Trace WI B Based on experience with the program the authors recommend the use of the criterion !TI/IWI as a basis for forming clusters. Description of Algorithm The program uses what the authors call a "hill climbing" algorithm which is a variation of steepest ascent methods. It is a hueristic procedure which may lead to local rather than global optimal values. The algorithm starts with a partition of the data into g clusters. This original partition &ay be stated by the user or computed by the ~rogram. Each object is considered for movement into each of the other groups to determine if the move improves the criterion; if it does, the object is moved. This continues The values a0,j = a, +l,j  aj,  ai,n + l . = 0, since these expressions are outside the dimensions of the matrix. We wish to determine the arrangement of columns and rows in the matrix which maximizes this measure of effectiveness, That is, the maximization is evaluated over all possible permutations of rows and columns which amounts tom.marrangement of the matrix. This problem reduces to the examination of the column arrangements and then the row arrangements. The algorithm suggested by the authors is as follows: 1. Place one of the columns in a column location arbitrarily. Set k = 1. 2. Try placing individually each of the remaining n  k columns in each of the k + 1 possible positions (to the left and right of the columns already placed) and compute each column's contribution. Place the column that gives the largest incremental contribution to the measure of effectiveness in its best location. Increment k by 1 and repeat until k = n. 3. When all columns have been placed, repeat the procedure on the rows. The row placement is unnecessary, however, if the input array is symmetric, since the final row and column orderings will be identical. The authors report that this algorithm is computed quickly by a computer. It reduces a matrix to block form where similar rows and columns are grouped together. We will use a numerical example to illustrate the computations. AIIE TRANSACTIONS, Volume 6, No. 3 The matrix used consists of four rows and three columns as follows: ROW M.E.=~ Row M.E.=4 Choose: (1, 3) Second Slection Row M.E.=8 2 Row M.E.=6 1 1 0 0 Row M.E.=6 1 1 0 1 1 1 0 The measure of effectiveness of this matrix is four. We begin by placing column 3 in position 1 and then test which of the remaining columns should be to the left or right of position 1. Downloaded by [The University Of Melbourne Libraries] at 07:37 26 September 2013 Row M.E.=8 4 1 1 1 0 1 1 1 0 Choose: (2, 1, 3) The measures of effectiveness when column 1 is to the left or right of 3 are shown for each value of aij to the right of these arrangements. Both have the same measure of effectiveness which is two. We next test the (2,3) arrangements. The measure of (2,3) and (3,2) are also both 2. We, therefore, arbitrarily choose the (1,3) arrangement. We now must test column 2 in three locations. L 2 3 2 L 3 1 3 2 1 1 0 1 1 0 1 0 1 0 1 0 1 0 0 0 0 1 1 0 1 0 1 1 1 1 0 0 1 0 1 0 0 0 0 1 The measures of effectiveness are M.E.(1,2,3)= 4, M.E . (2, 1, 3) = 6, M.E. (1, 3, 2) = 4. We, therefore, choose (2, 1, 3) as the arrangement with the most similarity of columns. For the arrangement of rows, we will begin with row 1 in the (2, 1, 3) matrix above. We will test rows 2,3, and 4 to determine the one which will maximize our criterion. I Row M.E.=7 Row M.E.=6 1 0 0 ntird Selection Row M.E.=10 4 1 0 0 Row M.E.=10 2 Row M.E.4 1 0 0 2 1 0 0 Row M.E.=8 2 1 0 0 We choose rows 4, 2, 1, 3 and columns 2, 1, 3 as the most preferable arrangement. This procedure does not necessarily give an optimal arrangement since we could obtain different values with new starting columns. Applications of the Algorithm The authors describe applications of the algorithm to problems of airport design and classification of aircraft. The procedure could also have been used to determine the organization of groups discussed in our first example. The method might also prove useful in organizing data matrices for linear programming applications. Other Clustering Methods The algorithms discussed in the previous sections of this paper are only a few of the clustering methods used in classification problems. Some additional methods are described briefly below. Hrst Selection Row M.E.=4 Row 1 1 1 0 2 M.E.=4 1 0 0 Row M.E.=6 1 1 1 0 September 1974, AIIE TRANSACTIONS Row M.E.=6 3 0 1 1 Graphical MethodsChernoff (5) discusses a procedure which involves using a computer to draw faces with a nose, mouth, eyes and distinct contour of the facial structure. A sample of k items is represented by k faces. The analyst Downloaded by [The University Of Melbourne Libraries] at 07:37 26 September 2013 groups the faces which have the greatest similarity. The author states that people have developed a sense of discrimination for human faces which should be helpful in distinguishing between items which are described by multivariate measurements. The examples cited in the paper involve faces drawn for geological specimens involving twelve variables. Other graphical representations were developed by Anderson (2). These were circles with a fixed radius with rays of various lengths and directions extending from the boundary. These are called "glyphs" by the author. They are more primitive than Chernoffs representation but are eaiser to construct. Bal1,G. H., "Data Analysis in the Social Sciences: What About the Details?'&oc. Fall Joint Computer Conference, Stanford , pp. 533559, MacMillan, New York, (1965). Bonner, R. E., "On Some Clustering Techniques," IBM J. Res. Devl., 8 pp. 2232, (1964). Chernoff, H., "The Use of Faces to Represent Points in kDimensional Space Graphically ,"Journal of American Statis tical Association, 68, 342, pp. 361368 (June 1973). Constantenescu, P., "The Classification of a Set of Elements with Respect to a Set of Properties," Comp. J. 8, pp. 352357 (1966). Corrnack, R. M., "A Review of Classification," Journal of the Royal Statistical Society, Journal A, pp. 321367 (1971). Edwards, A. W. F. and CavalliSforza, "A Method for Cluster Analysis," Biometries, 21 ,pp. 362375 (1965). Fisher, Walter D., Clustering and Aggregation in Economics, John Hopkins Press (1969). Frank, R.E., Green, P. E., "Numerical Taxonomy in Marketing Analysis: A Review Article," Journal o f Marketing Research, pp. 8394 (February 1968). Gower, J. C., "A Survey of Numerical Methods Useful in Taxonomy," Acarologia, 11, pp. 357376 (1969). Friedman, H. P. and Rubin, J., "On Some Invariant Criteria for Grouping Data," Journal o f American Statistical Association, 62, pp. 11591178 (1967). Hartigan, J. A. "Representation of Similarity Matrices by Trees," Journal of American Statistical Association, 62, pp. 11401158 (1969). Hotelling, H., "Multivariate Analysis," Statistics and Mathematics in Biology, Edited by Kempthorne, Hafner, N.Y. (1954). Jensen, R. E., "A Dynamic Programming Algorithm for Cluster Analysis," Operations Research, pp. 10341056 (1969). King, B. F. "Stepwise Clustering Procedures," Journal of American Statistical Association, 62, pp. 86101 (1967). Mahalanobis, P. C., "On the Generalized Distance in Statistics," Proc. Nut. Inst. Sci. India, 2, pp. 4955 (1936). Lance, G. N. and Williams, W. T., "A General Theory for Classificating Sorting Strategies, 11. Clustering Systems" Computer Journal, 10 pp. 271277 (1967). Lance, G. N. and Williams, W. T., "Note on a New Information Statistic ClassificatoryProgram," Computer Journal l l , p. 195 (1968). McCormick, W. J., Schweitzer, P. J. and White, T. W., "Problem Decomposition and Data Reorgaliization by a Clustering Technique," Operations Research, 20,5, pp. 9931009 (1972). McQuitty , L. L., "Multiple Hierarchical Classification of Institutions and Persons with Reference to UnionMeasurementR e lations and Psychological Well Being," Educational and Psychological Measurement, 22, pp. 513531 (1962). Morrison, D. G., "Measurement Problems in Cluster Analysis," Management Science, pp. 775780, (August 1967). Rao, C. R., Advanced Statistical Methods in Biometric Research, Hafner Publishing Company, Darien, Conn. (1952). Rogers, D. J., and Tanimato, T., "A Computer Program for Classifying Plants," Science, 132, pp. 1 1151122 (1960). Thiel, H., Economics and Information Theory, Chicago , Rand McNally (1967). Rubin, J., Friedman, H. P., "A Cluster Analysis and Taxonomy System for Grouping and Classifying Data,"IBM Contributed Bogram Description, IBM Scientific Center, 590 Madison Avenue, New York, New York. 10022 (1967). Bincipal Component MethodsUnder principal component procedures the data dimensions are reduced from p variables to 2 or 3. The items are then plotted on a two or three dimensional graph. Points which lie in the same locality are then placed in a cluster. See Rao (23). Dynamic ProgrammingProcedureA clustering algorithm was developed by Jensen 1151 using dynamic programming procedures. The program is most suitable for a small number of items and clusters. When the problem size becomes large, the computer storage requirements become excessive. Information Zheory Formulhtion Several authors have reported on formulations which tend to minimize the measure of information within the clusters. These include Theil (25) and Lance and Williams (19) Hierarchical Structures Another method of representing similarity between items involves constructing "trees" showing the relationship at various levels of similarity. Hierarchical structures are discussed by Constantenescu (6), McQuitty (21), Lance and Williams (1 8) and Edwards and CavalliSforza (8). Summaries of Clustering Methods Other summaries of clustering methods are also contained in articles by Ball (3), Bonner (4), Cormack (7), Frank and Green (1 O), Gower (1 1) and Sneath (27). Books on Clusterina Methods Books on clustering methods include works by Anderberg (I), Fisher (9), Sneath and Sokal (28) as well as Tryon and Bailey (29). (27) References (28) (1) (2) Anderberg, M. R., Cluster Analysis for Applications, Academic Press (1973). Anderson, E., "A Semigraphical Method for the Analysis of Complex Problems," Technometrics, 2,3, pp. 38791, (1960). (29) Sneath, P. H. A., "Some Statistical Problems in Numerical Taxonomy," f i e Statistician, 17, pp. 112, (1967). Sneath, P. H. A. and Sokal, R. R., Principles o f Numerical Taxonomy, W. H. Freeman & Company, San Francisco, Calif. (1963). Tryon, R. C. and Bailey, D. E., Cluster Analysis, McGrawHill Book Co. (1970). AIIE TRANSACTIONS, Volume 6, No. 3 (30) Downloaded by [The University Of Melbourne Libraries] at 07:37 26 September 2013 (31) Wilks, S., "Multidimensional Scatter,' Contributions t o Probability and Statistics, Edited by I. Olkin, Stanford University Press, pp. 597614 (1960). Wadycki, Walter J., "Clustering of Correlation Matrices: A StepWise Procedure" Unpublished Paper, University of Illinois at Chicago Circle, Chicago, Illinois (1 97 3). September 1974, AIIE TRANSACTIONS Mr. James N. Kennedy is Business Research Director, Management Sciences Division at Illinois Bell. He received his BS in Mathematics from John Carroll University and his MSIE from the Illinois Institute of Technology. He is currently an instructor in Statistics at IIT in the Evening Division. He has written articles on Management Sciences which have appeared in the Financial Executive, Journal o f Marketing Research, and Journal of Industrial Engineering. Mr. Kennedy is a member of ASA and ASQC.