===================== File: Cluster Author: Ingmar Krusch, Yacoub Ahmed, Andreas Kaenner email: Release: 25.4.98 Compatibility: PR 2 Location: contrib/miscellaneous Description: Searching for clusters with fuzzy-algorithm in 2-D data-sets. Notes: Cluster was designed to learn some basics about fuzzy-clustering and especially to learn the Be-API. Try it out. ===================== I'm sorry about the small documentation available for this program in english, but due to other work I have no time to make a full translation of the german version. The german version is in Word for Windows format --> CLUSTER.DOC. If you want to learn more about fuzzy-clustering, try to search via Yahoo or Alta-Vista ! Cluster was made by three students of the university of Emden, Germany : Ingmar Krusch Jacoub Ahmed Andreas Kaenner Many thanks to _prof. Frank Klawonn_ for his help :-) _Short_ step-by-step instruction: Basics: Start a cluster-session by double-clicking the Cluster-Ikon. There should be a set of data-points in the blue part of the main-window. At the bottom you can see three ikons. If one of this ikon is activated it has a red border. From left to right we have the rotate-ikon, the zoom-ikon and the "show me the right cluster"-ikon ;-). The zoom-ikon is activated. While you are over the blue part of the window, press the left mouse-button and hold the button down. If you move the mouse forward, you will zoom in and in the other direction, you will zoom out. Lets press the right button and hold it down. Move the mouse arround. This is the function to move your visible region. Now try to rotate the view. Click on the rotate-ikon. You will see that the red cross in the middle of the window disappears. Now there is a red circle to indicate the rotate-mode. Press the left mouse-button over the view and hold it down. Move the pointer arround the red circle. This is the function to rotate your visible region. But it's not verry accurate. As an example it is not possible to get back to the normal alignment (Y to the top and X to the right). Try to press the shift-key while you are rotating the view. Now the rotation angle should 'snap-in' every 45-degrees. As a result of rotation, the x and y-coordinates are no longer glued to the horizontal and vertical axis of your monitor. For this reason you can see a black axis in the top-right corner which will help you by navigating through your data. You can turn it off by pressing the left mouse button while you are over the text "Layout", at the bottom of the window. There is another point in this pop-up-menue called "Background". If you mark it you will see a grey-structured background behind your data-points. (I don't use this feature very often because it's slow and the data points become invisible sometimes.) If you press the right mouse-button in the rotation-mode you can move the view as mentioned above. The "show me the right cluster"-ikon will be described later in this document. How to find clusters: Move the mouse to the left part of the window and press the right mouse botton. Chose "new" from the pop-up menue. A new "job" appears in the job-list. A job is an independent thread with his own settings for the algorithms. The job-list enables you to compare different algorithms and/or settings. What you see is the staus-view of the job. You can't change any settings in this view. It's only for informational purpose. Now, press the right mouse-button while you are over the first job. Choose "start" from the pop-up menu. After the job has finished the calculation it will be marked. Click on the status-view of the first job. Now the top-bar is red to indicate that the results you will see in the data-view are produced by this job. At the bottom of the status-view you see the number of clusters the job found (4). Now we can talk about the "show me the right cluster"-ikon ;-). If you press this ikon you are in the cluster-switch mode. The red cross or the red circle dissapears. After that, click into the data-view. With every click the focus will change to another cluster. In the left-bottom corner of the data-view you will see the number of the active cluster and the algorithm wich was used. The color of one data-point indicates how strong the data-point belongs to the cluster. Black stands for 100% and white for 0%. The big, yellow circle indicates the middle of the specific cluster (prototypes). The smaller yellow circle indicates the farest point which belongs 50% to the cluster. The red points are for debugging purpose. They describe the location of the prototypes after each itteration. If you press the right mouse-button in the cluster-switch mode you can move the view as mentioned above. How to choose the cluster-algorithm: To set the algorithm you have to open the settings-window of a job. There are several ways to do that. Press the right mouse button while you are over the job and choose "Settings..." from the pop-up-menu. You can also choose it from the "Jobs"-menu in the menu-bar at the top of the window. If you prefer the second method you have to activate the job before (It must have a red bar!). In the upper part of the window you can make your own settings. Below that you see the numeric results from the calculation. If you click on the pop-up-menu named "prototypes" you will see another point called "Zugehoerigkeiten". It's a german word. It means how strong the data-points belong to a cluster. We have not found a similar expression in english :-(. If you havn't started the job before there are no "Zugehoerigkeiten". Press the Start-button and they will appear! They range from 0.0 -1.0. Ok, lets talk about the settings. I can't explain all the algorithms, but I can give you a quick overview (very rough!). There are three main-algorithms implemented. From top to bottom : FCM (Fuzzy-C-Means) // key-words for search via Yahoo ... (looks for circle-shaped clusters) GK (Gustafson-Kessel) (looks for elliptic-shaped clusters) GG (Gaht & Geva) (looks for elliptic-shaped clusters too, but other algorithm) The Fuzzy-C-Means is the fastes algorithm. Gath & Geva is the slowest. "eps" is the precision for the itteration. If you change "Steps-Pro" or "Steps-Poss" into something else than zero the itteration stops after the number of steps you have set. The FCM and the GK are available with an option called "possibilistic clustering". If you you want to choose this option you should mark it at the right side of "Steps-Poss". The possibilistic algorithms are not independent. You can choose them only in conjunction with the probabilistic algorithms. (The probabilistic alg. initializes the possibilistic alg.) Every algorithm has it's own advantages. Try them out and compare the results. Please note that they are executed from left to right and from top to bottom. Every Algorithm should be initialized with the algorithm above his position in the list expect the possibilistic alg.. They should run at least. Some examples: FCM/Pro + FCM/Poss // Ok FCM/Pro + FCM/Poss + GK/Pro // make no sense ! FCM/Pro + GK/Pro + GK/Poss // Ok FCM/Pro + GK/Pro + GK/Poss + GG // make no sense ! Under the algorithm-settings you can determine how much clusters you expect in the data. It's a good solution to mark the "automatic clustering" field, if you don't know anything about your data, The label "ClusterCount" will then change to "Max. Cluster". If you choose this option the job will run for "Max. Cluster" then for "Max. Cluster"-1 and so on. At the end the job picks the best result and discards the others. Try out to change the "Max. Cluster" field to 10. Click on "Start" and wait a little bit. After the job has finished you will see (look at the Prototypes or Zugehoerigkeiten) that he has found 2 prototypes. With the Fuzzifier you are able to set the fuzzynes of the result. Good values are 1.5 -2.5. The value 1.0 isn't allowed! If you are looking for linear-clusters (lines) in your data, you should set the "Linear-Factor" to determine the length of the first vector of an ellipsis-cluster in relation to it's second vector. In the status-view you can see how much linear-clusters were found. If you use FCM exclusively there are no vectors, because the FCM looks only for circle-shaped-cluster! Mark the "Log"-Item if you want to see the red points for debugging purpose. The "Reset"-button calculates the start-prototypes. They are spread randomly over the data-field. How to change the background color: You can change the background color with the menu-item "Preferences" under the menu "Project". The changes will take effect only if you don't have activated the grey-structured background. See "Basics:" above What's about other data: In the Data directory you will find additional data-files. You can load them via the "Data" menu-item or via drag-and-drop. The data-format is self-explanatory: normal 100 1 1 // the 100 is the number of points. x y -10.000000 -10.000000 -9.800000 -9.800000 -9.600000 -9.600000 -9.400000 -9.400000 ... ... Some additional notes: The jobs are working in independend threads. Perhaps there will be a recompile for the Intel-Platform but no future enhancements. If someone has an Idea for a program using fuzzy-cluster-algorithms, mail us: Andreas : kaenner@server.et-inf.fho-emden.de or Ingmar : fastjack@server.et-inf.fho-emden.de Andreas Kaenner, Ingmar Krusch, Jacoub Ahmed