I. Description of the program counterprop: The program is used for training a neural network according to the counterpropagation algorithm. It receives its inputs either in input files, or interactively, from the user. The outputs are also written to a file. Parameters under user control are the dimensionality of the inputs, and outputs, the number of nodes in the hidden layer, and the training process parameters. The details are given with the description of the file formats. usage: a) counterprop or b) counterprop -l input_file data_file weight_file or c) counterprop -o weight_file data_file output_file When format a) is used, the program prompts the user for the names of the data and weights file, and then proceeds to prompt the user on the structural parameters of the counterpropagation network. If output generation is performed, the program also queries about the name of the output data file. Format b) is used for learning. In learning, the parameters for the network are placed in the input_file, the training data are placed in the output_file, and after the training the weights will be output in the weight_file. Format c) is used for output generation. In output generation the program gets the parameters for the network, along with the nodes' weights from the weight_file, the test data from the data_file, and writes the outputs (data and their classification) in the output_file. The file formats are specified later. When the program executes in learning (training) mode, it generates output each iteration. The output gives the iteration and the measured error. In each iteration all the samples are presented, and the measured error is the mean distance between the inputs and their respective winner nodes in the first phase, i.e. the mean distance between the weights of the winner node and the desired output in the second phase. II. File formats 1. input_file The input file is in a fixed format, i.e. all the parameters I will present below must be present in the file. The format for the file will be given by an example. 2 1 | Dimension of input is 2, dimension of output is 1 3 | Hidden layer contains 3 nodes. 0.5 0.2 0.01 | Phase1: eta(start)=0.5; eta(end)=0.2, converge on error<=0.1 0.3 0.01 0.001 | Same for phase 2. 100 50 | Max. iterations in phase 1 are 100, max. in phase 2 are 50 12 | 12 inputs will be given in the data file. In the input file, all the values in one row are separated by one space. All values that are shown in one line in the example, must remain in the same line. Preferred values: The dimensions of the input space and the output space are problem-dependent. The number of dimensions is not limited in any way. The size of the hidden layer is also problem dependent. Larger problems, i.e. problems with more clusters need larger hidden layers. The size of the hidden layer is not limited in any way, except that it must be a number greater than zero. The end learning rate (eta) value should be at least one order of magnitude smaller than the starting learning rate. The learning rate shrinks linearly with each iteration, so that at the start of the training, it will be the same as the starting learning rate, while at the end (if the maximum number of iterations is reached), it will be the same as the ending learning rate. In phase 1, the starting eta value should be higher than 0.2, so that all nodes will have a chance to converge on their respective clusters. The end value should be around 0.01, so that no large movements are allowed at the end. In phase 2, the starting and ending eta rate may be arbitrary, but larger learning rates lead to faster convergence. Since all samples in the same cluster should have the same output value, the convergence criterion may be arbitrary low. If set at 0.0, it takes the network infinitely long time to reach it if the learning rate is set too low. If the learning rate is set to 1.0, convergence is reached very fast. 2. data_file The datafile is in a fixed format. It specifies all the training samples Its format is different in the learning and output generation modes. In the learning mode, the format is as follows (it will be given by an example): 0.0 1.0 0 | sample (0.0, 1.0), desired output is 0 0.5 0.9 0 | sample (0.5, 0.9), desired output is 0 0.9 0.9 1 | sample (0.9, 0.9), desired output is 1 0.6 0.2 0 | etc. The important thing to notice is that one input pattern is placed in one line, all of the values being intersperced with exactly one space. The value of the class associated with the input is placed in the same line as the input sample, divided by one space. In output generation the data file is the file of test samples. The format of the file is the same to the format for the data file in learning, with two exceptions: 1. the desired outputs are ommitted. 2. the first line in the data_file gives the number of test input samples For example, a output generation data file would be as follows: 4 | 4 test input samples 0.0 1.0 | sample (0.0, 1.0) 0.5 0.9 | sample (0.5, 0.9) 0.9 0.9 | sample (0.9, 0.9) 0.6 0.2 | etc. 3. weight_file The weight_file contains the most basic structural parameters, as well as data on the hidden nodes: their input and output weights. Two lines are output per hidden node. The first line contains the weights on the connections between the input nodes and the hidden node, while the second one contains the connections between the hidden node and the output nodes. An example of such a file (for the input file example above) is: 2 1 | Dimension of input is 2, dimension of output is 1 3 | Hidden layer contains 3 nodes. 0.542478 0.072443 | Hidden node 0: input weights 0.032431 | Hidden node 0: output weights 0.081976 0.895905 | Hidden node 1: input weights 0.004230 | Hidden node 1: output weights 0.898993 0.897992 | etc. 0.970493 | The number of weights from the input nodes to a hidden node is the same as the dimension of the input, and the number of weights from a hidden node to the output nodes is the same as the dimension of the output. 4. output_file The output file contains the outputs generated by the lvq network when running output generation. An example for such a file (for the test data example above) is: 0.000000 1.000000 0.004230 | Sample (0.0, 1.0), output close to 0 0.500000 0.900000 0.004230 | Sample (0.5, 0.9), output close to 0 0.900000 0.900000 0.970493 | Sample (0.9, 0.9), output close to 1 0.600000 0.200000 0.032431 | etc. Notice that the outputs are not exactly the same as the desired outputs. This happens when the convergence criterion in phase 2 is not set to an extremely low value III. Program organization The programs is organized in one module - counterprop. Description of the module: The program parses the input arguments (main()), and supplies them to the training function (counterprop()). It also determines which of the usage formats have been used, and calls the counterprop() function accordingly. The counterprop() function in turn opens the relevant files (openfile()), reads the input (readinput()), and initializes the random number generator (initrandom()). If the network is performing learning, the counterprop() function initializes the network (initnetwork()), trains it (trainnetwork()), and outputs the results (writeoutputweight()). If the network is performing output generation, the counterprop() function calculates the outputs for the test data (getoutput()), and outputs the data, along with its newly associated class. The function readinput() also generates the main control structure (called the network structure), depending on the input parameters. If no arguments are supplied at run-time, the user is interactively queryed whether the network is to be trained, or output is to be generated, and depending on the choice, the user is queried for the input parameters, and for the names of data, weight, and possibly output files. The initnetwork() function assignes random weights for all nodes. All weights are chosen from the interval (0.0, 1.0) . The training of the network consists of two phases. Hence, the trainnetwork() function simply calls the phase1() and phase2() functions. In phase1(), the counterpropagation trains the weights from the input to the hidden nodes. In this phase, for each iteration the network is presented with all the training samples. For one iteration the learning rate remains constant, but as iterations progress, the learning rate is shrinked linearly. For each input sample, the network is trained by finding the winner for that sample (closest()), and updating the weights that lead from the input nodes to the winner node (update()). The update() function consists of moving the weights of the winner node closer to the input sample. The error is calculated as the mean distance from the training inputs to the winner nodes before updates are performed. The network converges when the error measure falls below a user-determined limit. The network also stops training when the specified number of iterations have been run, but it had not converged. The closest() function traverses the entire network to find the closest node. It uses the function get_euclidean() to find the euclidean distance between two vectors, i.e. between the training sample and the weight of each node. The phase2() function performs the training of weights from the hidden to the output nodes. Its control structure is the same as phase1(), i.e. for each iteration the learning rate is computed, then all the samples are presented until either the network converges, or the number of iterations exceeds the user-specified bound. For each input sample, phase2() first finds the closest hidden node (closest()), then moves the hidden node closer to the input, only at the learning rate at the end of phase 1 (which is presumably small). Finally, the weights from the node to the output nodes are modified so that they are closer to the desired output. The last operation uses the same update() function that is used to update the weights from the input to the hidden nodes. The error measure is computed as the mean euclidean distance between the winner node's output weights and the desired output. The getoutputs() function simply presents all the test samples to the network. The output of the network are the weights from the winner node for that test sample to all the output nodes. IV. Data structure The most important data structure in the program is the "network" structure. A pointer to this structure is passed to all functions that need to read/modify parts/all of the network. Description of the parts of the structure network: The structure has 3 distinct parts: 1. Structural data; 2. Training parameters; and 3. Input storage. 1. Structural data This part contains data relevant to the 'physical' structure of the network: the dimensionality of the input and output space, the number of nodes in the input, hidden and output layer, and the weights in the network. a) io_nodes - an integer array which gives the number of input/output dimensions. io_nodes[0] gives the dimension of the input space, while io_nodes[1] gives the dimension of the output space. Each hidden node will be associated with exactly io_nodes[0] weights towards the input nodes, and with io_nodes[1] weights towards the output nodes. b) num_hidden - an integer which gives the number of nodes in the hidden layer of the network. c) weights - a three dimensional array in which all the weights in the network are being kept. Its usage is: weights[layer][i][j] meaning: weight between hidden node i and input/output node j. layer == 0 gives the weights between the hidden and the input nodes, while layer == 1 gives the weights between the hidden and the output nodes. 2. Training parameters This part contains all the training parameters. The training parameters are unused when the network is used to generate outputs. a) eta1_start, eta1_end - two doubles giving the learning rate at start and at end of phase 1 in the training. The learning rate is linearly decreased between these two values, as the training progresses. b) eta2_start, eta2_end - two doubles giving the learning rate at start and at end of phase 2 in the training. The learning rate is linearly decreased between these two values, as the training progresses. c) converge1, converge2 - two doubles giving a limit on the error, below which the network is said to converge, in phase 1 and phase 2 respectively. d) num_iter1, num_iter2 - two integers giving the maximum number of iterations in phase 1 and phase 2 respectively. e) learning - an integer specifying whether the network is used for learning or output generation. When set to 1 the network is used for learning. When set to 0, the network is used for output generation. 3. Input storage This part contains all the input data: the number of inputs and the training patterns. a) num_inputs - an integer stating the number of inputs the machine will receive. b) inputs - a 1-dimensional array giving the values of all training/test input samples. inputs[input#] gives the input number 'input#'. All the inputs are stored in the 'input_s' structure. The input_s structure contains two arrays: 1. the array input - an array of io_nodes[0] doubles - gives the actual input value. 2. the array output - an array of io_nodes[1] doubles - gives either the desired output (when the network is being trained), or the actual output (when the network is used for output generation). The array output is either read from a file (for training samples), or is generated by the network (for test samples).