I. Description of the program art: The program is used for training a neural network according to the art algorithm. It receives its inputs either in input files, or interactively, from the user. The outputs (the weights and classes) are written to an output file. Parameters under user control are the dimensionality of the inputs and the vigilance parameter. The details are given with the description of the file formats. usage: a) art or b) art -l input_file data_file weight_file or c) art -o weight_file data_file output_file When format a) is used, the program prompts the user for the names of the data and weights file, and then proceeds to prompt the user on the structural parameters of the art network. If output generation is performed, the program also queries about the name of the output data file. Format b) is used for learning. In learning, the parameters for the network are placed in the input_file, the training data are placed in the output_file, and after the training the weights will be output in the weight_file. Format c) is used for output generation. In output generation the program gets the parameters for the network, along with the nodes' weights from the weight_file, the test data from the data_file, and writes the outputs (data and their classification) in the output_file. The file formats are specified below. II. File formats 1. input_file The input file is in a fixed format, i.e. all the parameters I will present below must be present in the file. The format for the file will be given by an example. 7 | Dimension of input space is 7 0.7 | Vigilance parameter is 0.7 5 | 5 inputs will be given in the data_file In the input file, all values that are shown in one line in the example, must remain in the same line. Preferred values: The dimension of the input space is problem-dependent. The number of dimensions is not limited in any way. The vigilance criterium controls the degree of similarity between the input sample and the winner sample that accepts it. If the degree of similarity is unsatisfactory, the winner is discarded, and another winner is seeked. For low values of the vigilance parameter, the nodes in the network will be fairly general, while for high values of the vigilance parameter, the nodes in the network will be fairly specific. The network will develop a small number of general or a large number of specific nodes, depending on the vigilance parameter. 2. data_file The datafile is in a fixed format. It specifies all the training samples Its format is different in the learning and output generation modes. In the learning mode, the format is as follows (it will be given by an example): 1 1 0 0 0 0 1 | sample 1 0 0 1 1 1 1 0 | sample 2 1 0 1 1 1 1 0 | etc. The important thing to notice is that one input pattern is placed in one line, all of the values being intersperced with exactly one space. In output generation the data file is the file of test samples. The format of the file is the same to the format for the data file in learning, with one exception: The first line in the data_file gives the number of test input samples. For example, a output generation data file would be as follows: 3 | 3 test input samples 1 1 0 0 0 0 1 | sample 1 0 0 1 1 1 1 0 | sample 2 1 0 1 1 1 1 0 | etc. 3. weight_file The weight_file contains the most basic structural parameters, as well as data on the nodes: their top-down and bottom-up weights. An example of such a file (for the input file example above) is: 7 | Dimension of input space is 7 2 | 2 nodes in the network 0 0 0 1 1 1 0 | Top-down weights of node 1 0.00 0.00 0.00 0.28 0.28 0.28 0.00 | Bottom-up weights of node 2 1 1 0 0 0 0 1 | Top-down weights of node 1 0.28 0.28 0.00 0.00 0.00 0.00 0.28 | Bottom-up weights of node 2 4. output_file The output file contains the outputs generated by the art network when running output generation. An example for such a file (for the test data example above) is: 1 1 0 0 0 0 1 | test sample 1 1 1 0 0 0 0 1 | top-down weights of the winner node 0.28 0.28 0.00 0.00 0.00 0.00 0.28 | bottom-up weights of the winner node 0 0 1 1 1 1 0 | test sample 2 0 0 0 1 1 1 0 | top-down weights of the winner node 0.00 0.00 0.00 0.28 0.28 0.28 0.00 | bottom-up weights of the winner node 1 0 1 1 1 1 0 | test sample 3 0 0 0 1 1 1 0 | top-down weights of the winner node 0.00 0.00 0.00 0.28 0.28 0.28 0.00 | bottom-up weights of the winner node III. Program organization The programs is organized in one module - art. Description of the module: The program parses the input arguments (main()), and supplies them to the training function (art()). It also determines which of the usage formats have been used, and calls the art() function accordingly. The art() function in turn opens the relevant files (openfile()), reads the input (readinput()), and initializes the random number generator (initrandom()). If the network is performing learning, the art() function trains the network (trainnetwork()), and outputs the results (writeoutputweight()). If the network is performing output generation, the art() function calculates the outputs for the test data (get_outputs()), and outputs the data (writeoutputdata()). The function readinput() also generates the main control structure (called the network structure), depending on the input parameters. If no arguments are supplied at run-time, the user is interactively queryed whether the network is to be trained, or output is to be generated, and depending on the choice, the user is queried for the input parameters, and for the names of data, weight, and possibly output files. The trainnetwork() function is run until no further changes are made to the network. In each iteration the trainnetwork() function presents the network with all training samples. For each training samples, the best matching node is found (best_node()), and if such a node exists, it is updated (update_best()). If not, a new node is added to the network (new_node()). The best_node() function forms a list of active nodes (initially all nodes - activate_all()), and then proceeds to find the node with largest output (largest_output()), and to test whether it is satisfactory under the vigilance criterium (satisfactory()). It continues with the operation until either a node that satisfies the vigilance criterium is found, or until all the nodes have been removed from the list of active nodes. The largest_output() function simply determines which of the active nodes has the largest scalar product of the input sample and the bottom-up weights. The satisfactory() function checks whether a node satisfies the vigilance criterium or not, for a certain input sample. The uppdate_best() function updates both the top-down and the bottom-up weights for the best-matching node. Finally, the new_node() function creates a new node, with its top-down and bottom-up weights set according to the training input sample for which the new_node() function was invoked. The get_outputs() function presents each test sample to the already trained art network. For each sample the best matching sample (sample with largest output) is found, using the largest_output() function. The bottom-up weights of the winner node are recorded in the 'output' field for each test input. IV. Data structure The most important data structure in the program is the "network" structure. A pointer to this structure is passed to all functions that need to read/modify parts/all of the network. Description of the parts of the structure network: The structure has 3 distinct parts: 1. Structural data; 2. Training parameters; and 3. Input storage. 1. Structural data This part contains data relevant to the 'physical' structure of the network: the dimensionality of the input space, the number of nodes, and the nodes themselves. a) indim - an integer which gives the number of input dimensions. Each node will have exactly indim weights, to specify its position in input space. b) num_nodes - an integer which gives the number of nodes in the network. c) list - a linked list of nodes. Parameters for each node are placed in a structure 'node'. The structure contains two arrays of weights: the top-down weights 't', and the bottom-up weights 'b'. It also contains an integer: active, signifying whether the node is in the list of active nodes, and a pointer to the next node in the linked lisk 'next'. 2. Training parameters This part contains all the training parameters. The training parameters are unused when the network is used to generate outputs. a) vigilance - a double giving the value of the vigilance parameter. b) learning - an integer specifying whether the network is used for learning or output generation. When set to 1 the network is used for learning. When set to 0, the network is used for output generation. 3. Input storage This part contains all the input data: the number of inputs and the training (or testing) patterns. a) num_inputs - an integer stating the number of inputs the machine will receive. b) inputs - a 1-dimensional array giving the values of all training inputs. inputs[input#] gives the input numbered 'input#'. All the inputs are stored in the 'input_s' structure. The input_s structure contains two arrays. The array 'input', an array of indim integers - gives the actual input values. The array 'output', an array of indim doubles - gives the bottom-up weights of the winner node for this input sample. The array 'output' is used only for output generation. Its values for network training are undefined.