I. Description of the package: The package consists of 2 programs: boltztrain and boltzoutput 1. boltztrain boltztrain is used for training a boltzmann machine, using either the annealing or the mean-field annealing scheme. It receives its parameters in an input file, writes its outputs (the weights) in an output file, and has the possibility to continue with training, i.e. to start from a determined set of weights, instead of initializing them randomly. A number of parameters for the boltzmann machine is under user control, such as: number of nodes, learning rate, choice of training algorithm, choice of training patterns, termination criterion etc. The details are given with the description of the file formats. usage: a) boltztrain or b) boltztrain [-c] input_file data_file output_file [continuation_file] When format a) is used, the program prompts the user for the names of the data and weights file, and then proceeds to prompt the user on the structural parameters of the boltzmann machine. There is no continue_training option when format a) is used. The file formats are specified later. When the boltztrain program executes, it generates output each iteration. The output specifies which iteration the program is in, as well as an error measure. The error measure is the sum of the squared differences between the probabilities p and p' (when both i/o are clamped, and when only outputs are clamped, respectively). The lower the error measure, the smaller the weight changes, i.e. the closer the algorithm is to termination. The continuation file is simply the output file from a previous execution. A restriction is that the same file name MUST NOT be used as both the output_file and the continuation_file name. 2. boltzoutput boltzoutput is used for testing a boltzmann machine. The choice of the output generation algorithm may be specified by the user to any one of the following: annealing, mean-field annealing, or hopfield-style. boltzoutput receives its inputs in the form of a weights file, in the same format as the output file generated by boltztrain. boltzoutput also expects another file, the test-data file, to be specified by the user. This file contains the test data. usage: a) boltzoutput or b) boltzoutput [-(a|h|m)] weight_file test_data_file When format a) is used, the user is prompted for the names of the weight_file, the data_file, and the type of output generation. When the boltzoutput program is executed, it generates outputs for each pattern in the test data. These outputs are printed out by printing out the whole state of the boltzmann machine, i.e. the state of the input layer, the hidden layer(s), and the output layer. II. File formats 1. input_file for boltztrain The input file is in a fixed format, i.e. all the parameters I will present below must be present in the file. The format for the file will be given by an example. Input file: Comments and descriptions 3 | Number of layers 2 2 1 | Number of nodes in each layer. The input layer is first 0 1 0 | Existence of internal connections in each layer. 1 is YES 300 0.001 0.09 | #iterations learning_rate convergence_criterion 40 0.95 0.5 m | starting_temperature beta_parameter end_temperature training algorithm: m - mean-field; a - annealing 4 | #training patterns In the input file, all the values in one row are separated by one space. All values that are shown in one line in the example, must remain in the same line. Preferred values: The number of layers is usually set to 3. ,meaning one input, one hidden and one output layer. The number of nodes in each layer is problem dependent. The existence of interconnections between the nodes in one layer also depends on the problem. Usually it is a good idea to interconnect the nodes in the hidden layer(s), but interconnecting the output nodes as well sometimes gives good results. The number of iterations simply limits the time that it will take the algorithm to execute. Since an option to continue the training exists, it is irrelevant whether this value will be set high enough. If after execution it is found that the iterations were too few, training can be simply continued by specifying the weight_file generated by boltztrain, as the continuation file and running boltztrain once more. The learning rate is the critical parameter. Usually, learning rates of 0.1 should be good enough, but, specially when mean-field annealing is used, it is much better to use a very low learning rate (such as 0.01, or even 0.005). If the learning rate is too high, the machine won't converge in a local minimum). The convergence criterion stops the machine's further training when a certain criterion is satisfied. The criterion used here is: sum((p1-p2)*(p1-p2))/total_weights where p1 - probabilities when both inputs and outputs are clamped, p2 - probabilities when only inputs are clamped total_weights - normalizes the sum, and makes it independent on the size of the machine To minimize the probability that the criterion will be satisfied due to a random choice, the algorithm keeps training the machine until two successive steps that satisfy the criterion occur. The starting and ending temperatures, as well as the beta parameter are all parameters of the simulated annealing process. For the "regular" annealing, I have found that good results are obtained by setting the start and end temperatures to 5 and 0.1 respectively, and setting the beta parameter to 0.9. For the mean-field annealing, setting the low temperature to a too low value makes all the states in the machine settle on or close to one of the {OFF, ON} states, defeating the purpose of the mean-field annealing scheme. I have found that setting the low temperature to 0.5 gives good enough results. The high temperature and the beta parameter may be set to the same values as in the "regular" annealing scheme, although setting them to 40 and 0.95 gives somewhat better results. Finally, the number of training patterns, as well as the training patterns themselves depend on the actual problem. 2. Data_file for boltztrain The datafile is in a fixed format. It specifies all the input-output training sample pairs. The format will be given by an example: 1 -1 ; 1 | input_pattern ; output_pattern -1 1 ; 1 | (for every pattern) 1 1 ; -1 | -1 -1 ; -1 | The important thing to notice is that one input pattern (and its corresponding desired output) are placed in one line, all of the values being intersperced with exactly one space, and the input and desired output are separated by a single semicolon 3. weight_file for both boltztrain and boltzoutput The weight_file is generated by the boltztrain program, and used by the boltzoutput program. This file is not under user control. It contains the basical information about the network (#layers, #nodes in each layer, internal connections), in the same format as the input_file. It then contains a list of the weights in the machine. An example of such a file (for the input file example above) is: 3 | number of layers 2 2 1 | number of nodes in the layers 0 1 0 | internal connectivity of the layers 0.391631 | various weights 0.107526 -0.429760 0.263403 -0.412605 -0.457438 0.007050 4. test_data_file for boltzoutput The test_data_file has fixed format. It specifies the number of test patterns that will be presented to the network, and the test patterns themselves. The format of the test patterns is similar to the format of the training patterns for the input_file, with the exception that the desired outputs are not present in the test_file. An example is presented below: 4 | 4 test patterns 1 1 | first pattern 1 -1 | second pattern -1 1 | third pattern -1 -1 | fourth pattern III. Program organization The programs are organized in three modules: 1. boltztrain 2. boltzoutput 3. boltz_common Since both the training and output generation are based on some common principles, all the functions dealing with common problems have been placed in the boltz_common module. The program boltztrain consists of the first and third modules, while the program boltzoutput consists of the first and second modules. Description of the modules: 1. boltztrain The boltztrain module implements the training algorithm. It parses the input arguments (main()), and supplies them to the training function (boltzmann()), which in turn opens the relevant files (openfiles()), reads the input (readinput()), initializes the random number generator (initrandom()), initializes the weights (initweight()), and trains the machine (trainboltzmann()). In the event of the user requesting continuation of execution, the weights are not initialized to random numbers, but are read instead (readweights()). The function readinput() also generates the main control structure (called the machine structure), by calling functions from the common module. The trainboltzmann() function initializes the state of the machine to a random state (initstate()), and initializes the exponent lookup table (init_exp_lookup()). The reason for using an exponent lookup table is that the repeated computation of the exponent is necessary in the algorithm, and it takes too much time. The exponent lookup itself is implemented in the common module. The trainboltzmann() function then proceeds to repeat the training of the boltzmann machine, until no changes (i.e. the convergence criterion) is satisfied two iterations in a row) occur. One training cycle consists of calling phases one and two (phase1() and phase2()) and modifying the weights afterwards (modify_weights()). This training cycle is repeated for all input patterns, and for each input pattern the machine is left to enter a stable state, either by annealing (anneal()) or by mean_field annealing (mean_field_anneal()). After the end of the annealing process statistics about which connections connect nodes which are simultaneously on or off are collected (update_counters() or update_mean_anneal_counters()). These statistics consist of simply counting the number of occurences of the same state on the ends of each connection, for every input pattern. At the end of the phase, these counters are normalized by the total possible number of updates, so that they will represent probabilities of the two states being equal accross each connection (normalize_counters()). The difference between the two phases is that in phase 1, both the input and output nodes are clamped, while in phase 2, only the input nodes are clamped. A node being clamped means that it is prohibited for that node to change its state during the annealing/mean_field annealing processes. Also, to get better performance on the test data, a noisy clamping of the inputs (and outputs in phase 1) is performed (noisyclamp()). With noisy clamping of inputs, every node has a certain probability of fliping its state, either from an ON to an OFF state, or from an OFF to an ON state. The noisy clamping is performed once for each input : at the start of phase 1. If the input sample has been changed in phase 1, it will remain the same in phase 2. Because in the second phase some of the nodes (that were clamped in the first phase) are not clamped any longer, and therefore are able to change their state, the probabilities of nodes accross one connection being equal in phase one, and the corresponding probabilities in phase two are different (in the general case), and they are recorded in two different storage spaces. The details of the data structure will be given later. Finally, the modify_weights() function simply modifies all weights based on the recorded probabilities p1 and p2 (standing for probabilities in phases one and two respectively). The modify_weights() function returns an error measure, based on the squared differences in the probabilities p1 and p2. That measure is summed accross all weights, and then divided by the total number of weights, to give a measure of the average weight change. 2. boltzoutput The boltzoutput module implements the output generation algorithm. The main() function just parses the input arguments, supplies defaults (if needed) and calls the output generation function (boltzoutput()). boltzoutput() opens the input files (openfiles()), reads the machine parameters and generates the machine structure (readinputs()), and depending upon the actual output generation algorithm requested (annealing, mean-field annealing, or hopfield-style), calls the appropriate function. The function readinputs() reads the machine parameters (#layers, #nodes in each layer, and internal connections of each layer), calls functions from the common module to generate the machine structure, and then reads in the weights of the machine (which were presumably calculated by the boltztrain program). By setting the mean_field field of the structure machine to false (0), when the function anneal_output() is called, it will perform the regular annealing algorithm for each input pattern, and generate the output patterns. The anneal_output() function initializes the random number generator (initrandom()) and the exponent table lookup (init_exp_lookup()), and for each input test pattern performs the annealing algorithm (anneal_style()), and writes the obtained output. The anneal_style() function initializes the state of the machine to a random state, copies the inputs to the input nodes, clamps the input layer, sets the start temperature, the end temperature and the beta parameter to 20.0, 0.1 and 0.95 respectively, and calls the annealing function from the common module (anneal()). If the mean_field field of the structure machine were set to true (1), the anneal_output() function would set the values for the start temperature, end temperature and the beta parameter to 40.0, 0.5, and 0.9 respectively, and calls the mean field annealing function from the common module (mean_field_anneal()). If the generation algorithm requested was the hopfield algorithm, the hopfield_output() function would have been called. The hopfield_output() function initializes the random number generator (initrandom()) and the exponent table lookup (init_exp_lookup()), and then for each input, performs the hopfild-style output generation (hopfield_style()) and writes the output (writeoutput()). The hopfield_style() function initializes the machine to a random state (initstate()), copies the input test pattern to the input node layer, and while changes occur, randomly selects one node (select_node()), checks whether its energy change is larger than 0 (energy_change()), and if so, flips the state of the node (flip_state()). All the functions mentioned in the last paragraph are from the common module. 3. boltz_common The common module contains several functions grouped in a few categories: a) random number functions; b) allocation and deallocation of the structure machine; c) state changes in the machine; d) annealing, calculating energy changes etc. ; e) exponent table lookup; and f) mean-field annealing functions. a) random number functions initrandom() initializes the random number generator by using the number of milliseconds since the 1st of January 1970 as a seed. get_rand() returns a random number that is within certain limits. get_1bit() is a function that returns just one random bit: 0 or 1. b) allocation and deallocation of the structure machine; deallocate_weight_matrix() is used to deallocate storage for the weight matrix, and for the probabilities p1 and p2 associated with the weight matrix. deallocate_internal_weight() is used to deallocate storage for the internal weights, and for the probabilities p1 and p2 associated with the internal matrix. deallocate_mean_field() is used to deallocate storage used for the mean values of the states, and for the probabilities of nodes being ON or OFF. deallocate_machine() is used to deallocate all the storage allocated for the structure machine. get_machine() allocates an empty structure machine, and initializes all its internal pointers to NULL. allocate_weight_matrix() is the opposite of deallocate_weight_matrix(). allocate_internal_weight_matrix() is the opposite of deallocate_internal_weight(). allocate_mean_field() is the opposite of deallocate_mean_field(). allocate_weights() calls the allocate_weight_matrix() and allocate_internal_weight() functions to allocate storage for all the weights (inter-layer and internal), storage for the states, for the 'clamped' array, for the inputs and desired outputs of the machine, for all the various probabilities, c) state changes in the machine; The functions in this category make the choice of the values for the ON and the OFF states invisible to the outside world. flip_state() flips the state of a node from ON to OFF or from OFF to ON. noisy_flip_state() - similar to flip_state(), only has a certain probability of making an error. d) annealing, calculating energy changes etc. ; select_node() selects one node randomly. Only those nodes which are in non-clamped layers have a chance of being selected. energy_change() computes the change in the energy of the matrix if one node were to flip its state. It is independent on the values used for ON and OFF. initstate() initializes the state to the machine to all ON, then randomly swithes some nodes to OFF. get_num_changes() determines how many changes of node state will be attempted on a certain temperature. Used solely by the annealing algorithm. accept_change() makes a decision whether to accept an energy change or not. If the energy change is positive, i.e. if it increases the energy function of the whole network, then the change is accepted immediately. Otherwise, there is a certain chance that such a change will be accepted, byt the chance falls with the fall of the 'temperature' parameter. anneal() performs the simulated annealing algorithm(). For every temperature in the selected range, a number of attempts will be made to change the state of some nodes in the machine (selected by delect_node()). If the change of energy (energy_change()) is accepted (accept_change()), then the state of the node is flipped (flip_state()). anneal_1_step() performs just one step of the annealing process, i.e. selects one node, and attempts to flip its state on a fixed temperature. e) exponent table lookup; and To reduce the time needed to compute the exp(x) function, a table lookup is implemented. The initialization of the table (init_exp_lookup()) just places 750 values of exp(x) in the table, for values of x ranging from 0.0 to 14.98, with a step of 0.02. exp_lookup() returns the value of the closest exponent. No interpolation is done. Simply the closest value (i.e. the first lowest value) is lookes up in the table and returned. f) mean-field annealing functions. The get_mean_input() function calculates the mean input to a node by summing up the products of the mean_states and the weights of all nodes that have a connection with the particular node. The mean_field_anneal() function first assigns the "real" values of the nodes to the mean values, then calculates the mean values of the nodes by repeatedly applying the get_mean_input() function to all the nodes. This is performed for a set of temperatures in a manner similar to annealing. At the end, the probabilities of a node being on or off are set. The probabilities of a node being on or off depend on its mean value. IV. Data structures The data structure used by all modules is the structure named "machine". A pointer to that structure is passed to all those functions that need to read/modify parts/all of the machine. This approach is used as opposed to a global structure which every function will be allowed to change. An even better approach would be subdividing the structure machine into few structures, with different contents, e.g. weights, probabilities, layers, etc. However, the latest approach is not used. Description of the parts of the structure machine: The structure has 5 distinct parts: 1. Machine data; 2. Input storage; 3. Training algorithm data; 4. Mean-field annealing data; and 5. Training algorithm probabilities. 1. Machine data This part contains data relevant to the 'physical' structure of the machine: the number of layers, number of nodes, values of weights etc. a) weights - this is a 3-dimensional array in which all weights between nodes on different layers are kept. Its usage is: weights[layer][i][j], meaning: weight between node 'i' in layer 'layer', and node 'j' in layer 'layer+1'. b) internal - this is a 3-dimensional array in which all weights between nodes on the same layer are kept. Its usage is: internal[layer][i][j], meaning: weight between nodes 'i' and 'j', both on layer 'layer'. No self-excitatory weights exist, i.e. the value of internal[layer][i][i] is undefined. Also, since this matrix is a symmetric matrix (accross its second and third dimensions), internal[layer][i][j] would be the same as internal[layer][j][i]. This implementation chooses to store only one of the above mentioned weights, i.e. internal[layer][i][j] is the value of the weight between nodes i and j on the layer 'layer' if and only if i>j. If i