on the applicability of spiking neural network models to solve the task of recognizing gender hidden in texts

Procedia Computer Science Procedia Computer Science 101, 2016, Pages 187 – 196 YSC 2016 5th International Young Scientist Conference on Computational Science On the applicability of spiking neural network models to solve the task of recognizing gender hidden in texts Alexander Sboev1,2,3,4,5 , Tatiana Litvinova2 , Danila Vlasov1,4 , Alexey Serenko2 , and Ivan Moloshnikov2,4 MEPhI National Research Nuclear University, Moscow, Russia National Research Center Kurchatov Institute, Moscow, Russia Plekhanov Russian University of Economics, Moscow, Russia JSC “Concern ‘Systemprom’ ”, Moscow, Russia Moscow Technological University (MIREA), Moscow, Russia Sboev AG@nrcki.ru Abstract Two approaches to utilize spiking neural networks, applicable for implementing in neuromorphic hardware with ultra-low power consumption, in the task of recognizing gender of a text author are analyzed The first one is to obtain synaptic weights for the spiking network by training a formal network We show the results obtained with this approach The second one is a creation of a supervised learning algorithm for spiking networks that would be based on biologically plausible plasticity rules We discuss possible ways to construct such algorithms Keywords: supervised learning, spike-timing-dependent plasticity, artificial neural networks, spiking neural networks Introduction For a few last years the interest to spiking neural networks has been growing greatly as the result of appearance of neuromorphic hardware capable of running such networks It, in turn, gives rise to necessity to develop approaches that can be implemented on such hardware for solving practical tasks Taking into account the fact that hardware with ultra-low power consumption gives a way to solve the mentioned tasks on autonomous devices, a problem of spiking neural network learning becomes particularly relevant The task of predicting gender of a text author on base of linguistic parameters, that could be realised on these devices, is important, in particular, for security or conversational purposes There are generally two approaches to using spiking networks in a classification task Since learning algorithms for artificial networks are developed more than those for spiking nets, the direct approach is to convert a trained formal network into a spiking one In [1] each formal neuron is replaced with several spiking ones They, along with the encoding and decoding Peer-review under responsibility of organizing committee of the scientific committee of the 5th International Young Scientist Conference on Computational Science © 2016 The Authors Published by Elsevier B.V doi:10.1016/j.procs.2016.11.023 187 On the applicability of SNN to gender recognition Sboev et al Figure 1: Algorithm steps machinery, reproduce its activation function Furthermore, one can simply transfer synaptic weights from a trained formal network to a spiking network of same topology [2] We show in Section 1.2 that after such transfer the spiking network achieves higher accuracy than the formal one in the Fisher’s Iris classification task, and in Section 1.3 apply this approach to the gender recognition task Another approach is to implement learning in spiking neuron networks by biologically inspired learning rules There has been published a number of synaptic plasticity models suitable for supervised learning [3, 4, 5], but still none has been based only on the current knowledge of biological neural systems operating rules, namely, on the Hebb principle As the biologically plausible long-term plasticity model we consider spike-timing-dependent plasticity (STDP) [6] It was in [7] shown to be suitable for unsupervised learning, but a supervised learning protocol based on it has not yet been developed In Section 2.2 the STDP parameters which allow to receive several different synaptic weight distributions are demonstrated In Section 2.3 we show that any desired weight values can be reached in case of given proper value of correlation between input and output spike sequences Based on this fact, in Section 2.4 we suggest a supervised learning algorithm suitable for classification of rate-coded binary vectors ANN to SNN mapping approach 1.1 Network parameters and learning algorithm We here used, following [2], the combined learning algorithm, involving artificial (ANN) and spiking neural networks (SNN) It consists of the following steps (fig 1): Training the artifical neural network using backpropagation The neurons’ activation function was ReLU for hidden layers and Softmax for the output layer Neuron biases were set to zero Input data was normalized so that the L2 norm of each vector was Transferring the synaptic weights to the spiking neural network Integrate-and-fire neuron model was chosen, in which the membrane potential V obeys dV i s∈Si wi δ(t − s), dt = where Si is the sequence of spikes (spike train) on i-th input synapse, and wi is the synaptic weight Whenever the potential exceeds the threshold Θ, it is reset to zero and the neuron fires a spike Encoding input data to spike trains Input vector component x was encoded by a Poisson spike train with mean frequency x · νmax Optimizing the spiking network parameters Besides νmax and Θ, simulation time T and simulation step Δt were adjusted According to [2], 188 On the applicability of SNN to gender recognition Sboev et al • the simulation time should be set long enough to eliminate probabilistic influences of spike trains; • correct classification is impossible if it requires a neuron to fire several spikes in one simulation step So, total input a neuron receives during one simulation step must not exceed the threshold This condition is confidently fulfilled if wi ≤ Θ νmax · Δt · (1) i To fulfill (1) all spiking neural network weights are divided by the normalization factor M , same for all neurons in a layer but unique for each layer, ⎞ ⎛ wij ⎠ , (2) M = max ⎝ Θ j j where wij is i-th synapse weight of j-th neuron in current layer The conditions above are necessary but not sufficient, so achieving maximal classification accuracy still requires adjusting νmax and Θ 1.2 Fisher’s Iris classification To test the algorithm described above the popular toy task of Fisher’s iris classification was solved The network had neurons in the input layer, neurons in the single hidden layer, neurons in the output layer Spiking network weights were normalized according to (2) Each input vector was presented during 10 s The classification result was determined according to the output neuron that fired the most spikes during the simulation 1.2.1 Results The mean classification error (the ratio of wrongly classified input samples to the total number of samples) of ReLU network was 0.04 ± 0.01 on the training set and 0.06 ± 0.04 on the test set, averaged over 20 realizations of splitting to training and testing sets The spiking network can achieve higher accuracy than the ReLU one (Fig 2), with adjusted Θ and νmax reducing the error down to 0.04 ± 0.01 The higher Θ is, the higher the accuracy is, because the neuron has to integrate more input spikes before it fires an output spike 1.3 Gender prediction RusPersonality [8] is the first corpus of Russian-language texts labeled with data on their authors This free-to-use corpus contains over 1,850 documents, 230 words per document in average, from 1,145 respondents and is currently expanding A unique aspect of our corpus is the breadth of the metadata (gender, age, personality, neuropsychological testing data, education level, etc.) Another advantage is that, in contrast to the common approach of retrieving texts from social networks, all our samples were designed especially for this corpus Therefore they not contain any borrowings or citations All respondents were given a few themes to write about, same for male and female participants This, along with the large number of participants, allows to focus on the peculiarities caused by demographic characteristics of authors (gender in the case of the current paper) rather than by their individual styles 189 On the applicability of SNN to gender recognition Sboev et al ! ! ! Figure 2: Fisher’s iris classification error of spiking network on the test set, divided by the error of ReLU network, in dependence of maximum input frequency νmax for different neuron thresholds Θ The error is averaged over 20 realizations of splitting to training and testing sets, and then over independent realizations of input spike trains For distinctness, deviation bars are shown not for every point As the input data for gender prediction, the following set of context-independent features was used to describe a text: • Morphological features – the number of nouns, numerals, adjectives, prepositions, verbs, pronouns, interjections, articles, conjunctions, participles, infinitives, and the number of finite verbs • Syntactical parameters – syntactic relations of different types • Derivative coefficients which are different ratios of parts of speech (Trager index, dynamics coefficient, etc.) • The number of exclamatory marks, question marks, dots, and of emoticons; • The number of words pertaining to a particular “Emotion” group, e.g., “Anxiety”, “Discontent”, the total of 37 categories The highest gender classificacion accuracy obtained on our corpus is 0.86±0.05 [9], employing a sophisticated combination of learning algorithms However, we are currently interested in the difference in accuracy between the spiking network and ReLU rather than in the absolute accuracy values The training set contained 364 texts, the testing one 187 Network topology: 141 input neurons, 81 neurons in the first hidden layer, 19 neurons in the second hidden layer and neurons in the output layer Weight mapping was performed both with and without normalization (2) 190 ! On the applicability of SNN to gender recognition Sboev et al % % % % % % % % % % &'$ "!#$ Figure 3: The dependence of gender recognizing error on maximum input frequency νmax for different neuron thresholds Θ, and also with weights normalization, in which case Θ was equal to 1.3.1 Results The classification error of ReLU neural network was 0.22 on the testing set Mean classification error on test set of spiking neural network with different Θ and νmax without normalization (2) and with normalization are shown in Fig Again, as in the Iris classification task, the best accuracy was obtained at high input frequencies and thresholds The lowest error is 0.22, indicating that no losses took place during mapping 2.1 The principal possibility of applying Spike-TimingDependent Plasticity to the gender recognition task Materials and methods In the Spike-Timing-Dependent Plasticity model, each synapse’ strength is described by a weight ≤ w ≤ wmax , whose change depends on the exact moments tpre of presynaptic spikes and tpost of postsynaptic spikes: ⎧ ⎪ ⎪ ⎪ ⎨ −W− · w wmax Δw = ⎪ w ⎪ ⎪ ⎩ W+ · − wmax μ− μ+ tpre − tpost τ− tpost − tpre · exp − τ+ · exp − , if tpre − tpost > 0; (3) , if tpre − tpost < 191 On the applicability of SNN to gender recognition Sboev et al Figure 4: The restricted symmetric spike pairing scheme Tics denote spikes, and a gray line mean taking that pair of spikes into account in the STDP weight change rule, potentiation in pre-before-post case and depression in post-before-pre case where W+ = 0.03, W = 1.035 · W+ , τ+ = τ− = τcorr = 20 ms The rule with μ+ = μ− = is called additive STDP, with μ+ = μ− = – multiplicative, intermediate values ≤ μ ≤ are also possible In case of additive STDP the additional constraint is needed to prevent the weight from falling below zero or exceeding the maximum value wmax = 1: if w + Δw > wmax , then Δw = wmax − w; if w + Δw < 0, then Δw = w An important part of STDP rule is the scheme of pairing pre- and postsynaptic spikes when evaluating weight change according to the rule Besides the all-to-all scheme, there exist several nearest-neighbour ones [6] We used the restricted symmetric scheme (Fig 4), in which a presynaptic spike is paired with the last preceding postsynaptic, and vice versa, but a spike can participate neither in two depression pairs nor in two potentiation pairs As the neuron model we used Leaky Integrate-and-Fire, in which the membrane potential dynamics is − (V (t) − Vresting ) Isyn (t) Iext dV = + + ; dt τm Cm Cm when V ≥ Vth = −54 mV, V → Vresting = −70 mV, and during the refractory period τref = ms the neuron is insensitive to the synaptic input The membrane capacity Cm = 300 pF, the membrane leakage time constant τm = 10 ms The postsynaptic current is of exponential form: a presynaptic spike from synapse i at time tsp adds wi (tsp ) sp qsyn − t−t e τsyn Θ(t − tsp ) τsyn to Isyn , where qsyn = 0.75 nC, τsyn = ms, wi is the synaptic weight and Θ(t) is the Heaviside step function 2.2 The possibility of reaching non-bimodal weight distributions by non-additive STDP In case of additive STDP only and wmax = are the stable values of weight Using nonadditive STDP allows to reach more wide range of weight distributions To investigate the ability of weights to converge to the target, we used the protocol of [10]: Preliminarily, the output train of the neuron with target weights and without STDP is recorded It will be then considered as the desired output 192 On the applicability of SNN to gender recognition Sboev et al & Figure 5: Target synaptic weights and weights reached after applying protocol described in Section 2.2 In the left plot target weights are all equal to 0.5, and in the right plot target weights are distributed uniformly between and Then the neuron, now with STDP turned on, receives the same input trains, and is forced to fire spikes in desired moments by stimulating it by current impulses Fig shows two examples of target weight distributions that can be reached during learning with the parameters that we found, μ+ = 0.06 and μ− = 0.01 2.3 2.3.1 Correlative nature of STDP The correlation measure The direction of average weight change is determined by the amount of correlation between input and output spike trains Defining the normed cross-correlation function as Γ(Δt) = S (k · t ) · pre bin k k Spost (k · tbin ) Spre (k · tbin )Spost (k · tbin + Δt), k where Spre/post (t) indicates a pre/postsynaptic spike respectively at time t, and tbin is the simulation step, τcorr I= Γ(Δt) Δt=0 can be used as a rough correlation indicator, where τcorr is the STDP time window constant 2.3.2 Results Here we artificially generated input and output spike trains with different values of correlation When STDP is applied to these trains, the weights reach some equlibrium state (Fig 6A) The obtained weights, in their turn, reproduce output signal with the same level of correlation with input as the initial artificial signal, see Fig 6B STDP was non-additive, with the parameters as in Section 2.2 So, any desired weight value can be reached by making the neuron generate output with the proper amount of correlation with the corresponding input Based on this fact, we suggest the following protocol of supervised learning 193 On the applicability of SNN to gender recognition Sboev et al & ' '" Figure 6: Results of applying STDP to artificially generated input and output spike trains A: the dynamics of the Eucledian norm of the weight vector during the weights convergence B: the dependence of correlation indicator I for artificially constructed input and output trains on weights obtained as the result of STDP learning on base of these trains (“artificial output”), and the correlation of the output that the neuron produces with the established weight (“neuron output”) 2.4 A proposal for practical learning algorithm based on STDP As the input data we used 10-dimensional binary vectors, having half components of and the other half of Each vector component of was encoded by 10 synapses of the neuron receiving independent Poisson trains with mean frequency of 30 Hz, a component of – by 10 independent 2-Hz trains Let each vector belong to one of two classes: C+ , in response to which high output frequency is expected as proper classification, and C− , vectors from which should produce low mean output frequency Our model consisted of a single neuron with 100 incoming synapses, all excitatory STDP was additive with W+ = 0.01, W = 1.035 · W+ Initially all weights were set to 0.4 2.4.1 The learning protocol Input vectors are presented to the neuron in an alternating manner: a vector from C+ during s, then a vector from C− for 1.5 s During the presentation of a vector from C− the neuron is stimulated with constant current, high enough to make the mean output rate close to the highest possible 1/τref 2.4.2 Results While the neuron is receiving an input vector from C+ class, a synapse receiving high-frequency input contributes more to the neuron’s output, therefore its weight is more rewarded by STDP Vector components of increase in 66% cases, and weights of synapses receiving components of decrease in 66% cases with the parameters we have chosen When a vector from C− class is presented, the neuron output is caused by the stimulating current and is poorly correlated with input So, all weights decrease (for them not to fall to zero the duration of a vector from C− 194 On the applicability of SNN to gender recognition Sboev et al Figure 7: Deviation β between actual and tar- Figure 8: Mean firing rate of the neuron in reget weights during learning sponse to the input vectors after learning The first three vectors belong to C+ class, and the second three to the C− class Firing rate was averaged over tries, each having independent 30-s input spike trains is 1.5 s in contrast to s of a vector from C+ ), but weights of high-frequency inputs decrease more due to higher number of post-before-pre events We took six binary vectors: S1 S2 S3 S4 S5 S6 = (1 = (0 = (1 = (1 = (0 = (0 1 1 1 0 0 1 1 0 1 0 1 1 1 0 0 0 1 1 0 0 0), 0), 1), 1), 1), 0); three of which are linearly separable from the other three The target weights which separate them are known: (1 1 1 0), so we watched the deviation β(t) = 100 i=1 i |wi (t) − wtarget | 100 i=1 i wtarget between actual and target weights during learning (Fig 7) After 6,045 s of learning (310 cycles of presenting the whole set of vectors) the neuron clearly distinguishes the classes by its mean firing rate, as shown in Fig 195 On the applicability of SNN to gender recognition Sboev et al Conclusion There is a straightforward way to obtain the spiking network to solve the task of recognizing gender hidden in texts by training a well-studied artificial network and then use the ready weights in the spiking network implemented on a hardware with low energy consumption Results of mapping ANN to SNN demonstrate the same classification error of 0.22 of both ANN and SNN, indicating lossless mapping It is also possible to implement supervised learning in a spiking network with spike-timingdependent plasticity, based on controlling the correlation between input and output spike trains The proposed technique opens the way for using it in practical tasks, such as gender identifying It is a question of further research Acknowledgements This work was supported by RSF, project 16-18-10050 “Identifying the Gender and Age of Online Chatters Using Formal Parameters of their Texts” Simulations were carried out using high-performance computing resources of federal center for collective usage at NRC “Kurchatov Institute”, http://computing.kiae.ru References [1] Chris Eliasmith How to build a brain: A neural architecture for biological cognition Oxford University Press, 2013 [2] Peter U Diehl, Daniel Neil, Jonathan Binas, Matthew Cook, Shih-Chii Liu, and Michael Pfeiffer Fast-classifying, high-accuracy spiking deep networks through weight and threshold balancing In IEEE International Joint Conference on Neural Networks (IJCNN), 2015 [3] R Gă utig and H Sompolinsky The tempotron: a neuron that learns spike timing-based decisions Nat Neurosci., 9(3):420–428, 2006 [4] Joseph M Brader, Walter Senn, and Stefano Fusi Learning real-world stimuli in a neural network with spike-driven synaptic dynamics Neural computation, 19(11):2881–2912, 2007 [5] Jan-Moritz P Franosch, Sebastian Urban, and J Leo van Hemmen Supervised spike-timingdependent plasticity: A spatiotemporal neuronal learning rule for function approximation and decisions Neural computation, 25(12):3113–3130, 2013 [6] A Morrison, M Diesmann, and W Gerstner Phenomenological models of synaptic plasticity based on spike timing Biol Cybern., 98:459–478, 2008 [7] Peter U Diehl and Matthew Cook Unsupervised learning of digit recognition using spike-timingdependent plasticity Frontiers in Computational Neuroscience, 2015 [8] OV Zagorovskaya, TA Litvinova, and OA Litvinova Elektronnyy korpus studencheskikh esse na russkom yazyke i ego vozmozhnosti dlya sovremennykh gumanitarnykh issledovaniy [electronic corpus of student essays and its applications in modern humanity studies] Mir nauki, kultury i obrazovaniya [World of Science, Culture and Education], 3(34):387–9, 2012 [9] A Sboev, T Litvinova, D Gudovskikh, R Rybka, and I Moloshnikov Machine learning models of text categorization by author gender using topic-independent features (in review) [10] R Legenstein, C Naeger, and W Maass What can a neuron learn with spike-timing-dependent plasticity Neural Computation, 17:2337–2382, 2005 196 ... classification To test the algorithm described above the popular toy task of Fisher’s iris classification was solved The network had neurons in the input layer, neurons in the single hidden layer, neurons... difference in accuracy between the spiking network and ReLU rather than in the absolute accuracy values The training set contained 364 texts, the testing one 187 Network topology: 141 input neurons,... classification task, and in Section 1.3 apply this approach to the gender recognition task Another approach is to implement learning in spiking neuron networks by biologically inspired learning rules There

Định dạng
Số trang	10
Dung lượng	415,41 KB