Establish the Level-Tree

Một phần của tài liệu Collaborate computing networking, applications and worksharing (Trang 615 - 675)

In order to make the theory more easy-to-understand, we choose a keywords to start our experiment as an example. The keywords we chose is “Confucianism” and it is of course that the user can chose any else words as long as he need. And in order to establish the thesauri level-tree, we add some related words like “Confucius”, “Analects”(The Analects of Confucius) and so on. Figure 2. Shows the level-tree, which is a part of the integrated Chinese thesauri, used in the example experience in this paper.

Fig. 2. The thesauri level-tree of our experience

After the establishment of the thesauri tree, we will use it to gain the correlation value of each related word. And based on the structure of level-tree, we choose to apply the Dijkstra algorithm to realize the process of traversing all the nodes, and get the shortest path between every related words and the keywords. Considering the principle of easy-to-understand and practical, we choose a simple formula as following show to calculate the semantic distance of the related words and the keywords:

Dist(Ci) =1∕(2∧Li) (1)

The Ci represent the related words. Li means the length of the shortest path of the related words and the keywords. Dist(Ci) shows the degree of the relation of word Ci and the keywords. We definite the weight of the keywords “Confucianism” is 1, and any other words’ weight will be based on the semantic distance between it and the keywords.

Using this formula, we arrive at the conclusion that the correlation of “Confucian”,

“Confucius”, “Four Books” is 0.5 and for “Confucian Persuasion”, “Confucianist”, 604 F. Chen et al.

“Menci”, “Analects”, the value is 0.25. And the weights of those related words are showed in Fig. 3.

Fig. 3. The weights of related words

4 Word-Segment Algorithm and Word-Frequency Statistics Differ from English text, the words in Chinese text don’t have the blanks to divided from each other. It even ever made some trouble in the development of the text analysis in China. But not very long after the emerge of that problem, Jie-Ba algorithm was came up with. And in our experiment, we also choose to use it in order to divide the words in Chinese text. The Jie-Ba algorithm is based on the trie tree, which is a famous prefix tree. It includes more than 20,000 words, approximately cover the all common words that may be used in our ordinary life. And all these words are collected in a txt file. It is similar with the verb collocations in English, presenting the correlation between words and words like Fig. 4 shows.

And according to the trie tree, we are able to distinguish the divide method of the sample. Once the word matched and have no word longer than it can match, a blank will be added. And text will be segregated word by word. Jie-Ba algorithm provides three modes named full mode, default and search mode. Full mode can scan all the letters that can become a word, its speed is quick but can’t solve the problem of ambiguity. The default mode is trying to segregate the text most accurately, so it is suitable for text analysis and we choose this mode for those reasons. As you can see by its name, the search mode fits the search engine and under this mode, text will be default segregated and long words will be cut again for a second time.

A Method on Chinese Thesauri 605

Fig. 4. The verb collocations in English

After word-segment, the text is divided word by word and segregated by blanks. So, it offers the condition of word-frequency and we don’t need to worry about take the wrong match of words. And we use the same dictionary which is used by Jie-Ba algo‐

rithm to calculate the word-frequency in each sample text. It is easy to realize because it just need some cycles to match the words, gain the final data and output the result.

5 Calculation of Text Relevance

Provided the related words’ weight and word frequency, the last step is to count the text relevance with the keywords. And we choose to use the following formula:

R(Assayk) = {∑n

i=1[Dist(Ci) ∗Nk,i]}

MK (2)

In the formula, Assayk represent the text. ‘n’ means the number of keywords and related words. Ci is the NO.i word, N(k,i) means the sum of word Ci in Assayk. And Mk is the sum number of word in Assayk. After calculation, we arrive at the conclusion of following data, which is showed in Figs. 5 and 6:

606 F. Chen et al.

Fig. 5. Text relevance of samples

Fig. 6. Histogram of text relevance

And as the figure shows, every text is attached with its own value of the relevance between it and user’s keyword. Using those data, we can offer the users a second select chance, limiting the relevance value of those texts that he gain from input the keywords in search engine only. And this step can further filter the samples to attain different users demand and class the texts, which contain the keyword, by the text relevance. It will be used in many high requirements research, because the texts is no more just texts but hold a relevance with the keywords in different degree.

6 Conclusion and Future Work

This paper presents a novel model to secondly filter the texts those are collected by input the keywords on the Internet. It is based on the Chinese Thesauri and related to the establishment of thesauri, word-segment algorithm, word-frequency statistics and the calculation of text relevance. Through the experiment, we validate the effectiveness and accuracy of our method. Using the Chinese Thesauri, we limit the relevance of all the samples, which can avoid the waste of analyzing the useless texts and make the samples attain the higher quality after the second filter.

A Method on Chinese Thesauri 607

Future word will include a number of aspects. Firstly, the existing thesaurus needs an more accurate standard and cover larger scale. It requires a more professional knowl‐

edge of linguistics and graph theory. Secondly, the language is always in the proceeding of change. At the same time, the Chinese Thesauri will also be changing. So, we need to continually update the structure of our thesauri into a more scientific framework.

Finally, in order to make this model be used by not only the technical scholar but also the layman, we are supposed to get the entire algorithm into software. After this step, it will only require the user input the keywords and smallest relevance of the texts and keywords. Then, sample collecting, thesauri establishment, word-segment algorithm, word-frequency statistics and the calculation of text relevance will all be done auto‐

matically. It will largely enhance the features of using-friendly and efficient.

Acknowledgements. The research was supported in part by the National Science Foundation of China under No.61672104, 61170209, 61502038,U1509214;Program for New Century Excellent Talents in University No.NCET-13-0676. Key Program of BFSU 2011 Collaborative Innovation Center No.BFSU2011-ZD04.

References

1. Jing, Y., Crof, W.B.: An Association Thesauri for Information Retrieval (1994)

2. Mihalcea, R., Corley, C.: Corpus-based and Knowledge-based Measures of Text Semantic Similarity (2006)

3. Tausczik, Y.R., Pennebaker, J.W.: The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods (2010)

4. Scott, S., Matwin, S.: Text Classification Using WordNet Hypernyms (1998)

5. Roberts, C.W.: Text Analysis for the Social Sciences: Methods for Drawing Statistical Inferences from Texts and Transcript. Lawrence Erlbaum Associates, Mahwah (1997) 6. Lacity, M.C., Janson, M.A.: Understanding qualitative data: a framework of text analysis

methods. J. Manage. Inf. Syst. 11(2), 137–155 (1994)

7. Stone, P.J.: Thematic text analysis: new agendas for analyzing text content. In: Roberts, C.

(ed.) Text Analysis for the Social Sciences. Lawrence Erlbaum Associates, Mahwah (1997) 8. Lehnert, W., Sundheim, B.: A Performance Evaluation of Text-Analysis Technologies.

www.aaai.org

9. Soergel, D.: Indexing languages and thesauri: construction and maintenance (1974).

www.dsoergel.com

10. Wang, Y.-C., Vandendorpe, J., Evens, M.: Relational thesauri in information retrieval. J. Am.

Soc. Inf. Sci. 36(1), 15–27 (1985). America

11. Larsen, H.L., Yager, R.R.: The use of fuzzy relational thesauri for classificatory problem solving in information retrieval and expert systems. IEEE Trans. Syst. Man Cybern. 23(1), 31–41 (2002)

12. Budanitsky, A., Hirst, G.: Semantic distance in WordNet: an experimental, application- oriented evaluation of five measures (2001)

608 F. Chen et al.

Formal Modelling and Analysis of TCP for Nodes Communication with ROS

Xiaojuan Li1(&), Yanyan Huo1, Yong Guan1, Rui Wang1, and Jie Zhang2

1 Beijing Key Laboratory of Light Industrial Robot and Safety Verification, College of Information Engineering, Capital Normal University,

Beijing 100048, China Lixj66@gmail.com

2 College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China

Abstract. TCP (transportation control protocol) is widely used for supporting communications between robotic nodes with ROS (robotic operation system) for critical-task implementation. The probability of bit errors and lost packets is much higher for moving nodes under WLAN. So it is essential to analyze the performance and the reliability of the communication processes for nodes with ROS. It is built that the communication model of nodes for TCP in ROS by MDP(Markov Decision Process) and the reliability of that is analyzed in this paper. The Specifications of the TCP for nodes communication is formalized into the objective properties by PCTL(Probabilistic Computation Tree Logic), and the satisfiability of the properties is verified by the probabilistic model checker. The results can help the designers to make better strategies for the communication process over TCP in ROS of robotic nodes.

Keywords: Node network communication Probabilistic model checking

Markov decision process

1 Introduction

With the increasingly development of robotic technology, many new applications are deployed in distributed nodes with ROS for cooperative tasks. The correctness and reliability of communication among nodes is getting more important in critical-task system. The transportation of commands and data among nodes with ROS is based on TCP(transport control protocol), which plays an important role for the communication reliability of nodes. Some work have been made on reliability analysis of TCP in WLAN. A adaptable TCP segment size scheme is proposed to improve the TCP communication performance in wireless environment [1]. Data link layer and sub- section connection is developed for improving the performance of TCP protocol in wireless network [3]. A reliability sorting algorithm is put forward for TCP data packet for the limited covert channel in [4]. These references basically use simulation, emu- lation or other traditional verification methods to analysis or improve the optimized methods for the reliability of TCP communication protocol.

©ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2017 S. Wang and A. Zhou (Eds.): CollaborateCom 2016, LNICST 201, pp. 609–614, 2017.

DOI: 10.1007/978-3-319-59288-6_61

Formal method is based on strictly mathematical reasoning to analyze or check the correctness of design and implementation, which can be an automatic checking for a finite status system. Model checking can provide automatic checking whether a system abstraction model satisfy the properties, which are formalized from the specifications of design. A colored Petri net model for the TCP’s connection is used to verify the correct of the communication protocol [7]. The verification and analysis for SpaceWire [8]

communication protocol at the exchange level by model checking. The real-time properties of the session level of nodes with ROS [9] is verified by Uppaal. The Probabilistic model checking combines probability analysis and general model checking method technology, and it is a useful for the description of non-deterministic stochastic systems. The paper focuses on the analysis and verification of the reliability of the TCP transportation for nodes with ROS by probabilistic model checking (Table1).

2 Formal Modelling of Nodes Communication Based on TCP

The operation communication between ROS nodes is controlled by a main node running as roscore, which is responsible for monitoring and management the all functional nodes’communication. All of the nodes must register in the ROS Master node while they start, and all nodes can communicate with each other after authorized by the master node. XML-RPC communication protocol is the calling mechanism of the communication between ROS nodes’ communication, which is based on TCP protocol, and by adding the port on the transport layer, it can be represented corre- sponding application layer communication. In order to analysis and verify the per- formance of the nodes’communication protocol, this paper builds formal model and gives probabilistic analysis for the connection set up and sending or receiving message of the nodes.

The Markov Decision Process model for the connection and communication between node1 and node2 is built for the verification. The node1 model is shown in Fig.1, While establishing the models for node1 and node2, the action translates from initial state“idle”to send-request state “request”. After sending a request, the node1 will wait for confirmation information from the node2. If the node1 receives the confirmation from the node2, it will send a signal to the node2 again, if the node2

Table 1. Symbolic representation of the node1.

Symbol Function

idle request req_num ack1_num ack2_num RECEIVE_NUM SEND_NUM P

Initial state, no request for connection Sends the request signal

The number of sending request

The number of backtrack from the node2 The response number from node1 to node2 The upper limit of receive message The upper limit of send message The rate of package lost 610 X. Li et al.

successfully receives the confirmation signal, then the connection was established successfully. Conversely, if the node1 does not receive confirmation signal from the node2 within the prescribed time limit, the node1 will keeping sending a signal to the node2 until it receives confirmation signal from the node2 or it reaches maximum retransmission limit. In order to make modelling and analysis the problem of estab- lishing connection and sending message between node1 and node2, the paper extends the model by adding the sending data into the model after the connection is established successfully.

In the similar way, the model of the node2, which is receiver side, also describes the process of the three-way-handshake connection and data transmission. When a con- nection is requested from the node1 to the node2, the state of the node2 will change, and it will transfer from the initial state“idle”to the“wait-req”state. After receiving the request signal, the node2 turns into the“receive-req”state. Then after sending the confirmation signal to the node1 successfully, the node2 will be to“send Send-ack1” state, if the progress is successful, the node2 will transfer to receive “reveive-ack2” state, and wait for the acknowledgement signal from the node1, if it successfully receives the confirmation signal from the node1, it will move to “establish” state.

Fig. 1. Node1 communication process modelling

Formal Modelling and Analysis of TCP 611

Next, the node1 and the node2 will send data each other. When the node2 receives the request from the node1 successfully, the node2 sends the corresponding service [15] to the node1, if the node1 receives the response message from the node2, then the tran- sition have been completed successfully, During the period, the lost package may made the failure transmission between the node1 and the node2, it is taken into account in the formal models by adding the probability to the MDP models in the paper.

3 Verification and Probability Analysis

Critical properties for the nodes communication, which is translated into PCTL for- mula, is extracted from the design specification. Based on the formal models of the nodes that we have set up aboved, the properties are automatically verified by PRISM model checker and further be analyzed.

In the model of node1, c = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 respectively represent the node1 is at idle, request, send- req, wait-ack, receive-ack1, send-ack2, establish, send-message, send-success, receive-response state; and the node2 state s = 0, 1, 2, 3, 4, 5, 6, 7 respectively represent the node2 is at idle, wait-req, receive-req, send-ack1, receive-ack2, establish, receive-message, response-success state.

Property 1. The maximum probability under different package lost?

Pmax=?

[Fc=9&req_num=1&ack1_num=1&ack2_num=1&send_num=1&receive_num=1]

In Fig.2, the horizontal axis represents the rate of the package lost in the wireless network, the vertical axis represents the maximum probability of the connection established and sending data successfully between the node1 and the node2. As Fig.2 is shown, when the error probability is 0.05, the maximum probability of sending the correct data is less than 75%, the results is very important for optimizing the network in the design phase.

Fig. 2. The maximum probability under different package lost 612 X. Li et al.

Property 2. The maximum probability of the sending data successfully under different retransmission?

The maximum probability for first and second transferring successful is respec- tively expressed as following:

Pmax=?[F s=7&req_num=1&ack1_num=1&ack2_num=1&send_num=1]

Pmax=?[Fs=7&req_num=1&ack1_num=1&ack2_num=1&(send_num=2|

send_num=1)]

The result shows that the maximum probability withfirst transferring is 0.6587, and the maximum probability is 0.7246 for second try shown in Fig.3, which coincides with the experiment result, and it verified that the probability changes with the number of retransmission, and they are positive correlation. The results of the analysis lay a foundation to the future research or application about the communication nodes with ROS.

4 Conclusion

We builds the formal model and analysis the communication protocol in the wireless network for the nodes with ROS, and extracts some critical properties for analyze and checking. the reliability of the TCP between nodes with ROS is analyzed under dif- ferent link error probability by probabilistic model checking, which provides useful strategies for designer, and is helpful for avoiding bug at design phrase.

Fig. 3. The maximum probability under twice transmision

Formal Modelling and Analysis of TCP 613

Acknowledgement. The authors thank Beijing Key Laboratory of Electronic System Reliability Technology, Beijing Engineering Research Center of Highly Reliable Embedded System, Beijing Advanced Innovation Center for Imaging Technology, Beijing Collaborative Innovation Center of Mathematics and Information Science for their support. This work was supported by the National Natural Science Foundation of China (61373034, 61303014, 61472468, 61572331), the Project of Beijing Municipal Science & Technology Commission(Z141100002014001).

References

1. Huang, Z.: The real-time operating system development and implemen- tation for industrial robot controller. School of Mechanical Engineering & Automation, Beijing University of Aeronautics, Beijing (2013)

2. Han, H.: Improvement for wireless TCP based on bit error rate monitoring. J. Comput. Appl.

31(10), 2657–2659 (2011)

3. Li, M., Zhang, Y., Xiang, D.: Improvement mechanism of transmission control protocol in wireless network. Comput. Eng.42(1), 103–108 (2016)

4. Wei, S., Yang, W., Shen, Y.: A secret communication method based on reliable packet ordering. J. Chin. Comput. Syst.37(1), 124–128 (2016)

5. Wang, Z.: Survey of model checking. Comput. Sci.40(6A), 1–14 (2013)

6. PRISM-probabilistic symbolic model checker [EB/OL] (2011). http://www.prismmodel checker.org/

7. Li, F.: Formal description and verification of TCP protocol based on colored Petri nets. Mod.

Comput.309(6), 49–52 (2009)

8. Li, Y., Li, X., Guan, Y.: Formal modeling and probabilistic analysis of SpaceWire protocol.

J. Chin. Comput. Syst.34(9), 25–29 (2013)

9. Wang, Y., Wang, R., Guan, Y.: Formal verification of node to node communication in RGMP-ROS hybrid operating system. J. Chin. Comput. Syst.36(10), 2379–2383 (2015) 10. Parker, D.A.: Implementation of Symbolic Model Checking for Probabilistic Systems.

University of Birmingham (2002)

11. Su, K., Luo, X., Lu, G.: Symbolic model checking for CTL. Chin. J. Comput.28(11), 1798–

1806 (2005)

12. TCP three-way-handshake and four recovery summary [EB/OL].http://blog.csdn.net 13. Li, Z.: Research on design of intelligent gateway in substation. Jiangsu university of science

and technology (2007)

14. Liu, Y.: Research on reliability design and test method of device driver. University of Electronic Science and technology (2014)

15. Wu, Y.: Design of P2P instant messaging software based on XMPP protocol. Electronic information technology and instruments of Zhejiang University, Hangzhou (2007) 614 X. Li et al.

Một phần của tài liệu Collaborate computing networking, applications and worksharing (Trang 615 - 675)

Tải bản đầy đủ (PDF)

(706 trang)