Experimental Results and Analysis

Một phần của tài liệu Collaborate computing networking, applications and worksharing (Trang 102 - 108)

This paper compares the compression algorithm Huffman compression algorithm, adap‐

tive Huffman compression algorithm and the multiple order context Huffman compres‐

sion algorithm based on Markov model, from two aspects:

Comparison of Compression ratio: Comparison of different amount of data on the same type of data (the string type). (Note: compression ratio = amount of data after compression/amount of data before compression)

Figures 3, 4, 5 and 6 are the comparison result of one-order, two-order, three-order and four-order Markcov chain compression algorithm with the other two compression algorithm respectively. Figure 7 is the comparison result of different order.

Fig. 3. Compression result of one-order context

Fig. 4. Compression result of two-order context

Fig. 5. Compression result of three-order context

Fig. 6. Compression result of four-order context

As shown in Fig. 3, the compression effect of the Huffman algorithm based on Markov chain is better than the other two algorithms in the case of using one-order context compression. The compression effect using two-order context compression is shown in Fig. 4. We can see the advantage of Huffman algorithm based on Markov chain is more obvious. When the amount of data reaches 1000 KB, compression ratio of the Huffman algorithm based on Markov chain is 45%, the other algorithm are 72% and 62%

respectively.

90 Y. Huo et al.

The three-order case is shown in Fig. 5, and the compression algorithm proposed in this paper has a good compression effect. The compression rate can reach 42%. We can see that compared to the other two algorithms, the advantage of Huffman algorithm based on Markov chain is more and more obvious in the four-order case.

As shown in above, multiple order context compression algorithm based on Markov chain is better than the other two compression algorithms. This is because the Huffman compression algorithm and the adaptive Huffman compression algorithm only take each character as an independent coding unit without taking into account the link between characters. The algorithm proposed in this paper considers the link between the char‐

acters, and makes the letters merged and compressed. But with the increase of the compression order, the compression rate is stable. Comparison of the compression ratio in different order of context is shown in Fig. 8 for the algorithm proposed in this paper.

Fig. 8. Comparison of the compression ratio Fig. 7. Change of compression of each order

An Adaptive Multiple Order Context Huffman Compression Algorithm 91

As can be seen in Fig. 8, with the increase of the order of context, the compression ratio is gradually reduced, and the compression effect becomes better. But when the order is more than three, the compression rate becomes stable. This is because with the increase of the context order, the relation between characters becomes weak. So the optimal context compression order should be three or four.

5 Conclusion

Through the analysis of Sect. 4, we can see Huffman compression algorithm based on Markov chain is superior to the traditional Huffman compression algorithm and the adaptive Huffman compression algorithm. By constructing the Markov model and calculating the transfer probability matrix, characters are merged and compressed, which gets fine effect. The performance of the compression algorithm proposed in this paper is good, but we use threshold (e.g. TEMP), the value of which is also an important factor affecting the results of the experiment. These thresholds need to be further tested to achieve better compression results.

References

1. Kuruvila, M., Gopinath, D.P.: Entropy of Malayalam language and text compression using Huffman coding. In: First International Conference on Computational Systems and Communications. IEEE, pp. 150–155 (2014)

2. Wu, J., Dai, W., Xiong, H.: Regional context model and dynamic Huffman binarization for adaptive entropy coding of multimedia. In: IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, pp. 1–6. IEEE (2014)

3. Wang, Z., Le, J.J., Wang, M., et al.: The column storage district level data compression mode and compression strategy selection method. In: ndbc2010 National Database Conference of China, pp. 523–1530 (2010)

4. Darwiyanto, E., Pratama, H.A., Septiana, G.: Text data compression for mobile phone using burrows-wheeler transform, move-to-front code and arithmetic coding (Case Study: Sunan Ibnu Majah Bahasa Translation). In: International Conference on Information and Communication Technology, pp. 178–183. IEEE (2015)

5. Wang, W.J., Lin, C.H.: Code compression for embedded systems using separated dictionaries.

IEEE Trans. Very Large Scale Integr. Syst. 24, 1 (2015)

6. Yokoo, H.: An adaptive data compression method based on context sorting. In: Data Compression Conference, pp. 160–169. IEEE (1996)

7. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans.

Inform. Theory 23(3), 337–343 (1977)

8. Ren, W., Wang, H., Xu, L., et al.: Research on a quasi-lossless compression algorithm based on huffman coding. In: 2011 International Conference on Transportation, Mechanical, and Electrical Engineering (TMEE), pp. 1729–1732. IEEE (2011)

9. Ong, G.H., Ng, J.P.: Dynamic Markov compression using a crossbar-like tree initial structure for chinese texts. In: International Conference on Information Technology and Applications, pp. 407–410. IEEE (2005)

92 Y. Huo et al.

10. Wei, J., Wang, S., Zhang, L., et al.: Minimizing data transmission latency by bipartite graph in MapReduce. In: IEEE International Conference on CLUSTER Computing, pp. 521–522.

IEEE (2015)

11. Papamichalis, P.E.: Markov-Huffman coding of LPC parameters. IEEE Trans. Acoust. Speech Signal Proc. 33(2), 451–453 (1985)

12. Nandi, U., Mandal, J.K.: Adaptive region based huffman compression technique with selective code interchanging. In: Advances in Computing and Information Technology, pp. 739–748 (2012)

13. Singh, S., Singh, H.: Improved adaptive huffman compression algorithm. Int. J. Comput.

Technol. 1(1), 1–6 (2011)

An Adaptive Multiple Order Context Huffman Compression Algorithm 93

Course Relatedness Based on Concept Graph Modeling

Pang Jingwen1, Cao Qinghua1, and Sun Qing1,2(B) 1 School of Computer Science and Engineering,

Beihang University, Beijing 100191, China {pangjingwen,caoqinghua,sunqing}@buaa.edu.cn

2 School of Economics and Management, Beihang University, Beijing 100191, China

Abstract. Analyzing the relatedness between courses can help students plan their own curricula more efficiently, especially for the learning on MOOC platforms. However, there are few researchers that concentrate on mining the relationship between courses. In this paper, we propose a method to compare relatedness between courses based on representing courses as concept graphs. The concept graph comprises not only the semantic relationship between concepts but also the importance of con- cepts in the course. Moreover, we take a cluster analysis to find relevant concepts between two courses and take advantage of Similar Concept Groups to compute the degree of course relatedness. We experimented with a collection of English syllabi from Beihang University and experi- ments show better performance than the state-of-the-art.

Keywords: Course relatednessãConcept graphãDBpediaãClustering 1 Introduction

Understanding the relatedness among curricula is important for students to make curriculum planning. As the quantity of online educational resources grows rapidly, it becomes necessary to obtain the course relatedness automatically. If a student already learntData Miningat school and wants to learn more about it on a MOOC platform, how does he choose an appropriate course from the ones with similar titles, such asData Mining Capstone, Pattern Discovery in Data Mining, Cluster Analysis in Data Miningand so on? It is hard to solve these problems with- out an accurate representation of overlapped course contents. In addition, more and more students take part in international exchange student programs in uni- versities. There is not a detailed criterion to compare contents between courses in different universities and complete credit transfer. Hence, many students have to waste time to retake similar courses. Additionally, curriculum design and evalu- ation requires a deep insight into the difference and relatedness between courses and abundant domain knowledge. It will take much more time to finish the task manually as the quantity of courses grows. Therefore, it is significant to give an

c ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2017 S. Wang and A. Zhou (Eds.): CollaborateCom 2016, LNICST 201, pp. 94–103, 2017.

DOI: 10.1007/978-3-319-59288-6 9

Course Relatedness Based on Concept Graph Modeling 95 accurate measure of course relatedness automatically in order to help students and teachers improve their efficiency of study or work.

Some methods have been proposed to automate the process to measure course relatedness. Since course data is usually text, most work will involve methods of computing text similarity. Yang et al. [14] learn a directed universal concept graph and use it to explain the course content overlap and detect prerequisite relations among courses. They use four different schemes to represent the course content.

Two of schemes use human-readable words or Wikipedia categories as the con- cept space and the others map course contents into latent features. Although this method has a good performance on inducing prerequisite relations, there is no single concept graph to describe contents of a course and no specific evaluation of course relatedness. Jean et al. [10] analyze conceptual overlap between courses with Latent Dirichlet allocation (LDA) [1]. This method transforms every course into a topic vector and calculates the distance between vectors. However, latent topics are not explicit course concepts and cannot represent the course content directly. Sheng-syun et al. [12] compute similarity between lectures in different online courses retrieved from a query and structure related lectures into a learn- ing map. They utilize words and grammatical features of lecture titles to eval- uate the similarity. In terms of a course, concepts are its basic components. All methods described above do not combine various semantic relationships and the importance of concepts to analyze the course relatedness.

In this paper, we propose a new method to measure the course relatedness.

We first link terms in syllabi to concepts from a knowledge base and regard these concepts as nodes to build a concept graph for each course. Then, we assign weights to edges in the concept graph to measure the association between each pair of concepts. Since the relationship between terms in syllabi is usually implicit, we leverage abundant semantic resources in a knowledge base such as internal links in Wikipedia to obtain explicit relations between concepts. Based on the degree of association between concepts in the graph, we can measure the node strength to represent the concept importance in the course. Finally, after mapping each concept into a continuous vector, we cluster all concepts from any pair of courses to filter irrelevant ones between two courses, and compute course relatedness by leveraging picked concepts and their weights in concept graphs.

In this way, we can reduce the impact of irrelevant concepts on the precision of similarity computation.

Our contributions are as follows.

We propose a new method to assess the course relatedness. The method represents the course content as a concept graph and compare the similarity between concept graphs. We combine two types of semantic relationship of concepts in the knowledge base to construct concept graphs for courses.

We integrate clustering with similarity computation between concept graphs.

By clustering, we classify related concepts from a pair of courses into groups and remove irrelevant concepts between two courses, which reduces the impact of irrelevant concepts on the accuracy of similarity computation.

In the process of measuring the course relatedness, we take the pairwise sim- ilarity of concepts into consideration as well as the importance of concepts in each course to achieve better performance.

96 P. Jingwen et al.

2 Concept Graph Construction

Given a course syllabus, our aim is to build a graph in which nodes are detected concepts from DBpedia by a mention detection tool. We connect any pair of concepts if their associative degree is non-zero. In terms of associative degree, co-occurrence relationship and category relationship are taken into considera- tion. Finally, we regard associative degree between concepts as edge weights and compute node strength for nodes in the graph.

Một phần của tài liệu Collaborate computing networking, applications and worksharing (Trang 102 - 108)

Tải bản đầy đủ (PDF)

(706 trang)