Tìm tầm ảnh hưởng của bài báo khoa học trong mạng trích dẫn

TRANG BÌA BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP HCM - PHAN HỒNG TRUNG TÌM TẦM ẢNH HƯỞNG CỦA BÀI BÁO KHOA HỌC TRONG MẠNG TRÍCH DẪN LUẬN VĂN THẠC SĨ Chuyên ngành: Công Nghệ Thông Tin Mã số ngành: 60480201 TP HỒ CHÍ MINH, tháng 08 năm 2017 BÌA TRONG HEADER BỘ GIÁO DỤC VÀ ĐÀO TẠO TRƯỜNG ĐẠI HỌC CÔNG NGHỆ TP HCM - PHAN HỒNG TRUNG TÌM TẦM ẢNH HƯỞNG CỦA BÀI BÁO KHOA HỌC TRONG MẠNG TRÍCH DẪN LUẬN VĂN THẠC SĨ Chuyên ngành: Công Nghệ Thông Tin Mã số ngành: 60480201 CÁN BỘ HƯỚNG DẪN KHOA HỌC: PGS.TS.ĐỖ PHÚC TP HỒ CHÍ MINH, tháng 08 năm 2017 HỘI ĐỒNG ĐÁNH GIÁ LUẬN VĂN CƠNG TRÌNH ĐƯỢC HỒN THÀNH TẠI TRƯỜNG ĐẠI HỌC CƠNG NGHỆ TP HCM Cán hướng dẫn khoa học: PGS.TS.ĐỖ PHÚC Luận văn Thạc sĩ bảo vệ Trường Đại học Công nghệ TP.HCM ngày 19 tháng 11 năm 2017 Thành phần Hội đồng đánh giá Luận văn Thạc sĩ gồm: Họ tên Chức danh Hội đồng PGS.TS.Vũ Đức Lung Chủ tịch PGS.TS.Võ Đình Bảy Phản biện TS.Vũ Thanh Hiền Phản biện TS.Cao Tùng Anh Ủy viên TS.Văn Thiên Hoàng Ủy viên, Thư ký TT Chủ tịch Hội đồng đánh giá LV PGS.TS.Vũ Đức Lung TRƯỜNG ĐH CÔNG NGHỆ TP.HCM CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM VIỆN ĐÀO TẠO SAU ĐẠI HỌC Độc lập – Tự – Hạnh phúc TP HCM, ngày 18 tháng 08 năm 2017 NHIỆM VỤ LUẬN VĂN THẠC SĨ Họ tên học viên: PHAN HỒNG TRUNG Giới tính: Nam Ngày, tháng, năm sinh: 01/03/1968 Nơi sinh: Đồng Tháp Chuyên ngành: Công Nghệ Thông Tin MSHV: 1541860051 I- Tên đề tài: TÌM TẦM ẢNH HƯỞNG CỦA BÀI BÁO KHOA HỌC TRONG MẠNG TRÍCH DẪN II- Nhiệm vụ nội dung: Cài đặt OrientDB, Scala, Apache Spark, IntelliJ IDEA Thu thập, tổ chức lưu trữ liệu đồ thị OrientDB Chuyển từ đồ thị OrientDB thành đồ thị GraphX để phân tích Tìm tầm ảnh hưởng báo Tìm báo có tầm ảnh hưởng nhiều Xác định độ lan tỏa báo theo thời gian Biểu diễn trực quan đồ thị Mở rộng hệ thống Apache Spark Cluster gồm nhiều máy tính III- Ngày giao nhiệm vụ: 15/03/2017 IV- Ngày hoàn thành nhiệm vụ: 18/08/2017 V- Cán hướng dẫn: PGS.TS Đỗ Phúc CÁN BỘ HƯỚNG DẪN KHOA QUẢN LÝ CHUYÊN NGÀNH (Họ tên chữ ký) (Họ tên chữ ký) PGS.TS.Đỗ Phúc -i- LỜI CAM ĐOAN Tôi xin cam đoan cơng trình nghiên cứu riêng tơi Các số liệu, kết nêu Luận văn trung thực chưa công bố cơng trình khác Tơi xin cam đoan giúp đỡ cho việc thực Luận văn cảm ơn thơng tin trích dẫn Luận văn rõ nguồn gốc Học viên thực Luận văn (Ký ghi rõ họ tên) Phan Hồng Trung - ii - LỜI CÁM ƠN Xin gửi lời cảm ơn chân thành đến Thầy, PGS.TS Đỗ Phúc, cho đề tài thú vị nhiệt tình hướng dẫn, cung cấp tài liệu, kiến thức, gợi ý, góp ý hữu ích trình làm Luận văn Xin chân thành cảm ơn Thầy:  PGS.TS Võ Đình Bảy  PGS.TS Quản Thành Thơ  PGS.TS Lê Hoàng Thái  TS Nguyễn Sinh Kế  TS Nguyễn An Khương  TS Đặng Trường Sơn  TS Trần Đức Khánh cung cấp cho tơi nhiều tri thức q báu suốt khóa học, nhờ tơi hồn thành tốt đề tài Luận văn Xin chân thành cảm ơn nhà trường, phòng, khoa, đặc biệt Phòng Đào Tạo Sau Đại Học nhiệt tình giúp đỡ hỗ trợ suốt khóa học Kính chúc Thầy, Cơ nhiều sức khỏe TP Hồ Chí Minh, ngày 18 tháng 08 năm 2017 Phan Hồng Trung - iii - TÓM TẮT Big Data Graph Databases hai nội dung nghiên cứu mới, hấp dẫn đầy tiềm Tuy nhiên, qua thông tin Internet, Việt Nam việc ứng dụng khai thác Big Data Graph Databases mẻ chưa với tiềm tầm quan trọng Nhiều doanh nghiệp Việt Nam nắm tay nguồn Big Data chưa biết cách khai thác mức Vì chọn đề tài để nghiên cứu ứng dụng hai kỹ thuật điều cấp thiết Đó lý tơi chọn đề tài “TÌM TẦM ẢNH HƯỞNG CỦA BÀI BÁO KHOA HỌC TRONG MẠNG TRÍCH DẪN” Với đề tài tơi có thể:  Dùng kỹ thuật Graph Databases để tổ chức, lưu trữ truy vấn mạng trích dẫn chất mạng trích dẫn đồ thị  Dùng kỹ thuật Big Data để phân tích xử lý mạng trích dẫn mạng trích dẫn thực nghiệm đề tài lớn Mục tiêu đề tài nghiên cứu, ứng dụng Big Data Graph Databases; góp phần để nước nhà bắt kịp xu hướng giới việc khai thác, ứng dụng chúng thực tiễn Cụ thể, đề tài xây dựng hệ thống Citation Network Explorer (CNE) để tìm tầm ảnh hưởng báo khoa học mạng trích dẫn Hệ thống CNE bao gồm chức sau: Tải sở liệu đồ thị vào hệ thống xử lý Big Data Biểu diễn trực quan đồ thị Tìm tầm ảnh hưởng báo Tìm báo có tầm ảnh hưởng nhiều Tìm thành phần liên thơng chứa báo Xác định độ lan tỏa báo theo thời gian Trong đề tài này, triển khai kỹ thuật sau:  Sử dụng sở liệu đồ thị, cụ thể OrientDB, để tổ chức, lưu trữ mạng trích dẫn báo khoa học - iv  Dùng tảng Apache Spark để khai thác Big Data, cụ thể sử dụng GraphX để phân tích mạng trích dẫn  Dùng ngơn ngữ lập trình Scala kết hợp với Play Framework để xây dựng ứng dụng  Dùng thư viện VisJs để biểu diễn trực quan mạng trích dẫn Ngồi ra, tơi trình bày kết thực nghiệm số chức hệ thống CNE để đo đạc thời gian xử lý chức đồ thị với qui mô khác Cuối phần đánh giá kết thực đề tài so với dự định ban đầu kinh nghiệm thu thập Ngồi phần tơi đề nghị số định hướng phát triển, mở rộng đề tài -v- ABSTRACT Big Data and Graph Databases are two new, exciting and potential research topics However, through information on the Internet, in Vietnam the application and exploitation of Big Data as well as Graph Databases is still quite new and not true with its potential and importance Many Vietnamese enterprises are in the hands of Big Data but not know how to exploit properly Therefore, it is imperative to choose a topic to study and apply both techniques That's why I chose the topic “FIND INFLUENCE OF SCIENTIFIC PAPERS IN A CITATION NETWORK” With this topic I can:  Using Graph Databases to organize, store, and query citation networks because the nature of citation networks is a graph  Using Big Data technology to analyze and process citation networks because of the citation network experiment in this thesis quite large The objectives of the research are to study, apply Big Data and graph database; Contribute to the country to catch up with the trend of the world in exploiting, applying them in practice Specifically, on this topic I built the Citation Network Explorer (CNE) application to find the influence of scientific papers in the large citation network The CNE application includes the following functions: Load graph databases into the Big Data processing system Visualize graphs Find the influence of a paper Find the most influential paper Find the connected component that contains a paper Find the propagation of a paper over time In this topic, I am deploying the following new techniques:  Using the graph database, namely the OrientDB, to organize, store citation networks of scientific papers  Using Apache Spark to exploit Big Data, specifically using GraphX to - vi analyze citation networks  Using Scala programming language in combination with Play Framework to build applications  Using VisJs library to visualize the citation network In addition, I also present experimental results of some of the major functions of the CNE system to measure the processing time of functions on graphs with different scales Finally, that is the evaluation of the results of the project against the original plan as well as the experience gained In addition, in this section I also propose some orientations of expanding the topic Lưu ý thực hàm createEdges() Scala tương tự thực hàm importNodes(), thêm số cạnh nên thực lệnh commit() để giải phóng nhớ, để tránh tượng tràn nhớ; dòng 19 Code 3, sau thêm 100 cạnh vào đồ thị thực lệnh commit() Hiện thực hàm main() Code – Hiện thực hàm main() Hàm main() gọi hàm để tạo nội dung cho sở liệu đồ thị def main(args: Array[String]) { println(“Begin creating the graph ”) val uri: String = “remote:localhost/cn” val factory: OrientGraphFactory = new OrientGraphFactory(uri) val graph: OrientGraph = factory.getTx() try { prepareGraph(graph) importNodes(graph, s”D:\\dblp\\dblp.txt”)25 createEdges(graph) 10 } finally { 11 // To release all the instances and free all the resources 12 // (in case of pool usage), call the close() 13 factory.close() 14 } 15 println(“End creating the graph.”) 16 } Hiện thực thuật toán Pregel Apache Spark Trong GraphX, thuật toán Pregel thực phương thức apply đối tượng Pregel Code (Apache Spark Docs, 2017), (Apache Spark API, 2017) Code – Hiện thực thuật toán Pregel Apache Spark Thuật toán xử lý song song giải hầu hết vấn đề đồ thị 25 def apply[VD: ClassTag, ED: ClassTag, A: ClassTag] (graph: Graph[VD, ED], initialMsg: A, maxIterations: Int = Int.MaxValue, activeDirection: EdgeDirection = EdgeDirection.Either) (vprog: (VertexId, VD, A) => VD, sendMsg: EdgeTriplet[VD, ED] => Iterator[(VertexId, A)], mergeMsg: (A, A) => A) : Graph[VD, ED] = { Giả sử đường dẫn text file chứa mạng trích dẫn D:\dblp\dblp.txt 10 11 require(maxIterations >= 0, s”Maximum number of iterations must be greater than or equal to 0, but got ${maxIterations}”) 12 var g = graph.mapVertices((vid, vdata) => vprog(vid, vdata, initialMsg)) 13 .cache() 14 // compute the messages 15 var messages = GraphXUtils.mapReduceTriplets(g, sendMsg, mergeMsg) 16 var activeMessages = messages.count() 17 // Loop 18 var prevG: Graph[VD, ED] = null 19 var i = 20 while (activeMessages > && i < maxIterations) { 21 // Receive the messages and update the vertices 22 prevG = g 23 g = g.joinVertices(messages)(vprog).cache() 24 val oldMessages = messages 25 // Send new messages, skipping edges where neither side received a message 26 // We must cache messages so it can be materialized on the next line, 27 // allowing us to uncache the previous iteration 28 messages = GraphXUtils.mapReduceTriplets( 29 g, sendMsg, mergeMsg, Some((oldMessages, activeDirection))).cache() 30 // The call to count() materializes `messages` and the vertices of `g` 31 // This hides oldMessages (depended on by the vertices of g) and 32 // the vertices of prevG (depended on by oldMessages and the vertices of g) 33 activeMessages = messages.count() 34 // Unpersist the RDDs hidden by newly-materialized RDDs 35 oldMessages.unpersist(blocking = false) 36 prevG.unpersistVertices(blocking = false) 37 prevG.edges.unpersist(blocking = false) 38 // count the iteration 39 i += 40 logInfo(“Pregel finished iteration “ + i) 41 } 42 messages.unpersist(blocking = false) 43 g 44 } // end of apply Hiện thực thuật toán Find Connected Component Code – Hiện thực thuật toán Find Connected Component Thuật toán tìm thành phần liên thơng chứa báo def findConnectedComponent(id: Int): Graph[Int, Int] = {//Graph[depth,nothing] val mapGraph = graph .mapVertices { case (vid, _) => if (vid == id) else -1 } .cache //unreal depth val initialMessage = -1 //unreal depth val pregelGraph = pregel(mapGraph, initialMessage, Int.MaxValue, EdgeDirection.Either)( 10 vprog = (vid, attr, msg) => { 11 attr max msg 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 } }, sendMsg = (edge) => { if (edge.srcAttr > initialMessage && edge.dstAttr == initialMessage) { Iterator((edge.dstId, edge.srcAttr + 1)) } else if(edge.dstAttr > initialMessage && edge.srcAttr == initialMessage){ Iterator((edge.srcId, edge.dstAttr + 1)) } else { Iterator.empty } }, mergeMsg = (msg1, msg2) => { msg1 max msg2 }) mapGraph.unpersist() //remove nodes not connected to the selected node pregelGraph.subgraph(vpred = { case (vid, depth) => depth >= }) Hiện thực thuật toán Find Propagation Code – Hiện thực thuật tốn Find Propagation Thuật tốn tìm lan tỏa báo theo thời gian theo hướng In, không giới hạn số năm lan tỏa def spreadInOf(id: Int): Graph[Int, Int] = { //Graph[id,nothing] val mapGraph = graph .mapVertices { case (vid, _) => if (vid == id) id else } .cache //unreal vid val initialMessage = //unreal vid val pregelGraph = pregel(mapGraph, initialMessage, Int.MaxValue, EdgeDirection.In)( 10 vprog = (vid, attr, msg) => { 11 attr max msg 12 }, 13 sendMsg = (edge) => { 14 if (edge.dstAttr == id && edge.srcAttr == initialMessage) { 15 Iterator((edge.srcId, id)) 16 } else { 17 Iterator.empty 18 } 19 }, 20 mergeMsg = (msg1, msg2) => { 21 msg1 max msg2 22 }) 23 24 //remove nodes not connected to the selected node 25 mapGraph.unpersist() 26 pregelGraph.subgraph(vpred = { 27 case (vid, vd0) => vd0 == id 28 }) 29 } Hiện thực thuật toán Load Graph Code – Hiện thực thuật toán Load Graph Thuật toán đọc sở liệu đồ thị từ OrientDb Server, chuyển thành đồ thị GraphX Apache Spark def loadGraph(pageSize: Int): Graph[(Short,Float), Byte] = {//Graph[(year,tempRank),nothing] var graph: Graph[(Short,Float), Byte] = null val ograph = factory.getTx val db: ODatabaseDocumentTx = ograph.getRawGraph try { //Tạo đối tượng SparkContext val sc: SparkContext = sparkContext.get //Tạo vertexRDD var vertexRDD: RDD[(Long, (Short,Float))] = sc.emptyRDD 10 //Đọc đỉnh theo trang 11 var query: OSQLSynchQuery[ODocument] = new OSQLSynchQuery[ODocument](s"select from Paper where @rid > ? LIMIT ${pageSize}") 12 var resultset: OConcurrentResultSet[ODocument] = db.query(query, new ORecordId()) 13 while (!resultset.isEmpty()) { 14 val last: ORID = resultset.get(resultset.size() - 1).getIdentity() 15 val list = resultset.map(v => (v.field[Long]("id"), (v.field[Short]("year"),1:Float))).toList 16 vertexRDD = vertexRDD ++ sc.parallelize(list) 17 resultset = db.query(query, last) 18 } 19 //Tạo edgeRDD 20 var edgeRDD: RDD[Edge[Byte]] = sc.emptyRDD 21 //Đọc cạnh theo trang 22 query = new OSQLSynchQuery[ODocument](s"select from Reference where @rid > ? LIMIT ${pageSize}") 23 resultset = db.query(query, new ORecordId()) 24 while (!resultset.isEmpty()) { 25 val last: ORID = resultset.get(resultset.size() - 1).getIdentity(); 26 val list = resultset.map(e => Edge(e.field[Long]("src"), e.field[Long]("dst"), 0.toByte)).toList 27 edgeRDD = edgeRDD ++ sc.parallelize(list) 28 resultset = db.query(query, last) 29 } 30 //Tạo graphx 31 graph = Graph(vertexRDD, edgeRDD) 32 } finally { 33 ograph.shutdown 34 } 35 graph 36 } Hiện thực thuật toán Traverse Graph Code – Hiện thực thuật toán Traverse Graph Thuật toán Traverse Graph duyệt đồ thị OrientDb Server theo chiến lược Breadth First Search (BFS) để xác định thành phần liên thông chứa báo chuyển thành phần liên thơng tìm thành đồ thị GraphX Apache Spark def traverse(id: Long, maxDepth: Int, limit: Int): Graph[(Short,Float), Byte] = {//Graph[(year,tempRank),nothing] var graph: Graph[(Short,Float), Byte] = null //Lấy OrientDb Graph val ograph = factory.getTx try { //Lấy Paper.id index val index = factory.getDatabase.getMetadata.getIndexManager.getIndex(“Paper.id”) //Lấy @rid đỉnh xuất phát val rid = index.get(id) 10 //Nếu có báo id 11 if (rid != null) { 12 //vertices chứa đỉnh duyệt 13 val vertices = mutable.Set[OrientVertex]() 14 //edges chứa cạnh duyệt 15 val edges = mutable.Set[Edge[Byte]]() 16 //Lấy đỉnh xuất phát 17 var vertex = ograph.getVertex(rid) 18 //Thêm đỉnh xuất phát vào vertices 19 vertices.add(vertex) 20 var count = //Đếm số đỉnh vertices 21 var depth = //Độ sâu 22 //Thêm (đỉnh xuất phát,độ sâu tại) vào hàng đợi q 23 val q = Queue((vertex, depth)) 24 breakable { 25 //Trong hàng đợi q khác rỗng 26 while (!q.isEmpty) { 27 //Lấy đỉnh hàng đợi q duyệt 28 val element = q.dequeue 29 vertex = element._1 30 depth = element._2 31 //Nếu đến độ sâu tối đa ngưng 32 if (depth == maxDepth) break 33 depth += 34 //Đọc đỉnh độ sâu 35 vertex.getVertices(Direction.BOTH).foreach({case (v: OrientVertex) => 36 { 37 //Thêm cạnh duyệt vào edges 38 edges.add(Edge(v.getProperty[Long](“id”), vertex.getProperty[Long](“id”), 0.toByte)) 39 //Nếu chưa duyệt v 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 if (!vertices.contains(v)) { q.enqueue((v, depth)) vertices.add(v) count += //Nếu lấy số đỉnh tối đa ngưng if (count == limit) break } } }) //foreach } //while } //breakable //Tạo đối tượng SparkContext val sc: SparkContext = sparkContext.get //Tạo vertexRDD edgeRDD val vertexRDD = sc.parallelize(vertices.map(v => (v.getProperty[Long](“id”), (v.getProperty[Short](“year”),1:Float))).toList) 55 val edgeRDD = sc.parallelize(edges.toList) 56 //Tạo graph 57 graph = Graph(vertexRDD, edgeRDD) 58 } 59 } finally { 60 ograph.shutdown 61 } 62 graph 63 } 10 Hiện thực thuật toán Visualize Graph lớp TasksController Code 10 – Hiện thực thuật toán Visualize Graph lớp TasksController TasksController chuyển đồ thị GraphX thành định dạng JSON chuyển đồ thị định dạng JSON cho Index def getGraph = Action { implicit request => // Lấy đồ thị GraphX cache val graph = grapher.getGraph // Chuyển đỉnh đồ thị GraphX thành JsArray val nodes: JsArray = if (graph.vertices.isEmpty) { new JsArray() } else { graph.vertices.map { case (vid, (year, rank)) => Json.arr(Json.obj(“id” -> vid, “label” -> “%d\n%.2f”.format(vid, rank), 10 “group” -> year, “level” -> year, “value” -> “%.2f”.format(rank))) 11 }.reduce(_ ++ _) 12 } 13 // Chuyển cạnh đồ thị GraphX thành JsArray 14 val edges: JsArray = if (graph.edges.isEmpty) { 15 new JsArray() 16 } else { 17 graph.edges.map { e => 18 Json.arr(Json.obj(“from” -> e.srcId, “to” -> e.dstId)) 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 } }.reduce(_ ++ _) } // Lấy đỉnh thích đồ thị GraphX // Chuyển đỉnh thích đồ thị GraphX thành JsArray val legends: JsArray = if (graph.vertices.isEmpty) { new JsArray() } else { // vid đỉnh đại diện cho year grapher.getLegends.map { case (year, vid) => Json.arr(Json.obj(“id” -> year, “label” -> year, “level” -> 1, “fixed” -> true, “vid” -> vid )) }.reduce(_ ++ _) } // Gửi đỉnh, cạnh, thích định dạng JSON cho Index Ok(Json.obj(“nodes”->nodes,”edges”->edges,”legends”->legends)) 11 Hiện thực thuật toán Visualize Graph lớp Index Code 11 – Hiện thực thuật toán Visualize Graph lớp Index Index đọc đồ thị định dạng JSON thực biểu diễn trực quan đồ thị ajax_request({ url:"/tasks/getGraph", data:{}, // Xử lý kết Web Server gửi callback: function(graph_data){ // Nếu khơng có liệu báo lỗi if(graph_data==null){ status_show("error","Please load the graph!"); return; 10 } 11 // Tạo dataset 12 dataset = { 13 nodes: new vis.DataSet(graph_data.nodes), 14 edges: new vis.DataSet(graph_data.edges) 15 }; 16 // Vẽ đồ thị 17 var container_graph = document.querySelector("#panel-content") 18 graph = new vis.Network(container_graph, dataset, options_simple); 19 // Sau vẽ đồ thị xong in thơng báo 20 graph.once("afterDrawing", function(eventData){ 21 status_show("success", "System has processed the request successfully.") 22 }); 23 // Vẽ phần thích 24 add_vertical_legend(graph_data.legends); 25 } 26 }) PHỤ LỤC 2: CÁC CÀI ĐẶT CẦN THIẾT Cài đặt Java 1.1 Download Vào website http://www.oracle.com/technetwork/java/javase/downloads/jdk8downloads-2133151.html chọn jdk-8u131-windows-x64.exe để download 1.2 Cài đặt Chạy file jdk-8u131-windows-x64.exe để cài đặt 1.3 Các thao tác 1.3.1 Xem version Java java –version 1.3.2 Biên dịch chương trình javac filename.java 1.3.2 Chạy chương trình java filename Cài đặt Scala 2.1 Download Vào website https://www.scala-lang.org/download/ để download file cài đặt Scala Phiên Scala 2.12.2 file cài đặt tương ứng scala- 2.12.2.msi 2.2 Cài đặt Chạy file scala-2.12.2.msi 2.3 Các thao tác 2.3.1 Xem version Scala scala –version 2.3.2 Biên dịch chương trình scalac filename.scala 2.3.3 Chạy chương trình scala filename 2.3.4 Sử dụng môi trường tương tác Scala cung cấp mơi trường tương tác cho phép người lập trình thực câu lệnh Scala Môi trường dùng để chạy thử câu lệnh đoạn chương trình đơn giản, tiện dụng cho người bắt đầu học Scala Để vào môi trường tương tác thực lệnh scala Sau viết chạy thử chương trình đơn giản HelloWorld Để khỏi mơi trường tương tác thực lệnh :q :quit Hình sau minh họa việc sử dụng môi trường tương tác Scala: Cài đặt IntelliJ IDEA 3.1 Download Vào website https://www.jetbrains.com/idea/download để download file cài đặt IntelliJ IDEA File cài đặt ideaIC-2017.1.5.exe 3.2 Cài đặt Chạy file ideaIC-2017.1.5.exe để tiến hành cài đặt 3.3 Các thao tác 3.3.1 Tạo dự án Chọn menu File  New  Project 3.3.2 Chạy dự án Chọn menu Run  Run Giao diện IntelliJ IDEA hình sau: Cài đặt OrientDb 4.1 Download Vào website http://orientdb.com/download/ để download OrientDb OrientDB 2.2.21 GA Community Edition (June 1st, 2017) dành cho môi trường Windows Sau download ta có file orientdb-community-2.2.21.zip 4.2 Cài đặt Để cài đặt OrientDb ta cần giải nén file orientdb-community-2.2.21.zip 4.3 Các thao tác 4.3.1 Chạy OrientDb Server Vào folder orientdb-community-2.2.21\bin, nhấp Giao diện OrientDb Server lên hình sau: đúp lên file server.bat Khi chạy OrientDB Server lần đầu tiên, server yêu cầu đặt password cho root Password lưu file orientdb-server-config.xml Để shutdown OrientDb Server:  Chuyển qua cửa sổ OrientDb Server  Bấm Ctrl-C đợi OrientDb Server lên câu hỏi Terminate batch job (Y/N)? Trả lời Yes cách gõ phím Y 4.3.2 Chạy OrientDb Console OrientDb Console chương trình client cung cấp giao diện dịng lệnh để người sử dụng tương tác với OrientDb Server Để chạy OrientDb Console: Vào folder orientdb-community-2.2.21\bin, nhấp đúp lên file Giao diện OrientDb Console lên sau: Để thoát khỏi OrientDb Console ta cần thực lệnh exit console.bat 4.3.3 Chạy OrientDb Studio OrientDb Studio chương trình client cung cấp giao diện Web để người sử dụng tương tác với OrientDb Server Để chạy OrientDb Studio: Mở browser nhập địa chỉ: http://localhost:2480 Giao diện OrientDb Studio lên sau: Cài đặt Apache Spark 5.1 Download Vào website https://spark.apache.org/downloads.html để download Apache Spark Phiên file nén spark-2.1.0-bin-hadoop2.7.tgz Ngoài cần download file winutils.exe để giả môi trường Hadoop File download website https://github.com/steveloughran/winutils/tree/master/hadoop2.6.0/bin 5.2 Cài đặt Trước cài đặt Spark phải đảm bảo cài đặt Java Scala Giả sử Java Scala cài đặt vào folder sau: C:\Program Files\Java\jdk1.8.0_31 C:\Program Files (x86)\scala Việc cài đặt Apache Spark sau:  Giải nén file spark-2.1.0-bin-hadoop2.7.tgz Giả sử kết sau giải nén ta có folder D:\proj\spark-2.1.0-bin-hadoop2.7  Giả lập mơi trường Hadoop:  Tạo folder D:\proj\hadoop\bin  Chép file winutils.exe vào folder D:\proj\hadoop\bin  Thiết lập biến môi trường: JAVA_HOME=C:\Program Files\Java\jdk1.8.0_31 SCALA_HOME=C:\Program Files (x86)\scala HADOOP_HOME=D:\proj\hadoop SPARK_HOME=D:\proj\spark-2.1.0-bin-hadoop2.7 PATH=%PATH%;%JAVA_HOME%\bin;%SCALA_HOME%\bin;%HADOOP_HOME%\bin; %SPARK_HOME%\bin;  Vào chế độ Command Prompt, thực lệnh sau: winutils.exe chmod 777 C:\tmp\hive26 5.3 Các thao tác 5.3.1 Vào Spark-shell spark-shell Kết sau: 26 Nếu không thấy folder C:\tmp\hive chạy lệnh spark-shell 5.3.2 Thốt khỏi Spark-shell :q :quit 5.3.3 Vào giao diện Web Spark http://localhost:4040 ... đề tài “TÌM TẦM ẢNH HƯỞNG CỦA BÀI BÁO KHOA HỌC TRONG MẠNG TRÍCH DẪN” Với đề tài tơi có thể:  Dùng kỹ thuật Graph Databases để tổ chức, lưu trữ truy vấn mạng trích dẫn chất mạng trích dẫn đồ thị... để tìm tầm ảnh hưởng báo khoa học mạng trích dẫn Hệ thống CNE bao gồm chức sau: Tải sở liệu đồ thị vào hệ thống xử lý Big Data Biểu diễn trực quan đồ thị Tìm tầm ảnh hưởng báo Tìm báo có tầm ảnh. .. đề tài “TÌM TẦM ẢNH HƯỞNG CỦA BÀI BÁO KHOA HỌC TRONG MẠNG TRÍCH DẪN”2 Với đề tài tơi có thể:  Dùng kỹ thuật Graph Databases để tổ chức, lưu trữ truy vấn mạng trích dẫn chất mạng trích dẫn đồ

Định dạng
Số trang	132
Dung lượng	3,37 MB