Cấu trúc dữ liệu và giải thuật (Data Structure and Algorithms) Với các sinh viên chuyên nghành tin học thì cụm từ Cấu trúc dữ liệu (Data Structure) không còn là xa lạ. ... Cấu trúc dữ liệu là cách lưu trữ, tổ chức dữ liệu có thứ tự, có hệ thống để dữ liệu có thể được sử dụng một cách hiệu quả Data structures and Algorithms Introduction Pham Quang Dung Hanoi
Data structures and Algorithms Basic definitions and notations Pham Quang Dung Hanoi, 2012 Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 / 46 Outline First example Algorithms and Complexity Big-Oh notation Pseudo code Analysis of algorithms Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 / 46 First example Find the longest subsequence of a given sequence of numbers Given a sequence s = a1 , , an a subsequence is s(i, j) = , , aj , ≤ i ≤ j ≤ n weight w (s(i, j)) = j ak k=i Problem : find the subsequence having largest weight Example sequence : -2, 11, -4, 13, -5, The largest weight subsequence is 11, -4, 13 having weight 20 Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 / 46 Direct algorithm Scan all possible subsequences n = n2 +n Compute and keep the largest weight subsequence C++ code : Analyzing the complexity by counting the number of basic operations n3 + n2 + n Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 / 46 Direct algorithm Faster algorithm j k=i Observation : a[k] = a[j] + j−1 k=i a[k] C++ code : Complexity : Pham Quang Dung () n2 + n Data structures and Algorithms Basic definitions and notations Hanoi, 2012 / 46 Recursive algorithm Divide the sequence into subsequences at the middle s = s1 :: s2 The largest subsequence might be in s1 or be in s2 or start at some position of s1 and end at some position of s2 C++ code : Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 / 46 Recursive algorithm Count the number of addition (”+”) operation T (n) T (n) = if n = T ( n2 ) + T ( n2 ) + n if n > By induction : T (n) = n logn Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 / 46 Dynamic programming General principle Division : divide the initial problem into smaller similar problems (subproblems) Storing solutions to subproblems : store the solution to subproblems into memory Aggregation : establish the solution to the initial problem by aggregating solutions to subproblems stored in the memory Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 / 46 Dynamic programming Largest subsequence Division : Let si be the weight of the largest subsequence of a1 , , ending at Aggregation : s1 = a1 si = max{si−1 + , }, ∀i = 2, , n Solution to the original problem is max{s1 , , sn } Number of basic operations is n (best algorithm) Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 / 46 Comparison between algorithms operations logn nlogn n2 n3 operations logn nlogn n2 n3 n = 10 time n = 100 time −8 3.32 3.3×10 sec 6.64 6×10−8 sec −7 33.2 3.3×10 sec 664 6.6×10−6 sec 100 10−6 sec 10000 10−4 sec −5 10 10 sec 10 10−2 sec n = 104 time n = 106 time −6 13.3 10 sec 19.9 < 10−5 sec −3 1.33×10 ×10 sec 1.99×10 2×10−1 sec 108 sec 101 2.77 hours 10 2.7 hours 101 115 days Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 10 / 46 Pseudo code Case instruction Case condition : statement ; condition : statement ; condition n : statement n ; endcase Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 32 / 46 Pseudo code Functions and Procedures Function name(parameters) begin instructions ; return value ; end Procedure name(parameters) begin instructions ; end Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 33 / 46 Pseudo code Example : Find the maximal value of an array A(1 : n) Function max(A(1 : n)) begin datatype x ; integer i ; x = A[1] ; for i = to n if x < A[i] then x = A[i] ; endif endfor return x ; end Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 34 / 46 Pseudo code Algorithm 1: max(A) n ← A.length; x ← A[1]; foreach i ∈ n if x < A[i] then x ← A[i]; return x; Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 35 / 46 Outline First example Algorithms and Complexity Big-Oh notation Pseudo code Analysis of algorithms Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 36 / 46 Analysis of algorithms Experiments studies Write a program implementing the algorithm Execute the program on a machine with different input sizes Measure the actual execution times Plot the results Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 37 / 46 Analysis of algorithms Shortcomings of experiments studies Need to implement the algorithm, sometime difficult Results may not indicate the running time of other input not experimented To compare two algorithms, it is required to use the same hardware and software environments Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 38 / 46 Analysis of algorithms Asymptotic algorithm analysis Use high-level description of the algorithm (pseudo code) Determine the running time of an algorithm as a function of the input size Express this function with Big-Oh notation Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 39 / 46 Analysis of algorithms Sequential structure : P and Q are two segments of the algorithm (the sequence P; Q) Time(P; Q) = Time(P) + Time(Q) or Time(P; Q) = Θ(max(Time(P), Time(Q))) for loop : for i = to m P(i) t(i) is the time complexity of P(i) time complexity of the for loop is Pham Quang Dung () m i=1 t(i) Data structures and Algorithms Basic definitions and notations Hanoi, 2012 40 / 46 Analysis of algorithms while (repeat) loop Specify a function of variables of the loop such that this function reduces during the loop To evaluate the running time, we analyze how the function reduces during the loop Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 41 / 46 Analysis of algorithms Example : binary search Function BinarySearch(T [1 n], x) begin i ← 1; j ← n; while i < j k ← (i + j)/2 ; case x < T [k] : j ← k − ; x = T [k] : i ← k ; j ← k ; exit ; x > T [k] : i ← k + ; endcase endwhile end Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 42 / 46 Analysis of algorithms Example : binary search Denote d = j − i + (number of elements of the array to be investigated) i ∗ , j ∗ , d ∗ respectively the values of i, j, d after a loop We have If x < T [k] then i ∗ = i, j ∗ = (i + j)/2 − 1, d ∗ = j ∗ − i ∗ + ≤ d/2 If x > T [k] then j ∗ = j, i ∗ = (i + j)/2 + 1, d ∗ = j ∗ − i ∗ + ≤ d/2 Ifx = T [k] then d ∗ = Hence, the number of iterations of the loop is logn Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 43 / 46 Analysis of algorithms Primitive operations Function Fib(n) begin i ← 0; j ← 1; for k = to n begin j ←j +i; i ←j −i; end return j ; end Primitive operation is j ← j + i, hence, the running time is O(n) Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 44 / 46 Analysis of algorithms Primitive operations (be careful ! !) Procedure PigeonholeSorting(T [1 n]) begin for i = to n inc(U[T [i]]) ; i ← 0; for k = to s while U[k] > i ← i + 1; T [i] ← k ; U[k] ← U[k] − ; endwhile endfor end Number of primitive operations is sk=1 U[k] = n Hence running time is Θ(n) (But not correct !) Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 45 / 46 Analysis of algorithms Primitive operations (be careful ! !) Consider the case T [i] = i , ∀i = 1, , n U[k] = 1, if k = q 0, otherwise s = n2 , the running time is Θ(n2 ) not Θ(n) Reason : The primitive operation is not well-chosen Many null-loop where U[k] = If the primitive operation is the checking instruction U[k] > 0, then the running time is Θ(n + s) = Θ(n2 ) Pham Quang Dung () Data structures and Algorithms Basic definitions and notations Hanoi, 2012 46 / 46