Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 66 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
66
Dung lượng
760,88 KB
Nội dung
A SOFTWARE APPROACH FOR LOWER POWER CONSUMPTION Bui Ngoc Hai Faculty of Information Technology University of Engineering and Technology Vietnam National University, Hanoi Supervised by Assoc Prof Dr Nguyen Ngoc Binh A thesis submitted in fulfillment of the requirements for the degree of Master of Science in Computer Science April 2014 ORIGINALITY STATEMENT ‘I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at University of Engineering and Technology (UET) or any other educational institution, except where due acknowledgement is made in the thesis Any contribution made to the research by others, with whom I have worked at UET or elsewhere, is explicitly acknowledged in the thesis I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged.’ Hanoi, April 25th , 2014 Signed ABSTRACT Optimizing the power consumption is an important topic in embedded system engineering, especially for embedded systems that use battery power source Power optimization can be achieved by software techniques and instruction scheduling is an effective software approach for reducing power cost of processor(s) In this thesis, we propose our idea of using a genetic algorithm for low power instruction scheduling Our algorithm is applied to each basic block of assembly code to generate lower power program In the experiment section, we use two open source simulation tools that are SimpleScalar Tool Set and SimplePower, the algorithm is applied to assembly programs of SimpleScalar Instruction Set, these programs are compiled and then have their power consumptions measured by SimplePower The experimental results showed the effectiveness of our proposed method This scheduling method will be combined with the idea of reducing memory access for low power design in our further work ACKNOWLEDGEMENTS First and foremost, I would like to express my deepest gratitude to my supervisor, Assoc.Prof.Dr Nguyen Ngoc Binh for giving me the opportunity to work with him and for his patient guidance and continuous support throughout the years I would like to give my honest appreciation to my colleagues at the Laboratory of Embedded Systems for their great support I also would like to thank all my friends who gave me moral support during this work Finally, this thesis would not have been possible without the moral support and love of my parents and my brother Thank you! Table of Contents Chapter Introduction 1.1 Software power optimization 1.2 Power optimization by instruction scheduling 1.3 Our work 1.4 Thesis organization Chapter Related Work 2.1 Software power estimation 2.2 Energy code driven generation for low power 2.3 Reducing memory access 2.4 Software power optimization using symbolic algebra 2.5 List scheduling for low power 2.6 Instruction scheduling to reduce switching activity 2.7 Low power instruction scheduling as traveling salesman problem 2.8 Force-directed scheduling for low power 2.9 Instruction scheduling to reduce the off-chip power 2.10 Energy-oriented and performance-oriented combination scheduling 2.11 Criticality-directed and Uncriticality-directed instruction scheduling for low power 2.12 Low power instruction scheduling using Particle Swarm Optimization algorithm Chapter Instruction Scheduling for Low Power 10 3.1 Problem description 10 3.2 Partitioning Basic Blocks of assembly code 12 3.2 Data Flow Graph construction 13 3.4 Generating Power Dissipation Table 14 Chapter 4.Genetic Algorithm for low power Instruction scheduling 17 4.1 Genetic Algorithm 17 4.2 Topological sorting 20 4.3 Representation of chromosome 21 4.4 Cross Over operator 21 4.5 Mutation operator 23 4.6 Fitness function 23 4.7 Genetic Algorithm for low power scheduling 24 Chapter Expreriments 26 5.1 SimpleScalar tool set 27 5.2 SimplePower simulator 30 5.3 Experimental benchmarks set 32 5.4 Experimental results 33 5.5 Analysis and evaluation 35 Chapter Conclusion and Future Work 37 References 39 Appendix A Some important source code 42 Appendix B Source code of benchmark programs 48 Appendix C Power Dissipation Table 55 Appendix D An example of scheduling a basic block 56 List of Figures Figure 2.1 List scheduling for low power Figure 3.1 Flow of low power instruction scheduling 11 Figure 3.2 An example of a Basic Block and its Data Flow Graph 12 Figure 3.3 Examples of Basic Blocks 13 Figure 3.4 Algorithm to construct a DFG 14 Figure 3.5 PDT generation example 16 Figure 4.1 Topological sorting with random priorities assignment 21 Figure 4.2 Chromosome representation 21 Figure 4.3 Cross over operator 22 Figure 4.4 Cross over operator example 23 Figure 4.5 Mutation operator 23 Figure 4.6 Genetic algorithm for low power scheduling 24 Figure 5.1 Experimental framework 27 Figure 5.2 SimpleScalar simulator software architecture 28 Figure 5.3 SimplePower result example 31 List of Tables Table 3.1 Instruction set architecture 15 Table 5.1 Experimental benchmark set 32 Table 5.2 Experimental results of GA scheduling 33 Table 5.3 Experimental results of list scheduling 34 Table 5.4 Results comparison of two algorithms 35 List of Abbreviations BB Basic Block DC Direct Current DFG Data Flow Graph GA Genetic Algorithm ISA Instruction Set Architecture PDT Power Dissipation Table PISA Portable Instruction Set Architecture PSO Particle Swarm Optimization RAW Read after Write RTL Register Transfer Level TSP Travelling Salesman Problem WAR Write after Read WAW Write after Write List of Notations ∑ Sum Energy Power of a program Power Base Cost of instruction i Overhead cost between two instruction i and j n Number of instruction in a basic block Number of times i get executed Number of times the pair (i,j) get executed Energy cost of other effects of the program P(t) Population at the loop t xti Solution i at the loop t vi Vertex i 42 Appendix A Some Important Source Code A.1 Topological sorting with priorities void TopoSort(struct DFG* dfg, struct Chromosome* chro) { int d[chro->num_of_gene]; int i,j,tmp,count; int index = 0; for(i=0;inum_of_gene;i++) { chro->topo[i] = -1; d[i] = -1; } struct List tmpList; init(&tmpList); // for(i=0;inumOfNode;i++) if(dfg->node[i].pred.numOfElem == 0) { add(&tmpList,i); } // for(i=0;ii;j ) if(chro->prio[tmpList.Elem[j-1]]>chro>prio[tmpList.Elem[j]]) { tmp = tmpList.Elem[j-1]; tmpList.Elem[j-1] = tmpList.Elem[j]; tmpList.Elem[j] = tmp; } // main loop while(tmpList.numOfElem>0) { chro->topo[index] = tmpList.Elem[0]; index++; d[tmpList.Elem[0]] = 1; del(&tmpList,tmpList.Elem[0]); for(i=0;inumOfNode;i++) { if(d[i]!=1) { 43 count = 0; for(j=0;jnode[i].pred.numOfElem;j++) if(d[dfg->node[i].pred.Elem[j]]!=1) count++; if(count == && contain(&tmpList,i)==0) { add(&tmpList,i); for(j=tmpList.numOfElem-1;j>0;j ) if(chro->prio[tmpList.Elem[j-1]]> chro->prio[tmpList.Elem[j]]) { tmp = tmpList.Elem[j-1]; tmpList.Elem[j-1]= tmpList.Elem[j]; tmpList.Elem[j] = tmp; } } } } } } // A.2 Mutation operator void mutation(struct Chromosome* chro1, struct Chromosome* chro2) { int i,j,tmp; chro2->num_of_gene = chro1->num_of_gene; for(i=0;inum_of_gene;i++) chro2->prio[i] = chro1->prio[i]; //srand(randSeed); i = rand()%(chro2->num_of_gene); j = rand()%(chro2->num_of_gene); while(i==j) { j = rand()%(chro2->num_of_gene); } tmp = chro2->prio[i]; chro2->prio[i] = chro2->prio[j]; chro2->prio[j] = tmp; } 44 A.3 Cross over operator void crossOver(struct Chromosome* chro1,struct Chromosome* chro2,struct Chromosome* C) { int i,j,tmp; int x,y; // random gene from struct List tmpList1,tmpList2; init(&tmpList1); init(&tmpList2); C->num_of_gene = chro1->num_of_gene; //srand(randSeed); x = rand()%(chro1->num_of_gene); y = rand()%(chro1->num_of_gene); while(x==y) { y = rand()%(chro1->num_of_gene); } if(x>y) { tmp = x; x = y; y = tmp; } // -for(i=y;i>=x;i ) add(&tmpList1,chro1->prio[i]); for(i=0;inum_of_gene;i++) if(contain(&tmpList1,chro2->prio[i])==0) add(&tmpList2,chro2->prio[i]); i=x; j=-1; while(tmpList1.numOfElem < chro1->num_of_gene) { if(i==0) { i = chro1->num_of_gene; i ; j++; if(chro1->prio[i] != tmpList2.Elem[j]) { if(contain(&tmpList1,chro1->prio[i])==0) add(&tmpList1,chro1->prio[i]); if(contain(&tmpList1,tmpList2.Elem[j])==0) 45 add(&tmpList1,tmpList2.Elem[j]); } else add(&tmpList1,chro1->prio[i]); } else if(y==chro1->num_of_gene-1) { i ; j++; if(chro1->prio[i] != tmpList2.Elem[j]) { if(contain(&tmpList1,chro1->prio[i])==0) insert(&tmpList1,chro1->prio[i],0); if(contain(&tmpList1,tmpList2.Elem[j])==0) insert(&tmpList1,tmpList2.Elem[j],0); } else insert(&tmpList1,chro1->prio[i],0); } else { i ; j++; if(chro1->prio[i] != tmpList2.Elem[j]) { if(contain(&tmpList1,chro1->prio[i])==0) insert(&tmpList1,chro1->prio[i],0); if(contain(&tmpList1,tmpList2.Elem[j])==0) add(&tmpList1,tmpList2.Elem[j]); } else insert(&tmpList1,chro1->prio[i],0); } } C->num_of_gene = chro1->num_of_gene; for(i=0;inum_of_gene;i++) C->prio[i] = tmpList1.Elem[i]; } A.4 Fitness function void fitness(struct DFG* dfg, struct Chromosome* chro ) { double power_sum = 0; 46 int i,index1,index2; for(i=0;inum_of_gene-1;i++) { index1=indexOf(dfg->program->ASMCode[ dfg->node[chro->topo[i]].ins_index]); index2 = indexOf(dfg->program->ASMCode[dfg>node[chro->topo[i+1]].ins_index]); power_sum += PDT[index1][index2]; } chro->power = power_sum; } A.5 Genetic Scheduling Algorithm void Genetic_Schedule(struct DFG* dfg) { int i,j,k,tmp; int pop_size = 100; int max_gen = 200; double pc = 1.0; double pm = 0.5; struct Chromosome chro[pop_size*2]; int a[pop_size*2]; for(i=0;inumOfNode; chro[i].power = 0; a[i] = i; } for(i=0;ischedule[i]].assigned_value = i; } } 48 Appendix B Source Code C of Benchmark Programs B.1 Quick Sort #include "array" swap(a, i, j) int a[], i, j; { int tmp; tmp = a[i]; a[i] = a[j]; a[j] = tmp; } int partition(a,l,r) int a[], l, r; { int x, i, t; swap(a,l,(l+r)>>1); t=l; for(i=l+1;i1); if ( key == a[mid] ) { found = 1; pos = mid; first = last + 1; } else if ( key < a[mid] ) { last = mid - 1; } else { first = mid + 1; } } } B.6 Tower of Hanoi #define other(i,j) (6-(i+j)) int num[4]; long count; void _start() { register disk, Loops = 0; disk = 1; while ( ) { disk++; num[0] = 0; num[1] = disk; num[2] = 0; num[3] = 0; count = 0; mov(disk,1,3); 53 Loops = Loops + 1; if ( disk == 10 ) break; } } mov(n,f,t) { register o; if(n == 1) { num[f] ; num[t]++; count++; return; } o = other(f,t); mov(n-1,f,o); mov(1,f,t); mov(n-1,o,t); return; } B.7 Heap sort #include "array" int parent(i) int i; { return(i>>2); } int left(i) int i; { return(2*i); } int right(i) int i; { return(2*i+1); } 54 heapify(a,i,n) int a[], i, n; { int l, r, largest, tmp; l = left(i); r = right(i); if((l a[i-1]) largest = l; else largest = i; if((r a[largest-1]) largest = r; if(largest != i) { tmp = a[largest-1]; a[largest-1] = a[i-1]; a[i-1] = tmp; heapify(a,largest,n); } } build_heap(a,n) int a[],n; { int i; for(i=n>>1;i>=1;i ) heapify(a,i,n); } heapsort(a, n) int a[], n; { int i, tmp; build_heap(a,n); for(i=n;i>=2;i ){ tmp = a[0]; a[0] = a[i-1]; a[i-1] = tmp; heapify(a,1,i-1); } } _start() { heapsort(a, 100); } 55 Appendix C Power Dissipation Table 10182 47.99 80 12086 25.20 25 25013 62.82 55 17806 66.21 63 10245 93.99 49 12094 84.12 32 10250 23.46 09 10323 11.74 67 10182 78.29 75 10344 02.93 11 10532 40.83 10 11959 61.55 37 12086 47.65 99 10163 55.61 68 36833 32.44 46 29620 61.80 36 10232 34.06 45 10185 03.72 90 10240 93.26 10 10309 51.81 63 12086 77.95 94 12243 72.86 25 12432 10.76 24 11950 30.74 41 22361 11.68 01 34178 92.89 15 18114 27.88 87 15594 52.20 72 18990 87.62 01 34183 22.08 17 18990 87.35 56 18938 18.19 20 22297 41.28 65 20211 39.35 68 20284 22.53 13 18238 44.03 26 18394 79.90 88 30208 31.38 97 17362 89.35 51 12430 52.14 18 15839 58.00 76 30216 90.31 04 15848 17.20 41 15795 48.04 04 18331 09.51 53 18201 16.89 55 18274 00.07 00 16168 18.86 95 10240 22.75 74 10226 62.82 70 21830 93.55 12 16240 78.86 42 93347 9.056 10232 61.73 20 93433 8.253 95168 1.876 10095 80.00 00 10160 63.29 12 10349 01.19 11 11304 90.79 59 12095 06.56 76 10185 03.71 60 36837 57.39 22 29629 20.71 13 10238 32.95 65 10169 53.96 85 10238 32.69 20 10319 80.43 88 12095 36.86 71 12252 31.77 02 12440 69.67 01 11963 11.89 66 10244 52.22 34 10235 22.02 35 21832 39.69 07 16249 38.06 07 93433 8.253 10232 61.46 74 93390 8.258 95211 1.342 10100 09.46 60 10164 92.75 72 10353 30.65 71 11313 48.22 92 10317 91.48 96 10304 31.55 92 21715 87.73 70 16191 00.83 45 95177 2.668 10314 60.19 47 95220 2.134 92124 2.087 10173 48.73 23 10144 19.09 03 10262 49.89 68 11377 91.60 89 10178 17.44 03 12081 94.64 48 24943 43.71 35 17736 14.08 10 10096 90.38 04 12090 53.56 55 10101 19.84 64 10174 08.13 21 10033 74.68 30 10194 99.31 66 10383 37.21 65 11955 44.16 45 10350 49.10 96 12249 96.58 36 19265 89.31 79 17286 95.88 20 10167 78.81 15 12258 55.50 43 10172 08.27 75 10148 47.84 75 10202 74.41 23 10079 36.92 56 99831 4.526 12124 53.37 24 10614 06.19 72 12513 53.67 12 19328 10.61 43 17349 17.17 85 10434 67.83 91 12522 12.59 19 10438 97.30 51 10264 71.31 64 10469 63.43 99 99790 1.318 10223 01.71 26 12412 93.78 60 11962 88.06 88 11953 56.10 59 17918 01.47 26 16146 61.52 17 11310 70.28 37 11966 41.23 10 11319 29.48 02 11382 56.20 61 11961 17.62 35 11873 21.60 82 12167 42.18 06 12750 30.83 89 56 Appendix D An example of Scheduling a Basic Block A basic block of assembly code of Bubble Sort benchmark lw move sll lw addu move lw move sll lw addu addu move lw lw slt $2,4($fp) $3,$2 $2,$3,2 $4,24($fp) $3,$2,$4 $2,$3 $3,4($fp) $4,$3 $3,$4,2 $4,24($fp) $3,$3,$4 $4,$3,4 $3,$4 $2,0($2) $3,0($3) $2,$3,$2 Unscheduled basic block lw lw move sll addu move lw lw move sll lw addu addu move lw slt $4,24($fp) $2,4($fp) $3,$2 $2,$3,2 $3,$2,$4 $2,$3 $2,0($2) $3,4($fp) $4,$3 $3,$4,2 $4,24($fp) $3,$3,$4 $4,$3,4 $3,$4 $3,0($3) $2,$3,$2 Scheduled basic block