IBM Global Services Reversing C++ Paul Vincent Sabanal X-Force R&D Mark Vincent Yason X-Force R&D IBM Internet Security Systems Ahead of the threat.ª © Copyright IBM Corporation 2007 IBM Global Services Reversing C++ Part I Introduction IBM Internet Security Systems Ahead of the threat.ª © Copyright IBM Corporation 2007 IBM Internet Security Systems Introduction > Purpose Understand C++ concepts as they are represented in disassemblies Have a big picture idea on what are major pieces (classes) of the C++ target and how these pieces relate together (class relationships) IBM Internet Security Systems X-Force Ð Rev ersing C++ © Copyright IBM Corporation 2007 IBM Internet Security Systems Introduction > Focus OnÉ (1) Identifying Classes (2) Identifying Class Relationships (3) Identifying Class Members IBM Internet Security Systems X-Force Ð Rev ersing C++ © Copyright IBM Corporation 2007 IBM Internet Security Systems Introduction > Motivation Increasing use of C++ code in malware Ð Difficult to follow virtual function calls in static analysis Ð Examples: Agobot, Mytob, new malcodes from our honeypot Most modern applications use C++ Ð For binary auditing, reversers can expect that the target can be a C++ compiled binary General lack of publicly available information regarding the subject of C++ reversing Ð Only good information is from Igor Skochinsky Ð https://www.openrce.org/articles/full_view/23 IBM Internet Security Systems X-Force Ð Rev ersing C++ © Copyright IBM Corporation 2007 IBM Global Services Reversing C++ Part II Manual Approach IBM Internet Security Systems Ahead of the threat.ª © Copyright IBM Corporation 2007 IBM Global Services Reversing C++ Part II Manual Approach Identifying C++ Binaries & Constructs IBM Internet Security Systems Ahead of the threat.ª © Copyright IBM Corporation 2007 IBM Internet Security Systems Manual Approach > Identifying C++ Binaries & Constructs Heavy use of ecx (this ptr) text:004019E4 text:004019E6 text:004019EB mov push call ecx, esi 0BBh sub_401120 ecx used without being initialized text:004010D0 sub_4010D0 text:004010D0 push text:004010D1 mov text:004010DD mov text:00401101 mov text:00401108 call text:0040110D add text:00401110 pop text:00401111 retn text:00401111 sub_4010D0 proc near esi esi, ecx dword ptr [esi], offset off_40C0D0 dword ptr [esi+4], 0BBh sub_401EB0 esp, 18h esi endp IBM Internet Security Systems X-Force Ð Rev ersing C++ © Copyright IBM Corporation 2007 IBM Internet Security Systems Manual Approach > Identifying C++ Binaries & Constructs Parameters on the stack, ecx = this ptr text:00401994 text:00401996 text:004019AB ::: text:004019AD push call mov 0Ch ??2@YAPAXI@Z ecx, eax call ClassA_ctor ; operator new(uint) Virtual function calls (indirect calls) text:00401996 ::: text:004019B2 ::: text:004019FF text:00401A01 text:00401A04 text:00401A06 text:00401A0B call ??2@YAPAXI@Z mov esi, eax mov add mov push call eax, [esi] ;EAX = vftable esp, ecx, esi 0CCh dword ptr [eax] IBM Internet Security Systems X-Force Ð Rev ersing C++ ; operator new(uint) © Copyright IBM Corporation 2007 IBM Internet Security Systems Manual Approach > Identifying C++ Binaries & Constructs STL Code and Imported DLLs text:00401201 mov ecx, eax text:00401203 call ds:?sputc@?$basic_streambuf@DU?$char_traits@D@std@@@std@@QAEHD@Z ; std::basic_streambuf::sputc(char) IBM Internet Security Systems X-Force Ð Rev ersing C++ © Copyright IBM Corporation 2007 IBM Internet Security Systems Automation > Strategies > Class Identification via Constructor / Destructor Search Constructor Identification Ð For dynamically allocated objects Look for calls to new() Track the value returned in EAX When tracking is done, look for the earliest call where the tracked register/variable is ECX Mark this function as constructor Ð For local objects For local objects, we the same thing Instead of initially tracking returned values of new(), we first locate instructions where an address of a stack variable is written to ECX, then start tracking ECX IBM Internet Security Systems X-Force Ð Rev ersing C++ © Copyright IBM Corporation 2007 IBM Internet Security Systems Automation > Strategies > Class Relationship Inferencing Inheritance Identification Track this pointer (ECX) Check blocks with ECX as tracked variable See if there is call to a constructor To handle multiple inheritance, track pointers to offsets relative to object address IBM Internet Security Systems X-Force Ð Rev ersing C++ © Copyright IBM Corporation 2007 IBM Internet Security Systems Automation > Strategies > Class Member Identification Member Variables Ð track the this pointer from the point the object is initialized Ð note accesses to offsets relative to the this pointer IBM Internet Security Systems X-Force Ð Rev ersing C++ © Copyright IBM Corporation 2007 IBM Internet Security Systems Automation > Strategies > Class Member Identification Non-virtual Functions Ð track the this pointer from the point the object is initialized Ð note all blocks where ECX is the tracked variable, then mark the call in that block, if there is any, as a member of the current class IBM Internet Security Systems X-Force Ð Rev ersing C++ © Copyright IBM Corporation 2007 IBM Internet Security Systems Automation > Strategies > Class Member Identification Virtual Functions Ð To identify virtual functions, we simply have to locate vftables first through constructor analysis After all of this is done, we then reconstruct the class using the results of these analysis IBM Internet Security Systems X-Force Ð Rev ersing C++ © Copyright IBM Corporation 2007 IBM Global Services Reversing C++ Part III Automation Enhancing Disassembly IBM Internet Security Systems Ahead of the threat.ª © Copyright IBM Corporation 2007 IBM Internet Security Systems Automation > Disassembly Enhancement RTTI structures reconstruction, naming, commenting rdata:004165A0 dd rdata:004165A4 off_4165A4 dd rdata:004165A8 dd rdata:004165AC dd rdata:004165B0 dd offset unk_4189E0 offset offset offset offset sub_401170 sub_4011C0 sub_401230 unk_418A28 ; DATA XREF: rdata:004165A0 dd offset oop_re$ClassA$RTTICompleteObjectLocator@00 rdata:004165A4 oop_re$ClassA$vftable@00 dd offset sub_401170 ; DATA XREF: rdata:004165A8 dd offset sub_4011C0 rdata:004165AC dd offset sub_401230 IBM Internet Security Systems X-Force Ð Rev ersing C++ © Copyright IBM Corporation 2007 IBM Internet Security Systems Automation > Disassembly Enhancement RTTI structures (another example) rdata:004189E0 dword_4189E0 rdata:004189E4 rdata:004189E8 rdata:004189EC rdata:004189F0 dd dd dd dd dd ; DATA XREF: 0 offset off_41B004 offset unk_4189F4 rdata:004189E0 oop_re$ClassA$RTTICompleteObjectLocator@00 dd ; RTTICompleteObjectLocator.signature rdata:004189E4 dd ; RTTICompleteObjectLocator.offset rdata:004189E8 dd ; RTTICompleteObjectLocator.cdOffset rdata:004189EC dd offset oop_re$ClassA$TypeDescriptor rdata:004189F0 dd offset oop_re$ClassA$RTTIClassHierarchyDescriptor IBM Internet Security Systems X-Force Ð Rev ersing C++ © Copyright IBM Corporation 2007 IBM Internet Security Systems Automation > Disassembly Enhancement Improving the call graph Ð Add cross references on virtual function calls Ð Result in more accurate call graph Ð Will yield improvements on binary diffing results IBM Internet Security Systems X-Force Ð Rev ersing C++ © Copyright IBM Corporation 2007 IBM Global Services Reversing C++ Part III Automation Visualization IBM Internet Security Systems Ahead of the threat.ª © Copyright IBM Corporation 2007 IBM Internet Security Systems Automation > Visualization UML Diagram Generation Ð Using pydot Ð Create a node for each class Ð Create an edge from each base classes Ð Pretty simple (once you have the data :) and Cool tooÉ Ð Very effective if RTTI exists (class names) Ð EXE2UML ? IBM Internet Security Systems X-Force Ð Rev ersing C++ © Copyright IBM Corporation 2007 IBM Internet Security Systems Automation > Visualization UML Diagram Example (w/o RTTI) class class class class ClassA ClassB ClassC ClassD IBM Internet Security Systems X-Force Ð Rev ersing C++ { } : public ClassA { } { } : public ClassB, public ClassC { } © Copyright IBM Corporation 2007 IBM Internet Security Systems Automation > Visualization UML Diagram Example (w/ RTTI) class class class class IBM Internet Security Systems X-Force Ð Rev ersing C++ ClassA ClassB ClassC ClassD { } : public ClassA { } { } : public ClassB, public ClassC { } © Copyright IBM Corporation 2007 IBM Global Services Reversing C++ DemoÉ IBM Internet Security Systems Ahead of the threat.ª © Copyright IBM Corporation 2007 IBM Global Services Thank you! Questions? Paul Vincent Sabanal X-Force R&D Mark Vincent Yason X-Force R&D IBM Internet Security Systems Ahead of the threat.ª © Copyright IBM Corporation 2007 ... applications use C++ Ð For binary auditing, reversers can expect that the target can be a C++ compiled binary General lack of publicly available information regarding the subject of C++ reversing. .. Ð Rev ersing C++ © Copyright IBM Corporation 2007 IBM Internet Security Systems Introduction > Focus OnÉ (1) Identifying Classes (2) Identifying Class Relationships (3) Identifying Class... Skochinsky Ð https://www.openrce.org/articles/full_view/23 IBM Internet Security Systems X-Force Ð Rev ersing C++ © Copyright IBM Corporation 2007 IBM Global Services Reversing C++ Part II Manual Approach