Categories
AppSec

Binary exploitation learning path

This article presents a learning path from a total noob to a mid-level binary exploitation specialist capable of understanding cutting-edge security research and writing exploits.

Why should you learn that anyhow?

Some of the greatest achievements in offensive security would be hacking a smartphone, web browser, OS kernel, IoT device or remote server. There’s only one category of offensive skills that can hit all of those targets. If you want to at least understand how those great hacks actually work, you need to acquire a certain body of knowledge. 

This subject is quite huge. It will take 3-5 years to get through it, depending on how much you already know and how much time you are willing to invest. So to make this more approachable, we need to make some decisions.

Choose the CPU architecture

The default choice would be x86. It’s as mainstream as it gets, which means you’ll have plenty of materials to learn from. This architecture dominates the desktop and laptop market and is also prevalent on servers. ARM would be a good choice, if you’re into smartphones and IoT. 

There are quite a lot of other architectures to choose from, but unless you have good reasons to specialize in any of them, I would recommend starting your adventure with either of those two. It’s best to build strong foundations before going into any of the niches. 

Choose the OS 

The default choice would probably be Linux. You should know it anyhow. It’s widespread, open source, and well documented. If you later decide to pivot to Android (or even macOS), your investment in Linux will pay dividends. The pwn.college has some introductory materials. Linux Basics for Hackers also comes highly recommended. If you’re a bookworm type like me, you will enjoy reading all books from people like Eric S. Raymond, Richard Stallman, and Linus Torvalds. If you aren’t, then at least skim through The Art of Unix Programming

Windows is very different. It’s quite hard to learn its low-level mechanics. On the other hand, it’s used a lot in businesses and on desktops. A steeper learning curve means more hardship but also less competition. Reading Windows Internals by Andrea Allievi, Mark E. Russinovich, Alex Ionescu, and David A. Solomon would be a good start. 

Finally, there are macOS and iOS. Likely the hardest and the most expensive to learn. The Art of Mac Malware by Patrick Wardle is highly recommended as a starting point. It’s available online for free. 

Choose the assembly flavor

Default: Intel (assuming x86). 

I know very few people who know both flavors well but prefer AT&T. It is, however, important to be able to read it anyhow, as some books and articles use it. For starters, you can read this short explainer of key differences. 

Allright, we’ve made some key decisions, we can get started. 

Learn assembly

First and foremost, learn the assembly of your chosen architecture. 

You’re unlikely to ever need to write any assembly more complex and longer than a typical shell code. You need, however, to be able to understand the kind of assembly the compiler generates. 

By learning to use the language, you’ll also understand how the CPU works from a programmer’s point of view, what key data structures are there, calling conventions, how memory segmentation works, and so on. The big bonus here is that these things are very similar in all architectures and operating systems, so the time invested into it is really well spent. 

I learned assembly years ago from Polish books and articles. Today it’s way simpler, there are lots of materials on the web. I can recommend brand new pwn.college’s Assembly Crash Course

Learn C

I used the classical “The C Programming Language” by Brian Kernighan and Dennis Ritchie. The next logical step would be reviewing the standards.This language is simple, so I’d suggest learning it in full. It’s a rock-solid investment. Nearly all low-level software is written in C. Reverse engineering tools use C-like pseudocode. Probably all latter languages of the imperative paradigm use plenty of C concepts. 

Decent knowledge of assembly and C is an absolute must-have for any low-level security work. 

Learn C++

The necessity of learning C++ today could be debated. It is, however, still a dominant language of a lot of security-critical software (like web browsers). 

If you are interested only in security aspects, you don’t really have to master the language. Also, “mastering C++” is a lifelong commitment, probably not worth it. C++ is by far the most complex programming language ever invented, and that’s not a compliment. It’s hard to overstate the amount of damage caused by the fact that memory safety was never a priority of its design. 

I personally started with the classical “The C++ Programming Language” by Bjarne Stroustrup. The most recent edition explains C++11. 

For more recent updates and further development, I can recommend the CppCon YT channel and the current revision of the C++ standard

Learn your chosen OS programmatically

Well, write some software. Read the relevant books (I suggested some starting points earlier). Get familiar with compilers, debuggers, and IDEs. 

Some of the stuff you should figure out: 

  • What security boundaries are there (here’s a list for MS Windows)? 
  • Which code integrity controls are in place? 
  • Which security measures are in place to protect the stack and the heap? 
  • What are typical local privilege escalation vectors? 
  • …and much, much more. 

Playing CTFs could be a great way of getting practically familiar with the security aspects of your chosen OS and platform. Read write-ups of CTF challenges you weren’t able to solve. Then try again. Next time you should be able to solve it on your own. Repeat the process. 

Learn debugging and reverse engineering

In other words, dynamic and static analysis of a binary. A smart strategy would be to learn cross-platform open source tools for software reversing, such as Ghidra

Choosing lldb over gdb may be a good idea if you’re working with the LLVM ecosystem across multiple OSes (even more so if one is macOS). Commands differ a bit, which is annoying when you have to switch. 

On the other hand, gdb has great extensions such as GEF or pwndbg. As always, it’s best to know all of it. 

For Windows, knowing WinDbg, x64dbg, and MS Visual Studio would be a nice combo. 

Learn relevant vulnerabilities and attacks

Now that you know the very basics, you can read the classics with full understanding. You can start with the original Smashing The Stack For Fun And Profit (or its newer rewrite). Today we’d call it stack-based buffer overflow (to distinguish from two other subclasses, the global and heap-based). Reading old stuff like that is fun, but a lot has changed since then. For a much more up-to-date and comprehensive introduction I suggest attending the pwn.college

Looking at the basic classes of vulnerabilities for programs written in C++ and those written in C should give you an idea of what you already know and what’s still out there. 

You can check out LiveOverflow’s binary exploitation playlist

Learn fuzzing

The number one technique for finding bugs to exploit is fuzzing. I can recommend courses of my friend Hardik, some of which you can also find on YouTube. This one, for instance, covers both Linux and Windows. 

Once you know the basics, I suggest trying to rediscover some of the known CVEs. This will give you a realistic bug hunter experience. Antonio made a nice set of challenges like this. 

Learn advanced exploitation techniques

It’s worth knowing Return-Oriented Programming, Data-Oriented Programming, and Jump-Oriented Programming

Read the article Painless intro to the Linux userland heap by Javier Jimenez. Then watch Max Kamper’s Introduction To GLIBC Heap Exploitation, and read his HEAPLAB. GLIBC Heap Exploitation Bible

You’d enjoy watching Evan Walls’s How to Weaponize a Vulnerability and How to Write Shellcode. I found the second part of the latter particularly beautiful in terms of how to deliver a great live programming talk. 

If you’re into Windows, you can read the works of j00ru and Gynvael, such as this one

Understand new mitigations such as Memory Tagging, Pointer Authentication on ARM, and the analysis of its resistance against real-world attacks. For a nice example of what hacking a smartphone looks like, read the epic MMS Exploit series by j00ru

Finally, reach for the cutting edge

Identify and follow experts on your chosen field of specialty. Watch their talks, read their books and articles. Follow key conferences and watch current talks from your area. 

Read Google Project Zero blog. The more you understand from those posts, the closer you are to your goal. 

Well, that would be it. If I missed something, feel free to reach out to me. I’d be happy to amend the article. 

Happy hacking. 🙂

I’d like to thank AtomicNicos and Gynvael Coldwind for reviewing the text and suggesting improvements.

By Cezary Cerekwicki

AppSec program manager. A former nerd who reluctantly learned he's not allergic to shirts and talking to people. Currently a people person.