AFL 初探(一)

AFL 初探(一)

Intro: Basic Analysis of AFL

AFL 初探

image

源码分析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
static void add_instrumentation(void) {


static u8 line[MAX_LINE];

FILE* inf;
FILE* outf;
s32 outfd;
u32 ins_lines = 0;

u8 instr_ok = 0, skip_csect = 0, skip_next_label = 0,
skip_intel = 0, skip_app = 0, instrument_next = 0;

#ifdef __APPLE__

u8* colon_pos;

#endif /* __APPLE__ */

//输入文件,gcc生成
if (input_file) {

inf = fopen(input_file, "r");
if (!inf) PFATAL("Unable to read '%s'", input_file);

} else inf = stdin;

//输出文件
outfd = open(modified_file, O_WRONLY | O_EXCL | O_CREAT, 0600);

if (outfd < 0) PFATAL("Unable to write to '%s'", modified_file);

outf = fdopen(outfd, "w");

if (!outf) PFATAL("fdopen() failed");

//对于inf进行每行的遍历插桩
while (fgets(line, MAX_LINE, inf)) {

/* In some cases, we want to defer writing the instrumentation trampoline
until after all the labels, macros, comments, etc. If we're in this
mode, and if the line starts with a tab followed by a character, dump
the trampoline now. */

//满足这些条件就插桩,条件由下面的扫描提供
if (!pass_thru && !skip_intel && !skip_app && !skip_csect && instr_ok &&
instrument_next && line[0] == '\t' && isalpha(line[1])) {

fprintf(outf, use_64bit ? trampoline_fmt_64 : trampoline_fmt_32,
R(MAP_SIZE));

instrument_next = 0;
ins_lines++;

}

/* Output the actual line, call it a day in pass-thru mode. */
//把原始的代码写入outf
fputs(line, outf);

if (pass_thru) continue;

/* All right, this is where the actual fun begins. For one, we only want to
instrument the .text section. So, let's keep track of that in processed
files - and let's set instr_ok accordingly. */

//判断代码段,只插桩.text段
if (line[0] == '\t' && line[1] == '.') {

/* OpenBSD puts jump tables directly inline with the code, which is
a bit annoying. They use a specific format of p2align directives
around them, so we use that as a signal. */

if (!clang_mode && instr_ok && !strncmp(line + 2, "p2align ", 8) &&
isdigit(line[10]) && line[11] == '\n') skip_next_label = 1;

if (!strncmp(line + 2, "text\n", 5) ||
!strncmp(line + 2, "section\t.text", 13) ||
!strncmp(line + 2, "section\t__TEXT,__text", 21) ||
!strncmp(line + 2, "section __TEXT,__text", 21)) {
instr_ok = 1;
continue;
}

if (!strncmp(line + 2, "section\t", 8) ||
!strncmp(line + 2, "section ", 8) ||
!strncmp(line + 2, "bss\n", 4) ||
!strncmp(line + 2, "data\n", 5)) {
instr_ok = 0;
continue;
}

}

/* Detect off-flavor assembly (rare, happens in gdb). When this is
encountered, we set skip_csect until the opposite directive is
seen, and we do not instrument. */

//判断位数
if (strstr(line, ".code")) {

if (strstr(line, ".code32")) skip_csect = use_64bit;
if (strstr(line, ".code64")) skip_csect = !use_64bit;

}

/* Detect syntax changes, as could happen with hand-written assembly.
Skip Intel blocks, resume instrumentation when back to AT&T. */

if (strstr(line, ".intel_syntax")) skip_intel = 1;
if (strstr(line, ".att_syntax")) skip_intel = 0;

/* Detect and skip ad-hoc __asm__ blocks, likewise skipping them. */

if (line[0] == '#' || line[1] == '#') {

if (strstr(line, "#APP")) skip_app = 1;
if (strstr(line, "#NO_APP")) skip_app = 0;

}

/* If we're in the right mood for instrumenting, check for function
names or conditional labels. This is a bit messy, but in essence,
we want to catch:

^main: - function entry point (always instrumented)
^.L0: - GCC branch label
^.LBB0_0: - clang branch label (but only in clang mode)
^\tjnz foo - conditional branches

...but not:

^# BB#0: - clang comments
^ # BB#0: - ditto
^.Ltmp0: - clang non-branch labels
^.LC0 - GCC non-branch labels
^.LBB0_0: - ditto (when in GCC mode)
^\tjmp foo - non-conditional jumps

Additionally, clang and GCC on MacOS X follow a different convention
with no leading dots on labels, hence the weird maze of #ifdefs
later on.

*/

if (skip_intel || skip_app || skip_csect || !instr_ok ||
line[0] == '#' || line[0] == ' ') continue;

/* Conditional branch instruction (jnz, etc). We append the instrumentation
right after the branch (to instrument the not-taken path) and at the
branch destination label (handled later on). */

//碰到jcc进行无条件插桩
if (line[0] == '\t') {

if (line[1] == 'j' && line[2] != 'm' && R(100) < inst_ratio) {

fprintf(outf, use_64bit ? trampoline_fmt_64 : trampoline_fmt_32,
R(MAP_SIZE));

ins_lines++;

}

continue;

}

//label中均包含":",下面的工作就是匹配各种label
/* Label of some sort. This may be a branch destination, but we need to
tread carefully and account for several different formatting
conventions. */

#ifdef __APPLE__

/* Apple: L<whatever><digit>: */


if ((colon_pos = strstr(line, ":"))) {

if (line[0] == 'L' && isdigit(*(colon_pos - 1))) {

#else

/* Everybody else: .L<whatever>: */

//匹配[.L<wtever>:]
if (strstr(line, ":")) {

if (line[0] == '.') {

#endif /* __APPLE__ */

/* .L0: or LBB0_0: style jump destination */

#ifdef __APPLE__

/* Apple: L<num> / LBB<num> */

if ((isdigit(line[1]) || (clang_mode && !strncmp(line, "LBB", 3)))
&& R(100) < inst_ratio) {

#else

/* Apple: .L<num> / .LBB<num> */

if ((isdigit(line[2]) || (clang_mode && !strncmp(line + 1, "LBB", 3)))
&& R(100) < inst_ratio) {

#endif /* __APPLE__ */

/* An optimization is possible here by adding the code only if the
label is mentioned in the code in contexts other than call / jmp.
That said, this complicates the code by requiring two-pass
processing (messy with stdin), and results in a speed gain
typically under 10%, because compilers are generally pretty good
about not generating spurious intra-function jumps.

We use deferred output chiefly to avoid disrupting
.Lfunc_begin0-style exception handling calculations (a problem on
MacOS X). */


if (!skip_next_label) instrument_next = 1; else skip_next_label = 0;

}

} else {

/* Function label (always instrumented, deferred mode). */
//匹配到一个label,instrument_next=1
instrument_next = 1;

}

}

}

//最后插入一个 main_payload
if (ins_lines)
fputs(use_64bit ? main_payload_64 : main_payload_32, outf);

if (input_file) fclose(inf);
fclose(outf);

if (!be_quiet) {

if (!ins_lines) WARNF("No instrumentation targets found%s.",
pass_thru ? " (pass-thru mode)" : "");
else OKF("Instrumented %u locations (%s-bit, %s mode, ratio %u%%).",
ins_lines, use_64bit ? "64" : "32",
getenv("AFL_HARDEN") ? "hardened" :
(sanitizer ? "ASAN/MSAN" : "non-hardened"),
inst_ratio);

}

}

afl-as.c文件中插桩的条件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/* In some cases, we want to defer writing the instrumentation trampoline
until after all the labels, macros, comments, etc. If we're in this
mode, and if the line starts with a tab followed by a character, dump
the trampoline now. */
if (!pass_thru && !skip_intel && !skip_app && !skip_csect && instr_ok &&
instrument_next && line[0] == '\t' && isalpha(line[1])) {

fprintf(outf, use_64bit ? trampoline_fmt_64 : trampoline_fmt_32,
R(MAP_SIZE));

instrument_next = 0;
ins_lines++;

}

具体看一下插进去的代码是什么:

通过fprintf()将格式化字符串添加到汇编文件的相应位置,只分析32位的情况,trampoline_fmt_32的具体内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
static const u8* trampoline_fmt_32 =

"\n"
"/* --- AFL TRAMPOLINE (32-BIT) --- */\n"
"\n"
".align 4\n"
"\n"
"leal -16(%%esp), %%esp\n"
"movl %%edi, 0(%%esp)\n"
"movl %%edx, 4(%%esp)\n"
"movl %%ecx, 8(%%esp)\n"
"movl %%eax, 12(%%esp)\n"
"movl $0x%08x, %%ecx\n"
"call __afl_maybe_log\n"
"movl 12(%%esp), %%eax\n"
"movl 8(%%esp), %%ecx\n"
"movl 4(%%esp), %%edx\n"
"movl 0(%%esp), %%edi\n"
"leal 16(%%esp), %%esp\n"
"\n"
"/* --- END --- */\n"
"\n";

其中,movl $0x%08x, %%ecx\n 为将R(x)生成的随机数给ecx作为标识代码段的key。然后调用__afl_maybe_log,调用完之后,把栈上保存的值恢复回去,再把栈恢复。

main_payload_32:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
static const u8* main_payload_32 = 

"\n"
"/* --- AFL MAIN PAYLOAD (32-BIT) --- */\n"
"\n"
".text\n"
".att_syntax\n"
".code32\n"
".align 8\n"
"\n"
---
//__afl_maybe_log:
"__afl_maybe_log:\n"
"\n"
" lahf\n"
" seto %al\n"
"\n"
" /* Check if SHM region is already mapped. */\n"
---
//判断共享内存是否加载,edx中存储SHM(__afl_area_ptr)
"\n"
" movl __afl_area_ptr, %edx\n"
" testl %edx, %edx\n"
" je __afl_setup\n"
"\n"
---
//__afl_store:
"__afl_store:\n"
"\n"
" /* Calculate and store hit for the code location specified in ecx. There\n"
" is a double-XOR way of doing this without tainting another register,\n"
" and we use it on 64-bit systems; but it's slower for 32-bit ones. */\n"
"\n"
---
//将pre和当前代码块的key进行异或,结果存储到edi
#ifndef COVERAGE_ONLY
" movl __afl_prev_loc, %edi\n"
" xorl %ecx, %edi\n"
" shrl $1, %ecx\n"
" movl %ecx, __afl_prev_loc\n"
#else
" movl %ecx, %edi\n"
#endif /* ^!COVERAGE_ONLY */
"\n"
---
//内存映射:edi为索引,edx为map,体现edge命中逻辑
#ifdef SKIP_COUNTS
" orb $1, (%edx, %edi, 1)\n"
#else
" incb (%edx, %edi, 1)\n"
#endif /* ^SKIP_COUNTS */
"\n"
---
//__afl_return
"__afl_return:\n"
"\n"
" addb $127, %al\n"
" sahf\n"
" ret\n"
"\n"
".align 8\n"
"\n"
---
//__afl_setup
"__afl_setup:\n"
"\n"
" /* Do not retry setup if we had previous failures. */\n"
"\n"
" cmpb $0, __afl_setup_failure\n"
" jne __afl_return\n"
"\n"
" /* Map SHM, jumping to __afl_setup_abort if something goes wrong.\n"
" We do not save FPU/MMX/SSE registers here, but hopefully, nobody\n"
" will notice this early in the game. */\n"
"\n"
---
//寻找 SHM 共享内存
" pushl %eax\n"
" pushl %ecx\n"
"\n"
" pushl $.AFL_SHM_ENV\n"
" call getenv\n"
" addl $4, %esp\n"
"\n"
" testl %eax, %eax\n"
" je __afl_setup_abort\n"
"\n"
" pushl %eax\n"
" call atoi\n"
" addl $4, %esp\n"
"\n"
---
//shmat参数:SHM ID = getenv(.AFL_SHM_ENV)
" pushl $0 /* shmat flags */\n"
" pushl $0 /* requested addr */\n"
" pushl %eax /* SHM ID */\n"
" call shmat\n"
" addl $12, %esp\n"
"\n"
" cmpl $-1, %eax\n"
" je __afl_setup_abort\n"
"\n"
---
//储存 SHM 地址到 __afl_area_ptr
" /* Store the address of the SHM region. */\n"
"\n"
" movl %eax, __afl_area_ptr\n"
" movl %eax, %edx\n"
"\n"
" popl %ecx\n"
" popl %eax\n"
"\n"
---
//__afl_forkserver
"__afl_forkserver:\n"
"\n"
" /* Enter the fork server mode to avoid the overhead of execve() calls. */\n"
"\n"
" pushl %eax\n"
" pushl %ecx\n"
" pushl %edx\n"
"\n"
" /* Phone home and tell the parent that we're OK. (Note that signals with\n"
" no SA_RESTART will mess it up). If this fails, assume that the fd is\n"
" closed because we were execve()d from an instrumented binary, or because\n"
" the parent doesn't want to use the fork server. */\n"
"\n"
//将__afl_temp中的4个字节写到提前开好的管道中
" pushl $4 /* length */\n"
" pushl $__afl_temp /* data */\n"
" pushl $" STRINGIFY((FORKSRV_FD + 1)) " /* file desc */\n"
" call write\n"
" addl $12, %esp\n"
"\n"
" cmpl $4, %eax\n"
//jne:不相等跳转,即fail跳转->__afl_fork_resume
" jne __afl_fork_resume\n"
"\n"
---
//__afl_fork_wait_loop:
"__afl_fork_wait_loop:\n"
"\n"
" /* Wait for parent by reading from the pipe. Abort if read fails. */\n"
"\n"
//不断地从管道中读取内容,假如读取到的字节数不为4就会跳到__afl_die
" pushl $4 /* length */\n"
" pushl $__afl_temp /* data */\n"
" pushl $" STRINGIFY(FORKSRV_FD) " /* file desc */\n"
" call read\n"
" addl $12, %esp\n"
"\n"
" cmpl $4, %eax\n"
" jne __afl_die\n"
"\n"
" /* Once woken up, create a clone of our process. This is an excellent use\n"
" case for syscall(__NR_clone, 0, CLONE_PARENT), but glibc boneheadedly\n"
" caches getpid() results and offers no way to update the value, breaking\n"
" abort(), raise(), and a bunch of other things :-( */\n"
"\n"
//fork,判断fork是否成功,如果成功,子进程跳到__afl_fork_resume
" call fork\n"
"\n"
" cmpl $0, %eax\n"
" jl __afl_die\n"
" je __afl_fork_resume\n"
"\n"
//父进程 write PID to pipe,传给 fuzzer
" /* In parent process: write PID to pipe, then wait for child. */\n"
"\n"
" movl %eax, __afl_fork_pid\n"
"\n"
" pushl $4 /* length */\n"
" pushl $__afl_fork_pid /* data */\n"
" pushl $" STRINGIFY((FORKSRV_FD + 1)) " /* file desc */\n"
" call write\n"
" addl $12, %esp\n"
"\n"
//如果waitpid返回的结果小于等于0,就会跳到afl_die,同时状态传入__afl_temp
" pushl $0 /* no flags */\n"
" pushl $__afl_temp /* status */\n"
" pushl __afl_fork_pid /* PID */\n"
" call waitpid\n"
" addl $12, %esp\n"
"\n"
" cmpl $0, %eax\n"
" jle __afl_die\n"
"\n"
" /* Relay wait status to pipe, then loop back. */\n"
//状态通过管道传回fuzzer,同时启动新一轮等待
"\n"
" pushl $4 /* length */\n"
" pushl $__afl_temp /* data */\n"
" pushl $" STRINGIFY((FORKSRV_FD + 1)) " /* file desc */\n"
" call write\n"
" addl $12, %esp\n"
"\n"
" jmp __afl_fork_wait_loop\n"
"\n"
---
//__afl_fork_resume:
"__afl_fork_resume:\n"
"\n"
//管道关闭,寄存器恢复
" /* In child process: close fds, resume execution. */\n"
"\n"
" pushl $" STRINGIFY(FORKSRV_FD) "\n"
" call close\n"
"\n"
" pushl $" STRINGIFY((FORKSRV_FD + 1)) "\n"
" call close\n"
"\n"
" addl $8, %esp\n"
"\n"
" popl %edx\n"
" popl %ecx\n"
" popl %eax\n"
//跳转到__afl_store:
" jmp __afl_store\n"
"\n"
"__afl_die:\n"
"\n"
" xorl %eax, %eax\n"
" call _exit\n"
"\n"
"__afl_setup_abort:\n"
"\n"
" /* Record setup failure so that we don't keep calling\n"
" shmget() / shmat() over and over again. */\n"
"\n"
" incb __afl_setup_failure\n"
" popl %ecx\n"
" popl %eax\n"
" jmp __afl_return\n"
"\n"
".AFL_VARS:\n"
"\n"
" .comm __afl_area_ptr, 4, 32\n"
" .comm __afl_setup_failure, 1, 32\n"
#ifndef COVERAGE_ONLY
" .comm __afl_prev_loc, 4, 32\n"
#endif /* !COVERAGE_ONLY */
" .comm __afl_fork_pid, 4, 32\n"
" .comm __afl_temp, 4, 32\n"
"\n"
".AFL_SHM_ENV:\n"
" .asciz \"" SHM_ENV_VAR "\"\n"
"\n"
"/* --- END --- */\n"
"\n";

/* The OpenBSD hack is due to lahf and sahf not being recognized by some
versions of binutils: http://marc.info/?l=openbsd-cvs&m=141636589924400

The Apple code is a bit different when calling libc functions because
they are doing relocations differently from everybody else. We also need
to work around the crash issue with .lcomm and the fact that they don't
recognize .string. */

#ifdef __APPLE__
# define CALL_L64(str) "call _" str "\n"
#else
# define CALL_L64(str) "call " str "@PLT\n"
#endif /* ^__APPLE__ */

特别的,对于llvm模式,代码插桩仅需一个modulepass,对每个 BB 进行 IRB 的辅助插桩即可

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
for (auto &F : M)
for (auto &BB : F) {

BasicBlock::iterator IP = BB.getFirstInsertionPt();
IRBuilder<> IRB(&(*IP));

if (AFL_R(100) >= inst_ratio) continue;

/* Make up cur_loc */

unsigned int cur_loc = AFL_R(MAP_SIZE);

ConstantInt *CurLoc = ConstantInt::get(Int32Ty, cur_loc);

/* Load prev_loc */

LoadInst *PrevLoc = IRB.CreateLoad(AFLPrevLoc);
PrevLoc->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(C, None));
Value *PrevLocCasted = IRB.CreateZExt(PrevLoc, IRB.getInt32Ty());

/* Load SHM pointer */
//shared memory table

LoadInst *MapPtr = IRB.CreateLoad(AFLMapPtr);
MapPtr->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(C, None));
Value *MapPtrIdx =
IRB.CreateGEP(MapPtr, IRB.CreateXor(PrevLocCasted, CurLoc));

/* Update bitmap */

LoadInst *Counter = IRB.CreateLoad(MapPtrIdx);
Counter->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(C, None));
Value *Incr = IRB.CreateAdd(Counter, ConstantInt::get(Int8Ty, 1));
IRB.CreateStore(Incr, MapPtrIdx)
->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(C, None));

/* Set prev_loc to cur_loc >> 1 */

StoreInst *Store =
IRB.CreateStore(ConstantInt::get(Int32Ty, cur_loc >> 1), AFLPrevLoc);
Store->setMetadata(M.getMDKindID("nosanitize"), MDNode::get(C, None));

inst_blocks++;

}

现在在回到fuzz侧,还记得forksrv的wait状态吗?fuzzer对于wait状态的解除是通过进行用例测试,在fork server启动完成后,一旦需要执行某个测试用例,则fuzzer会调用run_target()方法,在此方法中,便是通过命令管道,通知fork_server准备fork;并通过状态管道,获取子进程pid:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
if ((res = write(fsrv_ctl_fd, &prev_timed_out, 4)) != 4) {

if (stop_soon) return 0;
RPFATAL(res, "Unable to request new process from fork server (OOM?)");

}

if ((res = read(fsrv_st_fd, &child_pid, 4)) != 4) {

if (stop_soon) return 0;
RPFATAL(res, "Unable to request new process from fork server (OOM?)");

}

if (child_pid <= 0) FATAL("Fork server is misbehaving (OOM?)");

随后,fuzzer再次读取状态管道,获取子进程退出状态,并由此来判断子进程结束的原因,例如正常退出、超时、崩溃等,并进行相应的记录。

1
2
3
4
5
6
7
8
9
10
11
12
13
    if ((res = read(fsrv_st_fd, &status, 4)) != 4) {
...
/* Report outcome to caller. */

if (WIFSIGNALED(status) && !stop_soon) {

kill_signal = WTERMSIG(status);

if (child_timed_out && kill_signal == SIGKILL) return FAULT_TMOUT;

return FAULT_CRASH;

}

Fork Server(结合上述源码)

  • 【afl-fuzz.cc:!forksrv_pid】fuzzer进程执行fork()得到fork server进程,然后重定向两个管道作为通信接口,并关闭不必要的管道。其中设置了 SAN。然后执行 target。此为forksrv_init

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
       if (!forksrv_pid) {
    ...
    if (dup2(ctl_pipe[0], FORKSRV_FD) < 0) PFATAL("dup2() failed");
    if (dup2(st_pipe[1], FORKSRV_FD + 1) < 0) PFATAL("dup2() failed");
    ...
    close(ctl_pipe[0]);
    close(ctl_pipe[1]);
    close(st_pipe[0]);
    close(st_pipe[1]);
    ...
    setenv("ASAN_OPTIONS", "abort_on_error=1:"
    "detect_leaks=0:"
    "symbolize=0:"
    "allocator_may_return_null=1", 0);
    ...
    execv(target_path, argv);
  • 对于父进程(fuzzer),则会读取状态管道的信息,如果一切正常,则说明fork server创建完成。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
      fsrv_st_fd  = st_pipe[0]
    ...
    rlen = read(fsrv_st_fd, &status, 4);
    ...
    /* If we have a four-byte "hello" message from the server, we're all set. Otherwise, try to figure out what went wrong. */

    if (rlen == 4) {
    OKF("All right - fork server is up.");
    return;
    }

共享内存

作为fuzzer,AFL并不是像无头苍蝇那样对输入文件无脑地随机变化(其实也支持这种方式,即dumb模式),其最大特点就是会对target进行插桩,以辅助mutated input的生成。具体地,插桩后的target,会记录执行过程中的分支信息;随后,fuzzer便可以根据这些信息,判断这次执行的整体流程和代码覆盖情况。

AFL使用共享内存,来完成以上信息在fuzzer和target之间的传递。具体地,fuzzer在启动时,会执行setup_shm()方法进行配置。其首先调用shemget()分配一块共享内存,大小MAP_SIZE为64K:

1
shm_id = shmget(IPC_PRIVATE, MAP_SIZE, IPC_CREAT | IPC_EXCL | 0600);

分配成功后,该共享内存的标志符会被设置到环境变量中,从而之后fork()得到的子进程可以通过该环境变量,得到这块共享内存的标志符:

1
2
shm_str = alloc_printf("%d", shm_id);
if (!dumb_mode) setenv(SHM_ENV_VAR, shm_str, 1);

并且,fuzzer本身,会使用变量trace_bits来保存共享内存的地址:

1
trace_bits = shmat(shm_id, NULL, 0);

在每次target执行之前,fuzzer首先将该共享内容清零:

1
memset(trace_bits, 0, MAP_SIZE);

接下来,我们再来看看target是如何获取并使用这块共享内存的。相关代码同样也在上面提到的方法__afl_maybe_log()中。首先,会检查是否已经将共享内存映射完成:

1
2


分支信息的记录

[warning] AFL 保存的是 edges 执行次数而不是 blocks 执行次数,AFL是根据二元tuple(跳转的源地址和目标地址)来记录分支信息,从而获取target的执行流程和代码覆盖情况,其伪代码如下:

1
2
3
cur_location = <COMPILE_TIME_RANDOM>;
shared_mem[cur_location ^ prev_location]++;
prev_location = cur_location >> 1;

其中的代码在上述源码分析中可以找到(包含llvm_pass)

AFL文件变异

这一部分先挖坑,因为没有具体阅读源码,只是收集到的资料,源码部分之后会补上

bitflip

  • [自动检测token]: 在进行bitflip 1/1变异时,对于每个byte的最低位(least significant bit)翻转还进行了额外的处理:如果连续多个bytes的最低位被翻转后,程序的执行路径都未变化,而且与原始执行路径不一致,那么就把这一段连续的bytes判断是一条token。
  • [生成effector map]: 在对每个byte进行翻转时,如果其造成执行路径与原始路径不一致,就将该byte在effector map中标记为1,即“有效”的,否则标记为0,即“无效”的。

arithmetic

加减

interest

特殊语料库的替换

dictionary

havoc

splice

cycle

一个AFL优化策略

引用:

1
2
3
4
5
为此,我对ELF文件变异和objdump执行路径变异进行了简单的试验,发现许多”数据“bytes被翻
转后,确实能够引起执行路径的变化。但是,这些”数据“bytes往往是一块块分布在文件中的,
而每一块”数据“中的每个bytes被翻转后,执行路径往往是相同的。所以,我们就有了一个朴素
的想法:如果翻转一个byte引起执行路径变化,而且翻转该byte与翻转其前一个byte的执行路
径不同,此时才将其视为“有效”的。

# Fuzz

Comments

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×