GLIBC 堆概述

1. 背景

对于不同的应用来说，由于内存的需求各不相同等特性，因此目前堆的实现有很多种，具体如下

dlmalloc – General purpose allocator
ptmalloc2 – glibc
jemalloc – FreeBSD and Firefox
tcmalloc – Google
libumem – Solaris

ptmalloc2

目前 Linux 标准发行版中使用的堆分配器是 glibc 中的堆分配器：ptmalloc2
ptmalloc2 是从 dlmalloc 中 fork 出来的
在 fork 之后，添加了线程支持并于 2006 年发布，此后被集成到了 glibc 源码
主要是通过 malloc/free 函数来分配和释放内存块

2. malloc

第一次调用malloc

if size >= 128kb, malloc -> mmap -> sys_mmap
if size < 128kb, malloc -> brk -> sys_brk

malloc 与 free 函数背后的系统调用主要是 (s)brk 函数以及 mmap, munmap 函数。

系统调用

brk
- 函数原型：int brk(void *addr)
- 功能和作用：用于设置program_break指向的位置。
- 我们可以通过增加 brk 的大小来向操作系统申请内存。
sbrk()
- 函数原型：void *sbrk(intptr_t increment)
- 功能和作用：同 brk()，参数可以是负数。执行成功返回上一次 program_break 的值，可以设置参数为0返回当前的 program_break.
mmap()
- 函数原型：void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset)
- 功能和作用：当用户申请空间大于等于128kb，也就是0x20000字节时，不再使用 brk() 进行分配，改为使用 mmap
unmmap()
- 函数原型：int munmap(void *addr, size_t length)
- 功能和作用：对 mmap() 申请的空间进行回收。
主线程的 arena 又称为 main_arena，包含 start_brk 和 brk 中间的连续内存，当 main_arena 不够分配时，会使用 brk() 进行扩展。子线程 arena 可以有多片连续内存，但是大小是固定的，不可以扩展，如果不够用的话需要再次调用 mmap() 来分配。
子线程arena可以有多片连续内存，但是大小是固定的，不可以扩展，如果不够用的话需要再次调用mmap()来分配

初始时，堆的起始地址 start_brk 以及堆的当前末尾 brk 指向同一地址。根据是否开启ASLR，两者的具体位置会有所不同

不开启 ASLR 保护时，start_brk 以及 brk 会指向 data/bss 段的结尾。
开启 ASLR 保护时，start_brk 以及 brk 也会指向同一位置，只是这个位置是在 data/bss 段结尾后的随机偏移处。

具体效果如下图所示

当申请内存过大或者非主线程申请堆内存时， malloc 会使用 mmap 来创建独立的匿名映射段。匿名映射的目的主要是可以申请以0填充的内存，并且这块内存仅被调用进程所使用。回收这块内存时则实际使用的是 unmmap。

malloc 背后系统调用的详细介绍可以参考 Syscalls used by malloc

第二次调用malloc

只要分配的空间不超过128kb，则不会再次向内核申请空间，剩余堆空间不足以满足分配时才会调用 brk() 进行扩展。
即使将 main_arena 全部free，也不会立即把内存还给操作系统，此时内存由 glibc 进行管理。

3. Chunk

chunk 是 glibc 管理内存的基本单元。主要分为以下几类：

alloced chunk：已分配正在使用中的chunk。
free chunk：已经free的chunk。
top chunk：可以理解为地址的最高处，还没有分配过的chunk。
last remainder chunk：最后一次切分遗留的堆块，为了提高内存分配的局部连续性。

在程序的执行过程中，我们称由 malloc 申请的内存为 chunk 。这块内存在 ptmalloc 内部用 malloc_chunk 结构体来表示。当程序申请的 chunk 被 free 后，会被加入到相应的空闲管理列表bin中。

无论一个 chunk 的大小如何，处于分配状态还是释放状态，它们都使用一个统一的结构。虽然它们使用了同一个数据结构，但是根据是否被释放，它们的表现形式会有所不同。

/*
  This struct declaration is misleading (but accurate and necessary).
  It declares a "view" into memory allowing access to necessary
  fields at known offsets from a given base. See explanation below.
*/
struct malloc_chunk {

  INTERNAL_SIZE_T      prev_size;  /* Size of previous chunk (if free).  */
  INTERNAL_SIZE_T      size;       /* Size in bytes, including overhead. */

  struct malloc_chunk* fd;         /* double links -- used only if free. */
  struct malloc_chunk* bk;

  /* Only used for large blocks: pointer to next larger size.  */
  struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
  struct malloc_chunk* bk_nextsize;
};

一般来说，size_t 在 64 位中是 64 位无符号整数，32 位中是 32 位无符号整数。
我们称 prev_size 和 size 字段为 chunk header chunk 头

每个字段的具体的解释如下

prev_size
- 当前 chunk 的前一个 chunk 是指， 处于较低地址，物理相邻的前一地址 chunk，两个指针的地址差值即为前一chunk的实际大小(包含chunk 头)
- 前一个 chunk 处于被分配状态时，prev_size 存放前一个chunk 的用户数据。
- 前一个 chunk 处于空闲状态时，prev_size 存放前一个 chunk 的实际大小(包含chunk 头)。
- 提高内存空间的利用率
size
当前 chunk 的大小，大小必须是 2 * SIZE_SZ 的整数倍。如果申请的内存大小不是 2 * SIZE_SZ 的整数倍，会被转换为满足大小的最小的 2 * SIZE_SZ 的倍数。32 位系统中，SIZE_SZ 是 4；64 位系统中，SIZE_SZ 是 8。该字段的低三个比特位对 chunk 的大小没有影响，它们从高到低分别表示：
- NON_MAIN_ARENA(A)，记录当前 chunk 是否不属于主线程，1表示不属于，0表示属于。
- IS_MAPPED(M)，记录当前 chunk 是否是由 mmap 分配的。
- PREV_INUSE(P)，记录前一个 chunk 块是否被分配。一般来说，堆中第一个被分配的内存块的 size 字段的P位都会被设置为1，以便于防止访问前面的非法内存。当一个 chunk 的 size 的 P 位为 0 时，我们能通过 prev_size 字段来获取上一个 chunk 的大小以及地址。方便进行空闲chunk之间的合并。
- 标记bit位的宏定义
fd，bk
chunk 处于分配状态时，从 fd 字段开始是用户的数据。chunk 空闲时，会被添加到对应的空闲管理链表(bin)中，其字段的含义如下
- fd 指向下一个（非物理相邻）空闲的 chunk
- bk 指向上一个（非物理相邻）空闲的 chunk
- 通过 fd 和 bk 可以将空闲的 chunk 块加入到空闲的 chunk 块链表进行统一管理
fd_nextsize， bk_nextsize
与 fd, bk 类似，也是只有 chunk 空闲的时候才使用，不过其用于较大的 chunk（large chunk）。
- fd_nextsize 指向前一个与当前 chunk 大小不同的第一个空闲块，不包含 bin 的头指针。
- bk_nextsize 指向后一个与当前 chunk 大小不同的第一个空闲块，不包含 bin 的头指针。
- 一般空闲的 large chunk 在 fd 的遍历顺序中，按照由大到小的顺序排列。这样做可以避免在寻找合适chunk 时挨个遍历。

Allocated Chunk

一个已经分配的 chunk 的样子如下。我们称前两个字段称为 chunk header，后面的部分称为 user data。每次 malloc 申请得到的内存指针，其实指向 user data 的起始处。

当一个 chunk 处于使用状态时，它的下一个 chunk 的 prev_size 域无效，所以下一个 chunk 的该部分也可以被当前chunk使用。这就是chunk中的空间复用。

chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Size of previous chunk, if unallocated (P clear)  |   -------------
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+      chunk 头
        |             Size of chunk, in bytes                     |A|M|P|   -------------
  mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             User data starts here...                          .
        .                                                               .
        .             (malloc_usable_size() bytes)                      .
next    .                                                               |
chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             (size of chunk, but used for application data)    |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Size of next chunk, in bytes                |A|0|1|
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Free Chunk

被释放的 chunk 被记录在链表中（可能是循环双向链表，也可能是单向链表，由 chunk 的大小决定）。具体结构如下

chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Size of previous chunk, if unallocated (P clear)  |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
`head:' |             Size of chunk, in bytes                     |A|0|P|
  mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Forward pointer to next chunk in list             |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Back pointer to previous chunk in list            |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Unused space (may be 0 bytes long)                .
        .                                                               .
 next   .                                                               |
chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
`foot:' |             Size of chunk, in bytes                           |
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        |             Size of next chunk, in bytes                |A|0|0|
        +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

可以发现，如果一个 chunk 处于 free 状态，那么会有两个位置记录其相应的大小

本身的 size 字段会记录
它物理相邻的后一个 chunk 的 prev_size 字段会记录

一般情况下，物理相邻的两个空闲 chunk 会被合并为一个大的 chunk 。堆管理器会通过 prev_size 字段以及 size 字段合并两个物理相邻的空闲 chunk 块。

Top Chunk

程序第一次进行 malloc 的时候，heap 会被分为两块，一块给用户，剩下的那块就是 top chunk。其实，所谓的top chunk 就是处于当前堆的物理地址最高的 chunk。这个 chunk 不属于任何一个 bin，它的作用在于当所有的bin 都无法满足用户请求的大小时，如果其大小不小于指定的大小，就进行分配，并将剩下的部分作为新的 top chunk。否则，就对heap进行扩展后再进行分配。在main arena中通过sbrk扩展heap，而在thread arena中通过mmap分配新的heap。

/*
   Top

    The top-most available chunk (i.e., the one bordering the end of
    available memory) is treated specially. It is never included in
    any bin, is used only if no other chunk is available, and is
    released back to the system if it is very large (see
    M_TRIM_THRESHOLD).  Because top initially
    points to its own bin with initial zero size, thus forcing
    extension on the first malloc request, we avoid having any special
    code in malloc to check whether it even exists yet. But we still
    need to do so when getting memory from system, so we make
    initial_top treat the bin as a legal but unusable chunk during the
    interval between initialization and the first call to
    sysmalloc. (This is somewhat delicate, since it relies on
    the 2 preceding words to be zero during this interval as well.)
 */

/* Conveniently, the unsorted bin can be used as dummy top on first call */
#define initial_top(M) (unsorted_chunks(M))

需要注意的是，top chunk 的 prev_inuse 比特位始终为1，否则其前面的chunk就会被合并到top chunk中。

初始情况下，我们可以将 unsorted chunk 作为 top chunk。

Last Remainder

在用户使用 malloc 请求分配内存时，ptmalloc2 找到的 chunk 可能并不和申请的内存大小一致，这时候就将分割之后的剩余部分称之为 last remainder chunk，unsort bin 也会存这一块。top chunk 分割剩下的部分不会作为last remainder.

4. bin

概述

用户释放掉的 chunk 并不会马上归还给系统，ptmalloc 会统一管理 heap 和 mmap 映射区域中的空闲的chunk。当用户再一次请求分配内存时，ptmalloc 分配器会试图在空闲的chunk中挑选一块合适的给用户。这样可以避免频繁的系统调用，降低内存分配的开销。

在具体的实现中，ptmalloc 会根据空闲的 chunk 的大小以及使用状态将 chunk 初步分为4类：

fast bins
small bins
large bins
unsorted bin

每类中仍然有更细的划分，相似大小的 chunk 会用双向链表链接(fd, bk 指针)起来。也就是说，在每类 bin 的内部仍然会有多个互不相关的链表来保存不同大小的 chunk。

fastbinsY：用于存放 fastbin 的数组，里面有10(NFASTBINS)个大小不同的 fast bin
bins：也是一个 bin 数组，一共有126个 bin，按顺序分别是：

Bin 1 – Unsorted bin。字如其面，这里面的 chunk 没有进行排序，存储的 chunk 比较杂。
Bin 2 to Bin 63 – Small bin。同一个 small bin 链表中的 chunk 的大小相同。两个相邻索引的 small bin 链表中的 chunk 大小相差的字节数为2个机器字长，即32位相差8字节，64位相差16字节。
Bin 64 to Bin 126 – Large bin。large bins 中的每一个 bin 都包含一定范围内的 chunk，其中的 chunk 按 fd 指针的顺序从大到小排列。相同大小的 chunk 同样按照最近使用顺序排列。

此外，对于 bins 中的空闲 chunk 都会遵循一个原则：任意两个物理相邻的空闲chunk不能在一起, 会进行合并。

需要注意的是，并不是所有的 chunk 被释放后就立即被放到 bin 中。ptmalloc 为了提高分配的速度，会把一些小的 chunk 优先放到 fast bins 的容器内。而且，fastbin 容器中的 chunk 的使用标记总是被置位的，所以不满足上面的原则，换言之，对于 fastbin 中的 chunk 则不进行合并。

分配顺序: fast bin chunk -> unsorted bin chunk -> small bin chunk -> large bin chunk -> top chunk

glibc 2.26 版本引入了 tcache 机制后，优先考虑 tcache bin chunk

Fast Bin

大多数程序经常会申请以及释放一些比较小的内存块。如果将一些较小的 chunk 释放之后发现存在与之相邻的空闲的 chunk 并将它们进行合并，那么当下一次再次申请相应大小的 chunk 时，就需要对 chunk 进行分割，这样就大大降低了堆的利用效率。因为我们把大部分时间花在了合并、分割以及中间检查的过程中。因此，ptmalloc 中专门设计了 fast bin，对应的变量就是 malloc state 中的 fastbinsY

/*
   Fastbins

    An array of lists holding recently freed small chunks.  Fastbins
    are not doubly linked.  It is faster to single-link them, and
    since chunks are never removed from the middles of these lists,
    double linking is not necessary. Also, unlike regular bins, they
    are not even processed in FIFO order (they use faster LIFO) since
    ordering doesn't much matter in the transient contexts in which
    fastbins are normally used.

    Chunks in fastbins keep their inuse bit set, so they cannot
    be consolidated with other free chunks. malloc_consolidate
    releases all chunks in fastbins and consolidates them with
    other free chunks.
 */
typedef struct malloc_chunk *mfastbinptr;

/*
    This is in malloc_state.
    /* Fastbins */
    mfastbinptr fastbinsY[ NFASTBINS ];
*/

这类bin通常申请和释放的堆块都比较小，所以使用单链表结构，LIFO（后进先出）分配策略。
为了速度，fast bin不会进行合并，下一个chunk始终处于使用状态。
在fastbinsY数组里按照从小到大的顺序排列。
fast bin 链表的个数为10个，但是默认状态下只用前7个fast bin
64位系统中，默认 fast bin chunk 大小在 0x20 ~ 0x80 字节(包括 chunk 头)
当 MAX_FAST_SIZE 被设置为 0 时，系统就不会支持 fastbin
global_max_fast 可以设置 fastbin 最多支持的大小

需要特别注意的是，fastbin 范围的 chunk 的 inuse 始终被置为 1。因此它们不会和其它被释放的 chunk 合并。 malloc_consolidate 函数可以将 fastbin 中所有能和其它 chunk 合并的 chunk 合并在一起

/*
	Chunks in fastbins keep their inuse bit set, so they cannot
    be consolidated with other free chunks. malloc_consolidate
    releases all chunks in fastbins and consolidates them with
    other free chunks.
 */

Unsorted Bin

unsorted bin 可以视为空闲 chunk 回归其所属 bin 之前的缓冲区。

其在 glibc 中具体的说明如下

/*
   Unsorted chunks

    All remainders from chunk splits, as well as all returned chunks,
    are first placed in the "unsorted" bin. They are then placed
    in regular bins after malloc gives them ONE chance to be used before
    binning. So, basically, the unsorted_chunks list acts as a queue,
    with chunks being placed on it in free (and malloc_consolidate),
    and taken off (to be either used or placed in bins) in malloc.

    The NON_MAIN_ARENA flag is never set for unsorted chunks, so it
    does not have to be taken into account in size comparisons.
 */

unsorted bin 可以视为空闲 chunk 回归其所属 bin 之前的缓冲区，这主要是为了让 ptmalloc 能够有第二次机会重新利用最近释放的chunk(第一次机会就是 fast bin 机制)。

利用unsorted bin 也可以加快内存的分配和释放操作，因为整个操作都不再需要花费额外的时间去查找合适的bin了。

unsorted bin 的个数： 1个，处于 bins 数组下标 1 处。
unsorted bin 是一个循环双向链表，遍历顺序是FIFO。
在 unsorted bin 中，对chunk的大小并没有限制，任何大小的chunk都可以归属到 unsorted bin 中，在 fastbin 之后。

unsorted bin 中的空闲 chunk 处于乱序状态，主要有两个来源

当一个较大的 chunk 被分割成两半后，如果剩下的部分大于 MINSIZE，就会被放到 unsorted bin 中。
释放一个不属于 fast bin 的 chunk，并且该 chunk 不和 top chunk 紧邻时，该 chunk 会被首先放到 unsorted bin 中。

Small Bin

small bin chunk 大小为 0x20 ~ 0x400
small bin 链表的个数为62个，循环双向链表
单个 smallbin 链表中的 chunk 大小都是相同的，不同 smallbin 链表中的 chunk 大小是不同的
small bin 采用 FIFO 算法：释放时将新释放的 chunk 添加到链表的 front end(前端)，分配时从链表的rear end(尾端)中获取chunk。
就内存的分配和释放速度而言，small bin 比 larger bin快，但比 fast bin 和 unsorted bin 慢
物理地址相邻的free chunk需要进行合并操作，即合并成一个大的 free chunk

small bins 中每个 chunk 的大小与其所在的 bin 的 index 的关系为：chunk_size = 2 * SIZE_SZ * index，具体如下

下标	SIZE_SZ=4（32位）	SIZE_SZ=8（64位）
2	16	32
3	24	48
4	32	64
5	40	80
x	24x	28x
63	504	1008

比如对于 64 位系统来说，下标 2 对应的 small bin 中存储的 chunk 大小为均为 32 字节。

Large Bin

large bin chunk 的大小 >= 1024(0x400) 字节的
large bin链表的个数为63个，循环双向链表，被分为6组。
largechunk使用fd_nextsize、bk_nextsize连接起来的。
合并操作：类似于small bin。

large bins 中一共包括 63 个 bin，每个 bin 中的 chunk 的大小不一致，而是处于一定区间范围内。此外，这 63 个 bin 被分成了 6 组，每组 bin 中的 chunk 大小之间的公差一致，具体如下：

组	数量	公差
1	32	64B
2	16	512B
3	8	4096B
4	4	32768B
5	2	262144B
6	1	不限制

以 32 位平台的 large bin 为例，第一个 large bin 的起始 chunk 大小为 512(0x200) 字节，位于第一组，所以该bin 可以存储的 chunk 的大小范围为 [512,512+64)。

5. malloc_state

该结构用于管理堆，记录每个 arena 当前申请的内存的具体状态，比如说是否有空闲chunk，有什么大小的空闲chunk 等等。无论是 thread arena 还是 main arena，它们都只有一个 malloc state 结构。由于 thread 的 arena 可能有多个，malloc state结构会在最新申请的arena中。

注意，main arena 的 malloc_state 并不是 heap segment 的一部分，而是一个全局变量，存储在 libc.so 的数据段。泄漏libc基础地址的常见利用点。

其结构如下

struct malloc_state {
    /* Serialize access.  */
    __libc_lock_define(, mutex);

    /* Flags (formerly in max_fast).  */
    int flags;

    /* Fastbins */
    mfastbinptr fastbinsY[ NFASTBINS ];

    /* Base of the topmost chunk -- not otherwise kept in a bin */
    mchunkptr top;

    /* The remainder from the most recent split of a small request */
    mchunkptr last_remainder;

    /* Normal bins packed as described above */
    mchunkptr bins[ NBINS * 2 - 2 ];

    /* Bitmap of bins, help to speed up the process of determinating if a given bin is definitely empty.*/
    unsigned int binmap[ BINMAPSIZE ];

    /* Linked list, points to the next arena */
    struct malloc_state *next;

    /* Linked list for free arenas.  Access to this field is serialized
       by free_list_lock in arena.c.  */
    struct malloc_state *next_free;

    /* Number of threads attached to this arena.  0 if the arena is on
       the free list.  Access to this field is serialized by
       free_list_lock in arena.c.  */
    INTERNAL_SIZE_T attached_threads;

    /* Memory allocated from the system in this arena.  */
    INTERNAL_SIZE_T system_mem;
    INTERNAL_SIZE_T max_system_mem;
};

__libc_lock_define(, mutex);
- 该变量用于控制程序串行访问同一个分配区，当一个线程获取了分配区之后，其它线程要想访问该分配区，就必须等待该线程分配完成后才能够使用。
flags
- flags记录了分配区的一些标志，比如 bit0 记录了分配区是否有 fast bin chunk ，bit1 标识分配区是否能返回连续的虚拟地址空间。具体如下
fastbinsY[NFASTBINS]
- 存放每个 fast chunk 链表头部的指针
top
- 指向分配区的 top chunk
last_reminder
- 最新的 chunk 分割之后剩下的那部分
bins
- 用于存储 unstored bin，small bins 和 large bins 的 chunk 链表，总计126个。
binmap
- ptmalloc 用一个 bit 来标识某一个 bin 中是否包含空闲 chunk 。

6. Arena

一般情况下，无论是主线程还是新创建的线程，在第一次申请内存时，都会请求得到 arena。

arena 数量

对于不同系统，arena数量的约束如下

For 32 bit systems:
     Number of arena = 2 * number of cores.
For 64 bit systems:
     Number of arena = 8 * number of cores.

并不是每一个线程都会有对应的 arena。因为每个系统的核数是有限的，当线程数大于核数的二倍（超线程技术）时，就必然有线程处于等待状态，没有必要为每个线程分配一个 arena。主线程所对应的 arena 又被称为 main arena，与其他线程不同，main_arena 并不在申请的 heap 中，而是一个全局变量，在 libc.so 的数据段。

7. chunk 相关宏定义

chunk 与 mem 指针头部的转换

mem指向用户得到的内存的起始位置。

/* conversion from malloc headers to user pointers, and back */
#define chunk2mem(p) ((void *) ((char *) (p) + 2 * SIZE_SZ))
#define mem2chunk(mem) ((mchunkptr)((char *) (mem) -2 * SIZE_SZ))

最小的 chunk 大小

/* The smallest possible chunk */
#define MIN_CHUNK_SIZE (offsetof(struct malloc_chunk, fd_nextsize))

这里，offsetof 函数计算出 fd_nextsize 在 malloc_chunk 中的偏移，说明最小的 chunk 至少要包含 bk 指针。

最小申请的堆内存大小

用户最小申请的内存大小必须是 2 * SIZE_SZ 的最小整数倍。

注：就目前而看 MIN_CHUNK_SIZE 和 MINSIZE 大小是一致的，个人认为之所以要添加两个宏是为了方便以后修改 malloc_chunk 时方便一些。

/* The smallest size we can malloc is an aligned minimal chunk */
//MALLOC_ALIGN_MASK = 2 * SIZE_SZ -1
#define MINSIZE                                                                \
    (unsigned long) (((MIN_CHUNK_SIZE + MALLOC_ALIGN_MASK) &                   \
                      ~MALLOC_ALIGN_MASK))

检查分配给用户的内存是否对齐

2 * SIZE_SZ 大小对齐。

/* Check if m has acceptable alignment */
// MALLOC_ALIGN_MASK = 2 * SIZE_SZ -1
#define aligned_OK(m) (((unsigned long) (m) & MALLOC_ALIGN_MASK) == 0)

#define misaligned_chunk(p)                                                    \
    ((uintptr_t)(MALLOC_ALIGNMENT == 2 * SIZE_SZ ? (p) : chunk2mem(p)) &       \
     MALLOC_ALIGN_MASK)

请求字节数判断

/*
   Check if a request is so large that it would wrap around zero when
   padded and aligned. To simplify some other code, the bound is made
   low enough so that adding MINSIZE will also not wrap around zero.
 */

#define REQUEST_OUT_OF_RANGE(req)                                              \
    ((unsigned long) (req) >= (unsigned long) (INTERNAL_SIZE_T)(-2 * MINSIZE))

将用户请求内存大小转为实际分配内存大小

/* pad request bytes into a usable size -- internal version */
//MALLOC_ALIGN_MASK = 2 * SIZE_SZ -1
#define request2size(req)                                                      \
    (((req) + SIZE_SZ + MALLOC_ALIGN_MASK < MINSIZE)                           \
         ? MINSIZE                                                             \
         : ((req) + SIZE_SZ + MALLOC_ALIGN_MASK) & ~MALLOC_ALIGN_MASK)

/*  Same, except also perform argument check */

#define checked_request2size(req, sz)                                          \
    if (REQUEST_OUT_OF_RANGE(req)) {                                           \
        __set_errno(ENOMEM);                                                   \
        return 0;                                                              \
    }                                                                          \
    (sz) = request2size(req);

当一个 chunk 处于已分配状态时，它的物理相邻的下一个 chunk 的 prev_size 字段必然是无效的，故而这个字段就可以被当前这个 chunk 使用。这就是 ptmalloc 中 chunk 间的复用。具体流程如下

首先，利用 REQUEST_OUT_OF_RANGE 判断是否可以分配用户请求的字节大小的 chunk。
其次，需要注意的是用户请求的字节是用来存储数据的，即 chunk header 后面的部分。与此同时，由于chunk 间复用，所以可以使用下一个 chunk 的 prev_size 字段。因此，这里只需要再添加 SIZE_SZ 大小即可以完全存储内容。
由于系统中所允许的申请的 chunk 最小是 MINSIZE，所以与其进行比较。如果不满足最低要求，那么就需要直接分配 MINSIZE 字节。
如果大于的话，因为系统中申请的 chunk 需要 2 * SIZE_SZ 对齐，所以这里需要加上 MALLOC_ALIGN_MASK 以便于内存对齐。

8. heap_info

程序刚开始执行时，每个线程是没有 heap 区域的。
当其申请内存时，就需要一个结构来记录对应的信息，而 heap_info 的作用就是这个。
而且当该heap的资源被使用完后，就必须得再次申请内存了。
此外，一般申请的heap 是不连续的，因此需要记录不同heap之间的链接结构。

heap_info 是专门为从 Memory Mapping Segment 处申请的内存准备的，即为非主线程准备的。

主线程可以通过 sbrk() 函数扩展 program break location 获得（直到触及Memory Mapping Segment，只有一个heap，并且没有 heap_info 数据结构。

heap_info 在 glibc 中总体出现频率不高，了解即可。

/* A heap is a single contiguous memory region holding (coalesceable)
   malloc_chunks.  It is allocated with mmap() and always starts at an
   address aligned to HEAP_MAX_SIZE. */

typedef struct _heap_info
{
  mstate ar_ptr; /* Arena for this heap. */
  struct _heap_info *prev; /* Previous heap. */
  size_t size;   /* Current size in bytes. */
  size_t mprotect_size; /* Size in bytes that has been mprotected
                           PROT_READ|PROT_WRITE.  */
  /* Make sure the following data is properly aligned, particularly
     that sizeof (heap_info) + 2 * SIZE_SZ is a multiple of
     MALLOC_ALIGNMENT. */
  char pad[-6 * SIZE_SZ & MALLOC_ALIGN_MASK];
} heap_info;

9. TCache

待补充

tcache 是 glibc 2.26 (ubuntu 17.10) 之后引入的一种技术，目的是提升堆管理的性能。但提升性能的同时舍弃了很多安全检查，也因此有了很多新的利用方式。

这其实和 fastbin 很像，但又不一样。

……