android 在linux 4.12 内核对ion驱动的api 进行了修改,原来的一部分ioctl命令已经不存在了。
谷歌的ion 我个人觉的还是挺大的,system heap 内存分配的方式,其他的还有使用cma 分配等,不同的分配方式会调用linux不同的接口。这篇文章值只写下自己对system heap 的个人理解。ion相关代码在内核kernel\msm-4.14\drivers\staging\android\ion 路径下无论Android ion 最后调用那种heap 来分配内存。分配的buffer 都是放在linux dma-buf 这个结构中,dma-buf 是linux 中的一个框架,具体代码我并没有仔细去研究,根据ion中的使用来看,每个ion在分配的buffer 会存在dma-buf这个结构中,然后谷歌对这个buffer还有操作函数集ops ,也放到dma-buf中,在使用这个buffer时候实际上是间接调用dma-buf ops 来对这个buffer操作了,然后这个ops 函数在去调用heap 绑定的ops去实现。比如system heap,heap 创建时绑定了alloc。mmap,free,shrink等函数。dma-buf ops会最终调用这些函数。
在ion.c 文件中能够看到dma-buf ops 谷歌的实现
static const struct dma_buf_ops dma_buf_ops = {
.map_dma_buf = ion_map_dma_buf,
.unmap_dma_buf = ion_unmap_dma_buf,
.mmap = ion_mmap,
.release = ion_dma_buf_release,
.attach = ion_dma_buf_attach,
.detach = ion_dma_buf_detatch,
.begin_cpu_access = ion_dma_buf_begin_cpu_access,
.end_cpu_access = ion_dma_buf_end_cpu_access,
.begin_cpu_access_umapped = ion_dma_buf_begin_cpu_access_umapped,
.end_cpu_access_umapped = ion_dma_buf_end_cpu_access_umapped,
.begin_cpu_access_partial = ion_dma_buf_begin_cpu_access_partial,
.end_cpu_access_partial = ion_dma_buf_end_cpu_access_partial,
.map_atomic = ion_dma_buf_kmap,
.unmap_atomic = ion_dma_buf_kunmap,
.map = ion_dma_buf_kmap,
.unmap = ion_dma_buf_kunmap,
.vmap = ion_dma_buf_vmap,
.vunmap = ion_dma_buf_vunmap,
.get_flags = ion_dma_buf_get_flags,
};
在ion.h 中能够看到heap 必须实现的函数的定义
/**
* struct ion_heap_ops - ops to operate on a given heap
* @allocate: allocate memory
* @free: free memory
* @map_kernel map memory to the kernel
* @unmap_kernel unmap memory to the kernel
* @map_user map memory to userspace
*
* allocate, phys, and map_user return 0 on success, -errno on error.
* map_dma and map_kernel return pointer on success, ERR_PTR on
* error. @free will be called with ION_PRIV_FLAG_SHRINKER_FREE set in
* the buffer's private_flags when called from a shrinker. In that
* case, the pages being free'd must be truly free'd back to the
* system, not put in a page pool or otherwise cached.
*/
struct ion_heap_ops {
int (*allocate)(struct ion_heap *heap,
struct ion_buffer *buffer, unsigned long len,
unsigned long flags);
void (*free)(struct ion_buffer *buffer);
void * (*map_kernel)(struct ion_heap *heap, struct ion_buffer *buffer);
void (*unmap_kernel)(struct ion_heap *heap, struct ion_buffer *buffer);
int (*map_user)(struct ion_heap *mapper, struct ion_buffer *buffer,
struct vm_area_struct *vma);
int (*shrink)(struct ion_heap *heap, gfp_t gfp_mask, int nr_to_scan);
};
在正式进入到分配内存给ion环节前,有一些概念应该时要了解的,struct sg_table 此结构时linux中保存物理页面散列表的。具体解释建议看蜗窝科技的这篇文章Linux kernel scatterlist API介绍,简单的接受就是此结构保存了物理页面的散列表,system 在分配的时候并不是分配出来的时一个连续的物理页面,可以不连续,只要虚拟地址连续就可以,比如camera申请了12M的buffer,此时从伙伴中拿出来的buffer 可能时多个64K的页面。64k内部时连续的,当时64k页面之间并不是连续的。
伙伴系统: 这个晚上资料很多,概念也比较简单,伙伴系统通过哈希表来管理物理内存。分配的时候根据2的order (几)次方分配对应的物理页面数。
文件描述符fd,ion分配内存后最后返回的是fd,fd通过binder传输到不同的进程,然后在映射成进程的虚拟地址。fd 只能在一个进程内使用,传递到其他进程时时通过Android 的binder 机制,简单概括就是binder首先从要从其他进程分配个fd,然后让当前的进程fd对应的内核的file 结构体和其他进程的fd绑定。
ion 系统分配内存时在打开设备后调用ioctl函数实现的
case ION_IOC_ALLOC:
{
int fd;
fd = ion_alloc_fd(data.allocation.len,
data.allocation.heap_id_mask,
data.allocation.flags);
if (fd < 0)
return fd;
data.allocation.fd = fd;
break;
}
可以看到调用了ion_alloc_fd函数产生了一个fd,ion_alloc_fd函数有三个参数,第一个参数时分配的buffer长度,第二个时heap的选择,ion中有很多heap类型,本文只将system heap(其他heap 代码看起来比较难),第三个参数时标志位,在分配buffer的时候还有很多属性通过这个标志位来判断,比如分配的是否时camer内存,是否需要安全内存分配。函数ion_alloc_fd 实现如下:
int ion_alloc_fd(size_t len, unsigned int heap_id_mask, unsigned int flags)
{
int fd;
struct dma_buf *dmabuf;
dmabuf = ion_alloc_dmabuf(len, heap_id_mask, flags);
if (IS_ERR(dmabuf)) {
return PTR_ERR(dmabuf);
}
fd = dma_buf_fd(dmabuf, O_CLOEXEC);
if (fd < 0)
dma_buf_put(dmabuf);
return fd;
}
首先是产生产生了一个dma_buf 然后将这个dma-buf 转换成fd。dma-buf 定义位于kernel\msm-4.14\include\linux\dma-buf.h文章将中,每个变量的含义官方有解释:
/**
* struct dma_buf - shared buffer object
* @size: size of the buffer
* @file: file pointer used for sharing buffers across, and for refcounting.
* @attachments: list of dma_buf_attachment that denotes all devices attached.
* @ops: dma_buf_ops associated with this buffer object.
* @lock: used internally to serialize list manipulation, attach/detach and vmap/unmap
* @vmapping_counter: used internally to refcnt the vmaps
* @vmap_ptr: the current vmap ptr if vmapping_counter > 0
* @exp_name: name of the exporter; useful for debugging.
* @name: unique name for the buffer
* @ktime: time (in jiffies) at which the buffer was born
* @owner: pointer to exporter module; used for refcounting when exporter is a
* kernel module.
* @list_node: node for dma_buf accounting and debugging.
* @priv: exporter specific private data for this buffer object.
* @resv: reservation object linked to this dma-buf
* @poll: for userspace poll support
* @cb_excl: for userspace poll support
* @cb_shared: for userspace poll support
*
* This represents a shared buffer, created by calling dma_buf_export(). The
* userspace representation is a normal file descriptor, which can be created by
* calling dma_buf_fd().
*
* Shared dma buffers are reference counted using dma_buf_put() and
* get_dma_buf().
*
* Device DMA access is handled by the separate &struct dma_buf_attachment.
*/
struct dma_buf {
size_t size;
struct file *file;
struct list_head attachments;
const struct dma_buf_ops *ops;
struct mutex lock;
unsigned vmapping_counter;
void *vmap_ptr;
const char *exp_name;
char *name;
ktime_t ktime;
struct module *owner;
struct list_head list_node;
void *priv;
struct reservation_object *resv;
/* poll support */
wait_queue_head_t poll;
struct dma_buf_poll_cb_t {
struct dma_fence_cb cb;
wait_queue_head_t *poll;
unsigned long active;
} cb_excl, cb_shared;
struct list_head refs;
};
struct file 这个比较重要,这个会涉及将来的fd,实际上fd 是和struct file 连接起来的。 fd可以多个使用同一个struct file 者也是mmap映射fd 时候能够映射为多个虚拟地址的原因。
ion_alloc_dmabuf函数位于kernel\msm-4.14\drivers\staging\android\ion\ion.c 文件中:
struct dma_buf *ion_alloc_dmabuf(size_t len, unsigned int heap_id_mask,
unsigned int flags)
{
struct ion_device *dev = internal_dev;
struct ion_buffer *buffer = NULL;
struct ion_heap *heap;
DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
struct dma_buf *dmabuf;
char task_comm[TASK_COMM_LEN];
pr_debug("%s: len %zu heap_id_mask %u flags %x\n", __func__,
len, heap_id_mask, flags);
/*
* traverse the list of heaps available in this system in priority
* order. If the heap type is supported by the client, and matches the
* request of the caller allocate from it. Repeat until allocate has
* succeeded or all heaps have been tried
*/
len = PAGE_ALIGN(len);
if (!len)
return ERR_PTR(-EINVAL);
down_read(&dev->lock);
plist_for_each_entry(heap, &dev->heaps, node) {
/* if the caller didn't specify this heap id */
if (!((1 << heap->id) & heap_id_mask))
continue;
buffer = ion_buffer_create(heap, dev, len, flags);
if (!IS_ERR(buffer) || PTR_ERR(buffer) == -EINTR)
break;
}
up_read(&dev->lock);
if (!buffer)
return ERR_PTR(-ENODEV);
if (IS_ERR(buffer))
return ERR_CAST(buffer);
get_task_comm(task_comm, current->group_leader);
exp_info.ops = &dma_buf_ops;
exp_info.size = buffer->size;
exp_info.flags = O_RDWR;
exp_info.priv = buffer;
exp_info.exp_name = kasprintf(GFP_KERNEL, "%s-%s-%d-%s", KBUILD_MODNAME,
heap->name, current->tgid, task_comm);
dmabuf = dma_buf_export(&exp_info);
if (IS_ERR(dmabuf)) {
_ion_buffer_destroy(buffer);
kfree(exp_info.exp_name);
}
return dmabuf;
}
PAGE_ALIGN 这个宏长度的页面对齐(向上对齐),分配的buffer的大小假如是5K这里是将转换成8K,因为页面时以4k为大小的,与之对应的还有向下对齐,5k将转换为4k。
plist_for_each_entry 将从所有的heap中查找对应的heap 类型,并执行这个heap对应的分配buffer函数,这里我们假定这个heap时system heap。
在手机中查看system heap相关的信息,在adb shell 进入/sys/kernel/debug/ion/heaps
执行cat system
uncached pool = 349003776 cached pool = 1063071744 secure pool = 0
pool total (uncached + cached + secure) = 1412075520
可以看到system heap中有三个pool ,这三个pool是谷歌设置的三个存放物理页面的池。也可以自己加pool。
找到对应的heap后开始执行ion_buffer_create函数创建ions buffer,定义位于kernel\msm-4.14\drivers\staging\android\ion\ion.h
/**
* struct ion_buffer - metadata for a particular buffer
* @ref: reference count
* @node: node in the ion_device buffers tree
* @dev: back pointer to the ion_device
* @heap: back pointer to the heap the buffer came from
* @flags: buffer specific flags
* @private_flags: internal buffer specific flags
* @size: size of the buffer
* @priv_virt: private data to the buffer representable as
* a void *
* @lock: protects the buffers cnt fields
* @kmap_cnt: number of times the buffer is mapped to the kernel
* @vaddr: the kernel mapping if kmap_cnt is not zero
* @sg_table: the sg table for the buffer if dmap_cnt is not zero
* @vmas: list of vma's mapping this buffer
*/
struct ion_buffer {
union {
struct rb_node node;
struct list_head list;
};
struct ion_device *dev;
struct ion_heap *heap;
unsigned long flags;
unsigned long private_flags;
size_t size;
void *priv_virt;
/* Protect ion buffer */
struct mutex lock;
int kmap_cnt;
void *vaddr;
struct sg_table *sg_table;
struct list_head attachments;
struct list_head vmas;
};
前面介绍的struct sg_table 就放在ion buffer中,用来保存物理页面散列表。
/* this function should only be called while dev->lock is held */
static struct ion_buffer *ion_buffer_create(struct ion_heap *heap,
struct ion_device *dev,
unsigned long len,
unsigned long flags)
{
struct ion_buffer *buffer;
struct sg_table *table;
int ret;
buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
if (!buffer)
return ERR_PTR(-ENOMEM);
buffer->heap = heap;
buffer->flags = flags;
ret = heap->ops->allocate(heap, buffer, len, flags);
if (ret) {
if (!(heap->flags & ION_HEAP_FLAG_DEFER_FREE))
goto err2;
if (ret == -EINTR)
goto err2;
ion_heap_freelist_drain(heap, 0);
ret = heap->ops->allocate(heap, buffer, len, flags);
if (ret)
goto err2;
}
if (buffer->sg_table == NULL) {
WARN_ONCE(1, "This heap needs to set the sgtable");
ret = -EINVAL;
goto err1;
}
spin_lock(&heap->stat_lock);
heap->num_of_buffers++;
heap->num_of_alloc_bytes += len;
if (heap->num_of_alloc_bytes > heap->alloc_bytes_wm)
heap->alloc_bytes_wm = heap->num_of_alloc_bytes;
spin_unlock(&heap->stat_lock);
table = buffer->sg_table;
buffer->dev = dev;
buffer->size = len;
buffer->dev = dev;
buffer->size = len;
INIT_LIST_HEAD(&buffer->attachments);
INIT_LIST_HEAD(&buffer->vmas);
mutex_init(&buffer->lock);
if (IS_ENABLED(CONFIG_ION_FORCE_DMA_SYNC)) {
int i;
struct scatterlist *sg;
/*
* this will set up dma addresses for the sglist -- it is not
* technically correct as per the dma api -- a specific
* device isn't really taking ownership here. However, in
* practice on our systems the only dma_address space is
* physical addresses.
*/
for_each_sg(table->sgl, sg, table->nents, i) {
sg_dma_address(sg) = sg_phys(sg);
sg_dma_len(sg) = sg->length;
}
}
mutex_lock(&dev->buffer_lock);
ion_buffer_add(dev, buffer);
mutex_unlock(&dev->buffer_lock);
atomic_long_add(len, &heap->total_allocated);
return buffer;
err1:
heap->ops->free(buffer);
err2:
kfree(buffer);
return ERR_PTR(ret);
}
此函数最主要的是通过ret = heap->ops->allocate(heap, buffer, len, flags);函数调用heap对应的分配函数。其他的代码是一链表和sg_table的赋值。
systeam 的alloc函数位于kernel\msm-4.14\drivers\staging\android\ion\ion_system_heap.c中
static struct ion_heap_ops system_heap_ops = {
.allocate = ion_system_heap_allocate,
.free = ion_system_heap_free,
.map_kernel = ion_heap_map_kernel,
.unmap_kernel = ion_heap_unmap_kernel,
.map_user = ion_heap_map_user,
.shrink = ion_system_heap_shrink,
};
allocate 实现函数是ion_system_heap_allocate 源码如下:
static int ion_system_heap_allocate(struct ion_heap *heap,
struct ion_buffer *buffer,
unsigned long size,
unsigned long flags)
{
struct ion_system_heap *sys_heap = container_of(heap,
struct ion_system_heap,
heap);
struct sg_table *table;
struct sg_table table_sync = {0};
struct scatterlist *sg;
struct scatterlist *sg_sync;
int ret = -ENOMEM;
struct list_head pages;
struct list_head pages_from_pool;
struct page_info *info, *tmp_info;
int i = 0;
unsigned int nents_sync = 0;
unsigned long size_remaining = PAGE_ALIGN(size);
unsigned int max_order = orders[0];
struct pages_mem data;
unsigned int sz;
int vmid = get_secure_vmid(buffer->flags);
if (size / PAGE_SIZE > totalram_pages / 2)
return -ENOMEM;
if (ion_heap_is_system_heap_type(buffer->heap->type) &&
is_secure_vmid_valid(vmid)) {
pr_info("%s: System heap doesn't support secure allocations\n",
__func__);
return -EINVAL;
}
data.size = 0;
INIT_LIST_HEAD(&pages);
INIT_LIST_HEAD(&pages_from_pool);
while (size_remaining > 0) {
if (is_secure_vmid_valid(vmid))
info = alloc_from_pool_preferred(
sys_heap, buffer, size_remaining,
max_order);
else
info = alloc_largest_available(
sys_heap, buffer, size_remaining,
max_order);
if (IS_ERR(info)) {
ret = PTR_ERR(info);
goto err;
}
sz = (1 << info->order) * PAGE_SIZE;
if (info->from_pool) {
list_add_tail(&info->list, &pages_from_pool);
} else {
list_add_tail(&info->list, &pages);
data.size += sz;
++nents_sync;
}
size_remaining -= sz;
max_order = info->order;
i++;
}
ret = ion_heap_alloc_pages_mem(&data);
if (ret)
goto err;
table = kzalloc(sizeof(*table), GFP_KERNEL);
if (!table) {
ret = -ENOMEM;
goto err_free_data_pages;
}
ret = sg_alloc_table(table, i, GFP_KERNEL);
if (ret)
goto err1;
if (nents_sync) {
ret = sg_alloc_table(&table_sync, nents_sync, GFP_KERNEL);
if (ret)
goto err_free_sg;
}
i = 0;
sg = table->sgl;
sg_sync = table_sync.sgl;
/*
* We now have two separate lists. One list contains pages from the
* pool and the other pages from buddy. We want to merge these
* together while preserving the ordering of the pages (higher order
* first).
*/
do {
info = list_first_entry_or_null(&pages, struct page_info, list);
tmp_info = list_first_entry_or_null(&pages_from_pool,
struct page_info, list);
if (info && tmp_info) {
if (info->order >= tmp_info->order) {
i = process_info(info, sg, sg_sync, &data, i);
sg_sync = sg_next(sg_sync);
} else {
i = process_info(tmp_info, sg, 0, 0, i);
}
} else if (info) {
i = process_info(info, sg, sg_sync, &data, i);
sg_sync = sg_next(sg_sync);
} else if (tmp_info) {
i = process_info(tmp_info, sg, 0, 0, i);
}
sg = sg_next(sg);
} while (sg);
if (nents_sync) {
if (vmid > 0) {
ret = ion_hyp_assign_sg(&table_sync, &vmid, 1, true);
if (ret)
goto err_free_sg2;
}
}
buffer->sg_table = table;
if (nents_sync)
sg_free_table(&table_sync);
ion_heap_free_pages_mem(&data);
return 0;
err_free_sg2:
/* We failed to zero buffers. Bypass pool */
buffer->private_flags |= ION_PRIV_FLAG_SHRINKER_FREE;
if (vmid > 0)
ion_hyp_unassign_sg(table, &vmid, 1, true, false);
for_each_sg(table->sgl, sg, table->nents, i)
free_buffer_page(sys_heap, buffer, sg_page(sg),
get_order(sg->length));
if (nents_sync)
sg_free_table(&table_sync);
err_free_sg:
sg_free_table(table);
err1:
kfree(table);
err_free_data_pages:
ion_heap_free_pages_mem(&data);
err:
list_for_each_entry_safe(info, tmp_info, &pages, list) {
free_buffer_page(sys_heap, buffer, info->page, info->order);
kfree(info);
}
list_for_each_entry_safe(info, tmp_info, &pages_from_pool, list) {
free_buffer_page(sys_heap, buffer, info->page, info->order);
kfree(info);
}
return ret;
}
ion_system_heap_allocate 函数比较长,此函数的重点我觉的是 while 这块代码
while (size_remaining > 0) {
if (is_secure_vmid_valid(vmid))
info = alloc_from_pool_preferred(
sys_heap, buffer, size_remaining,
max_order);
else
info = alloc_largest_available(
sys_heap, buffer, size_remaining,
max_order);
if (IS_ERR(info)) {
ret = PTR_ERR(info);
goto err;
}
sz = (1 << info->order) * PAGE_SIZE;
if (info->from_pool) {
list_add_tail(&info->list, &pages_from_pool);
} else {
list_add_tail(&info->list, &pages);
data.size += sz;
++nents_sync;
}
size_remaining -= sz;
max_order = info->order;
i++;
}
ret = ion_heap_alloc_pages_mem(&data);
size_remaining 还是页对齐的 unsigned long size_remaining = PAGE_ALIGN(size);
整个while函数就是不断的从pool或者伙伴系统中取物理页面,每次取完后size_remaining 减去对应的大小,不断的重复直到最后size_remaining 为0,代表需要的buffer 已经全部取出。刚开始分配buffer的时候pool中是没有buffer进行分配的,是调用linux函数接口从伙伴系统中分配的。
while中根据is_secure_vmid_valid 进行了判断调用了不同的分配函数alloc_from_pool_preferred函数主要是从secure pool 取分配。
static struct page_info *alloc_from_pool_preferred(
struct ion_system_heap *heap, struct ion_buffer *buffer,
unsigned long size, unsigned int max_order)
{
struct page *page;
struct page_info *info;
int i;
if (buffer->flags & ION_FLAG_POOL_FORCE_ALLOC)
goto force_alloc;
info = kmalloc(sizeof(*info), GFP_KERNEL);
if (!info)
return ERR_PTR(-ENOMEM);
for (i = 0; i < NUM_ORDERS; i++) {
if (size < order_to_size(orders[i]))
continue;
if (max_order < orders[i])
continue;
page = alloc_from_secure_pool_order(heap, buffer, orders[i]);
if (IS_ERR(page))
continue;
info->page = page;
info->order = orders[i];
info->from_pool = true;
INIT_LIST_HEAD(&info->list);
return info;
}
page = split_page_from_secure_pool(heap, buffer);
if (!IS_ERR(page)) {
info->page = page;
info->order = 0;
info->from_pool = true;
INIT_LIST_HEAD(&info->list);
return info;
}
kfree(info);
force_alloc:
return alloc_largest_available(heap, buffer, size, max_order);
}
ION_FLAG_POOL_FORCE_ALLOC 判断了是否调用强制分配,如果强制分配会调用alloc_largest_available函数最后会直接带调用linux 函数从伙伴系统中分配物理页面。关于struct page 这个结构体的介绍可以参考《Linux 物理内存描述》链接
alloc_from_pool_preferred 核心是for循环,这里通过for 寻找合理的物理页面大小取分配,我们知道在伙伴系统是哈希表维护了2 的order次方的物理页面,在所有的pool中页存在这个原理,不过维护的通过数组的方式,通常只有2 的0 次方,和2的4次方。在
kernel\msm-4.14\drivers\staging\android\ion\ion_system_heap.h 中可以看到具体的定义
#ifndef CONFIG_ALLOC_BUFFERS_IN_4K_CHUNKS
#if defined(CONFIG_IOMMU_IO_PGTABLE_ARMV7S)
static const unsigned int orders[] = {8, 4, 0};
#else
static const unsigned int orders[] = {4, 0};
#endif
#else
static const unsigned int orders[] = {0};
#endif
#define NUM_ORDERS ARRAY_SIZE(orders)
根据我的测试目前手机应该是走的 orders[] = {4, 0}; 也就是说申请的物理页面时4k 或者时64k。
回到alloc_from_pool_preferred函数中的for循环,假定时 orders[] = {4, 0}
static inline unsigned int order_to_size(int order)
{
return PAGE_SIZE << order;
}
PAGE_SIZE 是物理页面大小,一般默认都是4k,armv8是支持物理页面4k,16k,64k。假定系统用的4k,那么开始时候就是2的4次放 乘以16 就是64k。if (size < order_to_size(orders[i])) 这句代码首先判断了要分配的页面大小是否小于64k,如果小于那就不从这个order对应的数组分。因为此order存放的都是连续的64K 的物理页面如果分配的buffer比64k小那么以为着必须拆分才行,物理页面分配都是已经找最合适的大小。所以这里size比order_to_size 小会直接continue 跳过后面们继续从order中找。64k后就是4k页面理论上通过页向上对齐不会有比这个页面还小的了。如果 orders[] 不是4,0 ,设置更多的数16,8,4,for循环会遍历查找,如果最后不是2的 0次方,比如是2 的1次方那么还存在for循环还是找不合适的orders问题,所以会跳出for循环进行也页面分割,从大的物理页面中分出合适的。调用split_page_from_secure_pool函数。
struct page *split_page_from_secure_pool(struct ion_system_heap *heap,
struct ion_buffer *buffer)
{
int i, j;
struct page *page;
unsigned int order;
mutex_lock(&heap->split_page_mutex);
/*
* Someone may have just split a page and returned the unused portion
* back to the pool, so try allocating from the pool one more time
* before splitting. We want to maintain large pages sizes when
* possible.
*/
page = alloc_from_secure_pool_order(heap, buffer, 0);
if (!IS_ERR(page))
goto got_page;
for (i = NUM_ORDERS - 2; i >= 0; i--) {
order = orders[i];
page = alloc_from_secure_pool_order(heap, buffer, order);
if (IS_ERR(page))
continue;
split_page(page, order);
break;
}
/*
* Return the remaining order-0 pages to the pool.
* SetPagePrivate flag to mark memory as secure.
*/
if (!IS_ERR(page)) {
for (j = 1; j < (1 << order); j++) {
SetPagePrivate(page + j);
free_buffer_page(heap, buffer, page + j, 0);
}
}
got_page:
mutex_unlock(&heap->split_page_mutex);
return page;
}
page = alloc_from_secure_pool_order(heap, buffer, 0); 从order 数组0 中分配一个页,也就是此时pool中最后的物理页面。这里的设计思想我猜是如果order[0]都无法分配出来就直接报错,下面for 循环应该是像注释说的多次尝试。split_page 位于
kernel\msm-4.14\mm\page_alloc.c page_alloc.c 存放伙伴系统的核心的接口函数后面还会用里面的分配内存的函数。split_page函数没太看懂内核中的实现。split_page_from_secure_pool 从物理页面分割出来的出来的页面会在最后放到info中
page = split_page_from_secure_pool(heap, buffer);
if (!IS_ERR(page)) {
info->page = page;
info->order = 0;
info->from_pool = true;
INIT_LIST_HEAD(&info->list);
return info;
}
回到alloc_from_pool_preferred函数中继续看alloc_from_secure_pool_order 函数的执行
struct page *alloc_from_secure_pool_order(struct ion_system_heap *heap,
struct ion_buffer *buffer,
unsigned long order)
{
int vmid = get_secure_vmid(buffer->flags);
struct ion_page_pool *pool;
if (!is_secure_vmid_valid(vmid))
return ERR_PTR(-EINVAL);
pool = heap->secure_pools[vmid][order_to_index(order)];
return ion_page_pool_alloc_pool_only(pool);
}
函数比较简单主要是根据order找到对应的pool,然后调用
/*
* Tries to allocate from only the specified Pool and returns NULL otherwise
*/
struct page *ion_page_pool_alloc_pool_only(struct ion_page_pool *pool)
{
struct page *page = NULL;
if (!pool)
return ERR_PTR(-EINVAL);
if (mutex_trylock(&pool->mutex)) {
if (pool->high_count)
page = ion_page_pool_remove(pool, true);
else if (pool->low_count)
page = ion_page_pool_remove(pool, false);
mutex_unlock(&pool->mutex);
}
if (!page)
return ERR_PTR(-ENOMEM);
return page;
}
函数从pool中取page。这里分为高端内存和低端,如果是4G内存空间 那么高端内存是指系统使用的3G-4G空间,这里使用高低内存是在从linux 伙伴系统取时候赋值给pool的。
回到ion_system_heap_allocate 的while函数中,如果不是从secure pool分配buffer。那么会调用alloc_largest_available函数
static struct page_info *alloc_largest_available(struct ion_system_heap *heap,
struct ion_buffer *buffer,
unsigned long size,
unsigned int max_order)
{
struct page *page;
struct page_info *info;
int i;
bool from_pool;
info = kmalloc(sizeof(*info), GFP_KERNEL);
if (!info)
return ERR_PTR(-ENOMEM);
for (i = 0; i < NUM_ORDERS; i++) {
if (size < order_to_size(orders[i]))
continue;
if (max_order < orders[i])
continue;
from_pool = !(buffer->flags & ION_FLAG_POOL_FORCE_ALLOC);
page = alloc_buffer_page(heap, buffer, orders[i], &from_pool);
if (IS_ERR(page))
continue;
info->page = page;
info->order = orders[i];
info->from_pool = from_pool;
INIT_LIST_HEAD(&info->list);
return info;
}
kfree(info);
return ERR_PTR(-ENOMEM);
}
这里ION_FLAG_POOL_FORCE_ALLOC也判断了是否需要强制分配如果需要强制分配那么将不会从pool分配。然后调用alloc_buffer_page函数
static struct page *alloc_buffer_page(struct ion_system_heap *heap,
struct ion_buffer *buffer,
unsigned long order,
bool *from_pool)
{
bool cached = ion_buffer_cached(buffer);
struct page *page;
struct ion_page_pool *pool;
int vmid = get_secure_vmid(buffer->flags);
struct device *dev = heap->heap.priv;
if (vmid > 0)
pool = heap->secure_pools[vmid][order_to_index(order)];
else if (!cached)
pool = heap->uncached_pools[order_to_index(order)];
else
pool = heap->cached_pools[order_to_index(order)];
page = ion_page_pool_alloc(pool, from_pool);
if (IS_ERR(page))
return page;
if ((MAKE_ION_ALLOC_DMA_READY && vmid <= 0) || !(*from_pool))
ion_pages_sync_for_device(dev, page, PAGE_SIZE << order,
DMA_BIDIRECTIONAL);
return page;
}
这里根据从那个pool 中分配获得了pool 然后调用了ion_page_pool_alloc函数同时将pool和是否需要从pool传递下去。
struct page *ion_page_pool_alloc(struct ion_page_pool *pool, bool *from_pool)
{
struct page *page = NULL;
BUG_ON(!pool);
if (fatal_signal_pending(current))
return ERR_PTR(-EINTR);
if (*from_pool && mutex_trylock(&pool->mutex)) {
if (pool->high_count)
page = ion_page_pool_remove(pool, true);
else if (pool->low_count)
page = ion_page_pool_remove(pool, false);
mutex_unlock(&pool->mutex);
}
if (!page) {
page = ion_page_pool_alloc_pages(pool);
*from_pool = false;
}
if (!page)
return ERR_PTR(-ENOMEM);
return page;
}
如果从pool中分配page失败或者不需要从pool分配那么将会调用ion_page_pool_alloc_pages函数。ion_page_pool_alloc_pages实际上是调用了linux 伙伴系统分配接口
static void *ion_page_pool_alloc_pages(struct ion_page_pool *pool)
{
struct page *page = alloc_pages(pool->gfp_mask, pool->order);
return page;
}
回到ion_system_heap_allocate函数中的while部分
sz = (1 << info->order) * PAGE_SIZE;
if (info->from_pool) {
list_add_tail(&info->list, &pages_from_pool);
} else {
list_add_tail(&info->list, &pages);
data.size += sz;
++nents_sync;
}
size_remaining -= sz;
max_order = info->order;
i++;
由于分配出来的page都保存到在info中,根据是否是从pool中分配的会加入到不同的链表中,info中的order 保存的是2的几次方,将它乘以物理页面大小,就会得到这次分配buffer大小,然后用总的减去这次分配出来的(size_remaining -= sz;)在while后面就是将page加入到page表中。
这里第一次使用pool中都是没有page 的都是从linux 伙伴系统中那出来,pool 存放的page 是在释放page 的时候保存到里面的。
回到ion_alloc_fd 函数,在产生dma-buf 后需要根据这个dma-buf产生fd调用
526int dma_buf_fd(struct dma_buf *dmabuf, int flags)
527{
528 int fd;
529
530 if (!dmabuf || !dmabuf->file)
531 return -EINVAL;
532
533 fd = get_unused_fd_flags(flags);
534 if (fd < 0)
535 return fd;
536
537 fd_install(fd, dmabuf->file);
538
539 return fd;
这里调用了linux 提供的函数 get_unused_fd_flags获得一个fd号,然后将dma-buf 的file 和fd绑定。
这个struct file 的获取是在前面ion_alloc_dmabuf函数中,最后在获取完成buffer后调用了dma_buf_export函数,这个函数
87 file = anon_inode_getfile(bufname, &dma_buf_fops, dmabuf,
488 exp_info->flags);
489 if (IS_ERR(file)) {
490 ret = PTR_ERR(file);
491 goto err_dmabuf;
492 }
493
可以看到申请file 并且绑定了前面说道的dma_buf_ops 这样实际上通过fd就可以调用dma_buf_ops。
void ion_system_heap_free(struct ion_buffer *buffer)
{
struct ion_heap *heap = buffer->heap;
struct ion_system_heap *sys_heap = container_of(heap,
struct ion_system_heap,
heap);
struct sg_table *table = buffer->sg_table;
struct scatterlist *sg;
int i;
int vmid = get_secure_vmid(buffer->flags);
if (!(buffer->private_flags & ION_PRIV_FLAG_SHRINKER_FREE) &&
!(buffer->flags & ION_FLAG_POOL_FORCE_ALLOC)) {
if (vmid < 0)
ion_heap_buffer_zero(buffer);
} else if (vmid > 0) {
if (ion_hyp_unassign_sg(table, &vmid, 1, true, false))
return;
}
for_each_sg(table->sgl, sg, table->nents, i)
free_buffer_page(sys_heap, buffer, sg_page(sg),
get_order(sg->length));
sg_free_table(table);
kfree(table);
}
此函数前面是一些变量的判断,重点在for_each_sg 将散列表中的物理页调用free_buffer_page 函数释放。
/*
* For secure pages that need to be freed and not added back to the pool; the
* hyp_unassign should be called before calling this function
*/
void free_buffer_page(struct ion_system_heap *heap,
struct ion_buffer *buffer, struct page *page,
unsigned int order)
{
bool cached = ion_buffer_cached(buffer);
int vmid = get_secure_vmid(buffer->flags);
if (!(buffer->flags & ION_FLAG_POOL_FORCE_ALLOC)) {
struct ion_page_pool *pool;
if (vmid > 0)
pool = heap->secure_pools[vmid][order_to_index(order)];
else if (cached)
pool = heap->cached_pools[order_to_index(order)];
else
pool = heap->uncached_pools[order_to_index(order)];
if (buffer->private_flags & ION_PRIV_FLAG_SHRINKER_FREE)
ion_page_pool_free_immediate(pool, page);
else
ion_page_pool_free(pool, page);
} else {
__free_pages(page, order);
}
}
获得对应的pool然后调用了
void ion_page_pool_free(struct ion_page_pool *pool, struct page *page)
{
int ret;
ret = ion_page_pool_add(pool, page);
if (ret)
ion_page_pool_free_pages(pool, page);
}
这是将page保存到了pool中,但是如果系统内存不够此时需要ion中的heap 将pool存放的page 还给伙伴系统。执行这个回收过程的是shrink函数
static int ion_system_heap_shrink(struct ion_heap *heap, gfp_t gfp_mask,
int nr_to_scan)
{
struct ion_system_heap *sys_heap;
int nr_total = 0;
int i, j, nr_freed = 0;
int only_scan = 0;
struct ion_page_pool *pool;
sys_heap = container_of(heap, struct ion_system_heap, heap);
if (!nr_to_scan)
only_scan = 1;
for (i = 0; i < NUM_ORDERS; i++) {
nr_freed = 0;
for (j = 0; j < VMID_LAST; j++) {
if (is_secure_vmid_valid(j))
nr_freed += ion_secure_page_pool_shrink(
sys_heap, j, i, nr_to_scan);
}
pool = sys_heap->uncached_pools[i];
nr_freed += ion_page_pool_shrink(pool, gfp_mask, nr_to_scan);
pool = sys_heap->cached_pools[i];
nr_freed += ion_page_pool_shrink(pool, gfp_mask, nr_to_scan);
nr_total += nr_freed;
if (!only_scan) {
nr_to_scan -= nr_freed;
/* shrink completed */
if (nr_to_scan <= 0)
break;
}
}
return nr_total;
}
函数页比较简单,除了一些数据统计,最重要的就是调用ion_page_pool_shrink函数,函数里面原理就是从pool中取page,然后调用
static void ion_page_pool_free_pages(struct ion_page_pool *pool,
struct page *page)
{
__free_pages(page, pool->order);
}
__free_pages 函数又是Linux 伙伴系统接口,位于kernel\msm-4.14\mm\page_alloc.c
system heap的 内存映射是在dma-buf 的ops中调用ion_heap_map_user 函数,此函数有个非常重要的参数struct vm_area_struct,它是进程虚拟内存管理的,其中有一些比较重要的变量,理解了这些变量的含义,理解下边的代码就非常简单了,首先看此结构体的定义,代码位于kernel\msm-4.14\include\linux\mm_types.h
/*
* This struct defines a memory VMM memory area. There is one of these
* per VM-area/task. A VM area is any part of the process virtual memory
* space that has a special rule for the page-fault handlers (ie a shared
* library, the executable area etc).
*/
struct vm_area_struct {
/* The first cache line has the info for VMA tree walking. */
unsigned long vm_start; /* Our start address within vm_mm. */
unsigned long vm_end; /* The first byte after our end address
within vm_mm. */
/* linked list of VM areas per task, sorted by address */
struct vm_area_struct *vm_next, *vm_prev;
struct rb_node vm_rb;
/*
* Largest free memory gap in bytes to the left of this VMA.
* Either between this VMA and vma->vm_prev, or between one of the
* VMAs below us in the VMA rbtree and its ->vm_prev. This helps
* get_unmapped_area find a free area of the right size.
*/
unsigned long rb_subtree_gap;
/* Second cache line starts here. */
struct mm_struct *vm_mm; /* The address space we belong to. */
pgprot_t vm_page_prot; /* Access permissions of this VMA. */
unsigned long vm_flags; /* Flags, see mm.h. */
/*
* For areas with an address space and backing store,
* linkage into the address_space->i_mmap interval tree.
*
* For private anonymous mappings, a pointer to a null terminated string
* in the user process containing the name given to the vma, or NULL
* if unnamed.
*/
union {
struct {
struct rb_node rb;
unsigned long rb_subtree_last;
} shared;
const char __user *anon_name;
};
/*
* A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
* list, after a COW of one of the file pages. A MAP_SHARED vma
* can only be in the i_mmap tree. An anonymous MAP_PRIVATE, stack
* or brk vma (with NULL file) can only be in an anon_vma list.
*/
struct list_head anon_vma_chain; /* Serialized by mmap_sem &
* page_table_lock */
struct anon_vma *anon_vma; /* Serialized by page_table_lock */
/* Function pointers to deal with this struct. */
const struct vm_operations_struct *vm_ops;
/* Information about our backing store: */
unsigned long vm_pgoff; /* Offset (within vm_file) in PAGE_SIZE
units */
struct file * vm_file; /* File we map to (can be NULL). */
void * vm_private_data; /* was vm_pte (shared mem) */
atomic_long_t swap_readahead_info;
#ifndef CONFIG_MMU
struct vm_region *vm_region; /* NOMMU mapping region */
#endif
#ifdef CONFIG_NUMA
struct mempolicy *vm_policy; /* NUMA policy for the VMA */
#endif
struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
seqcount_t vm_sequence;
atomic_t vm_ref_count; /* see vma_get(), vma_put() */
#endif
} __randomize_layout;
该结构体体作用可以参考https://linux-kernel-labs.github.io/master/labs/memory_mapping.html 文章, 在用户进程调用mmap函数时候会创建这个结构。它描述的是物理页对应的虚拟内存,它描述的是一段连续的、具有相同访问属性的虚存空间,该虚存空间的大小为物理内存页面的整数倍,结构体中每个成员的含义可以参考文章https://blog.csdn.net/ganggexiongqi/article/details/6746248
vm_start 是在进程中虚拟地址的起始地址。
int ion_heap_map_user(struct ion_heap *heap, struct ion_buffer *buffer,
struct vm_area_struct *vma)
{
struct sg_table *table = buffer->sg_table;
unsigned long addr = vma->vm_start;
unsigned long offset = vma->vm_pgoff * PAGE_SIZE;
struct scatterlist *sg;
int i;
int ret;
for_each_sg(table->sgl, sg, table->nents, i) {
struct page *page = sg_page(sg);
unsigned long remainder = vma->vm_end - addr;
unsigned long len = sg->length;
if (offset >= sg->length) {
offset -= sg->length;
continue;
} else if (offset) {
page += offset / PAGE_SIZE;
len = sg->length - offset;
offset = 0;
}
len = min(len, remainder);
ret = remap_pfn_range(vma, addr, page_to_pfn(page), len,
vma->vm_page_prot);
if (ret)
return ret;
addr += len;
if (addr >= vma->vm_end)
return 0;
}
return 0;
}
回到代码中addr = vma->vm_start 保存了虚拟地址的其实地址,vm_pgoff是该虚存空间起始地址在vm_file文件里面的文件偏移,单位为物理页面。比如现在有64个物理页面,用户在映射的时候使用第5个页面开始映射10个页面,那么这个vm_pgoff应该就是5.for_each_sg 代码主要是将sg散列表中存放的物理页面拿出来进行映射,首先看offset >= sg->length 这句代码,为什么要判断,如果offset 是便宜6个物理页面,当时这个sg只存放了5个物理页面,现在我们正常肯定是在下一个sg中在取一个页面构成,6个页面,所以
下面相关代码就是做这部分功能
if (offset >= sg->length) { 87 offset -= sg->length; 88 continue; 89 } else if (offset) { 90 page += offset / PAGE_SIZE; 91 len = sg->length - offset; 92 offset = 0; 93 }
我们假设下一个sg有三个物理页面,那么我们只需要在这个sg上page +1 就可以。现在offset就是1,在if 执行过程中 offset -= sg->length,这里其实已经6-5了。 len 变量就变成了3 -1 变成了2 个。offfset 因为后面不在需要所以设置为0, 我们需要将这两个进行映射,所以下面调用了linux 内核的remap_pfn_range的函数,此函数网上资料很多。映射到用户函数这里也就执行完成了