android ion --system heap(个人理解,不确定完全对)

陶博涉
2023-12-01

android 在linux 4.12 内核对ion驱动的api 进行了修改,原来的一部分ioctl命令已经不存在了。

谷歌的ion 我个人觉的还是挺大的,system heap 内存分配的方式,其他的还有使用cma 分配等,不同的分配方式会调用linux不同的接口。这篇文章值只写下自己对system heap 的个人理解。ion相关代码在内核kernel\msm-4.14\drivers\staging\android\ion 路径下无论Android ion 最后调用那种heap 来分配内存。分配的buffer 都是放在linux dma-buf 这个结构中,dma-buf 是linux 中的一个框架,具体代码我并没有仔细去研究,根据ion中的使用来看,每个ion在分配的buffer 会存在dma-buf这个结构中,然后谷歌对这个buffer还有操作函数集ops ,也放到dma-buf中,在使用这个buffer时候实际上是间接调用dma-buf ops 来对这个buffer操作了,然后这个ops 函数在去调用heap 绑定的ops去实现。比如system heap,heap 创建时绑定了alloc。mmap,free,shrink等函数。dma-buf ops会最终调用这些函数。

在ion.c 文件中能够看到dma-buf ops 谷歌的实现

static const struct dma_buf_ops dma_buf_ops = {
	.map_dma_buf = ion_map_dma_buf,
	.unmap_dma_buf = ion_unmap_dma_buf,
	.mmap = ion_mmap,
	.release = ion_dma_buf_release,
	.attach = ion_dma_buf_attach,
	.detach = ion_dma_buf_detatch,
	.begin_cpu_access = ion_dma_buf_begin_cpu_access,
	.end_cpu_access = ion_dma_buf_end_cpu_access,
	.begin_cpu_access_umapped = ion_dma_buf_begin_cpu_access_umapped,
	.end_cpu_access_umapped = ion_dma_buf_end_cpu_access_umapped,
	.begin_cpu_access_partial = ion_dma_buf_begin_cpu_access_partial,
	.end_cpu_access_partial = ion_dma_buf_end_cpu_access_partial,
	.map_atomic = ion_dma_buf_kmap,
	.unmap_atomic = ion_dma_buf_kunmap,
	.map = ion_dma_buf_kmap,
	.unmap = ion_dma_buf_kunmap,
	.vmap = ion_dma_buf_vmap,
	.vunmap = ion_dma_buf_vunmap,
	.get_flags = ion_dma_buf_get_flags,
};

在ion.h 中能够看到heap 必须实现的函数的定义

/**
 * struct ion_heap_ops - ops to operate on a given heap
 * @allocate:		allocate memory
 * @free:		free memory
 * @map_kernel		map memory to the kernel
 * @unmap_kernel	unmap memory to the kernel
 * @map_user		map memory to userspace
 *
 * allocate, phys, and map_user return 0 on success, -errno on error.
 * map_dma and map_kernel return pointer on success, ERR_PTR on
 * error. @free will be called with ION_PRIV_FLAG_SHRINKER_FREE set in
 * the buffer's private_flags when called from a shrinker. In that
 * case, the pages being free'd must be truly free'd back to the
 * system, not put in a page pool or otherwise cached.
 */
struct ion_heap_ops {
	int (*allocate)(struct ion_heap *heap,
			struct ion_buffer *buffer, unsigned long len,
			unsigned long flags);
	void (*free)(struct ion_buffer *buffer);
	void * (*map_kernel)(struct ion_heap *heap, struct ion_buffer *buffer);
	void (*unmap_kernel)(struct ion_heap *heap, struct ion_buffer *buffer);
	int (*map_user)(struct ion_heap *mapper, struct ion_buffer *buffer,
			struct vm_area_struct *vma);
	int (*shrink)(struct ion_heap *heap, gfp_t gfp_mask, int nr_to_scan);
};

 在正式进入到分配内存给ion环节前,有一些概念应该时要了解的,struct sg_table  此结构时linux中保存物理页面散列表的。具体解释建议看蜗窝科技的这篇文章Linux kernel scatterlist API介绍,简单的接受就是此结构保存了物理页面的散列表,system 在分配的时候并不是分配出来的时一个连续的物理页面,可以不连续,只要虚拟地址连续就可以,比如camera申请了12M的buffer,此时从伙伴中拿出来的buffer 可能时多个64K的页面。64k内部时连续的,当时64k页面之间并不是连续的。

伙伴系统: 这个晚上资料很多,概念也比较简单,伙伴系统通过哈希表来管理物理内存。分配的时候根据2的order (几)次方分配对应的物理页面数。

文件描述符fd,ion分配内存后最后返回的是fd,fd通过binder传输到不同的进程,然后在映射成进程的虚拟地址。fd 只能在一个进程内使用,传递到其他进程时时通过Android 的binder 机制,简单概括就是binder首先从要从其他进程分配个fd,然后让当前的进程fd对应的内核的file 结构体和其他进程的fd绑定。

1.内存分配

ion 系统分配内存时在打开设备后调用ioctl函数实现的

case ION_IOC_ALLOC:
	{
		int fd;

		fd = ion_alloc_fd(data.allocation.len,
				  data.allocation.heap_id_mask,
				  data.allocation.flags);
		if (fd < 0)
			return fd;

		data.allocation.fd = fd;

		break;
	}

可以看到调用了ion_alloc_fd函数产生了一个fd,ion_alloc_fd函数有三个参数,第一个参数时分配的buffer长度,第二个时heap的选择,ion中有很多heap类型,本文只将system heap(其他heap 代码看起来比较难),第三个参数时标志位,在分配buffer的时候还有很多属性通过这个标志位来判断,比如分配的是否时camer内存,是否需要安全内存分配。函数ion_alloc_fd 实现如下:

int ion_alloc_fd(size_t len, unsigned int heap_id_mask, unsigned int flags)
{
	int fd;
	struct dma_buf *dmabuf;

	dmabuf = ion_alloc_dmabuf(len, heap_id_mask, flags);
	if (IS_ERR(dmabuf)) {
		return PTR_ERR(dmabuf);
	}

	fd = dma_buf_fd(dmabuf, O_CLOEXEC);
	if (fd < 0)
		dma_buf_put(dmabuf);

	return fd;
}

首先是产生产生了一个dma_buf 然后将这个dma-buf 转换成fd。dma-buf  定义位于kernel\msm-4.14\include\linux\dma-buf.h文章将中,每个变量的含义官方有解释:

/**
 * struct dma_buf - shared buffer object
 * @size: size of the buffer
 * @file: file pointer used for sharing buffers across, and for refcounting.
 * @attachments: list of dma_buf_attachment that denotes all devices attached.
 * @ops: dma_buf_ops associated with this buffer object.
 * @lock: used internally to serialize list manipulation, attach/detach and vmap/unmap
 * @vmapping_counter: used internally to refcnt the vmaps
 * @vmap_ptr: the current vmap ptr if vmapping_counter > 0
 * @exp_name: name of the exporter; useful for debugging.
 * @name: unique name for the buffer
 * @ktime: time (in jiffies) at which the buffer was born
 * @owner: pointer to exporter module; used for refcounting when exporter is a
 *         kernel module.
 * @list_node: node for dma_buf accounting and debugging.
 * @priv: exporter specific private data for this buffer object.
 * @resv: reservation object linked to this dma-buf
 * @poll: for userspace poll support
 * @cb_excl: for userspace poll support
 * @cb_shared: for userspace poll support
 *
 * This represents a shared buffer, created by calling dma_buf_export(). The
 * userspace representation is a normal file descriptor, which can be created by
 * calling dma_buf_fd().
 *
 * Shared dma buffers are reference counted using dma_buf_put() and
 * get_dma_buf().
 *
 * Device DMA access is handled by the separate &struct dma_buf_attachment.
 */
struct dma_buf {
	size_t size;
	struct file *file;
	struct list_head attachments;
	const struct dma_buf_ops *ops;
	struct mutex lock;
	unsigned vmapping_counter;
	void *vmap_ptr;
	const char *exp_name;
	char *name;
	ktime_t ktime;
	struct module *owner;
	struct list_head list_node;
	void *priv;
	struct reservation_object *resv;

	/* poll support */
	wait_queue_head_t poll;

	struct dma_buf_poll_cb_t {
		struct dma_fence_cb cb;
		wait_queue_head_t *poll;

		unsigned long active;
	} cb_excl, cb_shared;

	struct list_head refs;
};

struct file 这个比较重要,这个会涉及将来的fd,实际上fd 是和struct file 连接起来的。 fd可以多个使用同一个struct file 者也是mmap映射fd 时候能够映射为多个虚拟地址的原因。

ion_alloc_dmabuf函数位于kernel\msm-4.14\drivers\staging\android\ion\ion.c 文件中:

struct dma_buf *ion_alloc_dmabuf(size_t len, unsigned int heap_id_mask,
				 unsigned int flags)
{
	struct ion_device *dev = internal_dev;
	struct ion_buffer *buffer = NULL;
	struct ion_heap *heap;
	DEFINE_DMA_BUF_EXPORT_INFO(exp_info);
	struct dma_buf *dmabuf;
	char task_comm[TASK_COMM_LEN];

	pr_debug("%s: len %zu heap_id_mask %u flags %x\n", __func__,
		 len, heap_id_mask, flags);
	/*
	 * traverse the list of heaps available in this system in priority
	 * order.  If the heap type is supported by the client, and matches the
	 * request of the caller allocate from it.  Repeat until allocate has
	 * succeeded or all heaps have been tried
	 */
	len = PAGE_ALIGN(len);

	if (!len)
		return ERR_PTR(-EINVAL);

	down_read(&dev->lock);
	plist_for_each_entry(heap, &dev->heaps, node) {
		/* if the caller didn't specify this heap id */
		if (!((1 << heap->id) & heap_id_mask))
			continue;
		buffer = ion_buffer_create(heap, dev, len, flags);
		if (!IS_ERR(buffer) || PTR_ERR(buffer) == -EINTR)
			break;
	}
	up_read(&dev->lock);

	if (!buffer)
		return ERR_PTR(-ENODEV);

	if (IS_ERR(buffer))
		return ERR_CAST(buffer);

	get_task_comm(task_comm, current->group_leader);

	exp_info.ops = &dma_buf_ops;
	exp_info.size = buffer->size;
	exp_info.flags = O_RDWR;
	exp_info.priv = buffer;
	exp_info.exp_name = kasprintf(GFP_KERNEL, "%s-%s-%d-%s", KBUILD_MODNAME,
				      heap->name, current->tgid, task_comm);

	dmabuf = dma_buf_export(&exp_info);
	if (IS_ERR(dmabuf)) {
		_ion_buffer_destroy(buffer);
		kfree(exp_info.exp_name);
	}

	return dmabuf;
}

PAGE_ALIGN  这个宏长度的页面对齐(向上对齐),分配的buffer的大小假如是5K这里是将转换成8K,因为页面时以4k为大小的,与之对应的还有向下对齐,5k将转换为4k。

plist_for_each_entry 将从所有的heap中查找对应的heap 类型,并执行这个heap对应的分配buffer函数,这里我们假定这个heap时system heap。

在手机中查看system heap相关的信息,在adb shell 进入/sys/kernel/debug/ion/heaps

执行cat system

uncached pool = 349003776 cached pool = 1063071744 secure pool = 0
pool total (uncached + cached + secure) = 1412075520

可以看到system heap中有三个pool ,这三个pool是谷歌设置的三个存放物理页面的池。也可以自己加pool。

找到对应的heap后开始执行ion_buffer_create函数创建ions buffer,定义位于kernel\msm-4.14\drivers\staging\android\ion\ion.h

/**
 * struct ion_buffer - metadata for a particular buffer
 * @ref:		reference count
 * @node:		node in the ion_device buffers tree
 * @dev:		back pointer to the ion_device
 * @heap:		back pointer to the heap the buffer came from
 * @flags:		buffer specific flags
 * @private_flags:	internal buffer specific flags
 * @size:		size of the buffer
 * @priv_virt:		private data to the buffer representable as
 *			a void *
 * @lock:		protects the buffers cnt fields
 * @kmap_cnt:		number of times the buffer is mapped to the kernel
 * @vaddr:		the kernel mapping if kmap_cnt is not zero
 * @sg_table:		the sg table for the buffer if dmap_cnt is not zero
 * @vmas:		list of vma's mapping this buffer
 */
struct ion_buffer {
	union {
		struct rb_node node;
		struct list_head list;
	};
	struct ion_device *dev;
	struct ion_heap *heap;
	unsigned long flags;
	unsigned long private_flags;
	size_t size;
	void *priv_virt;
	/* Protect ion buffer */
	struct mutex lock;
	int kmap_cnt;
	void *vaddr;
	struct sg_table *sg_table;
	struct list_head attachments;
	struct list_head vmas;
};

前面介绍的struct sg_table 就放在ion buffer中,用来保存物理页面散列表。

/* this function should only be called while dev->lock is held */
static struct ion_buffer *ion_buffer_create(struct ion_heap *heap,
					    struct ion_device *dev,
					    unsigned long len,
					    unsigned long flags)
{
	struct ion_buffer *buffer;
	struct sg_table *table;
	int ret;

	buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
	if (!buffer)
		return ERR_PTR(-ENOMEM);

	buffer->heap = heap;
	buffer->flags = flags;

	ret = heap->ops->allocate(heap, buffer, len, flags);

	if (ret) {
		if (!(heap->flags & ION_HEAP_FLAG_DEFER_FREE))
			goto err2;

		if (ret == -EINTR)
			goto err2;

		ion_heap_freelist_drain(heap, 0);
		ret = heap->ops->allocate(heap, buffer, len, flags);
		if (ret)
			goto err2;
	}

	if (buffer->sg_table == NULL) {
		WARN_ONCE(1, "This heap needs to set the sgtable");
		ret = -EINVAL;
		goto err1;
	}

	spin_lock(&heap->stat_lock);
	heap->num_of_buffers++;
	heap->num_of_alloc_bytes += len;
	if (heap->num_of_alloc_bytes > heap->alloc_bytes_wm)
		heap->alloc_bytes_wm = heap->num_of_alloc_bytes;
	spin_unlock(&heap->stat_lock);

	table = buffer->sg_table;
	buffer->dev = dev;
	buffer->size = len;

	buffer->dev = dev;
	buffer->size = len;
	INIT_LIST_HEAD(&buffer->attachments);
	INIT_LIST_HEAD(&buffer->vmas);
	mutex_init(&buffer->lock);

	if (IS_ENABLED(CONFIG_ION_FORCE_DMA_SYNC)) {
		int i;
		struct scatterlist *sg;

		/*
		 * this will set up dma addresses for the sglist -- it is not
		 * technically correct as per the dma api -- a specific
		 * device isn't really taking ownership here.  However, in
		 * practice on our systems the only dma_address space is
		 * physical addresses.
		 */
		for_each_sg(table->sgl, sg, table->nents, i) {
			sg_dma_address(sg) = sg_phys(sg);
			sg_dma_len(sg) = sg->length;
		}
	}

	mutex_lock(&dev->buffer_lock);
	ion_buffer_add(dev, buffer);
	mutex_unlock(&dev->buffer_lock);
	atomic_long_add(len, &heap->total_allocated);
	return buffer;

err1:
	heap->ops->free(buffer);
err2:
	kfree(buffer);
	return ERR_PTR(ret);
}

此函数最主要的是通过ret = heap->ops->allocate(heap, buffer, len, flags);函数调用heap对应的分配函数。其他的代码是一链表和sg_table的赋值。

systeam 的alloc函数位于kernel\msm-4.14\drivers\staging\android\ion\ion_system_heap.c中

static struct ion_heap_ops system_heap_ops = {
	.allocate = ion_system_heap_allocate,
	.free = ion_system_heap_free,
	.map_kernel = ion_heap_map_kernel,
	.unmap_kernel = ion_heap_unmap_kernel,
	.map_user = ion_heap_map_user,
	.shrink = ion_system_heap_shrink,
};

allocate 实现函数是ion_system_heap_allocate 源码如下:

static int ion_system_heap_allocate(struct ion_heap *heap,
				    struct ion_buffer *buffer,
				    unsigned long size,
				    unsigned long flags)
{
	struct ion_system_heap *sys_heap = container_of(heap,
							struct ion_system_heap,
							heap);
	struct sg_table *table;
	struct sg_table table_sync = {0};
	struct scatterlist *sg;
	struct scatterlist *sg_sync;
	int ret = -ENOMEM;
	struct list_head pages;
	struct list_head pages_from_pool;
	struct page_info *info, *tmp_info;
	int i = 0;
	unsigned int nents_sync = 0;
	unsigned long size_remaining = PAGE_ALIGN(size);
	unsigned int max_order = orders[0];
	struct pages_mem data;
	unsigned int sz;
	int vmid = get_secure_vmid(buffer->flags);

	if (size / PAGE_SIZE > totalram_pages / 2)
		return -ENOMEM;

	if (ion_heap_is_system_heap_type(buffer->heap->type) &&
	    is_secure_vmid_valid(vmid)) {
		pr_info("%s: System heap doesn't support secure allocations\n",
			__func__);
		return -EINVAL;
	}

	data.size = 0;
	INIT_LIST_HEAD(&pages);
	INIT_LIST_HEAD(&pages_from_pool);

	while (size_remaining > 0) {
		if (is_secure_vmid_valid(vmid))
			info = alloc_from_pool_preferred(
					sys_heap, buffer, size_remaining,
					max_order);
		else
			info = alloc_largest_available(
					sys_heap, buffer, size_remaining,
					max_order);

		if (IS_ERR(info)) {
			ret = PTR_ERR(info);
			goto err;
		}

		sz = (1 << info->order) * PAGE_SIZE;

		if (info->from_pool) {
			list_add_tail(&info->list, &pages_from_pool);
		} else {
			list_add_tail(&info->list, &pages);
			data.size += sz;
			++nents_sync;
		}
		size_remaining -= sz;
		max_order = info->order;
		i++;
	}

	ret = ion_heap_alloc_pages_mem(&data);

	if (ret)
		goto err;

	table = kzalloc(sizeof(*table), GFP_KERNEL);
	if (!table) {
		ret = -ENOMEM;
		goto err_free_data_pages;
	}

	ret = sg_alloc_table(table, i, GFP_KERNEL);
	if (ret)
		goto err1;

	if (nents_sync) {
		ret = sg_alloc_table(&table_sync, nents_sync, GFP_KERNEL);
		if (ret)
			goto err_free_sg;
	}

	i = 0;
	sg = table->sgl;
	sg_sync = table_sync.sgl;

	/*
	 * We now have two separate lists. One list contains pages from the
	 * pool and the other pages from buddy. We want to merge these
	 * together while preserving the ordering of the pages (higher order
	 * first).
	 */
	do {
		info = list_first_entry_or_null(&pages, struct page_info, list);
		tmp_info = list_first_entry_or_null(&pages_from_pool,
						    struct page_info, list);
		if (info && tmp_info) {
			if (info->order >= tmp_info->order) {
				i = process_info(info, sg, sg_sync, &data, i);
				sg_sync = sg_next(sg_sync);
			} else {
				i = process_info(tmp_info, sg, 0, 0, i);
			}
		} else if (info) {
			i = process_info(info, sg, sg_sync, &data, i);
			sg_sync = sg_next(sg_sync);
		} else if (tmp_info) {
			i = process_info(tmp_info, sg, 0, 0, i);
		}
		sg = sg_next(sg);

	} while (sg);

	if (nents_sync) {
		if (vmid > 0) {
			ret = ion_hyp_assign_sg(&table_sync, &vmid, 1, true);
			if (ret)
				goto err_free_sg2;
		}
	}

	buffer->sg_table = table;
	if (nents_sync)
		sg_free_table(&table_sync);
	ion_heap_free_pages_mem(&data);
	return 0;

err_free_sg2:
	/* We failed to zero buffers. Bypass pool */
	buffer->private_flags |= ION_PRIV_FLAG_SHRINKER_FREE;

	if (vmid > 0)
		ion_hyp_unassign_sg(table, &vmid, 1, true, false);

	for_each_sg(table->sgl, sg, table->nents, i)
		free_buffer_page(sys_heap, buffer, sg_page(sg),
				 get_order(sg->length));
	if (nents_sync)
		sg_free_table(&table_sync);
err_free_sg:
	sg_free_table(table);
err1:
	kfree(table);
err_free_data_pages:
	ion_heap_free_pages_mem(&data);
err:
	list_for_each_entry_safe(info, tmp_info, &pages, list) {
		free_buffer_page(sys_heap, buffer, info->page, info->order);
		kfree(info);
	}
	list_for_each_entry_safe(info, tmp_info, &pages_from_pool, list) {
		free_buffer_page(sys_heap, buffer, info->page, info->order);
		kfree(info);
	}
	return ret;
}

ion_system_heap_allocate 函数比较长,此函数的重点我觉的是 while 这块代码

while (size_remaining > 0) {
		if (is_secure_vmid_valid(vmid))
			info = alloc_from_pool_preferred(
					sys_heap, buffer, size_remaining,
					max_order);
		else
			info = alloc_largest_available(
					sys_heap, buffer, size_remaining,
					max_order);

		if (IS_ERR(info)) {
			ret = PTR_ERR(info);
			goto err;
		}

		sz = (1 << info->order) * PAGE_SIZE;

		if (info->from_pool) {
			list_add_tail(&info->list, &pages_from_pool);
		} else {
			list_add_tail(&info->list, &pages);
			data.size += sz;
			++nents_sync;
		}
		size_remaining -= sz;
		max_order = info->order;
		i++;
	}

	ret = ion_heap_alloc_pages_mem(&data);

size_remaining  还是页对齐的 unsigned long size_remaining = PAGE_ALIGN(size);

整个while函数就是不断的从pool或者伙伴系统中取物理页面,每次取完后size_remaining 减去对应的大小,不断的重复直到最后size_remaining 为0,代表需要的buffer 已经全部取出。刚开始分配buffer的时候pool中是没有buffer进行分配的,是调用linux函数接口从伙伴系统中分配的。

while中根据is_secure_vmid_valid 进行了判断调用了不同的分配函数alloc_from_pool_preferred函数主要是从secure pool 取分配。

static struct page_info *alloc_from_pool_preferred(
		struct ion_system_heap *heap, struct ion_buffer *buffer,
		unsigned long size, unsigned int max_order)
{
	struct page *page;
	struct page_info *info;
	int i;

	if (buffer->flags & ION_FLAG_POOL_FORCE_ALLOC)
		goto force_alloc;

	info = kmalloc(sizeof(*info), GFP_KERNEL);
	if (!info)
		return ERR_PTR(-ENOMEM);

	for (i = 0; i < NUM_ORDERS; i++) {
		if (size < order_to_size(orders[i]))
			continue;
		if (max_order < orders[i])
			continue;

		page = alloc_from_secure_pool_order(heap, buffer, orders[i]);
		if (IS_ERR(page))
			continue;

		info->page = page;
		info->order = orders[i];
		info->from_pool = true;
		INIT_LIST_HEAD(&info->list);
		return info;
	}

	page = split_page_from_secure_pool(heap, buffer);
	if (!IS_ERR(page)) {
		info->page = page;
		info->order = 0;
		info->from_pool = true;
		INIT_LIST_HEAD(&info->list);
		return info;
	}

	kfree(info);
force_alloc:
	return alloc_largest_available(heap, buffer, size, max_order);
}

ION_FLAG_POOL_FORCE_ALLOC 判断了是否调用强制分配,如果强制分配会调用alloc_largest_available函数最后会直接带调用linux 函数从伙伴系统中分配物理页面。关于struct page 这个结构体的介绍可以参考《Linux 物理内存描述》链接

alloc_from_pool_preferred 核心是for循环,这里通过for 寻找合理的物理页面大小取分配,我们知道在伙伴系统是哈希表维护了2 的order次方的物理页面,在所有的pool中页存在这个原理,不过维护的通过数组的方式,通常只有2 的0 次方,和2的4次方。在

kernel\msm-4.14\drivers\staging\android\ion\ion_system_heap.h 中可以看到具体的定义

#ifndef CONFIG_ALLOC_BUFFERS_IN_4K_CHUNKS
#if defined(CONFIG_IOMMU_IO_PGTABLE_ARMV7S)
static const unsigned int orders[] = {8, 4, 0};
#else
static const unsigned int orders[] = {4, 0};
#endif
#else
static const unsigned int orders[] = {0};
#endif

#define NUM_ORDERS ARRAY_SIZE(orders)

根据我的测试目前手机应该是走的 orders[] = {4, 0}; 也就是说申请的物理页面时4k 或者时64k。

回到alloc_from_pool_preferred函数中的for循环,假定时 orders[] = {4, 0}

static inline unsigned int order_to_size(int order)
{
    return PAGE_SIZE << order;
}

PAGE_SIZE 是物理页面大小,一般默认都是4k,armv8是支持物理页面4k,16k,64k。假定系统用的4k,那么开始时候就是2的4次放 乘以16 就是64k。if (size < order_to_size(orders[i])) 这句代码首先判断了要分配的页面大小是否小于64k,如果小于那就不从这个order对应的数组分。因为此order存放的都是连续的64K 的物理页面如果分配的buffer比64k小那么以为着必须拆分才行,物理页面分配都是已经找最合适的大小。所以这里size比order_to_size 小会直接continue 跳过后面们继续从order中找。64k后就是4k页面理论上通过页向上对齐不会有比这个页面还小的了。如果 orders[] 不是4,0 ,设置更多的数16,8,4,for循环会遍历查找,如果最后不是2的 0次方,比如是2 的1次方那么还存在for循环还是找不合适的orders问题,所以会跳出for循环进行也页面分割,从大的物理页面中分出合适的。调用split_page_from_secure_pool函数。

struct page *split_page_from_secure_pool(struct ion_system_heap *heap,
					 struct ion_buffer *buffer)
{
	int i, j;
	struct page *page;
	unsigned int order;

	mutex_lock(&heap->split_page_mutex);

	/*
	 * Someone may have just split a page and returned the unused portion
	 * back to the pool, so try allocating from the pool one more time
	 * before splitting. We want to maintain large pages sizes when
	 * possible.
	 */
	page = alloc_from_secure_pool_order(heap, buffer, 0);
	if (!IS_ERR(page))
		goto got_page;

	for (i = NUM_ORDERS - 2; i >= 0; i--) {
		order = orders[i];
		page = alloc_from_secure_pool_order(heap, buffer, order);
		if (IS_ERR(page))
			continue;

		split_page(page, order);
		break;
	}
	/*
	 * Return the remaining order-0 pages to the pool.
	 * SetPagePrivate flag to mark memory as secure.
	 */
	if (!IS_ERR(page)) {
		for (j = 1; j < (1 << order); j++) {
			SetPagePrivate(page + j);
			free_buffer_page(heap, buffer, page + j, 0);
		}
	}
got_page:
	mutex_unlock(&heap->split_page_mutex);

	return page;
}

page = alloc_from_secure_pool_order(heap, buffer, 0); 从order 数组0 中分配一个页,也就是此时pool中最后的物理页面。这里的设计思想我猜是如果order[0]都无法分配出来就直接报错,下面for 循环应该是像注释说的多次尝试。split_page 位于

kernel\msm-4.14\mm\page_alloc.c  page_alloc.c 存放伙伴系统的核心的接口函数后面还会用里面的分配内存的函数。split_page函数没太看懂内核中的实现。split_page_from_secure_pool 从物理页面分割出来的出来的页面会在最后放到info中

page = split_page_from_secure_pool(heap, buffer);
	if (!IS_ERR(page)) {
		info->page = page;
		info->order = 0;
		info->from_pool = true;
		INIT_LIST_HEAD(&info->list);
		return info;
	}

 

回到alloc_from_pool_preferred函数中继续看alloc_from_secure_pool_order  函数的执行

struct page *alloc_from_secure_pool_order(struct ion_system_heap *heap,
					  struct ion_buffer *buffer,
					  unsigned long order)
{
	int vmid = get_secure_vmid(buffer->flags);
	struct ion_page_pool *pool;

	if (!is_secure_vmid_valid(vmid))
		return ERR_PTR(-EINVAL);

	pool = heap->secure_pools[vmid][order_to_index(order)];
	return ion_page_pool_alloc_pool_only(pool);
}

函数比较简单主要是根据order找到对应的pool,然后调用

/*
 * Tries to allocate from only the specified Pool and returns NULL otherwise
 */
struct page *ion_page_pool_alloc_pool_only(struct ion_page_pool *pool)
{
	struct page *page = NULL;

	if (!pool)
		return ERR_PTR(-EINVAL);

	if (mutex_trylock(&pool->mutex)) {
		if (pool->high_count)
			page = ion_page_pool_remove(pool, true);
		else if (pool->low_count)
			page = ion_page_pool_remove(pool, false);
		mutex_unlock(&pool->mutex);
	}

	if (!page)
		return ERR_PTR(-ENOMEM);
	return page;
}

函数从pool中取page。这里分为高端内存和低端,如果是4G内存空间 那么高端内存是指系统使用的3G-4G空间,这里使用高低内存是在从linux 伙伴系统取时候赋值给pool的。

 

回到ion_system_heap_allocate 的while函数中,如果不是从secure pool分配buffer。那么会调用alloc_largest_available函数

static struct page_info *alloc_largest_available(struct ion_system_heap *heap,
						 struct ion_buffer *buffer,
						 unsigned long size,
						 unsigned int max_order)
{
	struct page *page;
	struct page_info *info;
	int i;
	bool from_pool;

	info = kmalloc(sizeof(*info), GFP_KERNEL);
	if (!info)
		return ERR_PTR(-ENOMEM);

	for (i = 0; i < NUM_ORDERS; i++) {
		if (size < order_to_size(orders[i]))
			continue;
		if (max_order < orders[i])
			continue;
		from_pool = !(buffer->flags & ION_FLAG_POOL_FORCE_ALLOC);
		page = alloc_buffer_page(heap, buffer, orders[i], &from_pool);
		if (IS_ERR(page))
			continue;

		info->page = page;
		info->order = orders[i];
		info->from_pool = from_pool;
		INIT_LIST_HEAD(&info->list);
		return info;
	}
	kfree(info);

	return ERR_PTR(-ENOMEM);
}

这里ION_FLAG_POOL_FORCE_ALLOC也判断了是否需要强制分配如果需要强制分配那么将不会从pool分配。然后调用alloc_buffer_page函数

static struct page *alloc_buffer_page(struct ion_system_heap *heap,
				      struct ion_buffer *buffer,
				      unsigned long order,
				      bool *from_pool)
{
	bool cached = ion_buffer_cached(buffer);
	struct page *page;
	struct ion_page_pool *pool;
	int vmid = get_secure_vmid(buffer->flags);
	struct device *dev = heap->heap.priv;

	if (vmid > 0)
		pool = heap->secure_pools[vmid][order_to_index(order)];
	else if (!cached)
		pool = heap->uncached_pools[order_to_index(order)];
	else
		pool = heap->cached_pools[order_to_index(order)];

	page = ion_page_pool_alloc(pool, from_pool);

	if (IS_ERR(page))
		return page;

	if ((MAKE_ION_ALLOC_DMA_READY && vmid <= 0) || !(*from_pool))
		ion_pages_sync_for_device(dev, page, PAGE_SIZE << order,
					  DMA_BIDIRECTIONAL);

	return page;
}

这里根据从那个pool 中分配获得了pool 然后调用了ion_page_pool_alloc函数同时将pool和是否需要从pool传递下去。

struct page *ion_page_pool_alloc(struct ion_page_pool *pool, bool *from_pool)
{
	struct page *page = NULL;

	BUG_ON(!pool);

	if (fatal_signal_pending(current))
		return ERR_PTR(-EINTR);

	if (*from_pool && mutex_trylock(&pool->mutex)) {
		if (pool->high_count)
			page = ion_page_pool_remove(pool, true);
		else if (pool->low_count)
			page = ion_page_pool_remove(pool, false);
		mutex_unlock(&pool->mutex);
	}
	if (!page) {
		page = ion_page_pool_alloc_pages(pool);
		*from_pool = false;
	}

	if (!page)
		return ERR_PTR(-ENOMEM);
	return page;
}

如果从pool中分配page失败或者不需要从pool分配那么将会调用ion_page_pool_alloc_pages函数。ion_page_pool_alloc_pages实际上是调用了linux 伙伴系统分配接口

static void *ion_page_pool_alloc_pages(struct ion_page_pool *pool)
{
	struct page *page = alloc_pages(pool->gfp_mask, pool->order);

	return page;
}

回到ion_system_heap_allocate函数中的while部分

sz = (1 << info->order) * PAGE_SIZE;

		if (info->from_pool) {
			list_add_tail(&info->list, &pages_from_pool);
		} else {
			list_add_tail(&info->list, &pages);
			data.size += sz;
			++nents_sync;
		}
		size_remaining -= sz;
		max_order = info->order;
		i++;

由于分配出来的page都保存到在info中,根据是否是从pool中分配的会加入到不同的链表中,info中的order 保存的是2的几次方,将它乘以物理页面大小,就会得到这次分配buffer大小,然后用总的减去这次分配出来的(size_remaining -= sz;)在while后面就是将page加入到page表中。

这里第一次使用pool中都是没有page 的都是从linux 伙伴系统中那出来,pool 存放的page 是在释放page 的时候保存到里面的。

回到ion_alloc_fd 函数,在产生dma-buf 后需要根据这个dma-buf产生fd调用

526int dma_buf_fd(struct dma_buf *dmabuf, int flags)
527{
528	int fd;
529
530	if (!dmabuf || !dmabuf->file)
531		return -EINVAL;
532
533	fd = get_unused_fd_flags(flags);
534	if (fd < 0)
535		return fd;
536
537	fd_install(fd, dmabuf->file);
538
539	return fd;

 这里调用了linux 提供的函数 get_unused_fd_flags获得一个fd号,然后将dma-buf 的file 和fd绑定。 

这个struct file 的获取是在前面ion_alloc_dmabuf函数中,最后在获取完成buffer后调用了dma_buf_export函数,这个函数

87	file = anon_inode_getfile(bufname, &dma_buf_fops, dmabuf,
488					exp_info->flags);
489	if (IS_ERR(file)) {
490		ret = PTR_ERR(file);
491		goto err_dmabuf;
492	}
493

可以看到申请file 并且绑定了前面说道的dma_buf_ops 这样实际上通过fd就可以调用dma_buf_ops。

2.内存释放

void ion_system_heap_free(struct ion_buffer *buffer)
{
	struct ion_heap *heap = buffer->heap;
	struct ion_system_heap *sys_heap = container_of(heap,
							struct ion_system_heap,
							heap);
	struct sg_table *table = buffer->sg_table;
	struct scatterlist *sg;
	int i;
	int vmid = get_secure_vmid(buffer->flags);

	if (!(buffer->private_flags & ION_PRIV_FLAG_SHRINKER_FREE) &&
	    !(buffer->flags & ION_FLAG_POOL_FORCE_ALLOC)) {
		if (vmid < 0)
			ion_heap_buffer_zero(buffer);
	} else if (vmid > 0) {
		if (ion_hyp_unassign_sg(table, &vmid, 1, true, false))
			return;
	}

	for_each_sg(table->sgl, sg, table->nents, i)
		free_buffer_page(sys_heap, buffer, sg_page(sg),
				 get_order(sg->length));
	sg_free_table(table);
	kfree(table);
}

此函数前面是一些变量的判断,重点在for_each_sg  将散列表中的物理页调用free_buffer_page 函数释放。

/*
 * For secure pages that need to be freed and not added back to the pool; the
 *  hyp_unassign should be called before calling this function
 */
void free_buffer_page(struct ion_system_heap *heap,
		      struct ion_buffer *buffer, struct page *page,
		      unsigned int order)
{
	bool cached = ion_buffer_cached(buffer);
	int vmid = get_secure_vmid(buffer->flags);

	if (!(buffer->flags & ION_FLAG_POOL_FORCE_ALLOC)) {
		struct ion_page_pool *pool;

		if (vmid > 0)
			pool = heap->secure_pools[vmid][order_to_index(order)];
		else if (cached)
			pool = heap->cached_pools[order_to_index(order)];
		else
			pool = heap->uncached_pools[order_to_index(order)];

		if (buffer->private_flags & ION_PRIV_FLAG_SHRINKER_FREE)
			ion_page_pool_free_immediate(pool, page);
		else
			ion_page_pool_free(pool, page);
	} else {
		__free_pages(page, order);
	}
}

获得对应的pool然后调用了


void ion_page_pool_free(struct ion_page_pool *pool, struct page *page)
{
	int ret;

	ret = ion_page_pool_add(pool, page);
	if (ret)
		ion_page_pool_free_pages(pool, page);
}

这是将page保存到了pool中,但是如果系统内存不够此时需要ion中的heap 将pool存放的page 还给伙伴系统。执行这个回收过程的是shrink函数

static int ion_system_heap_shrink(struct ion_heap *heap, gfp_t gfp_mask,
				 int nr_to_scan)
{
	struct ion_system_heap *sys_heap;
	int nr_total = 0;
	int i, j, nr_freed = 0;
	int only_scan = 0;
	struct ion_page_pool *pool;

	sys_heap = container_of(heap, struct ion_system_heap, heap);

	if (!nr_to_scan)
		only_scan = 1;

	for (i = 0; i < NUM_ORDERS; i++) {
		nr_freed = 0;

		for (j = 0; j < VMID_LAST; j++) {
			if (is_secure_vmid_valid(j))
				nr_freed += ion_secure_page_pool_shrink(
						sys_heap, j, i, nr_to_scan);
		}

		pool = sys_heap->uncached_pools[i];
		nr_freed += ion_page_pool_shrink(pool, gfp_mask, nr_to_scan);

		pool = sys_heap->cached_pools[i];
		nr_freed += ion_page_pool_shrink(pool, gfp_mask, nr_to_scan);
		nr_total += nr_freed;

		if (!only_scan) {
			nr_to_scan -= nr_freed;
			/* shrink completed */
			if (nr_to_scan <= 0)
				break;
		}
	}

	return nr_total;
}

函数页比较简单,除了一些数据统计,最重要的就是调用ion_page_pool_shrink函数,函数里面原理就是从pool中取page,然后调用

static void ion_page_pool_free_pages(struct ion_page_pool *pool,
				     struct page *page)
{
	__free_pages(page, pool->order);
}

__free_pages 函数又是Linux 伙伴系统接口,位于kernel\msm-4.14\mm\page_alloc.c

system heap的 内存映射是在dma-buf 的ops中调用ion_heap_map_user 函数,此函数有个非常重要的参数struct vm_area_struct,它是进程虚拟内存管理的,其中有一些比较重要的变量,理解了这些变量的含义,理解下边的代码就非常简单了,首先看此结构体的定义,代码位于kernel\msm-4.14\include\linux\mm_types.h

/*
 * This struct defines a memory VMM memory area. There is one of these
 * per VM-area/task.  A VM area is any part of the process virtual memory
 * space that has a special rule for the page-fault handlers (ie a shared
 * library, the executable area etc).
 */
struct vm_area_struct {
	/* The first cache line has the info for VMA tree walking. */

	unsigned long vm_start;		/* Our start address within vm_mm. */
	unsigned long vm_end;		/* The first byte after our end address
					   within vm_mm. */

	/* linked list of VM areas per task, sorted by address */
	struct vm_area_struct *vm_next, *vm_prev;

	struct rb_node vm_rb;

	/*
	 * Largest free memory gap in bytes to the left of this VMA.
	 * Either between this VMA and vma->vm_prev, or between one of the
	 * VMAs below us in the VMA rbtree and its ->vm_prev. This helps
	 * get_unmapped_area find a free area of the right size.
	 */
	unsigned long rb_subtree_gap;

	/* Second cache line starts here. */

	struct mm_struct *vm_mm;	/* The address space we belong to. */
	pgprot_t vm_page_prot;		/* Access permissions of this VMA. */
	unsigned long vm_flags;		/* Flags, see mm.h. */

	/*
	 * For areas with an address space and backing store,
	 * linkage into the address_space->i_mmap interval tree.
	 *
	 * For private anonymous mappings, a pointer to a null terminated string
	 * in the user process containing the name given to the vma, or NULL
	 * if unnamed.
	 */
	union {
		struct {
			struct rb_node rb;
			unsigned long rb_subtree_last;
		} shared;
		const char __user *anon_name;
	};

	/*
	 * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
	 * list, after a COW of one of the file pages.	A MAP_SHARED vma
	 * can only be in the i_mmap tree.  An anonymous MAP_PRIVATE, stack
	 * or brk vma (with NULL file) can only be in an anon_vma list.
	 */
	struct list_head anon_vma_chain; /* Serialized by mmap_sem &
					  * page_table_lock */
	struct anon_vma *anon_vma;	/* Serialized by page_table_lock */

	/* Function pointers to deal with this struct. */
	const struct vm_operations_struct *vm_ops;

	/* Information about our backing store: */
	unsigned long vm_pgoff;		/* Offset (within vm_file) in PAGE_SIZE
					   units */
	struct file * vm_file;		/* File we map to (can be NULL). */
	void * vm_private_data;		/* was vm_pte (shared mem) */

	atomic_long_t swap_readahead_info;
#ifndef CONFIG_MMU
	struct vm_region *vm_region;	/* NOMMU mapping region */
#endif
#ifdef CONFIG_NUMA
	struct mempolicy *vm_policy;	/* NUMA policy for the VMA */
#endif
	struct vm_userfaultfd_ctx vm_userfaultfd_ctx;
#ifdef CONFIG_SPECULATIVE_PAGE_FAULT
	seqcount_t vm_sequence;
	atomic_t vm_ref_count;		/* see vma_get(), vma_put() */
#endif
} __randomize_layout;

该结构体体作用可以参考https://linux-kernel-labs.github.io/master/labs/memory_mapping.html 文章, 在用户进程调用mmap函数时候会创建这个结构。它描述的是物理页对应的虚拟内存,它描述的是一段连续的、具有相同访问属性的虚存空间,该虚存空间的大小为物理内存页面的整数倍,结构体中每个成员的含义可以参考文章https://blog.csdn.net/ganggexiongqi/article/details/6746248

vm_start 是在进程中虚拟地址的起始地址。 

int ion_heap_map_user(struct ion_heap *heap, struct ion_buffer *buffer,
              struct vm_area_struct *vma)
{
    struct sg_table *table = buffer->sg_table;
    unsigned long addr = vma->vm_start;
    unsigned long offset = vma->vm_pgoff * PAGE_SIZE;
    struct scatterlist *sg;
    int i;
    int ret;

    for_each_sg(table->sgl, sg, table->nents, i) {
        struct page *page = sg_page(sg);
        unsigned long remainder = vma->vm_end - addr;
        unsigned long len = sg->length;

        if (offset >= sg->length) {
            offset -= sg->length;
            continue;
        } else if (offset) {
            page += offset / PAGE_SIZE;
            len = sg->length - offset;
            offset = 0;
        }
        len = min(len, remainder);
        ret = remap_pfn_range(vma, addr, page_to_pfn(page), len,
                      vma->vm_page_prot);
        if (ret)
            return ret;
        addr += len;
        if (addr >= vma->vm_end)
            return 0;
    }
    return 0;
}

回到代码中addr = vma->vm_start 保存了虚拟地址的其实地址,vm_pgoff是该虚存空间起始地址在vm_file文件里面的文件偏移,单位为物理页面。比如现在有64个物理页面,用户在映射的时候使用第5个页面开始映射10个页面,那么这个vm_pgoff应该就是5.for_each_sg 代码主要是将sg散列表中存放的物理页面拿出来进行映射,首先看offset >= sg->length 这句代码,为什么要判断,如果offset 是便宜6个物理页面,当时这个sg只存放了5个物理页面,现在我们正常肯定是在下一个sg中在取一个页面构成,6个页面,所以

下面相关代码就是做这部分功能

      if (offset >= sg->length) {
87			offset -= sg->length;
88			continue;
89		} else if (offset) {
90			page += offset / PAGE_SIZE;
91			len = sg->length - offset;
92			offset = 0;
93		}

我们假设下一个sg有三个物理页面,那么我们只需要在这个sg上page +1 就可以。现在offset就是1,在if 执行过程中 offset -= sg->length,这里其实已经6-5了。 len 变量就变成了3 -1 变成了2 个。offfset 因为后面不在需要所以设置为0, 我们需要将这两个进行映射,所以下面调用了linux 内核的remap_pfn_range的函数,此函数网上资料很多。映射到用户函数这里也就执行完成了

 类似资料: