Image-Text-Embedding

授权协议 MIT License
开发语言
所属分类 应用工具、 科研计算工具
软件类型 开源软件
地区 不详
投 递 者 卜阳
操作系统 跨平台
开源组织
适用人群 未知
 软件概览

Dual-Path Convolutional Image-Text Embedding

[Paper] [Slide]

This repository contains the code for our paper Dual-Path Convolutional Image-Text Embedding. Thank you for your kindly attention.

Some News

5 Sep 2021 I love the sentence that 'Define yourself via tell what you are different from others' (exemplar SVM), which also is the spirit of the instance loss.

11 June 2020 People live in the 3D world. We release one new person re-id code Person Re-identification in the 3D Space, which conduct representation learning in the 3D space. You are welcomed to check out it.

30 April 2020 We have won the AICity Challenge 2020 in CVPR 2020, yielding the 1st Place Submission to the retrieval track �� . Check out here.

01 March 2020 We release one new image retrieval dataset, called University-1652, for drone-view target localization and drone navigation �� . It has a similar setting with the person re-ID. You are welcomed to check out it.

What's New: We updated the paper to the second version, adding more illustration about the mechanism of the proposed instance loss.

Install Matconvnet

I have included my Matconvnet in this repo, so you do not need to download it again.You just need to uncomment and modify some lines in gpu_compile.m and run it in Matlab. Try it~ (The code does not support cudnn 6.0. You may just turn off the Enablecudnn or try cudnn5.1)

If you fail in compilation, you may refer to http://www.vlfeat.org/matconvnet/install/

Prepocess Datasets

  1. Extract wrod2vec weights. Follow the instruction in ./word2vector_matlab;

  2. Prepocess the dataset. Follow the instruction in ./dataset. You can choose one dataset to run.Three datasets need different prepocessing. I write the instruction for Flickr30k, MSCOCO and CUHK-PEDES.

  3. Download the model pre-trained on ImageNet. And put the model into './data'.

(bash) wget http://www.vlfeat.org/matconvnet/models/imagenet-resnet-50-dag.mat

Alternatively, you may try VGG16 or VGG19.

You may have a different split with me. (Sorry, this is my fault. I used a random split.) Just for a backup, this is the dictionary archive used in the paper.

Trained Model

You may download the three trained models from GoogleDrive.

Train

  • For Flickr30k, run train_flickr_word2_1_pool.m for Stage I training.

Run train_flickr_word_Rankloss_shift_hard for Stage II training.

  • For MSCOCO, run train_coco_word2_1_pool.m for Stage I training.

Run train_coco_Rankloss_shift_hard.m for Stage II training.

  • For CUHK-PEDES, run train_cuhk_word2_1_pool.m for Stage I training.

Run train_cuhk_word_Rankloss_shift for Stage II training.

Test

Select one model and have fun!

  • For Flickr30k, run test/extract_pic_feature_word2_plus_52.m and to extract the feature from image and text. Note that you need to change the model path in the code.

  • For MSCOCO, run test_coco/extract_pic_feature_word2_plus.m and to extract the feature from image and text. Note that you need to change the model path in the code.

  • For CUHK-PEDES, run test_cuhk/extract_pic_feature_word2_plus_52.m and to extract the feature from image and text. Note that you need to change the model path in the code.

CheckList

  • Get word2vec weight

  • Data Preparation (Flickr30k)

  • Train on Flickr30k

  • Test on Flickr30k

  • Data Preparation (MSCOCO)

  • Train on MSCOCO

  • Test on MSCOCO

  • Data Preparation (CUHK-PEDES)

  • Train on CUHK-PEDES

  • Test on CUHK-PEDES

  • Run the code on another machine

Citation

@article{zheng2017dual,
  title={Dual-Path Convolutional Image-Text Embeddings with Instance Loss},
  author={Zheng, Zhedong and Zheng, Liang and Garrett, Michael and Yang, Yi and Xu, Mingliang and Shen, Yi-Dong},
  journal={ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)},
  doi={10.1145/3383184},
  note={\mbox{doi}:\url{10.1145/3383184}},
  volume={16},
  number={2},
  pages={1--23},
  year={2020},
  publisher={ACM New York, NY, USA}
}
  • 摘要 有源码 可跑通 1 introduction 图文检索很重要 2 related work 属于图文检索中通过提高图像表示改善效果; 注意力机制使用了faster RCNN 关系推理用graph 3 本文方法 分4块介绍 3.1图像表示 使用的另外两篇论文的方法 3.2区域关系推理 以物体特征为节点建graph,以边表示的亲和力高的区域语义关联性强,被关联起来。 计算方式是矩阵乘,更新节点特

  • Video-Text Retrieval: 2020  CVPR MEE Learning a Text-Video Embedding from Imcomplete and Heterogeneous Data 2020  CVPR ViT An Image Is Worth 16X16 Words Transformers for image recognition at scale 202

 相关资料
  • * Added live demo page (url at bottom of page) - Updated! to bring scrollup animation and fixed width list! This project allows you to create images along with descriptive text that is displayed on mo

  • Image-With-Text 能使用户能够无比轻松呈现多个独立样式的文本块图像。你可以控制每个每个文本的颜色、字体、行高和大小。你也可以通过特定的相对于源图像的 X 和 Y 值来定位文本块。 示例代码: <?phprequire '../vendor/autoload.php';// Create image$image = new \NMC\ImageWithText\Image(dirnam

  • 描述 Text 用于显示文本,在 Web 容器中是使用 span 标签实现的,而不是 p 标签。 安装 $ npm install rax-text --save 属性 属性 类型 默认值 必填 描述 支持 numberOfLines Number 1 ✘ 行数 注:基础属性、事件及图片含义见组件概述。 示例 基本用法 import Text from 'rax-text'; function

  • 简介 <text> 是 Weex 内置的组件,用来将文本按照指定的样式渲染出来. WARNING <text> 不支持子组件。 TIP <text> 里直接写文本头尾空白会被过滤,如果需要保留头尾空白字符,暂时只能通过数据绑定的方式,见下面动态文本。 样式 支持 通用样式。 支持 文本样式。 属性 除了动态文本,text组件不支持其他属性。 动态文本 下列代码片段可以实现文字内容和JS变量的绑定。

  • 描述 (Description) html( val )方法获取所有匹配元素的组合文本内容。 结果是一个字符串,其中包含所有匹配元素的组合文本内容。 此方法适用于HTML和XML文档。 语法 (Syntax) 以下是使用此方法的简单语法 - <i>selector</i>.text( ) 参数 (Parameters) 以下是此方法使用的所有参数的说明 - NA 例子 (Example) 以下是

  • 描述 (Description) text( )方法获取所有匹配元素的组合文本内容。 此方法适用于XML和XHTML文档。 语法 (Syntax) 以下是使用此方法的简单语法 - <i>selector</i>.text( ) 参数 (Parameters) 以下是此方法使用的所有参数的说明 - NA 例子 (Example) 下面的例子会发现第一段中的文本剥离出html,然后设置第二段的ht