TensorflowSimilarity学习笔记6

樊熠彤
2023-12-01

2021SC@SDUSC

代码位置:similarity/distances.py at master · tensorflow/similarity · GitHub

# Copyright 2021 The TensorFlow Authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Vectorized embedding pairwise distances computation functions"""
from abc import ABC, abstractmethod
from typing import Union, List

import tensorflow as tf

from .types import FloatTensor


class Distance(ABC):
    """
    Note: don't forget to add your distance to the DISTANCES list
    and add alias names in it.
    """
    def __init__(self, name: str, aliases: List[str] = []):
        self.name = name
        self.aliases = aliases

    @abstractmethod
    def call(self, embeddings: FloatTensor) -> FloatTensor:
        """Compute pairwise distances for a given batch.
        Args:
            embeddings: Embeddings to compute the pairwise one.
        Returns:
            FloatTensor: Pairwise distance tensor.
        """

    def __call__(self, embeddings: FloatTensor):
        return self.call(embeddings)

    def __str__(self) -> str:
        return self.name

    def get_config(self):
        return {}


@tf.keras.utils.register_keras_serializable(package="Similarity")
class InnerProductSimilarity(Distance):
    """Compute the pairwise inner product between embeddings.
    The [Inner product](https://en.wikipedia.org/wiki/Inner_product_space) is
    a measure of similarity where the more similar vectors have the largest
    values.
    NOTE! This is not a distance and is likely not what you want to use with
    the built in losses. At the very least this will flip the sign on the
    margin in many of the losses. This is likely meant to be used with custom
    loss functions that expect a similarity instead of a distance.
    """
    def __init__(self):
        "Init Inner product similarity"
        super().__init__('inner_product', ['ip'])

    @tf.function
    def call(self, embeddings: FloatTensor) -> FloatTensor:
        """Compute pairwise similarities for a given batch of embeddings.
        Args:
            embeddings: Embeddings to compute the pairwise one.
        Returns:
            FloatTensor: Pairwise distance tensor.
        """

        tensor = tf.linalg.matmul(embeddings, embeddings, transpose_b=True)
        sims: FloatTensor = tf.reduce_sum(tensor, axis=1, keepdims=True)
        return sims

Keras 损失和指标
在 Keras 中编译模型时,我们为 compile 函数提供所需的损失和指标。 例如:
model.compile(loss='mean_squared_error', optimizer='sgd', metrics='acc')
出于可读性的目的,从现在开始我将专注于损失函数。 然而,所写的大部分内容也适用于指标。
来自 Keras 关于损失的文档:
我们可以传递现有损失函数的名称,也可以传递一个 TensorFlow/Theano 符号函数,该函数为每个数据点返回一个标量并采用以下两个参数:
y_true:真实标签。 TensorFlow/Theano 张量。
y_pred:预测。 与 y_true 形状相同的 TensorFlow/Theano 张量。
因此,如果我们想使用常见的损失函数,例如 MSE 或 Categorical Cross-entropy,我们可以通过传递适当的名称轻松实现。
Keras 的文档中提供了可用损失和指标的列表。

自定义损失函数
当我们需要使用可用的以外的损失函数(或度量)时,我们可以构建我们自己的自定义函数并传递给 model.compile。
例如,构建自定义指标(来自 Keras 的文档):

import keras.backend as K

def mean_pred(y_true, y_pred):
    return K.mean(y_pred)

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy', mean_pred])

 类似资料: