fastai学习:08_collab Questionnaire

施令秋
2023-12-01

1.What problem does collaborative filtering solve?
One very common problem to solve is when you have a number of users and a number of products, and you want to recommend which products are most likely to be useful for which users. There are many variations of this: for example, recommending movies (such as on Netflix), figuring out what to highlight for a user on a home page, deciding what stories to show in a social media feed, and so forth.
2. How does it solve it?
There is a general solution to this problem, called collaborative filtering, which works like this: look at what products the current user has used or liked, find other users that have used or liked similar products, and then recommend other products that those users have used or liked.
3.Why might a collaborative filtering predictive model fail to be a very useful recommendation system?
数据量不够
4.What does a crosstab representation of collaborative filtering data look like?
一个大的矩阵,用户和目标分别是横坐标和纵坐标,评分是值
5.Write the code to create a crosstab representation of the MovieLens data (you might need to do some web searching!).
先构建m*n的零矩阵,其中m为用户数量,n为电影数量,再根据读取的评分设置对应的值
6.What is a latent factor? Why is it “latent”?
没有明确提供给模型的关键因素,而是模型学习到的,如在电影评分预测中,模型的结果中可以发现的除了评分外电影的共性
7.What is a dot product? Calculate a dot product manually using pure Python with lists.
点积
a = range[0,5]
b = range[5,10]
dot_a_b = sum([i[0]*i[1] for i in zip(a,b)])
8.What does pandas.DataFrame.merge do?
合并矩阵
9.What is an embedding matrix?
Embedding: Multiplying by a one-hot-encoded matrix, using the computational shortcut that it can be implemented by simply indexing directly. This is quite a fancy word for a very simple concept. The thing that you multiply the one-hot-encoded matrix by (or, using the computational shortcut, index into directly) is called the embedding matrix.
10.What is the relationship between an embedding and a matrix of one-hot-encoded vectors?
嵌入是一种独热编码矩阵
11.Why do we need Embedding if we could use one-hot-encoded vectors for the same thing?
嵌入的计算效率更高
12.What does an embedding contain before we start training (assuming we’re not using a pretained model)?
为训练钱是随机初始化的
13.Create a class (without peeking, if possible!) and use it.

class Example:
		def __init__(self, a): self.a = a
		def say(self,x): return f'Hello {self.a}, {x}.

14.What does x[:,0] return?
第一列
15.Rewrite the DotProduct class (without peeking, if possible!) and train a model with it.

class DotProduct(Module):
    def __init__(self, n_users, n_movies, n_factors):`在这里插入代码片`
        self.user_factors = Embedding(n_users, n_factors)
        self.movie_factors = Embedding(n_movies, n_factors)
        
    def forward(self, x):
        users = self.user_factors(x[:,0])
        movies = self.movie_factors(x[:,1])
        return (users * movies).sum(dim=1)

16.What is a good loss function to use for MovieLens? Why?
MSe
17.What would happen if we used cross-entropy loss with MovieLens? How would we need to change the model?
需要将预测结果改为1-5分
18.What is the use of bias in a dot product model?
消除某些用户偏好带来额偏见
19.What is another name for weight decay?
L2 regularization
20.Write the equation for weight decay (without peeking!).

loss_wd = loss+wd*(parameters**2).sum()

21.Write the equation for the gradient of weight decay. Why does it help reduce weights?

   parameter.grad = 2*wd*parameters
能够更好地泛化,防止过拟合

22.Why does reducing weights lead to better generalization?
可以使得损失更加平滑
23. What does argsort do in PyTorch?
排序获得索引
24.Does sorting the movie biases give the same result as averaging overall movie ratings by movie? Why/why not?
不同,可能考虑到影片类型或者是演员等隐含因素
25.How do you print the names and details of the layers in a model?
learn.model
26.What is the “bootstrapping problem” in collaborative filtering?
没有搜集到足够多的初始信息,该如何处理的问题
27.How could you deal with the bootstrapping problem for new users? For new movies?
可以对提出问题,选择平均值,或者是特定值
28.How can feedback loops impact collaborative filtering systems?
有偏见用户或者是评论较多的用户的偏好可能会对系统造成影响
29.When using a neural network in collaborative filtering, why can we have different numbers of factors for movies and users?
使用了的是embedding metrics
30.Why is there an nn.Sequential in the CollabNN model?
可以将多层耦合到一起
31.What kind of model should we use if we want to add metadata about users and items, or information such as date and time, to a collaborative filtering model?
a tabular model,

 类似资料: