fastai学习:08_collab Questionnaire


1.What problem does collaborative filtering solve?
One very common problem to solve is when you have a number of users and a number of products, and you want to recommend which products are most likely to be useful for which users. There are many variations of this: for example, recommending movies (such as on Netflix), figuring out what to highlight for a user on a home page, deciding what stories to show in a social media feed, and so forth.
2. How does it solve it?
There is a general solution to this problem, called collaborative filtering, which works like this: look at what products the current user has used or liked, find other users that have used or liked similar products, and then recommend other products that those users have used or liked.
3.Why might a collaborative filtering predictive model fail to be a very useful recommendation system?
4.What does a crosstab representation of collaborative filtering data look like?
5.Write the code to create a crosstab representation of the MovieLens data (you might need to do some web searching!).
6.What is a latent factor? Why is it “latent”?
7.What is a dot product? Calculate a dot product manually using pure Python with lists.
a = range[0,5]
b = range[5,10]
dot_a_b = sum([i[0]*i[1] for i in zip(a,b)])
8.What does pandas.DataFrame.merge do?
9.What is an embedding matrix?
Embedding: Multiplying by a one-hot-encoded matrix, using the computational shortcut that it can be implemented by simply indexing directly. This is quite a fancy word for a very simple concept. The thing that you multiply the one-hot-encoded matrix by (or, using the computational shortcut, index into directly) is called the embedding matrix.
10.What is the relationship between an embedding and a matrix of one-hot-encoded vectors?
11.Why do we need Embedding if we could use one-hot-encoded vectors for the same thing?
12.What does an embedding contain before we start training (assuming we’re not using a pretained model)?
13.Create a class (without peeking, if possible!) and use it.

class Example:
		def __init__(self, a): self.a = a
		def say(self,x): return f'Hello {self.a}, {x}.

14.What does x[:,0] return?
15.Rewrite the DotProduct class (without peeking, if possible!) and train a model with it.

class DotProduct(Module):
    def __init__(self, n_users, n_movies, n_factors):`在这里插入代码片`
        self.user_factors = Embedding(n_users, n_factors)
        self.movie_factors = Embedding(n_movies, n_factors)
    def forward(self, x):
        users = self.user_factors(x[:,0])
        movies = self.movie_factors(x[:,1])
        return (users * movies).sum(dim=1)

16.What is a good loss function to use for MovieLens? Why?
17.What would happen if we used cross-entropy loss with MovieLens? How would we need to change the model?
18.What is the use of bias in a dot product model?
19.What is another name for weight decay?
L2 regularization
20.Write the equation for weight decay (without peeking!).

loss_wd = loss+wd*(parameters**2).sum()

21.Write the equation for the gradient of weight decay. Why does it help reduce weights?

   parameter.grad = 2*wd*parameters

22.Why does reducing weights lead to better generalization?
23. What does argsort do in PyTorch?
24.Does sorting the movie biases give the same result as averaging overall movie ratings by movie? Why/why not?
25.How do you print the names and details of the layers in a model?
26.What is the “bootstrapping problem” in collaborative filtering?
27.How could you deal with the bootstrapping problem for new users? For new movies?
28.How can feedback loops impact collaborative filtering systems?
29.When using a neural network in collaborative filtering, why can we have different numbers of factors for movies and users?
使用了的是embedding metrics
30.Why is there an nn.Sequential in the CollabNN model?
31.What kind of model should we use if we want to add metadata about users and items, or information such as date and time, to a collaborative filtering model?
a tabular model,
