2.1.1. recsys.ctr.ObiCTR¶
-
class
recsys.ctr.
ObiCTR
(n_latent_dims=10, n_iters=1, n_samples=4, n_sweeps=1, hypa_alpha=0.5, hypa_beta=0.45, hypa_rat=16.0, hypa_eps=16.0, hypa_user=16.0, evaluate_every=25000, seed=None)¶ Online Bayesian Inference algorithm for Collaborative Topic Regression model (obi-CTR)
- Parameters
n_latent_dims (int) – Number of dimensions of latent vector, which is equal to the number of topics. In 1, this is called \(K\).
n_iters (int) – Number of iterations to go through the whole rating records.
n_samples (int) – Total number of gibbs sampling rounds.
n_sweeps (int) – Number of burn-in sweeps in gibbs sampling process, should be less than n_samples.
hypa_alpha (int or float) – Prior of “topic proportions” dirichlet distribution. In 1, this is called \(\alpha\).
hypa_beta (int or float) – Prior of “topic” dirichlet distribution. In 1, this is called \(\beta\).
hypa_rat (int or float) – Prior of “rating” normal distribution. In 1, this is called \(\sigma_{r}^2\).
hypa_eps (int or float) – Prior of “item latent offset” normal distribution. In 1, this is called \(\sigma_{\epsilon}^2\).
hypa_user (int or float) – Prior of “user latent vector” normal distribution. In 1, this is called \(\sigma_{u}^2\).
evaluate_every (int) – Evaluate metrics (RMSE and average per-word log likelihood) every evaluate_every times.
seed (int) – Random seed for reproducible results across multiple function calls.
- Variables
n_users_ (int) – Number of users. In 1, this is called \(I\).
user_means_ (ndarray, shape = (n_users_, n_latent_dims)) – The collection of the mean vector of each user. In 1, this is called \(\{\mathbf{m}_{ui}\}_{i=1}^I\).
user_covs_ (ndarray, shape = (n_users_, n_latent_dims)) – The collection of the diagonals of covariance matrix of each user. In 1, this is called \(\{\Sigma_{ui}\}_{i=1}^I\).
n_items_ (int) – Number of items. In 1, this is called \(J\).
item_means_ (ndarray, shape = (n_items_, n_latent_dims)) – The collection of the mean vector of each item. In 1, this is called \(\{\mathbf{m}_{vj}\}_{j=1}^J\).
item_covs_ (ndarray, shape = (n_items_, n_latent_dims)) – The collection of the diagonals of covariance matrix of each item. In 1, this is called \(\{\Sigma_{vj}\}_{j=1}^J\).
vocab_size_ (int) – Number of words in the vocabulary. In 1, this is called \(W\) or \(D\).
word_id_dict_ (dict of key-value pair = (item_id: word_ids)) – Word id of each word in each item. In 1, this is called \(\mathbf{W}=\{\mathbf{w}_j\}_{j=1}^J\)
topic_assignment_dict_ (dict of key-value pair = (item_id: topic_assignments)) – Topic assignment of each word in each item. In 1, this is called \(\mathbf{Z}=\{\mathbf{z}_j\}_{j=1}^J\)
item_topic_cnt_mat_ (ndarray, shape = (n_items_, n_latent_dims)) – Number of words in each item that are assigned to topic \(k\). In 1, this is called \(\{\mathbf{C}_j\}_{j=1}^J\).
topic_proportions_ (ndarray, shape = (n_items_, n_latent_dims)) – The collection of the topic proportion of each item. In 1, this is called \(\Theta=\{\boldsymbol{\theta}_j\}_{j=1}^J\)
topic_word_mat_ (ndarray, shape = (n_latent_dims, vocab_size_)) – Variational parameters of each “topic” dirichlet distribution. In 1, this is called \(\{\Delta_k\}_{k=1}^K\)
topics_ (ndarray, shape = (n_latent_dims, vocab_size_)) – The collection of each “topic” dirichlet distribution. In 1, this is called \(\Phi=\{\boldsymbol{\phi}_{k}\}_{k=1}^K\)
References
- 1(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26)
“Collaborative topic regression for online recommender systems: an online and Bayesian approach”, Liu, Chenghao, et al., 2017
Methods
__init__
([n_latent_dims, n_iters, …])Initialize self.
eval_avg_ll
(rating_mat)Evaluate average per-word log likelihood.
eval_rmse
(rating_mat)Evaluate root mean squared error.
fit
(word_cnt_mat, rating_mat)Fit the obi-CTR model.
partial_fit
(rating_mat)Fit the obi-CTR model and the method can only be used after calling
set_vars
orfit
method once.predict_rating
(user_id, item_id)Predict a rating given by the user \(i\) to the item \(j\).
set_vars
(user_means, user_covs, item_means, …)Set all the latent variables in the obi-CTR model manually.
-
__init__
(n_latent_dims=10, n_iters=1, n_samples=4, n_sweeps=1, hypa_alpha=0.5, hypa_beta=0.45, hypa_rat=16.0, hypa_eps=16.0, hypa_user=16.0, evaluate_every=25000, seed=None)¶ Initialize self. See help(type(self)) for accurate signature.
-
eval_avg_ll
(rating_mat)¶ Evaluate average per-word log likelihood.
- Parameters
rating_mat (sparse matrix, shape = (n_users_, n_items_)) – Rating matrix.
- Returns
avg_llikelihood – Average per-word log likelihood.
- Return type
float
-
eval_rmse
(rating_mat)¶ Evaluate root mean squared error.
- Parameters
rating_mat (sparse matrix, shape = (n_users_, n_items)) – Rating matrix.
- Returns
rmse – Root mean squared error.
- Return type
float
-
fit
(word_cnt_mat, rating_mat)¶ Fit the obi-CTR model.
- Parameters
word_cnt_mat (sparse matrix, shape = (n_items_, vocab_size_)) – Word count matrix.
rating_mat (sparse matrix, shape = (n_users_, n_items_)) – Rating matrix.
- Returns
self – self
- Return type
self
-
partial_fit
(rating_mat)¶ Fit the obi-CTR model and the method can only be used after calling
set_vars
orfit
method once.- Parameters
rating_mat (sparse matrix, shape = (n_users_, n_items_)) – Rating matrix.
- Returns
self – self
- Return type
self
-
predict_rating
(user_id, item_id)¶ Predict a rating given by the user \(i\) to the item \(j\).
- Parameters
user_id (int) – User id.
item_id (int) – Item id.
- Returns
pred_r – Predicted rating.
- Return type
float
-
set_vars
(user_means, user_covs, item_means, item_covs, word_id_dict, topic_assignment_dict, topic_word_mat)¶ Set all the latent variables in the obi-CTR model manually.
- Parameters
user_means (ndarray, shape = (n_users_, n_latent_dims)) – The collection of the mean vector of each user. In 1, this is called \(\{\mathbf{m}_{ui}\}_{i=1}^I\).
user_covs (ndarray, shape = (n_users_, n_latent_dims)) – The collection of the diagonals of covariance matrix of each user. In 1, this is called \(\{\Sigma_{ui}\}_{i=1}^I\).
item_means (ndarray, shape = (n_items_, n_latent_dims)) – The collection of the mean vector of each item. In 1, this is called \(\{\mathbf{m}_{vj}\}_{j=1}^J\).
item_covs (ndarray, shape = (n_items_, n_latent_dims)) – The collection of the diagonals of covariance matrix of each item. In 1, this is called \(\{\Sigma_{vj}\}_{j=1}^J\).
word_id_dict (dict of key-value pair = (item_id: word_ids)) – Word id of each word in each item. In 1, this is called \(\mathbf{W}=\{\mathbf{w}_j\}_{j=1}^J\)
topic_assignment_dict (dict of key-value pair = (item_id: topic_assignments)) – Topic assignment of each word in each item. In 1, this is called \(\mathbf{Z}=\{\mathbf{z}_j\}_{j=1}^J\)
topic_word_mat (ndarray, shape = (n_latent_dims, vocab_size_)) – Variational parameters of each “topic” dirichlet distribution. In 1, this is called \(\{\Delta_k\}_{k=1}^K\)
- Returns
self – self
- Return type
self