2.1.1. recsys.ctr.ObiCTR

class recsys.ctr.ObiCTR(n_latent_dims=10, n_iters=1, n_samples=4, n_sweeps=1, hypa_alpha=0.5, hypa_beta=0.45, hypa_rat=16.0, hypa_eps=16.0, hypa_user=16.0, evaluate_every=25000, seed=None)

Online Bayesian Inference algorithm for Collaborative Topic Regression model (obi-CTR)

Parameters
  • n_latent_dims (int) – Number of dimensions of latent vector, which is equal to the number of topics. In 1, this is called \(K\).

  • n_iters (int) – Number of iterations to go through the whole rating records.

  • n_samples (int) – Total number of gibbs sampling rounds.

  • n_sweeps (int) – Number of burn-in sweeps in gibbs sampling process, should be less than n_samples.

  • hypa_alpha (int or float) – Prior of “topic proportions” dirichlet distribution. In 1, this is called \(\alpha\).

  • hypa_beta (int or float) – Prior of “topic” dirichlet distribution. In 1, this is called \(\beta\).

  • hypa_rat (int or float) – Prior of “rating” normal distribution. In 1, this is called \(\sigma_{r}^2\).

  • hypa_eps (int or float) – Prior of “item latent offset” normal distribution. In 1, this is called \(\sigma_{\epsilon}^2\).

  • hypa_user (int or float) – Prior of “user latent vector” normal distribution. In 1, this is called \(\sigma_{u}^2\).

  • evaluate_every (int) – Evaluate metrics (RMSE and average per-word log likelihood) every evaluate_every times.

  • seed (int) – Random seed for reproducible results across multiple function calls.

Variables
  • n_users_ (int) – Number of users. In 1, this is called \(I\).

  • user_means_ (ndarray, shape = (n_users_, n_latent_dims)) – The collection of the mean vector of each user. In 1, this is called \(\{\mathbf{m}_{ui}\}_{i=1}^I\).

  • user_covs_ (ndarray, shape = (n_users_, n_latent_dims)) – The collection of the diagonals of covariance matrix of each user. In 1, this is called \(\{\Sigma_{ui}\}_{i=1}^I\).

  • n_items_ (int) – Number of items. In 1, this is called \(J\).

  • item_means_ (ndarray, shape = (n_items_, n_latent_dims)) – The collection of the mean vector of each item. In 1, this is called \(\{\mathbf{m}_{vj}\}_{j=1}^J\).

  • item_covs_ (ndarray, shape = (n_items_, n_latent_dims)) – The collection of the diagonals of covariance matrix of each item. In 1, this is called \(\{\Sigma_{vj}\}_{j=1}^J\).

  • vocab_size_ (int) – Number of words in the vocabulary. In 1, this is called \(W\) or \(D\).

  • word_id_dict_ (dict of key-value pair = (item_id: word_ids)) – Word id of each word in each item. In 1, this is called \(\mathbf{W}=\{\mathbf{w}_j\}_{j=1}^J\)

  • topic_assignment_dict_ (dict of key-value pair = (item_id: topic_assignments)) – Topic assignment of each word in each item. In 1, this is called \(\mathbf{Z}=\{\mathbf{z}_j\}_{j=1}^J\)

  • item_topic_cnt_mat_ (ndarray, shape = (n_items_, n_latent_dims)) – Number of words in each item that are assigned to topic \(k\). In 1, this is called \(\{\mathbf{C}_j\}_{j=1}^J\).

  • topic_proportions_ (ndarray, shape = (n_items_, n_latent_dims)) – The collection of the topic proportion of each item. In 1, this is called \(\Theta=\{\boldsymbol{\theta}_j\}_{j=1}^J\)

  • topic_word_mat_ (ndarray, shape = (n_latent_dims, vocab_size_)) – Variational parameters of each “topic” dirichlet distribution. In 1, this is called \(\{\Delta_k\}_{k=1}^K\)

  • topics_ (ndarray, shape = (n_latent_dims, vocab_size_)) – The collection of each “topic” dirichlet distribution. In 1, this is called \(\Phi=\{\boldsymbol{\phi}_{k}\}_{k=1}^K\)

References

1(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26)

“Collaborative topic regression for online recommender systems: an online and Bayesian approach”, Liu, Chenghao, et al., 2017

Methods

__init__([n_latent_dims, n_iters, …])

Initialize self.

eval_avg_ll(rating_mat)

Evaluate average per-word log likelihood.

eval_rmse(rating_mat)

Evaluate root mean squared error.

fit(word_cnt_mat, rating_mat)

Fit the obi-CTR model.

partial_fit(rating_mat)

Fit the obi-CTR model and the method can only be used after calling set_vars or fit method once.

predict_rating(user_id, item_id)

Predict a rating given by the user \(i\) to the item \(j\).

set_vars(user_means, user_covs, item_means, …)

Set all the latent variables in the obi-CTR model manually.

__init__(n_latent_dims=10, n_iters=1, n_samples=4, n_sweeps=1, hypa_alpha=0.5, hypa_beta=0.45, hypa_rat=16.0, hypa_eps=16.0, hypa_user=16.0, evaluate_every=25000, seed=None)

Initialize self. See help(type(self)) for accurate signature.

eval_avg_ll(rating_mat)

Evaluate average per-word log likelihood.

Parameters

rating_mat (sparse matrix, shape = (n_users_, n_items_)) – Rating matrix.

Returns

avg_llikelihood – Average per-word log likelihood.

Return type

float

eval_rmse(rating_mat)

Evaluate root mean squared error.

Parameters

rating_mat (sparse matrix, shape = (n_users_, n_items)) – Rating matrix.

Returns

rmse – Root mean squared error.

Return type

float

fit(word_cnt_mat, rating_mat)

Fit the obi-CTR model.

Parameters
  • word_cnt_mat (sparse matrix, shape = (n_items_, vocab_size_)) – Word count matrix.

  • rating_mat (sparse matrix, shape = (n_users_, n_items_)) – Rating matrix.

Returns

self – self

Return type

self

partial_fit(rating_mat)

Fit the obi-CTR model and the method can only be used after calling set_vars or fit method once.

Parameters

rating_mat (sparse matrix, shape = (n_users_, n_items_)) – Rating matrix.

Returns

self – self

Return type

self

predict_rating(user_id, item_id)

Predict a rating given by the user \(i\) to the item \(j\).

Parameters
  • user_id (int) – User id.

  • item_id (int) – Item id.

Returns

pred_r – Predicted rating.

Return type

float

set_vars(user_means, user_covs, item_means, item_covs, word_id_dict, topic_assignment_dict, topic_word_mat)

Set all the latent variables in the obi-CTR model manually.

Parameters
  • user_means (ndarray, shape = (n_users_, n_latent_dims)) – The collection of the mean vector of each user. In 1, this is called \(\{\mathbf{m}_{ui}\}_{i=1}^I\).

  • user_covs (ndarray, shape = (n_users_, n_latent_dims)) – The collection of the diagonals of covariance matrix of each user. In 1, this is called \(\{\Sigma_{ui}\}_{i=1}^I\).

  • item_means (ndarray, shape = (n_items_, n_latent_dims)) – The collection of the mean vector of each item. In 1, this is called \(\{\mathbf{m}_{vj}\}_{j=1}^J\).

  • item_covs (ndarray, shape = (n_items_, n_latent_dims)) – The collection of the diagonals of covariance matrix of each item. In 1, this is called \(\{\Sigma_{vj}\}_{j=1}^J\).

  • word_id_dict (dict of key-value pair = (item_id: word_ids)) – Word id of each word in each item. In 1, this is called \(\mathbf{W}=\{\mathbf{w}_j\}_{j=1}^J\)

  • topic_assignment_dict (dict of key-value pair = (item_id: topic_assignments)) – Topic assignment of each word in each item. In 1, this is called \(\mathbf{Z}=\{\mathbf{z}_j\}_{j=1}^J\)

  • topic_word_mat (ndarray, shape = (n_latent_dims, vocab_size_)) – Variational parameters of each “topic” dirichlet distribution. In 1, this is called \(\{\Delta_k\}_{k=1}^K\)

Returns

self – self

Return type

self