Data Science / Machine learning / R News / TensorFlow

Simple Matrix Factorization with TensorFlow

by Sara · December 21, 2016

This article is originally published at https://hameddaily.blogspot.com/

This post aims to illustrate use of TensorFlow framework for implementing a simple Matrix Factorization (MF). MF is one of the widely used recommender systems that is especially exploited when we have access to tons of user explicit or implicit feedbacks. For details about matrix factorization and collaborative system refer to this paper.
For the purpose of this post we explore a simple movie recommendation by using the data from MovieLens. The dataset is downloaded from MovieLens 100K Dataset. It contains of 100,000 ratings (1-5) from 943 users on 1682 movies. The u.date represents the rating in sparse format of (userid, movieid, rating, timestamp).

Load the data


import pandas as pd
df = pd.read_csv('u.data', sep='\t', names=['user','item','rate', 'time'])
msk = numpy.random.rand(len(df)) < 0.7
df_train = df[msk]
user_indecies = [x-1 for x in df_train.user.values]
item_indecies = [x-1 for x in df_train.item.values]
rates = df_train.rate.values

We use 70% of the data for training. The rest will be used later for testing and accuracy calculation. The last two lines is for shifting the Ids to 0 (from 1).

Variables

In matrix factorization user rating $r$ is formulated as an inner product of two latent vectors $\vec{u}$ and $\vec{p}$ which are two latent vectors in same space to represent the user interest and movie feature respectively.
\[\hat{r} = \vec{u}^t\vec{p}\] We define these two latent variables in TensorFlow as follows: U that shows latent representation of user interest and P that represents the latent values for items.


feature_len = 10
U = tf.Variable(initial_value=tf.truncated_normal([943,feature_len]), name='users')
P = tf.Variable(initial_value=tf.truncated_normal([feature_len,1682]), name='items')

As mentioned before, all the ratings can be calculated as the product of these two variables


result = tf.matmul(U, P)

result is a 943x1682 matrix where its (i,j) entry shows the rating of user i for item j. We first make this matrix a flat vector and then we lookup (gather) the indices that we have the training data for.


result_flatten = tf.reshape(result, [-1])
R = tf.gather(result_flatten, user_indecies * tf.shape(result)[1] + 
              item_indecies, name='extracting_user_rate')

Cost function

The goal of the matrix factorization is to predict rating of user u for item p. Let call that prediction $\hat{r}$. Then a naive cost function for a single prediction is the absolute difference between user actual rating and our predicted rating: \[\parallel r-\hat{r} \parallel_1 \] As a result, the recommender system cost function is sum of all predication errors for all users and all items: We can add this cost function to TensorFlow computation graph as follows:


diff_op = tf.sub(R, rates, name='trainig_diff')
diff_op_squared = tf.abs(diff_op, name="squared_difference")
base_cost = tf.reduce_sum(diff_op_squared, name="sum_squared_error")

Regularization

Similar to any machine learning modeling we need to add regularization to control model overfitting. The regularization here is based on the latent vectors for items and users: \[ \sum_{i,j\in \Delta} \lambda (\parallel u_i \parallel + \parallel p_j\parallel) \] where $\Delta$ is the set of all (user, item) ratings that is available in training set. In TF we have:


lda = tf.constant(.001, name='lambda')
norm_sums = tf.add(tf.reduce_sum(tf.abs(U, name='user_abs'), name='user_norm'), 
   tf.reduce_sum(tf.abs(T, name='item_abs'), name='item_norm'))
regularizer = tf.mul(norm_sums, lda, 'regularizer')

With regularization, the final cost function is as follows:


cost = tf.add(base_cost, regularizer)

In summary, the above cost function is the representative of the following mathematical formulation: \[ \sum_{i,j\in \Delta} \parallel r_{i,j}-\hat{r_{i,j}} \parallel_1 + \lambda (||u_i|| + ||p_j||) \]

Optimizer

Our goal is to minimize the above cost function. One of the popular ways is by using stochastic gradient decent (SGD). SGD has a hyper parameter called learning rate or step size that determines the scale of walking in negative direction of gradient. We use an exponentially decaying learning rate:


lr = tf.constant(.001, name='learning_rate')
global_step = tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(lr, global_step, 10000, 0.96, staircase=True)
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
training_step = optimizer.minimize(cost, global_step=global_step)

Execution

Now its time to execute the above computation graph. First let's define a session that will run the graph:


sess = tf.Session()

Then we need to initialize all the variables.


init = tf.initialize_all_variables()
sess.run(init)

Finally, we will feed the data to the graph and run it in multiple epochs
for i in xrange(500):
sess.run(training_step)

Testing

Sanity Check

First lets look at some of the rating predictions from the training data. For example let's look at the first entry of the training set and compare the actual rating with rating from our MF prediction:


u, p, r = df[['user', 'item', 'rate']].values[0]
rhat = tf.gather(tf.gather(result, u-1), p-1)
print "rating for user " + str(u) + " for item " + str(p) + 
      " is " + str(r) + " and our prediction is: " + str(sess.run(rhat))

Accuracy

To calculate the accuracy we apply the MF model on the test set and calculate the average prediction error for all pairs of (user, item) in the test set.


df_test = df[~msk]
user_indecies_test = [x-1 for x in df_test.user.values]
item_indecies_test = [x-1 for x in df_test.item.values]
rates_test = df_test.rate.values
R_test = tf.gather(result_flatten, user_indecies_test * tf.shape(result)[1] + item_indecies_test, name='extracting_user_rate_test')
diff_op_test = tf.sub(R_test, rates_test, name='test_diff')
diff_op_squared_test = tf.abs(diff_op, name="squared_difference_test")

cost_test = tf.div(tf.reduce_sum(tf.square(diff_op_squared_test, name="squared_difference_test"), name="sum_squared_error_test"), df_test.shape[0] * 2, name="average_error")
print sess.run(cost_test)

Note 1: The full code can be found here: https://github.com/mamhamed/misc/tree/master/mf_with_tf
Note 2: Please also refer to this ohAI post as some of the above code is inspired from there.

Thanks for visiting r-craft.org
This article is originally published at https://hameddaily.blogspot.com/
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Simple Matrix Factorization with TensorFlow

You may also like...

Categories

Simple Matrix Factorization with TensorFlow

Load the data

Variables

Cost function

Regularization

Optimizer

Execution

Testing

Sanity Check

Accuracy

You may also like...

New interface to validation splits

IEEE Language Rankings 2018

Turn Scroll-Fest into No-Mess with Dashboards & Presentations

Categories