To implement an algorithm successfully, one needs to understand its structure, mathematical model, and also its strengths and weaknesses. The main types of recommendation systems we’ve overviewed in the previous article, using the same principles as python machine learning algorithms, helping it adapt according to the specific user’s preference easily.
The whole AI and machine learning concept represents an artificial intelligence algorithm class.
It differs that it emphasizes the learning process where previous similar tasks are used to analyze the existing one, not direct problem-solving. To implement these algorithms, various methods and approaches from such branches of mathematics as the probability theory, mathematical statistics, and linear algebra can be used. It helps to solve the tasks more efficiently than common deterministic algorithms, and sometimes even surpass a human.
Every mathematical model underlying the machine learning algorithm has got a difficult function that has enough previous experience to solve some tasks that have set. The point is that the more experience a mathematical model accumulates, the better it can be trained, and the less error value it has.
A mathematical model error in ML is some numerical value obtained by calculating the difference between the known results of the task and that one obtained as a result of the model. In practice, the mean square deviation or its root is usually used to calculate a model error value.
As it’s been mentioned in the previous article, the basics of any recommender system are a matrix made up of the system users and item ratings set for a specific item. The ultimate goal of the system is to fill in all the empty values of the matrix with the rating values based on users’ opinions. Let’s have a look at such a matrix:
Though there’s a wide range of algorithms based on simultaneous filtering users or items, matrix factorization addresses this challenge much more efficiently. To accomplish this, matrix factorization relies on object latent qualities that users liked the most. And on top of that, this algorithm uses the resources of the original machine effectively, thus it can be used as a tool for searching hidden peculiarities in any data sets.
Let’s shape the main task of the matrix factorization. The vector u describes all available users in the system, while the vector p describes items. Then, the matrix R with the dimensions u ? p is the matrix of estimates of all users for all items in the system. What is more, an additional task is to find k of the hidden properties of the items. So, the task is to find two matrices Pu ? k ? Qp ? k, their composition will be approximately the matrix R, as it’s shown in the following example:
Upon completion of the calculation, the raw values in the matrix P will show the hidden properties for a considering user, while the raw values in the matrix Qwill show the hidden properties of a considering item. To forecast the rating for the specific machine learning projects, one needs to calculate the product of the calculated matrix values using the following mathematical expression:
To calculate two matrices for users and items, it’s vital to factor the original matrix, but the mathematical operation of the singular decomposition doesn’t recognize the gaps in the matrix, while the original matrix is sparse. Thus, the task that had been set is about selecting the matrix missing values, that can be easily solved using machine learning technologies.
The most popular machine learning algorithm for calculating the efficient function values is a gradient descent algorithm. This method uses a function gradient to achieve the absolute minimum by moving against the gradient direction at a given speed. It can be possible, since the function gradient shows the fastest growth, and that can be used for minimization.
Applying the gradient descent algorithm to the described task is too random vector initialization for users and objects and minimization of the difference between its product and the known rating values. This difference shows the error or the model cost value, and is calculated using the following formula:
Upon the calculation of the error values, it’s necessary to select the appropriate optimizer direction using the partial derivative vector of the function. To accomplish this, the vectors for users and items described above need to be differentiated using the following mathematical example:
From derivative calculations for user vectors and items, we move to update the coefficients of these vectors. Upon its completion, the algorithm performs the next iteration. The stopping algorithm criterion is a very small determined error number or the epoch number. Updating of the weight numbers in vectors happens using the following mathematical examples:
It’s easy to notice that the model described above isn’t that complicated, which leads to fast overfitting using a big volume of data. It results in complete loss of ability to generalization, which is useless at practice. To avoid it in Machine Learning, there should be some additional limitations to the problem situation in order to prevent retraining and solve the task incorrectly.
After changing the mathematical model function of calculating error value, the formulas for updating the model vectors weight should be amended:
After the changes are made to the function for calculating the error value of a mathematical model, some amendments need to be brought about into the formulas of the updated weight values of the model vectors:
While forecasting user rating or searching for the same items in the described model, properties native to this or that user or an object are neglected. The one that counts is the indexes native to all the objects or users. It’s not always correct, and for a better result, it should be taken into account, too. To address this challenge in machine learning for business, one uses the coefficient of displacement that is set for each vector separately.
These coefficients as the vectors of users and items described earlier can be also trained in the learning process, that’s why it’s necessary to calculate its derivatives.
After the description of all the formulas stated above is made, it becomes easy to implement this algorithm using a programming language Python with a package NumPy. Below you can have a look at the way the optimization function and result calculations for the matrix Mhave been performed:
# Calculating the mean value in the given dataset.
mean = np.mean(M[np.where(M > 0)])
n_users, n_items = M.shape
for i in range(n_users):
for ii in range(n_items):
if data[i, ii] > 0:
yield i, ii, data[i, ii]
# Number of hidden properties.
K = 2
# Vector initialization for users and items.
p = np.random.normal(0.0, 0.1, (n_users, K))
q = np.random.normal(0.0, 0.1, (n_items, K))
# Displacement initialization for users and items.
bu = np.zeros(n_users)
bi = np.zeros(n_items)
for _ in range(20):
for u, i, r in load_dataset(M):
# Calculation error value.
err = r - (mean + bi[i] + bu[u] + p[u] @ q[i].T)
# Updating displacement coefficients.
bi[i] += 0.1 * (err - 0.01 * bi[i])
bu[u] += 0.1 * (err - 0.01 * bu[u])
# Updating the main coefficients.
p[u] += 0.1 * (err * q[i] - 0.01 * p[u])
q[i] += 0.1 * (err * p[u] - 0.01 * q[i])
# Calculating the target matrix after training.
mean + bi + bu[:, np.newaxis] + p @ q.T