Hypothesis-Based Collaborative Filtering
Minimum price
Suggested price

Hypothesis-Based Collaborative Filtering

Retrieving Like-Minded Individuals Based on the Comparison of Hypothesized Preferences

About the Book

The vast product variety and product variation offered by online retailers provide an amazing amount of choice options to individuals, thus posing a big challenge to them finding and choosing interesting products which provide them the most utility. Consequently, consumers have to be satisfied with finding a product that provides them sufficient utility. Beyond that, individuals tend to even defer product choice, which is known as overchoice phenomenon.

Recommender systems have emerged in the past years as an effective method to help individuals with finding interesting products. As a result, the consumer welfare enhanced by $731 million to $1.03 billion in the year 2000 due to the increased product variety of online bookstores. Consumer welfare refers to consumers’ total satisfaction. This enhancement in consumer welfare is 7 to 10 times larger than the consumer welfare gain from increased competition and lower prices in the book market. In other words, recommender systems are essential for increasing consumers welfare, which ultimately leads to an increase of economic and social welfare.

Typically, recommender systems use the collective wisdom of individuals for exposing individuals to products which best fits their preferences, thus maximizing their utility. More precisely, the product ratings of like-minded individuals are considered by the recommender system to provide individuals recommendations. Commonly, like-minded individuals are retrieved by comparing their ratings for common rated products. This filtering technology is commonly referred to as collaborative filtering.

However, retrieving like-minded individuals based on their ratings for common rated products may be inappropriate because common rated products may not necessarily be a representative sample of two individuals’ preferences being compared. We show why and when this is the case.

In this dissertation, we present hypothesis-based collaborative filtering (HCF) to expose individuals to products which best fits their preferences. HCF retrieves like-minded individuals based on the similarity of their hypothesized preferences by means of machine learning algorithms hypothesizing individuals’ preferences. Machine learning is a method to extract patterns to generalize from observations, thus being adequate to hypothesize individuals’ preferences from their product ratings. We present two different frameworks which retrieve like-minded individuals comparing the composition of hypothesized preferences and the predicted utilities individuals receive from products. Furthermore, we provide empirical evidence about the superiority of HCF to baseline collaborative filtering methods.

About the Author

Amancio Bouza
Amancio Bouza

Amancio has received his PhD for his thesis on recommender systems, machine learning, and Semantic Web. He has several years of experience in tech startups, IT companies, and companies across different industries as Enterpreneur, Product Manager, Product Owner, Technical Lead, and Software Engineer

Table of Contents



I Setting the Scene

1 Introduction

1.1 Motivation and Thesis

1.2 Hypothesis-Based Collaborative Filtering in a Nutshell

1.3 Thesis Statement

1.3.1 Research Hypotheses

1.3.2 Research Goals

1.4 Contributions

1.5 Organization

2 Related Work

2.1 Recommender Systems

2.1.1 Formal Framework

2.1.2 Ratings

2.2 Collaborative Filtering

2.2.1 General Framework for Collaborative Filtering

2.2.2 Cold-Start Problem

2.3 Machine Learning

II Preference Modeling

3 Conceptualization and Specification of Preferences

3.1 Formalization of Preferences

3.1.1 PartialPreferences

3.2 Partial Preference Extraction from Machine Learning Models

3.2.1 Partial Preference Extraction from Decision Tree Classifier

3.2.2 Partial Preference Extraction from Naïve Bayesian Classifier

3.3 Ontological Specification of Hypothesized Preferences

3.4 Acceptance of Hypotheses

3.5 Summary

4 Domain Ontology-Boosted Decision Tree Induction

4.1 Decision Tree Induction

4.1.1 Feature Selection

4.2 SEMTREE Extension to the Decision Tree Model

4.2.1 Basic Idea

4.2.2 Injecting Concept Features to Generalize from Features

4.2.3 Classification

4.2.4 Implementation

4.3 Acceptance of Hypotheses

4.4 Summary

III Preference Similarity

5 Hypothesized Preference Similarity

5.1 Theoretical Foundation of Hypothesized Preference Similarity

5.1.1 Hypothesized Partial Preference Similarity

5.1.2 Hypothesized Semi-Partial Preference Similarity

5.2 Hypothesized Utility-Based Preference Similarity

5.2.1 Product Set for Utility Prediction

5.2.2 Correlative Predicted Utility-Based Similarity

5.2.3 Probabilistic Predicted Utility-Based Similarity

5.2.4 Probabilistic Predicted Utility-Based Semi-Partial Similarity

5.3 Hypothesis Composition-Based Preference Similarity

5.3.1 Similarity of Hypothesized Partial Preferences

5.3.2 Similarity Computation Based on Partial Preference Similarity Matrix

5.4 Summary

IV Evaluation

6 Evaluation

6.1 Experimental Setting

6.1.1 Performance Metrics

6.2 Candidates for Comparison

6.2.1 Hypothesis-Based Collaborative Filtering Candidates

6.2.2 Baseline Collaborative Filtering Candidates

6.2.3 Baseline Content Filtering Candidates

6.3 Dataset

6.4 Results and Discussion

6.4.1 Rating Prediction Accuracy

6.4.2 Relevance Filtering Quality

6.5 Information Theoretic Reflection of Hypothesized Preferences versus Product Ratings

6.6 Acceptance of Hypotheses

6.7 Summary

7 Analysis

7.1 Method

7.1.1 Grounded Theory

7.1.2 Data Collection

7.1.3 Data Analysis

7.2 Theory Development

7.2.1 TheoryConcepts

7.2.2 Comparison of Recommendation Performance

7.3 Theory Consolidation

7.4 Theory Validation

7.4.1 Experimental Setting

7.4.2 Results and Discussion

7.5 Acceptance of Hypotheses

V Closing

8 Limitations

8.1 Conceptual Limitations

8.2 Technical Limitations

9 Conclusions

9.1 Acceptance of Hypotheses

9.2 Achievements of Research Goals and Thesis

9.3 Opportunities for Future Research

VI Appendix

A Tools



A.2.1 Architecture

A.3 MOLookup

A.4 LiMo Database

A.4.1 Interlinking Movies across Web Pages

B Movie Ontology MO

C MovieLens Dataset

C.1 Genres of MovieLens

C.2 Sparse MovieLens Dataset

D Distribution of Recommendation Performance

E Comparison Between Properties and Recommendation Performance

F Comparison Between Recomm. Perform. regarding Cold-Start Behavior

G Publications


Curriculum Vitae

Authors have earned$9,542,965writing, publishing and selling on Leanpub, earning 80% royalties while saving up to 25 million pounds of CO2 and up to 46,000 trees.

Learn more about writing on Leanpub

The Leanpub 45-day 100% Happiness Guarantee

Within 45 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers), EPUB (for phones and tablets) and MOBI (for Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses! Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks. Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. It really is that easy.

Learn more about writing on Leanpub