The name data sparsity in different applications

General Tech Learning Aids/Tools 2 years ago

0 1 0 0 0 tuteeHUB earn credit +10 pts

5 Star Rating 1 Rating

Posted on 16 Aug 2022, this text provides information on Learning Aids/Tools related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.

Take Quiz To Earn Credits!

Turn Your Knowledge into Earnings.

tuteehub_quiz

Answers (1)

Post Answer
profilepic.png
manpreet Tuteehub forum best answer Best Answer 2 years ago

 

I am recently surveying the techniques or algorithms which handle the data sparsity problems in various fields.

And I find quite similar name "data sparsity" or "sparse data" is used including the recommender system, text mining, information retrieval, statistical language modeling as well as high-dimension data. However, they all carried quite different specific meaning for specific applications. For instance, the large proportion of missing values in user-item matrix is regarded as sparsity. The large proportion of zero value(rather than missing) in instance feature matrix is also called sparsity. Also, the increasing dimension of data will also leading to more sparse data.

Some (not formal) definitions are given in previous works:

  1. In recommendation system, it is defined as inability to find a sufficient quantity of good quality neighbors to aid in the prediction process due to insufficient overlap of ratings between the active user and his neighbors[1].
  2. In high-dimension data, the sampling density ~N1/pN1/p where NN is the sample size and pp is the data dimension can also serve as a sparsity problem.[2]
  3. A quite formal definition of large proportion of zeros exist in feature matrix can be found [3], I also regard this as sparse representation rather than data sparsity.

In short, I am quite clear to understand what sparsity means in each applications. However, I am confused whether such name has a universal explanation or definition particular mathematically. Until now, to achieve the above goal, I attempt to come up with a sparsity measurement which can cover the above ones(But in my own view, the sparse representation which is widely used in text mining etc is different problem.)

[1]:Deepa Anand and Kamal K Bharadwaj. Utilizing various sparsity measures for enhancing accuracy of collaborative recommender systems based on local and global similarities. Expert systems with applications, 38(5):5101–5109, 2011.

[2]:Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Friedman, J., & Tibshirani, R. (2009). The elements of statistical learning (Vol. 2, No. 1). New York: Springer. Page 23.

[3]:Duchi, J., Jordan, M., & McMahan, B. (2013). Estimation, optimization, and parallelism when data is sparse. In Advances in Neural Information Processing Systems (pp. 2832-2840).

No matter what stage you're at in your education or career, TuteeHub will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.