Embark on a journey of knowledge! Take the quiz and earn valuable credits.
Take A QuizChallenge yourself and boost your learning! Start the quiz now to earn credits.
Take A QuizUnlock your potential! Begin the quiz, answer questions, and accumulate credits along the way.
Take A QuizPlease log in to access this content. You will be redirected to the login page shortly.
LoginGeneral Tech Bugs & Fixes 2 years ago
Posted on 16 Aug 2022, this text provides information on Bugs & Fixes related to General Tech. Please note that while accuracy is prioritized, the data presented might not be entirely correct or up-to-date. This information is offered for general knowledge and informational purposes only, and should not be considered as a substitute for professional advice.
I have couple of suggestions for your approach.
X : array or sparse (CSR) matrix of shape (n_samples, n_features), or array of shape (n_samples, n_samples)
get_distance()
has to return single value and not a array. Hence, I would suggest you to use some measure for not text features. I have given an example for euclidean distance.Example:
>>> from sklearn.feature_extraction.text import TfidfVectorizer
>>> corpus = [
... 'This is the first document.',
... 'This document is the second document.',
... 'And this is the third one.',
... 'Is this the first document?',
... ]
>>> vectorizer = TfidfVectorizer()
>>> text_list = vectorizer.fit_transform(corpus)
import numpy as np
hashes_list = np.array([[12,12,12],
[12,13,11],
[12,1,16],
[4,8,11]])
from scipy.sparse import hstack
combined_list = hstack((hashes_list,text_list))
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics.pairwise import euclidean_distances
from sklearn.cluster import DBSCAN
n1 = len(vectorizer.get_feature_names())
def get_distance(vec1,vec2):
text_distance = cosine_similarity([vec1[:n1]], [vec2[:n1]])
other_distance = euclidean_distances([vec1[n1:]], [vec2[n1:]])
return (text_distance+other_distance)/2
db = DBSCAN(eps=1, min_samples=3, metric=get_distance ).fit(combined_list.toarray())
No matter what stage you're at in your education or career, TuteeHUB will help you reach the next level that you're aiming for. Simply,Choose a subject/topic and get started in self-paced practice sessions to improve your knowledge and scores.
Please log in to access this content. You will be redirected to the login page shortly.
LoginReady to take your education and career to the next level? Register today and join our growing community of learners and professionals.
manpreet
Best Answer
2 years ago
I'm trying to combine two types of parameters before clustering.
My parameters are Text - represented as sparse matrix, and another array representing other features of my data point.
I've tried to combine the 2 types of parameters into 1 array and passing it as an input to the algo:
Also I've built a custom distance metric which I'm going to use.
But I'm getting an error when trying to pass my input array. The combined array was constructed as following:
Full Error Traceback:
Is this the correct approach for combining text vector with other parameters?