banner



Which Of The Following Is Included When Reconstituting And Drawing A Drug For Injection?

Introduction

The thought of creating machines which learn by themselves has been driving humans for decades now. For fulfilling that dream, unsupervised learning and clustering is the fundamental. Unsupervised learning provides more flexibility, simply is more challenging as well.

Clustering plays an important part to draw insights from unlabeled data. It classifies the data in similar groups which improves various business decisions by providing a meta understanding.

In this skill test, we tested our community on clustering techniques.  A full of 1566 people registered in this skill examination. If you missed taking the test, hither is your opportunity for you to find out how many questions you could have answered correctly.

If you are only getting started with Unsupervised Learning, here are some comprehensive resources to aid y'all in your journeying:

  • Machine Learning Certification Course for Beginners
  • The About Comprehensive Guide to K-Means Clustering You'll Ever Need

  • Certified AI & ML Blackbelt+ Plan

Overall Results

Beneath is the distribution of scores, this will assist you evaluate your performance:

Y'all can access your performance here. More than 390 people participated in the skill exam and the highest score was 33. Here are a few statistics nearly the distribution.

Overall distribution

Mean Score: fifteen.eleven

Median Score: fifteen

Style Score: 16

Helpful Resources

An Introduction to Clustering and different methods of clustering

Getting your clustering correct (Role I)

Getting your clustering right (Function II)

Questions & Answers

Q1. Movie Recommendation systems are an case of:

  1. Nomenclature
  2. Clustering
  3. Reinforcement Learning
  4. Regression

Options:

B. A. two Only

C. 1 and 2

D. ane and iii

E. 2 and 3

F. 1, 2 and 3

H. one, 2, 3 and four

Q2. Sentiment Analysis is an example of:

  1. Regression
  2. Classification
  3. Clustering
  4. Reinforcement Learning

Options:

A. 1 Merely

B. ane and two

C. i and iii

D. ane, 2 and 3

E. ane, ii and four

F. ane, 2, 3 and 4

Q3. Can conclusion trees be used for performing clustering?

A. True

B. False

Q4. Which of the following is the nearly advisable strategy for data cleaning before performing clustering analysis, given less than desirable number of information points:

  1. Capping and flouring of variables
  2. Removal of outliers

Options:

A. 1 but

B. 2 simply

C. 1 and 2

D. None of the in a higher place

Q5. What is the minimum no. of variables/ features required to perform clustering?

A. 0

B. 1

C. ii

D. 3

Q6. For two runs of Thou-Hateful clustering is information technology expected to get same clustering results?

A. Yes

B. No

Solution: (B)

Yard-Means clustering algorithm instead converses on local minima which might also correspond to the global minima in some cases but not always. Therefore, information technology'south advised to run the K-Means algorithm multiple times before drawing inferences almost the clusters.

Withal, notation that it'south possible to receive aforementioned clustering results from K-means past setting the same seed value for each run. But that is done past simply making the algorithm cull the set of same random no. for each run.

Q7. Is information technology possible that Consignment of observations to clusters does non modify between successive iterations in K-Means

A. Yes

B. No

C. Can't say

D. None of these

Solution: (A)

When the Thou-Means algorithm has reached the local or global minima, it will non alter the consignment of data points to clusters for 2 successive iterations.

Q8. Which of the following can act as possible termination conditions in K-Means?

  1. For a fixed number of iterations.
  2. Assignment of observations to clusters does not alter betwixt iterations. Except for cases with a bad local minimum.
  3. Centroids do not change between successive iterations.
  4. Terminate when RSS falls below a threshold.

Options:

A. ane, iii and iv

B. 1, 2 and 3

C. 1, 2 and 4

D. All of the in a higher place

Solution: (D)

All four conditions tin be used as possible termination status in K-Ways clustering:

  1. This condition limits the runtime of the clustering algorithm, simply in some cases the quality of the clustering will be poor because of an insufficient number of iterations.
  2. Except for cases with a bad local minimum, this produces a good clustering, but runtimes may exist unacceptably long.
  3. This too ensures that the algorithm has converged at the minima.
  4. End when RSS falls below a threshold. This criterion ensures that the clustering is of a desired quality later on termination. Practically, it'southward a skillful practice to combine it with a leap on the number of iterations to guarantee termination.

Q9. Which of the following clustering algorithms suffers from the problem of convergence at local optima?

  1. K- Ways clustering algorithm
  2. Agglomerative clustering algorithm
  3. Expectation-Maximization clustering algorithm
  4. Diverse clustering algorithm

Options:

A. 1 merely

B. 2 and 3

C. 2 and iv

D. 1 and three

Due east. 1,2 and 4

F. All of the above

Solution: (D)

Out of the options given, but K-Means clustering algorithm and EM clustering algorithm has the drawback of converging at local minima.

Q10. Which of the following algorithm is well-nigh sensitive to outliers?

A. K-ways clustering algorithm

B. Yard-medians clustering algorithm

C. Thousand-modes clustering algorithm

D. K-medoids clustering algorithm

Solution: (A)

Out of all the options, K-Ways clustering algorithm is most sensitive to outliers as information technology uses the hateful of cluster information points to find the cluster middle.

Q11. Later on performing K-Means Clustering analysis on a dataset, you observed the following dendrogram. Which of the post-obit determination can be drawn from the dendrogram?

A. There were 28 information points in clustering analysis

B. The all-time no. of clusters for the analyzed information points is four

C. The proximity part used is Average-link clustering

D. The above dendrogram interpretation is non possible for Thousand-Ways clustering analysis

Solution: (D)

A dendrogram is non possible for Chiliad-Means clustering analysis. However, one can create a cluster gram based on K-Means clustering analysis.

Q12. How tin can Clustering (Unsupervised Learning) be used to meliorate the accuracy of Linear Regression model (Supervised Learning):

  1. Creating different models for different cluster groups.
  2. Creating an input feature for cluster ids as an ordinal variable.
  3. Creating an input feature for cluster centroids as a continuous variable.
  4. Creating an input characteristic for cluster size as a continuous variable.

Options:

A. ane only

B. i and 2

C. i and 4

D. three only

E. 2 and four

F. All of the above

Solution: (F)

Creating an input feature for cluster ids as ordinal variable or creating an input feature for cluster centroids as a continuous variable might not convey any relevant data to the regression model for multidimensional data. But for clustering in a unmarried dimension, all of the given methods are expected to convey meaningful data to the regression model. For example, to cluster people in 2 groups based on their hair length, storing clustering ID every bit ordinal variable and cluster centroids as continuous variables volition convey meaningful information.

Q13. What could be the possible reason(s) for producing two unlike dendrograms using agglomerative clustering algorithm for the aforementioned dataset?

A. Proximity part used

B. of data points used

C. of variables used

D. B and c only

East. All of the above

Solution: (E)

Modify in either of Proximity function, no. of data points or no. of variables will lead to different clustering results and hence different dendrograms.

Q14. In the figure below, if you draw a horizontal line on y-axis for y=2. What will be the number of clusters formed?

A. one

B. 2

C. 3

D. 4

Solution: (B)

Since the number of vertical lines intersecting the reddish horizontal line at y=2 in the dendrogram are ii, therefore, ii clusters volition be formed.

Q15. What is the virtually appropriate no. of clusters for the information points represented by the following dendrogram:

A. ii

B. 4

C. 6

D. 8

Solution: (B)

The decision of the no. of clusters that tin best depict different groups can be called by observing the dendrogram. The best choice of the no. of clusters is the no. of vertical lines in the dendrogram cutting by a horizontal line that can transverse the maximum distance vertically without intersecting a cluster.

In the above example, the all-time choice of no. of clusters will exist 4 as the cerise horizontal line in the dendrogram below covers maximum vertical altitude AB.

Q16. In which of the following cases will Thou-Ways clustering fail to give good results?

  1. Data points with outliers
  2. Data points with different densities
  3. Data points with round shapes
  4. Data points with non-convex shapes

Options:

A. one and 2

B. 2 and 3

C. 2 and 4

D. 1, 2 and iv

East. i, ii, iii and 4

Solution: (D)

One thousand-Means clustering algorithm fails to give good results when the information contains outliers, the density spread of information points beyond the data space is different and the data points follow not-convex shapes.

Q17. Which of the following metrics, do nosotros have for finding dissimilarity between two clusters in hierarchical clustering?

  1. Single-link
  2. Consummate-link
  3. Average-link

Options:

A. one and 2

B. 1 and iii

C. ii and 3

D. 1, 2 and 3

Solution: (D)

All of the iii methods i.e. single link, complete link and boilerplate link can exist used for finding dissimilarity between 2 clusters in hierarchical clustering.

Q18. Which of the following are true?

  1. Clustering analysis is negatively affected by multicollinearity of features
  2. Clustering analysis is negatively affected by heteroscedasticity

Options:

A. i only

B. ii simply

C. 1 and two

D. None of them

Solution: (A)

Clustering assay is not negatively afflicted past heteroscedasticity but the results are negatively impacted by multicollinearity of features/ variables used in clustering as the correlated feature/ variable will carry extra weight on the altitude calculation than desired.

Q19. Given, half-dozen points with the post-obit attributes:

Which of the following clustering representations and dendrogram depicts the use of MIN or Unmarried link proximity office in hierarchical clustering:

A.

B.

C.

D.

Solution: (A)

For the single link or MIN version of hierarchical clustering, the proximity of two clusters is defined to be the minimum of the distance between any two points in the dissimilar clusters. For instance, from the tabular array, nosotros see that the distance between points 3 and six is 0.11, and that is the height at which they are joined into one cluster in the dendrogram. Every bit another example, the distance betwixt clusters {3, 6} and {2, v} is given by dist({3, 6}, {2, 5}) = min(dist(3, two), dist(6, ii), dist(iii, 5), dist(6, 5)) = min(0.1483, 0.2540, 0.2843, 0.3921) = 0.1483.

Q20 Given, six points with the post-obit attributes:

Which of the following clustering representations and dendrogram depicts the utilize of MAX or Complete link proximity function in hierarchical clustering:

A.

B.

C.

D.

Solution: (B)

For the unmarried link or MAX version of hierarchical clustering, the proximity of two clusters is defined to be the maximum of the distance betwixt whatsoever two points in the unlike clusters. Similarly, here points 3 and half-dozen are merged starting time. However, {three, 6} is merged with {four}, instead of {2, 5}. This is because the dist({3, half-dozen}, {iv}) = max(dist(3, 4), dist(6, 4)) = max(0.1513, 0.2216) = 0.2216, which is smaller than dist({3, 6}, {2, five}) = max(dist(iii, 2), dist(vi, 2), dist(3, 5), dist(6, five)) = max(0.1483, 0.2540, 0.2843, 0.3921) = 0.3921 and dist({3, 6}, {1}) = max(dist(3, ane), dist(half-dozen, 1)) = max(0.2218, 0.2347) = 0.2347.

Q21 Given, six points with the following attributes:

Which of the following clustering representations and dendrogram depicts the use of Grouping average proximity function in hierarchical clustering:

A.

B.
C.

D.

Solution: (C)

For the grouping average version of hierarchical clustering, the proximity of 2 clusters is defined to be the average of the pairwise proximities betwixt all pairs of points in the dissimilar clusters. This is an intermediate approach between MIN and MAX. This is expressed past the following equation:

Here, the distance betwixt some clusters. dist({3, 6, 4}, {i}) = (0.2218 + 0.3688 + 0.2347)/(3 ∗ 1) = 0.2751. dist({two, 5}, {1}) = (0.2357 + 0.3421)/(2 ∗ 1) = 0.2889. dist({3, 6, 4}, {2, 5}) = (0.1483 + 0.2843 + 0.2540 + 0.3921 + 0.2042 + 0.2932)/(half dozen∗1) = 0.2637. Because dist({iii, 6, 4}, {2, v}) is smaller than dist({3, 6, four}, {1}) and dist({2, 5}, {1}), these 2 clusters are merged at the fourth stage

Q22. Given, six points with the following attributes:

Which of the following clustering representations and dendrogram depicts the use of Ward's method proximity function in hierarchical clustering:

A.

B.

C.

D.

Solution: (D)

Ward method is a centroid method. Centroid method calculates the proximity between 2 clusters by computing the altitude betwixt the centroids of clusters. For Ward'southward method, the proximity between two clusters is divers as the increase in the squared error that results when two clusters are merged. The results of applying Ward'due south method to the sample data gear up of half-dozen points. The resulting clustering is somewhat different from those produced by MIN, MAX, and grouping average.

Q23. What should be the best choice of no. of clusters based on the following results:

A. 1

B. 2

C. iii

D. four

Solution: (C)

The silhouette coefficient is a measure of how similar an object is to its own cluster compared to other clusters. Number of clusters for which silhouette coefficient is highest represents the best choice of the number of clusters.

Q24. Which of the following is/are valid iterative strategy for treating missing values before clustering analysis?

A. Imputation with mean

B. Nearest Neighbor assignment

C. Imputation with Expectation Maximization algorithm

D. All of the to a higher place

Solution: (C)

All of the mentioned techniques are valid for treating missing values earlier clustering assay merely only imputation with EM algorithm is iterative in its functioning.

Q25. K-Mean algorithm  has some limitations. One of the limitation information technology has is, it makes hard assignments(A point either completely belongs to a cluster or not belongs at all) of points to clusters.

Note: Soft consignment tin can be consider equally the probability of being assigned to each cluster: say K = three and for some point xn, p1 = 0.7, p2 = 0.2, p3 = 0.1)

Which of the following algorithm(s) allows soft assignments?

  1. Gaussian mixture models
  2. Fuzzy M-ways

Options:

A. 1 just

B. two only

C. 1 and 2

D. None of these

Solution: (C)

Both, Gaussian mixture models and Fuzzy Thou-means allows soft assignments.

Q26. Assume, yous want to cluster 7 observations into 3 clusters using One thousand-Means clustering algorithm. After first iteration clusters, C1, C2, C3 has following observations:

C1: {(2,2), (4,4), (6,6)}

C2: {(0,four), (4,0)}

C3: {(5,five), (9,9)}

What will be the cluster centroids if you want to proceed for second iteration?

A. C1: (4,4), C2: (2,2), C3: (7,vii)

B. C1: (6,half dozen), C2: (four,4), C3: (9,9)

C. C1: (2,two), C2: (0,0), C3: (5,5)

D. None of these

Solution: (A)

Finding centroid for data points in cluster C1 = ((2+4+half-dozen)/3, (2+4+6)/three) = (4, 4)

Finding centroid for data points in cluster C2 = ((0+4)/2, (4+0)/2) = (2, 2)

Finding centroid for data points in cluster C3 = ((5+9)/2, (5+nine)/2) = (7, seven)

Hence, C1: (4,iv),  C2: (2,two), C3: (vii,7)

Q27. Assume, y'all want to cluster 7 observations into 3 clusters using Yard-Means clustering algorithm. After showtime iteration clusters, C1, C2, C3 has following observations:

C1: {(2,ii), (4,4), (six,six)}

C2: {(0,iv), (iv,0)}

C3: {(5,five), (9,ix)}

What will be the Manhattan altitude for ascertainment (9, 9) from cluster centroid C1. In second iteration.

A. 10

B. 5*sqrt(two)

C. thirteen*sqrt(2)

D. None of these

Solution: (A)

Manhattan distance betwixt centroid C1 i.east. (4, 4) and (9, ix) = (nine-four) + (9-4) = 10

Q28. If two variables V1 and V2, are used for clustering. Which of the post-obit are truthful for K means clustering with k =iii?

  1. If V1 and V2 has a correlation of 1, the cluster centroids volition exist in a direct line
  2. If V1 and V2 has a correlation of 0, the cluster centroids will be in straight line

Options:

A. one only

B. 2 only

C. 1 and 2

D. None of the to a higher place

Solution: (A)

If the correlation between the variables V1 and V2 is 1, and then all the information points will be in a straight line. Hence, all the three cluster centroids will grade a straight line as well.

Q29. Feature scaling is an of import pace before applying K-Mean algorithm. What is reason behind this?

A. In distance adding it volition give the aforementioned weights for all features

B. You always get the same clusters. If you employ or don't use feature scaling

C. In Manhattan altitude it is an of import footstep just in Euclidian information technology is non

D. None of these

Solution; (A)

Feature scaling ensures that all the features get same weight in the clustering assay. Consider a scenario of clustering people based on their weights (in KG) with range 55-110 and height (in inches) with range 5.6 to 6.4. In this case, the clusters produced without scaling tin can be very misleading as the range of weight is much higher than that of summit. Therefore, its necessary to bring them to aforementioned scale and then that they take equal weightage on the clustering effect.

Q30. Which of the following method is used for finding optimal of cluster in K-Mean algorithm?

A. Elbow method

B. Manhattan method

C. Ecludian mehthod

D. All of the above

Due east. None of these

Solution: (A)

Out of the given options, just elbow method is used  for finding the optimal number of clusters. The elbow method looks at the pct of variance explained as a part of the number of clusters: 1 should cull a number of clusters and so that calculation another cluster doesn't requite much better modeling of the information.

Q31. What is true about K-Mean Clustering?

  1. Yard-means is extremely sensitive to cluster centre initializations
  2. Bad initialization can atomic number 82 to Poor convergence speed
  3. Bad initialization can lead to bad overall clustering

Options:

A. 1 and iii

B. 1 and 2

C. two and 3

D. 1, 2 and 3

Solution: (D)

All three of the given statements are true. 1000-means is extremely sensitive to cluster center initialization. Also, bad initialization can atomic number 82 to Poor convergence speed as well every bit bad overall clustering.

Q32. Which of the following can exist applied to go expert results for K-ways algorithm corresponding to global minima?

  1. Attempt to run algorithm for different centroid initialization
  2. Adjust number of iterations
  3. Detect out the optimal number of clusters

Options:

A. 2 and 3

B. 1 and 3

C. i and two

D. All of above

Solution: (D)

All of these are standard practices that are used in gild to obtain skillful clustering results.

Q33. What should be the best choice for number of clusters based on the following results:

A. 5

B. 6

C. 14

D. Greater than fourteen

Solution: (B)

Based on the above results, the all-time selection of number of clusters using elbow method is 6.

Q34. What should be the best choice for number of clusters based on the following results:

A. 2

B. four

C. 6

D. viii

Solution: (C)

Generally, a college average silhouette coefficient indicates ameliorate clustering quality. In this plot, the optimal clustering number of grid cells in the study surface area should exist two, at which the value of the average silhouette coefficient is highest. Withal, the SSE of this clustering solution (m = 2) is too large. At k = half dozen, the SSE is much lower. In improver, the value of the average silhouette coefficient at yard = 6 is also very high, which is only lower than g = 2. Thus, the all-time choice is k = 6.

Q35. Which of the post-obit sequences is correct for a 1000-Ways algorithm using Forgy method of initialization?

  1. Specify the number of clusters
  2. Assign cluster centroids randomly
  3. Assign each data signal to the nearest cluster centroid
  4. Re-assign each indicate to nearest cluster centroids
  5. Re-compute cluster centroids

Options:

A. 1, 2, three, 5, 4

B. 1, 3, two, 4, 5

C. ii, 1, 3, iv, 5

D. None of these

Solution: (A)

The methods used for initialization in K means are Forgy and Random Division. The Forgy method randomly chooses one thousand observations from the data fix and uses these every bit the initial means. The Random Partitioning method start randomly assigns a cluster to each observation and then gain to the update step, thus computing the initial hateful to be the centroid of the cluster'south randomly assigned points.

Q36. If you lot are using Multinomial mixture models with the expectation-maximization algorithm for clustering a prepare of data points into two clusters, which of the assumptions are important:

A. All the information points follow two Gaussian distribution

B. All the data points follow n Gaussian distribution (n >two)

C. All the data points follow 2 multinomial distribution

D. All the data points follow n multinomial distribution (due north >two)

Solution: (C)

In EM algorithm for clustering its essential to choose the same no. of clusters to classify the data points into as the no. of dissimilar distributions they are expected to be generated from and as well the distributions must be of the aforementioned type.

Q37. Which of the following is/are not true almost Centroid based K-Means clustering algorithm and Distribution based expectation-maximization clustering algorithm:

  1. Both starts with random initializations
  2. Both are iterative algorithms
  3. Both have stiff assumptions that the data points must fulfill
  4. Both are sensitive to outliers
  5. Expectation maximization algorithm is a special case of K-Means
  6. Both requires prior knowledge of the no. of desired clusters
  7. The results produced by both are non-reproducible.

Options:

A. ane simply

B. v just

C. 1 and 3

D. 6 and vii

Due east. 4, 6 and 7

F. None of the above

Solution: (B)

All of the above statements are true except the 5th every bit instead Thou-Means is a special case of EM algorithm in which merely the centroids of the cluster distributions are calculated at each iteration.

Q38. Which of the following is/are not truthful about DBSCAN clustering algorithm:

  1. For data points to exist in a cluster, they must be in a distance threshold to a core point
  2. Information technology has stiff assumptions for the distribution of data points in dataspace
  3. Information technology has substantially high fourth dimension complexity of order O(northward3)
  4. It does non require prior knowledge of the no. of desired clusters
  5. It is robust to outliers

Options:

A. one only

B. two only

C. 4 only

D. 2 and 3

E. 1 and 5

F. ane, 3 and five

Solution: (D)

  • DBSCAN tin form a cluster of whatever arbitrary shape and does not have strong assumptions for the distribution of data points in the dataspace.
  • DBSCAN has a low fourth dimension complexity of order O(n log n) only.

Q39. Which of the following are the loftier and low bounds for the existence of F-Score?

A. [0,i]

B. (0,1)

C. [-1,ane]

D. None of the above

Solution: (A)

The everyman and highest possible values of F score are 0 and ane with 1 representing that every data point is assigned to the correct cluster and 0 representing that the precession and/ or call back of the clustering analysis are both 0. In clustering analysis, high value of F score is desired.

Q40. Following are the results observed for clustering 6000 data points into 3 clusters: A, B and C:

What is the F1-Score with respect to cluster B?

A. 3

B. iv

C. 5

D. six

Solution: (D)

Hither,

True Positive, TP = 1200

True Negative, TN = 600 + 1600 = 2200

Imitation Positive, FP = 1000 + 200 = 1200

Simulated Negative, FN = 400 + 400 = 800

Therefore,

Precision = TP / (TP + FP) = 0.five

Call up = TP / (TP + FN) = 0.half dozen

Hence,

F1 = 2 * (Precision * Recall)/ (Precision + recall) = 0.54 ~ 0.5

Cease Notes

I hope yous enjoyed taking the test and plant the solutions helpful. The exam focused on conceptual also every bit practical noesis of clustering fundamentals and its diverse techniques.

I tried to clear all your doubts through this article, but if nosotros take missed out on something so let us know in comments below. Also, If you take any suggestions or improvements you retrieve we should brand in the next skilltest, you can let u.s. know by dropping your feedback in the comments section.

Learn, compete, hack and get hired!

Source: https://www.analyticsvidhya.com/blog/2017/02/test-data-scientist-clustering/

Posted by: solisviturts.blogspot.com

0 Response to "Which Of The Following Is Included When Reconstituting And Drawing A Drug For Injection?"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel