Back to Browse

A Trivial Implementation of LSA using Scikit Learn (2/5)

26.2K views
May 16, 2019
5:46

This video introduces the steps in a full LSA Pipeline and shows how they can be implemented in Databricks Runtime for Machine Learning using the open-source libraries Scikit-Learn and Pandas. These steps are: - Import Raw Data - Build a Document-Term Matrix - Perform a Singular Value Decomposition on the Document-Term Matrix - Examine the generated Topic-Encoded Data This video uses a trivial list of strings as the body of documents so that you can compare your own intuition to the results of the LSA. After completing the process, we examine two byproducts of the LSA—the dictionary and the encoding matrix—in order to gain an understanding of how the documents are being encoded in topic space. This video introduces the core concepts in Natural Language Processing and the Unsupervised Learning technique, Latent Semantic Analysis. The purposes and benefits of the technique are discussed. In particular, the video highlights how the technique can aid understanding of latent, or hidden, aspects of a body of documents, in addition to reducing the dimensionality of the original dataset. Download the notebook here: https://files.training.databricks.com/classes/lsa-videos/LatentSemanticAnalysisTwoPoems.dbc Don't have a Databricks Account? Sign up for Community Edition: https://databricks.com/try-databricks This is Part 2 of our Introduction to Latent Semantic Analysis Series: https://www.youtube.com/playlist?list=PLroeQp1c-t3qwyrsq66tBxfR6iX6kSslt Learn more at Databricks Academy! https://databricksacademy.com

Download

0 formats

No download links available.

A Trivial Implementation of LSA using Scikit Learn (2/5) | NatokHD