Back to Browse

Latent Semantic Analysis with Apache Spark (5/5)

5.6K views
May 16, 2019
6:51

In this video, we begin looking at a new, larger dataset, the 20 newsgroups dataset. In order to work with this larger dataset, we move the analysis pipeline to Apache Spark using the Scala programming language. The video introduces a new type of NLP-specific preprocessing, lemmatization. We also discuss key differences between performing NLP in Scikit-Learn and Apache Spark. This video introduces the core concepts in Natural Language Processing and the Unsupervised Learning technique, Latent Semantic Analysis. The purposes and benefits of the technique are discussed. In particular, the video highlights how the technique can aid understanding of latent, or hidden, aspects of a body of documents, in addition to reducing the dimensionality of the original dataset. Download the notebook here: https://files.training.databricks.com/classes/lsa-videos/LatentSemanticAnalysisTwoPoems.dbc Don't have a Databricks Account? Sign up for Community Edition: https://databricks.com/try-databricks Install the Stanford Core NLP Package with the Maven Coordinate: databricks:spark-corenlp:0.4.0-spark2.4-scala2.11 This is Part 5 of our Introduction to Latent Semantic Analysis Series: https://www.youtube.com/playlist?list=PLroeQp1c-t3qwyrsq66tBxfR6iX6kSslt Learn more at Databricks Academy! https://databricksacademy.com

Download

0 formats

No download links available.

Latent Semantic Analysis with Apache Spark (5/5) | NatokHD