First, download the tool.  Navigate to https://github.com/arunbg/Topic-Modeling-Tool and click the “clone or download” button.  Select “download zip” and save the zipped file to your desktop.  Unzip the file (in most operating systems, double clicking will do this) and click on the file labeled TopicModelingTool.jar.  If your computer tells you that you cannot open the file, try right-clicking and selecting open.

Next, drop the Orange County History folder into the Topic Modeling Tool-master folder.

In the application, click “select input file or directory” and navigate to the Orange County History folder (we want to import all of the text files).  You can leave “select output dir” alone for the time being.  You can also adjust the number of topics looked for (the default is 10).

Click “learn topics.”

Once the process has stopped running, open the Topic Modeling Tool-master folder and open the output_csv folder that should now be there.  Open Topics_Words to see the topics that the tool extracted.  How would you characterize each of the topics?  What do they tell you about the source?

Now navigate to output_html.  Click on the file called all_topics.html.  The file should open in your web browser.  You can click on each of the topics to see what documents it is referenced in.  What new insights do you glean from the distribution of these documents?

Topic Modeling GUI

Leave a Reply

Your email address will not be published. Required fields are marked *

css.php