My primary goal for this assignment was to learn Voyant’s functionality, but I also wanted to understand how I might be able to use it in my current work. Over the past year I have supported faculty members at Hostos Community College who are building textbooks with Open Educational Resources (OER). Most of the faculty I have supported need assistance finding appropriate content or a satisfactory amount of content for their texts. They often get through 50-75% of the book on their own, and then need help identifying openly licensed content for gap areas or help identifying chapters that are sparse. I wondered if it would be helpful to faculty to create a corpus for the complete book and each chapter of their book, and then compare to their syllabus or course objectives.
For the sake of this assignment I chose to use a textbook for EDU 105: Social Studies for Young Children, one on which I am currently working. I decided to look at how the text of the syllabus — which included course objectives and weekly topics — compared to the content of the course material the professor wrote. To do so, I created one corpus of content for the textbook — with each section a separate document — and one corpus for the syllabus. For each I copied and pasted content from their respective websites (the books are built in LibGuides and syllabus is in Google Drive) into the “Add Texts” window on the home page. I found it very usable and straightforward.
Here is a view of the textbook corpus with the default dashboard:
Here is a view of the syllabus corpus with the default dashboard:
I primarily looked at the “cirrus” word cloud, term count, and links tools for my review, and asked the following questions:
- What are the most visually prominent words (based on term frequency in the content), and do they give me an idea of what the class covered?
- Textbook: Overall I would say that it did, but I needed to use a combination of tools for the complete picture. While the “cirrus” world cloud gave me a quick visual on the most common words, I could not create two-word phrases, like social studies or early childhood. “Social” and “Studies” were separate, so it wasn’t obvious that the class was about social studies. However, when I looked at the terms tool, I could see “social” and “studies” stacked together. As the second and third most common words. I could also see the link between “social” and “studies” with the link tool. Early Childhood did require that I scroll down through the list to understand that the course was for teachers of younger children, but I understood enough of what the course was about. I would say that the links tool was the best reflection of the class content (see image below).
- Syllabus: Regardless of the tool, the syllabus could have been appropriate for most courses. However, if this syllabus was compared to a syllabus for a course outside of the Education department, it is likely that it would be evident that it was for an education course.
The Summary tool was actually a good complement to the visual tools, because I was able to see the most frequent words (social, studies, students) and distinctive words per document (chapter).
- Did the scope of terms in the textbook content match the syllabus?
- The textbook terms covered all of the identified topics and goals in the syllabus, and was more specific. I found the Terms tool to be most helpful for this comparison, but once I realized that Voyant wasn’t great with the syllabus I actually compared the visualization for the textbook to the syllabus word document outside of Voyant.
- I hoped that comparing these two would show gaps in the content, but none of the tools I used helped with this evaluation.
- Are any chapters more content-rich than others?
- The Reader and Summary tools was superficially helpful for this. For example, the Reader tool showed a colored bar per document, so I could see which documents (in my text these are chapters) had more words. The Summary tool counted the number of words.
- This would not work for pages with a number of links to websites with additional reading, nor would it work for other media, like videos and images, which this textbook included.
I intent to continue exploring the following:
- How can I exclude what I consider to be false results? By this I mean common terms in all documents but distract from what I am reviewing? This is beyond stop words.
- Can I hide and reveal documents to compare subsets, or do I need to create a different corpus for each grouping?
- Would the visualizations change if I flattened multiple documents into one as opposed to multiple documents?
- Is it possible to see the syllabus and the textbook documents in parallel in the same screen?