Reading optophones helped blind readers to translate type into audible tones. Operators learned over time to distinguish patterns of tones as words or phrases. Here’s a video, by Tiffany Chan, demonstrating the scanning and sonification side of the process that is a precursor to optical character recognition (OCR), which converts page images into machine-readable text).
Most media histories stress the invention of optophones in the 1910s without attending to their operation, maintenance, and development throughout the first half of the 20th century. As a result, they tend to ignore research by blind operators and demonstrators such as Mary Jameson, who made significant contributions to the ongoing work of character recognition and output decoding. Prototyping the optophone highlights the historical significance of Jameson’s labor in ways that archival materials and existing scholarship do not.
The MLab remade the reading optophone to foreground Mary Jameson’s contributions to media and computing history. The lab’s prototype includes a laser-cut frame and runs Python, OpenCV, and Tesseract on a Raspberry Pi (RPi) with a camera attached. The Reading Optophone is the third volume in the lab’s “Kits for Cultural History” series.
Below are an official description of the Optophone Kit, acknowledgments, and a README that Tiffany wrote, with instructions and Python scripts for running the prototype on an RPi. I also link to a downloadable repository for the Kit and some essays and talks Tiffany and/or I wrote about optophonics. The video below demonstrates stages of the prototyping process, from an artifact in storage to a fabricated object in the MLab.
The Reading Optophone
MLab 2015-18 | with Tiffany Chan (project lead), Katherine Goertz, Evan Locke, Danielle Morgan, and Victoria Murawski | Volume 3 in the Kits for Cultural History series | open access
Links: repository (ZIP); MLab research (HTML); essay (HTML) with Amodern; essay (HTML) with the Digital Rhetoric Collaborative; talks at MLA (HTML), HASTAC (HTML), Digital Humanities (HTML), and Washington State University (video)
Supported by the Canada Foundation for Innovation, Social Sciences and Humanities Research Council of Canada, British Columbia Knowledge Development Fund, and UVic’s Faculty of Humanities, Faculty of Fine Arts, Department of English, and Department of Visual Arts
README, by Tiffany Chan
This is a repository for the Optophone Kit, part of the Maker Lab’s Kits for Cultural History series. The optophone was an aid for blind readers that converted text into sound in the 20th century, beginning in the 1910s and extending until at least the 1960s. To read more about the process of remaking the optophone, see our posts at maker.uvic.ca.
These files are part of research conducted by Tiffany Chan (project lead), Katherine Goertz, Danielle Morgan, Victoria Murawski, and Jentery Sayers. Thanks to Robert Baker (Blind Veterans UK), Mara Mills (New York University), and Matthew Rubery (Queen Mary University of London) for their support and feedback.
These instructions detail the workflow and steps for converting an image into plaintext and then into a stream of optophonic sounds. Currently, the repo contains 3 scripts, written in the Python programming language. As the Kit develops, some of the scripts may change or be combined together (see the change log).
- ocrScript.py: takes an image from the PiCamera, runs it through OCR (optical character recognition)
- toneGen.py: creates and saves optophonic sounds for later playback
- optoScript.py: takes a string of characters as its input and plays the corresponding sounds
To run these scripts, you will need to download and install Python. See the Python website for instructions on how to do this. Note that the optophone project uses Python version 2.7. The scripts and dependencies should work with Python 3, but there may be slight differences.
There are also several dependencies (Python modules or packages) that must be installed before the scripts can work. These include:
- PiCamera: for taking a picture with the Raspberry Pi camera
- OpenCV: a computer vision and image processing program (the optophone project uses version 3.0.0)
- Pillow/PIL: Python Imaging Library, also for working with images
- Tesseract: a free OCR program
- pyTesser: a Python wrapper for Tesseract (basically, how Python talks to Tesseract)
- pyGame: for playing sounds with Python
For the optophone project, these were all installed on a Raspberry Pi. Except for ocrScript.py, all scripts should work on a laptop or personal computer. You can also modify ocrScript.py to take an arbitrary image as its input instead of an image from a Raspberry Pi camera.
Here’s a basic workflow for the optophone.
Set up everything you need. Set up all the hardware (the Raspberry Pi, PiCamera), and any other peripherals (e.g. monitor, mouse, keyboard) you might need. Download and install any dependencies. Download all the Python scripts.
Create and save tones for the optophone to play. Currently, optoScript.py only plays the tones for lowercase a, b, and c. You can download the sound files in the tones folder to use them for playback or modify and run toneGen.py to generate different tones (note you will probably have to change the dictionary in optoScript.py to match). To make things easier, you can keep the tones in the same directory (folder) as your scripts. Otherwise, make sure the script can find the correct file path to your sound files.
Take a picture of the print material and turn it into plaintext. Position the camera to take a picture of the text. Ideally, you will want an image with bright lighting and where the text will take up as much of the image as possible (i.e., little to no background). This will make the image easier for the computer to read. ocrScript.py and Tesseract (the OCR program) will optimize the image as best as they can. Modify ocrScript.py as necessary (see notes below) and run it. ocrScript.py will convert the image into a plaintext file named results.txt.
Run optoScript.py to read the results.txt file (make sure results.txt is in the same folder/directory as optoscript.py) and play the associated sounds.
Notes on Using the Scripts
The current scripts were made for testing small samples of code. To use them to express your own text as tones, you may have to modify it. For example, the dictionary of tones, as it is recorded in optoScript.py, only contains 3 entries for lowercase a, b, and c. You would have to change the dictionary to include the tones for other characters before you could play them.
Other places where code may be modified are noted in the scripts themselves.
Featured image by Tiffany Chan, Katherine Goertz, and Victoria Murawski. Both videos by Tiffany Chan. All used with permission. This page was created on 27 June 2019 and last updated on 14 July 2021.