Making Reading More Immersive

He ain't got no distractions, can't hear those buzzers and bells.

Prepared by Alison Chaiken and offered under Creative Commons logo

immersive study photo

Photo credit: Marco Grob for the New York Times

Video of a presentation by mind-gamer Joshua Foer: "Step Outside Your Comfort Zone and Study Yourself Failing. H/T Jay Liew for the link.


If you're like me, you read a lot of documentation and other detailed technical material. Sometimes the documentation is poorly written or confusing and I have trouble concentrating. The hardest material to concentrate on is patent applications. I can barely bring myself to read more than a half page of claims at a time even when I have a deadline.

As a side note, I've been thinking a lot about assistive technologies due to my professional work on "in-vehicle infotainment" systems for drivers. A moment's consideration shows that a driver is essentially a blind computer user, so a lot of assistive technologies may be adaptable for in-car use.

Inspiration: Audio Files for Language Study


Recently I've been studying Finnish at night with the assistance of audio and video files whose intention is to improve pronunciation. The videos did not include written text to compare with the audio, and I found them hard to understand and useless. The audio files featured native speakers reading sentences that corresponded to print in a textbook. Reading the book while listening to the audio and repeating after the speakers not only improved my pronunciation, it also immensely helped my retention of vocabulary. I could remember words that I'd heard on the recording much better than those I had not.

It occurred to me that listening to audio derived from documentation while reading the documentation might help me focus and retain material better and lately I've been trying it out. Somehow the brain hack of listening to a stream that I must intentionally pause immensely reduces my tendency for distraction. Put on headphones, open your PDF file, crank up the volume and give it a try!

Why Not Orca?

Orca logo

Gnome comes with the Orca assistive technology package that will read just about any window out loud. Orca does work with PDFs, but really poorly in my experience. For every slide, the audio includes the metadata, and I had to keep moving the mouse around to get the page content to play. There may well be a way to configure Orca to speak PDFs better, but I haven't found one. Please email me if you know.

Orca also tends to cause the Gnome desktop to hang on my fully patched Fedora 14 system as of today (March 3, 2011). My solution, while clunky and minimally featureful, may SEGV on your system for all I know, but it will not hang your desktop and necessitate a reboot! If you use Orca, you may want to study up on the Magic Sysrq keys. ALT-SYSRQ-R-E-I-S-U-B is your friend. In Fedora, set "kernel.sysrq = 1" in /etc/sysctl.conf.

An Alternative Approach Based on totem, pdftotext and text2wave

SCALE 9x logo

Inspired by a talk at SCALE 9x by Dallas Legan and by a subsequent conversation with my buddy Sarah Newman, I've hacked up a solution that works for me. The basic idea is to use the very basic totem media player that comes with Gnome to play the audio stream so that it can be backed up and scrolled ahead. As far as I can make out, orca doesn't allow this kind of fine-grained control.

The gist of the implementation is to pipe output from pdftotext to text2wave to totem. While the gstreamer player gst-launch that underlies totem will play from stdin, I've had trouble getting totem to do so reproducibly. For shorter PDF files, totem streams from stdin, but for longer files it complains that it can't determine the type.


cat /tmp/dbus.wav | gst-launch-0.10 playbin uri=fd://0

Not so much:

[alison@bonnet UPnP]$ cat /tmp/dbus.wav | totem file://fd://0
** Message: Error: Could not open resource for reading.
gstgiosrc.c(324): gst_gio_src_get_stream (): /GstPlayBin2:play/GstURIDecodeBin:uridecodebin0/GstGioSrc:source:
Could not open location file://fd://0 for reading: Operation not supported
[alison@bonnet UPnP]$ cat /tmp/dbus.wav | totem uri:fd://0
** Message: Error: No URI handler implemented for "uri".
[alison@bonnet UPnP]$ cat /tmp/dbus.wav | totem uri=fd://0
gstfilesrc.c(1034): gst_file_src_start (): /GstPlayBin2:play/GstURIDecodeBin:uridecodebin0/GstFileSrc:source:
No such file "/home/alison/UPnP/uri=fd:/0"

Installation of sayPDF totem-based solution

  1. Get the files.
  2. tar xjf sayPDF.tar.bz2
  3. If you have already installed festival and its voices, you're all set. Otherwise on Fedora 14 or similar, try "./CommandLineAudioInstallFedora" to get the required packages. Note that Fedora by default installs the voices in the wrong directory.
  4. Move the festivalrc file to .festivalrc in your home directory. You can change the playback speed or try different voices by editing ~/.festivalrc.
  5. If you like, move the bash and perl executables to a bin directory on your $PATH.

Running sayPDF totem-based solution

Play file dbus.pdf from pages 2 through 5 using totem:

sayPDF_totem -f 2 -l 5 -i /tmp/dbus.pdf 

Play the same file using gstreamer directly:

sayPDF_gst-launch -f 2 -l 5 -i /tmp/dbus.pdf 

Note that text2wave is a CPU-intensive program. If you want to listen to a PDF on (for example) a netbook, you may want to ssh into a beefier machine and process the file there. Running the program with the -v flag will tell you what the Wav filename is so that you can cp it out of /tmp.

Miscellaneous Notes

Features to be Added

An Alternative Approach from Kyle Rankin

Cover of Kyle Rankin's book

The ever-inspiring Kyle Rankin offers an alternative approach in his book Linux Multimedia Hacks. Kyle generously allowed me to include his perl script in the tarball above. For reasons I don't understand, on my Fedora system I must invoke it as "perl <filename>" rather than having an executable called "speak" that I invoke as "speak <filename>." Kyle's solution only works with text files, which is displays as it speaks them. His script also adds end-of-sentence and end-of-paragraph pauses better than default festival.



Why would you take the advice of some random person you don't know?

Valid XHTML 1.0! Valid CSS! (Alison Chaiken)