home :: faq :: projects :: presentations :: contact

Analyzing Orphaned Transcripts in Transana

Nicolas Sheon, PhD, UCSF Center for AIDS Prevention Studies

An orphaned transcript is one that has lost its source media file. The main idea of Transana is to work with the original source media and create your own transcripts that are synchronized to the source media. However, in certain situations this is not possible. Selecting and coding transcripts in Transana assumes that the source media are available and/or that you have the time to insert time codes to synchronize the episode with the transcript. For users who already have transcripts but no source media available, or who do not have the time to add time codes while listening to the source media, the following steps describe a method to automatically insert time codes between each word in the transcript, thereby creating an artificial time line. Step 6 also describes how to create an artificial version of the media file using a text-to-speech engine.

Step 1: Remove Paragraph and Tab Formatting

Open a copy of the original transcript in Word. Delete any title text that is not part of the interview talk at the top of the first page, and search and replace (Ctrl-H) the following things:

To remove Paragraphs: Replace ^p with %

If the transcript has manual line breaks instead of paragraphs, then replace ^l with %

To remove tabs: Replace ^t with #

Step 2: Convert from Text to Table

Highlight the entire text. Go to menus and select: Table->Convert->Text to Table. Select the button next to Other at the bottom and put a space in the box next to Other. Change the number of columns to 1. The number of rows should be roughly equal to the number of words in the interview. Click Okay. You should now have a very long table.

Step 3: Use Excel to Add the Time Codes

Highlight the entire table and copy and paste it into an Excel worksheet in Cell A1.

In Cell B1, type ¤< (To make the ¤ symbol, hold down Alt and on the number pad, type 0164)

In Cell D1, type >

Your table should look like this.

Select all of columns B, C, and D and in the Menus go to Edit->Fill->Down

In Cell C1, type 440 or whatever the length of your time code interval is in milliseconds. Select all of column C and in the Menus go to Edit->Fill->Series. Type 440 (or whatever is in C1) in the box marked Step Value. Click OK. The optimum value for the interval depends on whether you want to hear the audio file in synch with the transcript or not. See Step 6.

Find the last row in Column A which is equal to the number of words. Select a range of cells A1 through D(last row in column A). In other words, don’t select all of A-D, just down to the bottom of Column A and over to D. Copy this and Paste it into a new Word document.

Step 4: From Table Back to Text

Select the entire table in Word. Go to Menus and select Table->Convert-> Table to Text. Separate text with Other: (Space).

Step 5: Reformat the Interview

Use search and replace dialogue to remove the spaces between ¤< and the number and between the number and >.

Replace “¤Replace “space>” with “>”

Replace % with ^p

Replace # with ^t

So if your interview originally looked like this:

I: It’s January 26th, I'm doing an interview with, uh – interview #2A04D1.

P: {chuckle}

I: {chuckle} And this is being recorded and you understand it’s being recorded and you’ve agreed to the recording.

P: Yes, I do and I have agreed.

It should now look like this:

I: It’s ¤<440>January ¤<880>26th, ¤<1320>I'm ¤<1760>doing ¤<2200>an ¤<2640>interview ¤<3080>with, ¤<3520>uh ¤<3960>– ¤<4400>interview ¤<4840> 2A04D1. ¤<5280> ¤<5720>

P: {chuckle} ¤<6160>

I: {chuckle} ¤<6600>And ¤<7040>this ¤<7480>is ¤<7920>being ¤<8360>recorded ¤<8800>and ¤<9240>you ¤<9680>understand ¤<10120>it’s ¤<10560>being ¤<11000>recorded ¤<11440>and ¤<11880>you’ve ¤<12320>agreed ¤<12760>to ¤<13200>the ¤<13640>recording.

P: Yes, ¤<14080>I ¤<14520>do ¤<14960>and ¤<15400>I ¤<15840>have ¤<16280>agreed.

Note that this method should retain all the original formatting. Save the time coded word document as a rich text file.

Step 6. Create a Media File

Even if you do not want to hear the audio, you will still need an audio file to provide a timeline for the creation of clips. The audio file needs to be a certain length which is the number of words multiplied by the time code interval, which in the above example is 440 miliseconds. For example, if you have an interview with 10,000 words, the audio file would need to be 4,400,000 ms which is equivalent to 4,400 seconds or 73 minutes, 20 seconds). The free audio editing software, Audacity (available for download at http://audacity.sourceforge.net) can be used to create an audio file of a certain length. In the Menus choose File->New. Then select Generate->Silence then enter the number of seconds you need.

Screen shot of the audacity silence generator.

To save the file to your computer, Choose File->Export as wav or mp3.

I you want to hear the interview synchronized with your text, then you can use NaturalReader to generate an audio file based on the text of your interview. NaturalReader is a text-to-speech engine that will produce a computer generated voice reading the text of your interview. There is only one voice, unlike the original interview with two or more voices. This is nothing like the original recording, but if no audio recording exists or you don’t want to take the time to synch the original with the transcript, this is a quick and dirty way to make an audio file you can use to navigate your transcript.

Screenshot of NaturalReader Professional Version

As in the screenshot above, you may want to remove the speaker ID’s from the file first so that the reader does not say "I:" or "P:" before each utterance. Also, certain acronyms may give the software trouble and so for instance, it helps to replace “HIV” with “aech ivy” and "{laughs}" with “heh heh heh.” The software has a built in search and replace feature.

NaturalReader software is free, but in the example below, I am using one of their “natural” voices (Charles) which comes with the $40 version of the software. This version also allows you to convert an entire interview to an mp3 file in a matter of seconds. See http://www.naturalreaders.com.

To see a flash video an example of an orphan transcript being read by Charles in Transana, see http://www.palmpal.org/PIdemo/PIdemo.html.

last updated 5/25/07