Quick start guide¶
This project relies on fluteline, an easy to use thread based pipelines library (it’s highly recommended that you check out its docs).
It supports the creation of speech-to-text pipelines using easy to use modular components.
First, instantiate the nodes that you want to use.
Assuming source
and destination
are two such instantiated nodes, connect them with source.connect(destination)
.
Then, start your nodes with the .start
method.
When processing is done turn off the nodes with the .stop
method.
The transcriber¶
-
class
watson_streaming.
Transcriber
(settings, credentials_file=None, apikey=None, hostname=None)¶ A fluteline consumer-producer.
Send audio samples to it (1 channel, 44100kHz, 16bit, little-endian) and it will spit out the results from watson.
Parameters: - settings (dict) – IBM Watson settings. Consult the official IBM Watson docs for more information.
- credentials_file (string) – Path to your IBM Watson credentials. Alternatively, provide an apikey and hostname.
- apikey (string) – API key for the IBM Watson service.
- hostname (string) – IBM Watson hostname.
Utilities¶
Convenient fluteline producers and consumers to use with the main
watson_streaming.Transcriber
.
-
class
watson_streaming.utilities.
FileAudioGen
(audio_file)¶ Producer that spits out audio samples from a file.
Parameters: audio_file (string) – Path to a .wav
file.
-
class
watson_streaming.utilities.
MicAudioGen
(*args, **kwargs)¶ Producer that spits out audio samples from your microphone.
-
class
watson_streaming.utilities.
Printer
(*args, **kwargs)¶ End-of-chain consumer to print the transcript received from IBM Watson.
Examples¶
The two examples bellow (copied from here) can help you understand how to use the library for your needs.
The first one is for transcribing audio from the microphone using watson_streaming.utilities.MicAudioGen
.
The 2nd example is similar, but transcribes audio from a file instead, using watson_streaming.utilities.FileAudioGen
.
'''
Speech to text transcription, from your mike, in real-time, using IBM Watson.
'''
import argparse
import time
import fluteline
import watson_streaming
import watson_streaming.utilities
def parse_arguments():
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument('credentials', help='path to credentials.json')
return parser.parse_args()
def main():
args = parse_arguments()
settings = {
'inactivity_timeout': -1, # Don't kill me after 30 seconds
'interim_results': True,
}
nodes = [
watson_streaming.utilities.MicAudioGen(),
watson_streaming.Transcriber(settings, args.credentials),
watson_streaming.utilities.Printer(),
]
fluteline.connect(nodes)
fluteline.start(nodes)
try:
while True:
time.sleep(10)
except KeyboardInterrupt:
pass
finally:
fluteline.stop(nodes)
if __name__ == '__main__':
main()
'''
Speech to text transcription, from an audio file, in real-time, using
IBM Watson.
'''
import argparse
import contextlib
import time
import wave
import fluteline
import watson_streaming
import watson_streaming.utilities
def parse_arguments():
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument('credentials', help='path to credentials.json')
parser.add_argument('audio_file', help='path to .wav audio file')
return parser.parse_args()
def main():
args = parse_arguments()
settings = {
'interim_results': True,
}
nodes = [
watson_streaming.utilities.FileAudioGen(args.audio_file),
watson_streaming.Transcriber(settings, args.credentials),
watson_streaming.utilities.Printer(),
]
fluteline.connect(nodes)
fluteline.start(nodes)
try:
with contextlib.closing(wave.open(args.audio_file)) as f:
wav_length = f.getnframes() / f.getnchannels() / f.getframerate()
# Sleep till the end of the file + some seconds slack
time.sleep(wav_length + 5)
finally:
fluteline.stop(nodes)
if __name__ == '__main__':
main()