Piping live sound from raspberry pi to macOS

I want to pipe and live play the sound recorded on my raspberry to my MacBook. I’ve tried the following:

On My raspberry:

I tried to establish a data stream on a port 3333

arecord -D plughw:3,0 -f S16_LE 44100 -t raw | nc -l -p 3333

On my MacBook:

nc 10.10.1.1 3333 | play -t raw -b 16 -e signed-integer -r 44100 -c 1 -V1 -

With this I can’t hear anything on my Mac but getting this output in the terminal:

-: (raw)

 File Size: 0
  Encoding: Signed PCM
  Channels: 1 @ 16-bit
Samplerate: 44100Hz
Replaygain: off
  Duration: unknown

In:0.00% 00:00:00.00 [00:00:00.00] Out:0     [      |      ]        Clip:0
Done.
Asked By: zahntheo

||

I can’t really tell you what goes wrong with your setup in your specific case; you’d want to check whether your nc actually receives data (e.g. by writing to a file, or by piping through pv), and whether your arecord actually captures sound (by writing to a file instead of piping to nc).

Also, not sure that 44100 Hz is an elegant sample rate; most hardware these days natively does 48000 and you’re just letting ALSA convert that to 44100 Hz.

What’s more important is that you should probably use a sensible transport framer, to allow the receiving end to properly align time, compensate dropped, drop late or reorder packets, learn about the stream format itself. I use MPEG Transport Stream as stream format, as used by things like digital video broadcasting, and other multimedia streaming platforms.

In the wake of that, using TCP for low-latency transport is probably not a great idea, either. You’ll also want to limit the transmit buffer size that your network sink uses (in this case, the nc on the RPi). All in all, arecord | nc might simply not be the best streaming approach. Also, in your approach, you get at least as much latency as it takes for the receiving end to start after your sender has been started and ready to connect

What I do when I quickly capture some audio and need to get it into a different machine, mostly for web conferencing reasons, is do the following on the "microphone machine":

ffmpeg 
  -f alsa -channels 1  -sample_rate 24000 -i pipewire 
  -f mpegts -max_packet_size 1024 
  -c:a libopus -b:a 64k -vbr off -packet_loss 10 -fec on 
  udp://127.0.0.1:1234

let’s take that apart:

  • ffmpeg: the program we’re running, FFmpeg. Pretty much the standard transcoding/streaming/decoding solution on most platforms.

input options:

  • -f alsa: the input type ("format") is alsa, i.e. we’re grabbing sound from the alsa sound system
  • -channels 1: optional I only want mono sound. Omit to use whatever your sound device gives you (probably stereo?), or set to different value if you want to specifically want to capture a different number of channels
  • -sample_rate 24000: optional I’m mostly concerned with speech in which case a 24 kHz sampling rate is much more than enough for excellent audio quality (I’m not a bat, my voice doesn’t go much above 1.2 kHz…).
  • -i pipewire: capture from the pipewire ALSA device. In your case, plughw:3,0, it seems. (check available capture devices with arecord -L)

output format options:

  • -f mpegts: after -i we’re done with describing the input, so this -f describes the output format. We’re streaming MPEG transport stream.
  • -max_packet_size 1024: optional we force the streamer to emit a packet every 1024 bytes. That limits transmit-side latency

audio codec options: optional

  • -c:a libopus: optional c is short for codec, a for audio, and here we use the libopus` encoder. OPUS is a mature, web-standard audio encoder with low-complexity and with high-quality settings.
  • -b:a 64k: optional, but we’re setting the encoded bitrate to 64kb/s. That’s fairly high-quality, but pretty OK in compute power (think 5% to 20% of one core, at most)
  • -vbr off: optional Force constant (as opposed to variable` bitrate). This makes sense if you plan to stream over a limited-bandwidth link and can’t have encoding having short spikes of high rate. Omit on LAN.
  • -packet_loss 10: optional set up the redundancy in packets such that it’s OK if 1 in 10 packets gets lost. Makes this robust on connections with occasional packet drops, like some internet connections, some wireless connections etc. Omit on LAN.
  • -fec on: optional after having set the redundancy amount, we also want to actually enable sending of this redunandy for forward error correction purposes. Only makes sense with -packet_loss > 0.

output options:

  • udp://127.0.0.1:1234 IP address and port to stream to. Attention! Here the receiver is the listening part, and the transmitter is a UDP socket, so it just pushes out the audio stream with no care for whether someone picks it up. The advantage here is that you can attach and de-attach the receiver as often as you like.

On the receiving end, we’ll open a listening socket:

ffplay -f mpegts -nodisp -fflags nobuffer udp://127.0.0.1:1234

-nodisp turns of the display (we are not streaming video), -fflags nobuffer makes the input stream parser not try to catch up on old things (i.e., avoids the "late listener" delay you’d get without).

Note that in this scenario, the playing end is the server, which opens the listening socket (as your nc -l did). You can also turn this around, if you want, using zmq:tcp:// where you used udp:// on both sides before, and get the ability to connect multiple listeners to your serving "recorder" pi.

You are of course also absolutely free to use nc -u -l -p 1234 | … instead, if is a program that understands MPEG TS.

Answered By: Marcus Müller
Categories: Answers Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.