How to Create Video Chat on Android? WebRTC Guide For Beginners

How to Create Video Chat on Android? WebRTC Guide For Beginners

·

8 min read

Briefly about WebRTC

WebRTC is a video chat and conferencing development technology. It allows you to create a peer-to-peer connection between mobile devices and browsers to transmit media streams. You can find more details on how it works and its general principles in our article about WebRTC in plain language.

2 ways to implement video communication with WebRTC on Android

  • The easiest and fastest option is to use one of the many commercial projects, such as Twilio or LiveSwitch. They provide their own SDKs for various platforms and implement functionality out of the box, but they have drawbacks. They are paid and the functionality is limited: you can only do the features that they have, not any that you can think of.

  • Another option is to use one of the existing libraries. This approach requires more code but will save you money and give you more flexibility in functionality implementation. In this article, we will look at the second option and use https://webrtc.github.io/webrtc-org/native-code/android/ as our library.

Creating a connection

Creating a WebRTC connection consists of two steps:

  1. Establishing a logical connection - devices must agree on the data format, codecs, etc.

  2. Establishing a physical connection - devices must know each other's addresses

To begin with, note that at the initiation of a connection, to exchange data between devices, a signaling mechanism is used. The signaling mechanism can be any channel for transmitting data, such as sockets.

Suppose we want to establish a video connection between two devices. To do this we need to establish a logical connection between them.

A logical connection

A logical connection is established using Session Description Protocol (SDP), for this one peer:

Creates a PeerConnection object.

Forms an object on the SDP offer, which contains data about the upcoming session, and sends it to the interlocutor using a signaling mechanism.

val peerConnectionFactory: PeerConnectionFactory
lateinit var peerConnection: PeerConnection

fun createPeerConnection(iceServers: List<PeerConnection.IceServer>) {
  val rtcConfig = PeerConnection.RTCConfiguration(iceServers)
  peerConnection = peerConnectionFactory.createPeerConnection(
      rtcConfig,
      object : PeerConnection.Observer {
          ...
      }
  )!!
}

fun sendSdpOffer() {
  peerConnection.createOffer(
      object : SdpObserver {
          override fun onCreateSuccess(sdpOffer: SessionDescription) {
              peerConnection.setLocalDescription(sdpObserver, sdpOffer)
              signaling.sendSdpOffer(sdpOffer)
          }

          ...

      }, MediaConstraints()
  )
}

In turn, the other peer:

  1. Also creates a PeerConnection object.

  2. Using the signal mechanism, receives the SDP-offer poisoned by the first peer and stores it in itself

  3. Forms an SDP-answer and sends it back, also using the signal mechanism

fun onSdpOfferReceive(sdpOffer: SessionDescription) {// Saving the received SDP-offer
  peerConnection.setRemoteDescription(sdpObserver, sdpOffer)
  sendSdpAnswer()
}

// FOrming and sending SDP-answer
fun sendSdpAnswer() {
  peerConnection.createAnswer(
      object : SdpObserver {
          override fun onCreateSuccess(sdpOffer: SessionDescription) {
              peerConnection.setLocalDescription(sdpObserver, sdpOffer)
              signaling.sendSdpAnswer(sdpOffer)
          }
           …
      }, MediaConstraints()
  )
}

The first peer, having received the SDP answer, keeps it

fun onSdpAnswerReceive(sdpAnswer: SessionDescription) {
  peerConnection.setRemoteDescription(sdpObserver, sdpAnswer)
  sendSdpAnswer()
}

After successful exchange of SessionDescription objects, the logical connection is considered established.

Physical connection

We now need to establish the physical connection between the devices, which is most often a non-trivial task. Typically, devices on the Internet do not have public addresses, since they are located behind routers and firewalls. To solve this problem WebRTC uses ICE (Interactive Connectivity Establishment) technology.

Stun and Turn servers are an important part of ICE. They serve one purpose - to establish connections between devices that do not have public addresses.

Stun server

A device makes a request to a Stun-server and receives its public address in response. Then, using a signaling mechanism, it sends it to the interlocutor. After the interlocutor does the same, the devices recognize each other's network location and are ready to transmit data to each other.

Turn-server

In some cases, the router may have a "Symmetric NAT" limitation. This restriction won’t allow a direct connection between the devices. In this case, the Turn server is used. It serves as an intermediary and all data goes through it. Read more in Mozilla's WebRTC documentation.

As we have seen, STUN and TURN servers play an important role in establishing a physical connection between devices. It is for this purpose that we when creating the PeerConnection object, pass a list with available ICE servers.

To establish a physical connection, one peer generates ICE candidates - objects containing information about how a device can be found on the network and sends them via a signaling mechanism to the peer

lateinit var peerConnection: PeerConnection

fun createPeerConnection(iceServers: List<PeerConnection.IceServer>) {

  val rtcConfig = PeerConnection.RTCConfiguration(iceServers)

  peerConnection = peerConnectionFactory.createPeerConnection(
      rtcConfig,
      object : PeerConnection.Observer {
          override fun onIceCandidate(iceCandidate: IceCandidate) {
              signaling.sendIceCandidate(iceCandidate)
          }           …
      }
  )!!
}

Then the second peer receives the ICE candidates of the first peer via a signaling mechanism and keeps them for itself. It also generates its own ICE-candidates and sends them back

fun onIceCandidateReceive(iceCandidate: IceCandidate) {
  peerConnection.addIceCandidate(iceCandidate)
}

Now that the peers have exchanged their addresses, you can start transmitting and receiving data.

Receiving data

The library, after establishing logical and physical connections with the interlocutor, calls the onAddTrack header and passes into it the MediaStream object containing VideoTrack and AudioTrack of the interlocutor

fun createPeerConnection(iceServers: List<PeerConnection.IceServer>) {

   val rtcConfig = PeerConnection.RTCConfiguration(iceServers)

   peerConnection = peerConnectionFactory.createPeerConnection(
       rtcConfig,
       object : PeerConnection.Observer {

           override fun onIceCandidate(iceCandidate: IceCandidate) { … }

           override fun onAddTrack(
               rtpReceiver: RtpReceiver?,
               mediaStreams: Array<out MediaStream>
           ) {
               onTrackAdded(mediaStreams)
           }
           … 
       }
   )!!
}

Next, we must retrieve the VideoTrack from the MediaStream and display it on the screen.

private fun onTrackAdded(mediaStreams: Array<out MediaStream>) {
   val videoTrack: VideoTrack? = mediaStreams.mapNotNull {                                                            
       it.videoTracks.firstOrNull() 
   }.firstOrNull()

   displayVideoTrack(videoTrack)

   … 
}

To display VideoTrack, you need to pass it an object that implements the VideoSink interface. For this purpose, the library provides SurfaceViewRenderer class.

fun displayVideoTrack(videoTrack: VideoTrack?) {
   videoTrack?.addSink(binding.surfaceViewRenderer)
}

To get the sound of the interlocutor we don't need to do anything extra - the library does everything for us. But still, if we want to fine-tune the sound, we can get an AudioTrack object and use it to change the audio settings

var audioTrack: AudioTrack? = null
private fun onTrackAdded(mediaStreams: Array<out MediaStream>) {
   … 

   audioTrack = mediaStreams.mapNotNull { 
       it.audioTracks.firstOrNull() 
   }.firstOrNull()
}

For example, we could mute the interlocutor, like this:

fun muteAudioTrack() {
   audioTrack.setEnabled(false)
}

Sending data

Sending video and audio from your device also begins by creating a PeerConnection object and sending ICE candidates. But unlike creating an SDPOffer when receiving a video stream from the interlocutor, in this case, we must first create a MediaStream object, which includes AudioTrack and VideoTrack.

To send our audio and video streams, we need to create a PeerConnection object, and then use a signaling mechanism to exchange IceCandidate and SDP packets. But instead of getting the media stream from the library, we must get the media stream from our device and pass it to the library so that it will pass it to our interlocutor.

fun createLocalConnection() {

   localPeerConnection = peerConnectionFactory.createPeerConnection(
       rtcConfig,
       object : PeerConnection.Observer {
            ...
       }
   )!!

   val localMediaStream = getLocalMediaStream()
   localPeerConnection.addStream(localMediaStream)

   localPeerConnection.createOffer(
       object : SdpObserver {
            ...
       }, MediaConstraints()
   )
}

Now we need to create a MediaStream object and pass the AudioTrack and VideoTrack objects into it

val context: Context
private fun getLocalMediaStream(): MediaStream? {
   val stream = peerConnectionFactory.createLocalMediaStream("user")

   val audioTrack = getLocalAudioTrack()
   stream.addTrack(audioTrack)

   val videoTrack = getLocalVideoTrack(context)
   stream.addTrack(videoTrack)

   return stream
}

Receive audio track:

private fun getLocalAudioTrack(): AudioTrack {
   val audioConstraints = MediaConstraints()
   val audioSource = peerConnectionFactory.createAudioSource(audioConstraints)
   return peerConnectionFactory.createAudioTrack("user_audio", audioSource)
}

Receiving VideoTrack is tiny bit more difficult. First, get a list of all cameras of the device.

lateinit var capturer: CameraVideoCapturer

private fun getLocalVideoTrack(context: Context): VideoTrack {
   val cameraEnumerator = Camera2Enumerator(context)
   val camera = cameraEnumerator.deviceNames.firstOrNull {
       cameraEnumerator.isFrontFacing(it)
   } ?: cameraEnumerator.deviceNames.first()

   ...

}

Next, create a CameraVideoCapturer object, which will capture the image

private fun getLocalVideoTrack(context: Context): VideoTrack {

   ...


   capturer = cameraEnumerator.createCapturer(camera, null)
   val surfaceTextureHelper = SurfaceTextureHelper.create(
       "CaptureThread",
       EglBase.create().eglBaseContext
   )
   val videoSource =
       peerConnectionFactory.createVideoSource(capturer.isScreencast ?: false)
   capturer.initialize(surfaceTextureHelper, context, videoSource.capturerObserver)

   ...

}

Now, after getting CameraVideoCapturer, start capturing the image and add it to the MediaStream

private fun getLocalMediaStream(): MediaStream? {
  ...

  val videoTrack = getLocalVideoTrack(context)
  stream.addTrack(videoTrack)

  return stream
}

private fun getLocalVideoTrack(context: Context): VideoTrack {
    ...

  capturer.startCapture(1024, 720, 30)

  return peerConnectionFactory.createVideoTrack("user0_video", videoSource)

}

After creating a MediaStream and adding it to the PeerConnection, the library forms an SDP offer, and the SDP packet exchange described above takes place through the signaling mechanism. When this process is complete, the interlocutor will begin to receive our video stream. Congratulations, at this point the connection is established.

Many to Many

We have considered a one-to-one connection. WebRTC also allows you to create many-to-many connections. In its simplest form, this is done in exactly the same way as a one-to-one connection. The difference is that the PeerConnection object, as well as the SDP packet and ICE-candidate exchange, is not done once but for each participant. This approach has disadvantages:

  • The device is heavily loaded because it needs to send the same data stream to each interlocutor

  • The implementation of additional features such as video recording, transcoding, etc. is difficult or even impossible

In this case, WebRTC can be used in conjunction with a media server that takes care of the above tasks. For the client-side the process is exactly the same as for direct connection to the interlocutors' devices, but the media stream is not sent to all participants, but only to the media server. The media server retransmits it to the other participants.

Conclusion

We have considered the simplest way to create a WebRTC connection on Android. If after reading this you still don't understand it, just go through all the steps again and try to implement them yourself - once you have grasped the key points, using this technology in practice will not be a problem. If you're eager to know more about this technology, check out our guide on Webrtc security.