The main technique that is used during face to face communication is speech, but this involves a lot more than just listening to the words that people say. Reading someone’s lips can also be a crucial aspect of this since it can help you parse the meaning of their words in situations where you might not be able to hear them all that clearly, and that is something that Meta seems to be taking into account when it comes to their AI.
A lot of studies have revealed that it would be a lot more difficult to understand whatever it is that someone is trying to say if you can’t see the manner in which their mouth is moving. Meta has developed a new framework called AV-HuBERT that will take both factors into account because of the fact that this is the sort of thing that could potentially end up vastly improving its speech recognition potential, although it should be said that this is only a test at this point.