Picking up on statistical regularities over time is an important prerequisite for language acquisition. For example, learning the transitional probabilities between syllables provides important scaffolding for segmenting the ongoing speech stream into component words – something that is not possible based on auditory information alone. A recent study by Kikuchi and colleagues examined the electrophysiological neural responses to confirmations and violations of an artificial grammar’s rules, but did so in an especially ambitious way – by comparing invasive recordings from human and monkey auditory cortex.
Both species were exposed to an artificial grammar (sequences of CVC nonsense words concatenated in rule-based ways) for 30 minutes, and then neural recordings were made during listening to sequences in which the context led to a specific nonsense word being consistent with or violating the grammatical structure. In response to all nonsense words, both species showed phase consistency in the theta frequency band (~4–8 Hz) as well as power modulations in the gamma band (>~50 Hz). In addition, significant phase–amplitude coupling was found between the theta and gamma bands in response to nonsense words. The more interesting question then, is what happens in response to confirmations versus violations of the artificial grammar rules?
In both species, phase–amplitude coupling was modulated by both confirmations and violations of the artificial grammar rules. Some neurons liked confirmations, some liked violations, and some liked both. In a classical statistical testing world, averaging over recording sites, this would very much be a null effect. Of course, that’s not how neural population coding goes, so we can imagine that looking at the activity pattern over the population of neurons may have provided more information about whether grammar was being respected, but this type of analysis was not performed. Instead, an analysis is presented which suggests that the latencies of the different neural effects in monkeys at least were different, such that phase–amplitude coupling effects and changes in single-unit activity occurred earlier in time than gamma-power modulations. Keep in mind that these are the latencies of the statistical effects, and not necessarily when the real action starts happening (just when the action crosses a significance threshold). There were no attempts to relate the effects to each other in a more fine-grained way, to learn for example whether single-trial phase–amplitude-coupling modulations might predict subsequent power modulations on the same trial.
So, I’ll ask, as the authors ask, what does it all mean? There were no species differences whatsoever, at least in as far as what the current techniques and measures could tell. What does that imply for the relationship between neural “oscillations” (here, theta–gamma coupling specifically) and speech segmentation / perception? That is, can a neural response that is conserved across species do something special for humans that it doesn’t do for other species that don’t use language in the way we typically think of language being used? I’d say, “sure”. For one, the study tested responses to learned statistical regularities in the transitions between complex sounds, something some species of non-human animals seem quite able to do (see also a recent demonstration that monkey auditory cortex neural activity synchronizes with the slow rhythms of speech). On top of that, to cite something Anne-Lise Giraud said at a “Neural Oscillations in Speech and Language Processing” workshop I just attended, one of the really appealing things about neural oscillations is exactly that they are evolutionarily conserved, but still DO seem to have been coopted to do something special for humans.
To sum up, despite my superficial grumpiness about the paper’s shortcomings, I do think the approach is 100% commendable, and one way forward for learning about speech and language processing. Species comparisons are hard, especially with invasive recordings even for humans(!). But having the opportunity to directly compare humans to other species and to use carefully matched stimuli, pipelines, and maybe even tasks has the potential to tell us a lot about the human capacity to learn and communicate via spoken language.