When Will It Just Be Machines Talking to Machines?

Do you ever get the feeling that the machines are trying to influence the way you think, or take charge of your communicative acts?

One of the things I noticed for the first time today was that my WordPress editor seems to have started converting some pasted in URLs to actual links, using the pasted URL as the href attribute value and with the link text pulled from the referenced page (WordPress Editor Generates Page Title Links From Pasted URLs). Thinking about it, this is an example of an auto__completion__ behaviour in which the machine has detected some pattern and “completed” it based on the assumption that I intend to “complete” the pattern by turning it from a URL to a web hyperlink.

That is, I paste in X but actually want to represent it as [Y](X) (a link represented in markdown, where Y is the link text and X the target URL or <a href="X">Y</a> (an HTML link).

I imagine most people are familiar with the notion that Google offers a range of autocompletion and autosuggestion terms when you start to type in a Google web search (I don’t think the voice search (yet?) starts to interrupt as you when you ‘ok Google’ it (I don’t knowingly have any voice interfaces activated…))…

What I’ve also noticed over the last few days that a Gmail update seems to have come along with a new, positively set default that opts me in to an autocomplete service there when I’m replying to an email at least:

This service has been available since May, 2018, at least: SUBJECT: Write emails faster with Smart Compose in Gmail.

In look and feel, it’s very reminiscent of code autocompletion support in programming code editors. If you aren’t a programmer, know that computer programmes are essentially composed of fixed vocabulary terms (whether imposed by the language or defined within the programme itself), so code completion makes absolute sense to the people who built the Gmail software application and user interface. Why on earth wouldn’t you want it everywhere…

A couple of things concern me without even thinking about it:

  1. What could possibly go wrong…
  2. Does autocomplete change what people intend to write?

In the paper Responsible epistemic technologies: A social-epistemological analysis of autocompleted web search, Miller & Record write:

[U]sers’ exposure to autosuggestions is involuntary. Users cannot type a search without encountering autosuggestions. Once seen, they cannot “unsee” the results. …

Psychology suggests two likely consequences of involuntary exposure. First, initially disregarded associations sometimes transform into beliefs because humans are prone to source-monitoring errors: subjects mistake the original information source and may put more or less credence in the information than they would have given the correct source (e.g. Johnson et al., 1993). Someone might read [something] and initially disregard it, but later, having forgotten the source, recall having read it. This is supported by the sleeper effect, according to which when people receive a message with a discounting cue, they are less persuaded by it immediately than later in time (Kumkale and Albarracín, 2004). Second, involuntary exposure to certain autosuggestions may reinforce unwanted beliefs. Humans are bad at identifying and rooting out their implicit biases (Kenyon, 2014). Because exposure is involuntary, even subjects hygienic in their epistemic practices may be negatively affected.

[A]utosuggestions interactively affect a user’s inquiry, leading to paths she might not have pursued otherwise. Effectively, if a user looks at the screen, she can’t help but see the autosuggestions, and these impressions can affect her inquiry. Autosuggestions may seem to a user to delimit the possible options or represent what most people find relevant, either of which may change her search behavior. She may change her search terms for one of the suggestions, add or subtract additional terms to rule out or in suggested results. She may abandon her search altogether because the autosuggestions seem to provide the answer or indicate that there is no answer to be found that is she may assume that because nothing is being suggested, no results for the query exist. Furthermore, because the displayed information may be incomplete or out of context, she might reach a different conclusion on the basis of autosuggestions than if she actually visited the linked page.

Altering a user’s path of inquiry can have positive effects, as when he is exposed to relevant information he might not have encountered given his chosen search terms. But the effects may also be negative. … Such derails in inquiry may be deleterious… .

Finally, autosuggestions affect users’ belief formation process in a real-time interactive and responsive manner. “It helps to complete a thought,” as one user put this (Ward et al., 2012: 12). They may thus generate beliefs the user might not have had. Based on autosuggestions, I might erroneously believe [X]. Alternatively, I might come to believe that these things are possible, where before I held no beliefs about them, or I might give these propositions more credence than I would otherwise. Autocomplete is like talking with someone constantly cutting you off trying to finish your sentences. This can be annoying when the person is way off base or pleasant when he seems like your mind-reading soulmate. Either way, it has a distracting, attention-shifting effect that other interactive interface technologies lack.

As an aside, I also note that as well as offering autosuggestion possibilities that intrude on our personal communicative acts, it’s also acting as a proxy that can buffer us from having to engage in those actions. Spam filtering is one example (I tend not to review my spam filter folders, so I’m not sure how many legitimate emails get passed through to it. Hmm, thinks, does a contemporary version of the OSS Simple Sabotage Field Manual markdown include suggestions to train corporate spam filters on legitimate administrative internal emails?)

A good example of creeping intermediation comes in the form of Google Duplex, a voice agent / assistant demoed earlier this year that can engage in certain phone-based, voice interactions on your behalf. It’s about to start appearing in the wild on Pixel phones (Pixel 3 and on-device AI: Putting superpowers in your pocket).

One of the on-device features that will be supported is a new Call Screen service:

You can see who’s calling and why before you answer a call with the help of your Google Assistant. …

  1. When someone calls, tap Screen call.
  2. The Google Assistant will … ask who’s calling and why. Then you’ll see a real-time transcript of how the caller responds.
  3. Once the caller responds, choose a suggested response or an action. Here are some responses and what the caller will hear:
    • ​​Is it urgent? – “Do you need to get a hold of them urgently?”
    • Report as spam – “Please remove this number from your mailing and contact list. Thanks, and goodbye.”
    • I’ll call you back – “They can’t talk right now, but they’ll give you a call later. Thanks, and goodbye.”
    • I can’t understand – “It’s difficult to understand you at the moment. Could you repeat what you just said?”

But not actually “transfer” the call to the user so they can answer it?!

According to Buzzfeed (The Pixel 3: Everything You Need To Know About Google’s New Phone), the Call Screen bot will answer the phone for you and challenge the caller: “The person you’re calling is using a screening service and will get a copy of this conversation. Go ahead and say your name and why you’re calling.” This raises the interesting question of how another (Google) bot on the calling side might respond…

(By the by, thinks: phone receptionists – the automated voice assistants will be after your job…)

It’s probably also worth remembering that:

[s]ometimes Call Screen may not understand the caller. To ask the caller to repeat themselves, tap I can’t understand. The caller will hear, “It’s difficult to understand you at the moment. Could you repeat what you just said?”

So now rather than you spending a couple of seconds to answer the phone, realise it’s a spam caller, and hang up, you have to take even more time out waiting on Call Screen, reading the Call Screen messages and training it a bit further when it gets stuck? But I guess that’s how you pay for its freeness.

Anyway, as part of your #resistance defense toolkit, maybe add that phrase to your growing list of robot tells. (Is there a full list anywhere?)

As well as autocomplete and autosuggest, I note the ever engaging Pete Warden blogging recently on the question of Will Compression Be Machine Learning’s Killer App?:

One of the other reasons I think ML is such a good fit for compression is how many interesting results we’ve had recently with natural language. If you squint, you can see captioning as a way of radically compressing an image. One of the projects I’ve long wanted to create is a camera that runs captioning at one frame per second, and then writes each one out as a series of lines in a log file. That would create a very simplistic story of what the camera sees over time, I think of it as a narrative sensor.

The reason I think of this as compression is that you can then apply a generative neural network to each caption to recreate images. The images won’t be literal matches to the inputs, but they should carry the same meaning. If you want results that are closer to the originals, you can also look at stylization, for example to create a line drawing of each scene. What these techniques have in common is that they identify parts of the input that are most important to us as people, and ignore the rest.

Which is to say: compress the image by creating a description of it and then generating an image based on the description at the other end. A picture may save a thousand words, but if the thousand words compress smaller than the picture in terms of bits and bytes, that makes sense to the data storage and transmission folk, albeit at the trade off of increased compute requirements on either side.

Hmm, this reminds me of a thinkses from over a decade ago on The Future of Music:

My expectation over the last 5 years or so was that CD singles/albums would start to include remix applications/software studios on that medium – but I’ve been tracking it as a download reality on and off for the last 6 months or so (though it’s been happening for longer).

That said – my expectation of getting the ‘src’ on the CD was predicated on the supply of the remix application on the CD too, rather than it being pre-installed on the users’ computer.

The next thing I’m looking out for is a ‘live by machine’ gig, where a club franchise has real hardware/synths being played at a distance by the band, who are maybe in another venue owned by that club chain?

For this, you have to imagine banks of synths receiving (MIDI) control signals over the net from the real musicians playing live elsewhere.

This is not so much online jamming (or here: eJamming) – where you mix realtime audio feeds from other musicians on the web with your own efforts – as real time creation of the music from audio generators…

It’s also interesting to note that the “reproducibility” requirement associated with shipping the software tooling required to let you make use of the data (“predicated on the supply of the remix application on the CD too”), as well as the data, was in my thinking even then…

Pete Warden goes on:

It’s not just images

There’s a similar trend in the speech world. Voice recognition is improving rapidly, and so is the ability to synthesize speech. Recognition can be seen as the process of compressing audio into natural language text, and synthesis as the reverse. You could imagine being able to highly compress conversations down to transmitting written representations rather than audio. I can’t imagine a need to go that far, but it does seem likely that we’ll be able to achieve much better quality and lower bandwidth by exploiting our new understanding of the patterns in speech.

I even see interesting possibilities for applying ML compression to text itself. Andrej Karpathy’s char-rnn shows how well neural networks can mimic styles given some examples, and that prediction is a similar problem to compression. If you think about how much redundancy is in a typical HTML page, it seems likely that there would be some decent opportunities for ML to improve on gzip. This is getting into speculation though, since I don’t have any ML text compression papers handy.

Ah ha…

Tangentially related, ramblings on Google languaging: Translate to Google Statistical (“Google Standard”?!) English? and Google Translate Equilibrium Finder. FWIW, these aren’t machine generated “related” items: they’re old thoughts I remembered blogging about before…)

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...