Categories
3D modeling artificial intelligence deception fake news imagery information technology internet media forgery media technology persuasion privacy social media social media speech synthesis surveillance video

AI-enables creation of 3D face model from single image

If you’re not sufficiently concerned about people using AI tools to create convincing fake audio and video, now the Computer Vision Lab at Nottingham University has developed an AI system capable of creating fairly accurate 3D faces from single photographs. I uploaded one of my own to the demo tool and a few seconds later it produced the following model (a GIF of captured screen video of me rotating the 3D model):

Imagine what AI can do with multiple images and videos of you (from your social media posts, mobile phone’s images and videos library, surveillance images, etc.). Among other possible take-aways is the need for vigilance and cynicism. If you see or hear something in digital media (online or in media sent to you via email, IM, etc.) that is too terrible, wonderful or just shocking to be true, it probably isn’t. For now, at least, it’s still possible to detect forged media (and fake news, but you probably don’t want to) but soon it will require AI tools to spot the work of other AI tools and we’ll then have to decide which AIs to believe. The make/detect forgeries arms race is accelerating.

Okay, still smarting from me suggesting you may not want to detect when the news you enjoy and agree with is fake? Check out the following video and exercise your media literacy by researching cognitive biases.

Related links (interesting examples of cognitive bias and trolling in many of the comments)
  • https://www.ipscommons.sg/fake-news-mind-traps/
  • https://www.youtube.com/watch?v=4XGTTKJJsEw
  • https://www.youtube.com/watch?v=rrkqZfHOvbE
  • https://en.wikipedia.org/wiki/List_of_cognitive_biases
Categories
artificial intelligence brain cognition communication complexity computing engineering interaction design interface metaphors semantics speech speech synthesis

Should AI agents’ voice interactions be more like our own? What effects should we anticipate?

An article at Wired.com considers the pros and cons of making the voice interactions of AI assistants more humanlike.

The assumption that more human-like speech from AIs is naturally better may prove as incorrect as the belief that the desktop metaphor was the best way to make humans more proficient in using computers. When designing the interfaces between humans and machines, should we minimize the demands placed on users to learn more about the system they’re interacting with? That seems to have been Alan Kay’s assumption when he designed the first desktop interface back in 1970.

Problems arise when the interaction metaphor diverges too far from the reality of how the underlying system is organized and works. In a personal example, someone dear to me grew up helping her mother–an office manager for several businesses. Dear one was thoroughly familiar with physical desktops, paper documents and forms, file folders, and filing cabinets. As I explained how to create, save, and retrieve information on a 1990 Mac, she quickly overcame her initial fear. “Oh, it’s just like in the real world!” (Chalk one for Alan Kay? Not so fast.) I knew better than to tell her the truth at that point. Dear one’s Mac honeymoon crashed a few days later when, to her horror and confusion, she discovered a file cabinet inside a folder. To make matters worse, she clicked on a string of underlined text in a document and was forcibly and instantly transported to a strange destination. Cries for help punctuated my hours. Having come to terms with computers through the command-line interface, I found the desktop metaphor annoying and unnecessary. Hyperlinking, however–that’s another matter altogether–an innovation that multiplied the value I found in computing.

On the other end of the complexity spectrum would be machine-level code. There would be no general computing today if we all had to speak to computers in their own fundamental language of ones and zeros. That hasn’t stopped some hard-core computer geeks from advocating extreme positions on appropriate interaction modes, as reflected in this quote from a 1984 edition of InfoWorld:

“There isn’t any software! Only different internal states of hardware. It’s all hardware! It’s a shame programmers don’t grok that better.”

Interaction designers operate on the metaphor end of the spectrum by necessity. The human brain organizes concepts by semantic association. But sometimes a different metaphor makes all the difference. And sometimes, to be truly proficient when interacting with automation systems, we have to invest the effort to understand less simplistic metaphors.

The article referenced in the beginning of this post mentions that humans are manually coding “speech synthesis markup tags” to cause synthesized voices of AI systems to sound more natural. (Note that this creates an appearance that the AI understands the user’s intent and emotional state, though this more natural intelligence is illusory.) Intuitively, this sounds appropriate. The down side, as the article points out, is that colloquial AI speech limits human-machine interactions to the sort of vagueness inherent in informal speech. It also trains humans to be less articulate. The result may be interactions that fail to clearly communicate what either party actually means.

I suspect a colloquial mode could be more effective in certain kinds of interactions: when attempting to deceive a human into thinking she’s speaking with another human; virtual talk therapy; when translating from one language to another in situations where idioms, inflections, pauses, tonality, and other linguistic nuances affect meaning and emotion; etc.

In conclusion, operating systems, applications, and AIs are not humans. To improve our effectiveness in using more complex automation systems, we will have to meet them farther along the complexity continuum–still far from machine code, but at points of complexity that require much more of us as users.