Alibaba’s speech recognition algorithm can isolate voices in noisy crowds


Source: VentureBeat

Chinese conglomerate Alibaba is one of the world’s largest ecommerce companies, but it’s increasingly turning its attention to artificial intelligence (AI). In March 2017, it launched an AI services division for health care and manufacturing, and in September its public cloud division — Alibaba Cloud — unveiled plans to set up a dedicated subsidiary and produce a self-developed AI inference chip that could be used for logistics and autonomous driving.

Alibaba has its fingers in plenty of AI pies, needless to say. And during a presentation at NeurIPS 2018 in Montreal this morning, it delivered an update on those cross-company efforts.

 “We’re solving … scenarios [with] unseen difficulties,” said Rong Jin, dean of the Alibaba Institute of Data Science. “AI together with innovation [is helping] to solve some interesting challenges.”

One of those challenges is speech recognition in noisy environments, like a crowded subway system or congested convention center. Alibaba’s solution is part hardware, part software: a far-field microphone array and sophisticated deep learning algorithms that isolate voices in a crowd, drastically reducing error rate.

Compared to the 84 percent accuracy the “best” speech recognition technologies are able to achieve with a mic array alone, Alibaba claims its model is between 94 and 95 percent accurate, even with heavily accented speakers. It has already been deployed as part of a voice-based subway ticketing system in Shanghai, and Alibaba is in talks to bring it to “a number of [additional] cities.”

“Nothing can save you if you don’t get enough signal to be recognized in the first place,” Jin said.

The spoken word isn’t the only domain Alibaba is tackling with AI. Using natural language processing, it’s performing automatic translation in real time, in the cloud, so that Alibaba retail customers in countries such as Russia and the Malay region can converse with human agents in their native tongues. And it’s tapping algorithms to field a portion of the tens of thousands of calls its support centers receive each day with AliMe, Alibaba’s intelligent customer service engine.

AliMe, much like Google’s Duplex, can carry on a phone conversation and answer basic questions without involving a human agent. Perhaps more impressively, in a chatbot context, it’s able to automatically extract text and images from a supplied document with “better than human” performance.

In an onstage demo, a customer asked Dian Xiaomi — Alibaba’s answering bot — about sales promotions for a particular Bluetooth speaker, like what sort of free gifts they’d receive with their purchase and how the gifts would be delivered to their residence. (A version rolling out later this year will add sentiment analysis and automated alerts for priority cases.) Another demo showed a humanoid embodiment of the chatbot — a prototype, Jin told the audience — with coordinated eye, lip, and head movements.

It’s a boon for bustling Alibaba divisions like AliExpress, which has over 150 million users and millions of merchants, and Cainiao, whose human workers and robots fulfill more than a billion orders each year. On Singles’ Day — the November 11 Chinese shopping holiday that this year generated $30.8 billion — Alibaba’s agents receive 5 times the typical number of calls in a 24-hour period, which would have been nearly impossible to juggle without a helping hand from AI.

Dian Xiaomi currently serves almost 3.5 million users a day, Alibaba says.

But natural language processing is just the tip of Alibaba’s AI iceberg. On Xian Yu, the retailer’s used goods marketplace, the company deployed a negotiation bot that talks to buyers to settle on a price.

The bot’s development wasn’t a cakewalk — it needed to learn negotiating strategies and efficient ways to generate text that’d incentivize back-and-forth negotiation — but the end result is impressive. When published to 10 million users on the same platform, the bot had a 20 percent higher chance of making a deal than a typical human being.

“Most of the [users] are not professional sellers,” Jin said. “They don’t know how to set a price or talk to buyers.”

On the inventory management and image search front, Alibaba is leveraging a scalable computer vision architecture to sift through hundreds of millions of entities. Its Cloud Image Search algorithm can recognize objects and find images containing similar or identical ones, and one of its store management apps — which picks out multiple items on a shelf to generate a summary that includes a distribution of different brands — can detect more than 100,000 SKUs with “high accuracy.” (Alibaba’s working toward a goal of 10 million SKUs.)

Both complement Alibaba’s Ali Smart Supply Chain (ASSC), a suite of AI tools that help Alibaba merchants forecast product demand, allocate inventory, and select pricing strategies.

Alibaba’s machine vision work extends to satellite images. Using data gathered from AutoNavi, the largest map and navigation provider in China, with over 70 million users, its systems are able to identify recently constructed buildings, for example, and gather information related to road work and points of interest.

Alibaba is also using computer vision to prevent shoplifting. At its more than 66 Hema brick-and-mortar stores, offline algorithms at its self-checkout kiosks prevent ne’er-do-well customers from scanning only the first item in a basket, or concealing items from the overhead camera’s view.

“The goal is to … have a computer vision system figure out if a customer is intentionally or unintentionally scanning items,” Jin said. “The machine sees that things aren’t scanned.”

It’s powered by a deep learning algorithm — AliFPGA-X100 — that runs on a field-programmable gate array, a reconfigurable integrated circuit within the kiosks. Alibaba says it’s able to process images up to 170 times faster compared than a comparable GPU-based system.

Alibaba is also applying AI to Youku, its video hosting service. Machine learning algorithms automatically generate thumbnails for the roughly 200,000 videos its tens of millions of active users upload each day. And it can target certain audience segments with said thumbnails. Female users might see a different preview image for a given video than male users, for example. This has led to a 15 percent improvement in click-through rates and a 12 percent uptick in dwell time.

Today’s survey comes just over a year after the debut of Alibaba’s new research organization — the Academy for Discovery, Momentum, and Outlook (or DAMO) — aimed at tackling emerging technologies, like machine learning and network security, and the opening of labs in San Mateo, Seattle, Moscow, Tel Aviv, and Singapore. It also closely follows the launch of Alibaba’s Tmall Genie, an AI-powered voice assistant that’s sold over 5 million units since it hit store shelves in July 2017.

And the company is arguably just getting started. Alibaba plans to spend more than $15 billion on research and development by 2020, it told Quartz in October 2017.

Scan QR code
关注Ali TechnologyWechat Account