研究方向
  • 语音识别及语音唤醒

面向家居、车载、​​办公室、公共空间、强噪声、近远场等复杂场景,研究多语言、多模态、端云一体的语音识别及唤醒技术,通过平台方式提供丰富的开发者定制模型自学习能力,让业务具备语音模型的自定制能力。

  • 语音合成

研究高音质、高表现力的语音合成技术及个性化语音合成,说话人转换技术,主要应用于语音交互、信息播报和篇章朗读等场景。

  • 声学及信号处理

研究声学器件、结构和硬件方案设计,基于物理建模和机器学习的声源定位、语音增强和分离技术、以及多模态和分布式信号处理等。

  • 声纹识别与音频事件检测

研究文本相关/无关声纹识别、动态密码、近场/远场环境声纹识别、性别年龄画像、大规模声纹检索、语种方言识别、音频指纹检索、音频事件分析等。

  • 口语理解及对话系统

基于自然语言理解技术,构建语音交互场景下的口语理解和对话系统,提供给开发者自纠错能力及对话定制能力。

  • 端云一体语音交互平台

综合应用声学、信号、唤醒、识别、理解、对话、合成等原子能力,构建全链路、跨平台、低成本、高可复制性、端云一体的分布式语音交互平台,帮助第三方具备可扩展定制化的场景能力。

  • 多模态人机交互

业内首创在公众场所强噪音的环境下实现免唤醒远场语音交互,并结合流式多轮多意图口语理解,业务知识图谱自适应等技术,面向公共空间真实复杂的场景提供自然语音交互体验。


产品及应用
  • 多模态人机交互

    致力于用最自然的人机语音交流方式,打造公共空间真实场景下的智能服务机器。主打业内首创的强噪声环境下的免唤醒语音交互、语音识别、流式多轮多意图口语识别等技术,已应用于交通行业和新零售行业。

    1)地铁语音售票机:全球首台地铁语音售票机,用户能够用该机器进行语音站点查询、语音模糊地点查询并完成路径规划;用户购票时间由30秒下降至10秒。

    2)快餐店语音点餐机:用户可以用人机交流式的语音交互方式,完成客制化点餐需求的快速下单。

  • 智能语音客服

    应用于智能语音导航(电话客服机器人、快递咨询等)、智能外呼(催收、回访、发货前确认等)、金牌话术、智能质检、App服务直达等多种场景。目前已落地于支付宝95188热线、菜鸟电话机器人、中国平安培训助手、中国移动智能客服等。

  • 端云一体语音交互

    提供全链路语音交互的能力,跨平台接入各类设备,具备有交互系统的场景化、定制化能力和主动交互能力。

    1)车载语音智能助手:已与上汽荣威、福特等汽车品牌合作。

    2)远场语音电视:阿里-海尔五代人工智能电视,用户与电视机进行远场语音交互。

  • 便携智能语音一体机

    便携智能一体机由达摩院结合应用场景现有问题和用户实际需求,由智能语音识别技术+智能采集阵列硬件+先进的音频处理算法组成。 打破传统场景记录方案,完美解决记录速度慢、记录不完整、速记成本高的问题。具备会后记录实时成稿,参会人无感使用,无需布线等特点,让用户使用更加轻松,记录效率更高。

  • 司法政务语音助手

    将语音识别技术、防串音处理技术、自然语言理解、大数据分析等技术综合运用,用于庭审语音识别与记录、案件分析等场景。目前已应用于浙江高院、福建高院等客户,覆盖全国28个省市,超过1万个法庭。

    了解更多

研究团队
鄢志杰达摩院语音实验室负责人

中国科学技术大学博士,IEEE高级会员。长期担任语音领域顶级学术会议及期刊专家评审。研究领域包括语音识别、语音合成、声纹、语音交互等。曾任微软亚洲研究院语音团队主管研究员。研究成果被转化并应用于阿里巴巴集团、蚂蚁金服及微软公司多项语音相关产品中。曾荣获中国科协百名基层科技工作者称号。

付强达摩院语音实验室研究员

西安电子科技大学博士,曾在美国OGI从事博士后研究。在IEEE Trans等学术刊物及会议上发表论文近百篇。曾获中国科学院杰出科技成就奖(2014年)、中国语音产业联盟先进个人(2016年)。创办的先声互联公司2017年获得过北京科技型中小企业促进专项资助

马斌达摩院语音实验室研究员

香港大学博士。加入阿里前,他是新加坡资讯通信研究院 (I2R)的语言技术部门负责人和资深研究员。曾任 IEEE/ACM 音频、语音及语言处理期刊和Elsevier语音通信期刊的编委。是INTERSPEECH 2014年技术委员会联合主席,曾获新加坡总统科技奖。

冯津伟达摩院语音实验室研究员

弗吉尼亚理工大学博士。师从音频声学泰斗沙家正先生,并与导师一起研制出了全球第一台扬声器纸盆共振频率的自动测试系统。曾主持开发基于麦克风阵列的视频跟踪系统。

李威达摩院语音实验室资深算法专家

香港大学计算机系博士。曾任百度语音技术部资深工程师,负责百度语音识别声学模型、语音合成核心算法及训练流程的研究和开发工作。现负责大规模声学模型、语言模型研究及产品化工作。

高杰达摩院语音实验室资深算法专家

中科院博士。曾任微软STC语音科学家,负责基于分布式计算平台的超大规模语音识别模型训练系统的研究和开发工作。现负责语音识别大规模解码器核心引擎等工作。

雷鸣达摩院语音实验室资深算法专家

中国科学技术大学博士。曾任微软STC语音科学家,负责语音合成核心算法的研发工作。现在负责语音识别和语音合成的算法研发和产品化等工作。

雷赟达摩院语音实验室资深算法专家

德州大学达拉斯分校博士。拥有50篇会议和期刊论文。研究领域包括声纹识别、语种识别、音频检测、语音识别、机器翻译、自然语言理解、推荐系统等。曾任Facebook和SRI的研究科学家。

王雯达摩院语音实验室资深技术专家

普渡大学计算工程博士。在IEEE/ACL会议和期刊上发表了100篇以上的论文。研究领域包括自然语言理解、自然语言处理、机器翻译、深度学习、语言模型、语音识别等。曾任SRI资深研究科学家。


学术成果
论文
  • Zhifu Gao, Shiliang Zhang, Ming Lei, Ian McLoughlin, SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition. INTERSPEECH 2020
  • Shiliang Zhang, Zhifu Gao, Haoneng Luo, Ming Lei, Jie Gao, Zhijie Yan, Lei Xie, Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition. INTERSPEECH 2020.
  • Yingzhu Zhao, Chongjia Ni, Cheung-Chi LEUNG, Shafiq Joty, Eng Siong Chng and Bin Ma,Cross Attention with Monotonic Alignment for Speech Transformer. INTERSPEECH 2020.
  • Yingzhu Zhao, Chongjia Ni, Cheung-Chi LEUNG, Shafiq Joty, Eng Siong Chng and Bin Ma,Speech Transformer with Speaker Aware Persistent Memory. INTERSPEECH 2020.
  • Yingzhu Zhao, Chongjia Ni, Cheung-Chi LEUNG, Shafiq Joty, Eng Siong Chng and Bin Ma,Universal Speech Transformer. INTERSPEECH 2020.
  • Shengkui Zhao, Trung Hieu Nguyen, Hao Wang and Bin Ma, Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion. INTERSPEECH 2020.
  • Siqi Zheng, Yun Lei, Hongbin Suo, Phonetically-Aware Coupled Network For Short Duration Text-independent Speaker Verification. INTERSPEECH 2020.
  • Kai Fan,Bo Li Jiayi Wang, Boxing Chen, Niyu Ge, Shiliang Zhang,Neural Zero-Inflated Quality Estimation Model For Automatic Speech Recognition System. INTERSPEECH 2020.
  • Weilong Huang and Jinwei Feng,Differential Beamforming for Uniform Circular Array with Directional Microphones. INTERSPEECH 2020.
  • Ziteng Wang, Yueyue Na, Zhang Liu, Yun Li, Biao Tian and Qiang Fu, A Semi-blind Source Separation Approach for Speech Dereverberation. INTERSPEECH 2020.
  • Van Tung Pham, Haihua Xu, Yerbolat Khassanov, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma, Haizhou Li, INDEPENDENT LANGUAGE MODELING ARCHITECTURE FOR END-TO-END ASR, ICASSP2020.
  • Zhihao Du, Ming Lei, Jiqing Han, Shiliang Zhang, PAN: PHONEME AWARE NETWORK FOR MONAURAL SPEECH ENHANCEMENT, ICASSP2020.
  • Yun Li, Zhang Liu, Yueyue Na, Ziteng Wang, Biao Tian, Qiang Fu,"A VISUAL-PILOT DEEP FUSION FOR TARGET SPEECH SEPARATION IN MULTI-TALKER NOISY ENVIRONMENT", ICASSP2020.
  • Qian Chen, Mengzhe Chen, Bo Li, Wen Wang,Controllable Time-Delay Transformer for Real-Time Punctuation Prediction and Disfluency Detection,ICASSP2020.
  • Qian Chen, Zhu Zhuo, Wen Wang, Qiuyun Xu, Transfer Learning for Context-Aware Spoken Language Understanding, ASRU 2019.
  • Qian Chen, Wen Wang, Sequential neural networks for noetic end-to-end response selection, Computer Speech & Language 2019.
  • Shengkui Zhao, Chongjia Ni, Rong Tong, Bin Ma, "Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition", accepted by INTERSPEECH 2019.
  • Shengkui Zhao, Trung Hieu Nguyen, Hao Wang, Bin Ma, "Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks", accepted by INTERSPEECH 2019.
  • Yerbolat Khassanov, Haihua Xu, Van Tung Pham, Zhiping Zeng, Eng Siong Chng, Chongjia Ni and Bin Ma, "Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data", accepted by INTERSPEECH 2019.
  • Siqi Zheng, Gang Liu, Hongbin Suo, Yun Lei, "Autoencoder-based Semi-Supervised Curriculum Learning For Out-of-domain Speaker Verification", accepted by INTERSPEECH 2019.
  • Siqi Zheng, Gang Liu, Hongbin Suo, Yun Lei, "Towards A Fault-tolerant Speaker Verification System: A Regularization Approach To Reduce The Condition Number", accepted by INTERSPEECH 2019.
  • Zhiying Huang, ShiLiang Zhang and Ming Lei, "Audio Tagging with Compact Feedforward Sequential Memory Network and Audio-to-Audio Ratio Based Data Augmentation", accepted by INTERSPEECH 2019.
  • Zhiying Huang, Heng Lu, Ming Lei, Zhijie Yan. Linear Networks Based Speaker Adaptation for Speech Synthesis. ICASSP, 2018.
  • Shiliang Zhang, Yuan Liu, Ming Lei, Bin Ma and Lei Xie, "Towards Language-Universal Mandarin-English Speech Recognition", accepted by INTERSPEECH 2019.
  • Shiliang Zhang, Ming Lei and Zhijie Yan, "Investigation of Transformer based Spelling Correction Model for CTC-based End-to-End Mandarin Speech Recognition", accepted by INTERSPEECH 2019.
  • Qian Chen, Wen Wang, "SEQUENTIAL MATCHING MODEL FOR END-TO-END MULTI-TURN RESPONSE SELECTION", accepted by ICASSP 2019.
  • Wei Li, Sicheng Wang, Ming Lei, Sabato Siniscalchi, Chin-Hui Lee, "IMPROVING AUDIO-VISUAL SPEECH RECOGNITION PERFORMANCE WITH CROSS-MODAL STUDENT-TEACHER TRAINING", accepted by ICASSP 2019.
  • Shiliang Zhang, Ming Lei, Bin Ma, Lei Xie, "Robust Audio-Visual Speech Recognition Using Bimodal DFSMN with Multi-condition Training and Dropout Regularization", accepted by ICASSP 2019.
  • Shiliang Zhang, Ming Lei, Yuan Liu, Wei Li, "Investigation of Modeling Units for Mandarin Speech Recognition using DFSMN-CTC-SMBR", accepted by ICASSP 2019.
  • Qian Chen, Wen Wang, "Sequential Attention-based Network for Noetic End-to-End Response Selection", AAAI DSTC7 workshop 2019.
  • Mengzhe Chen, Shiliang Zhang, Ming Lei, Yong Liu, Haitao Yao, Jie Gao. Compact Feedforward Sequential Memory Networks for Small-footprint Keyword Spotting. INTERSPEECH, 2018.
  • Shiliang Zhang, Ming Lei. Acoustic Modeling with DFSMN-CTC and Joint CTC-CE Learning. INTERSPEECH, 2018.
  • Mengxiao Bi, Heng Lu, Shiliang Zhang, Ming Lei. Deep Feed-Forward Sequential Memory Networks for Speech Synthesis. ICASSP, 2018.
  • Fei Tao, Gang Liu. Advanced LSTM: A Study About Better Time Dependency Modeling in Emotion Recognition. ICASSP, 2018.
  • Fei Tao, Gang Liu, Qingen Zhao. An Ensemble Framework of Voice-Based Emotion Recognition System for Films and TV Programs. ICASSP, 2018.
  • Shiliang Zhang, Ming Lei, Zhijie Yan, Lirong Dai. Deep-FSMN for Large Vocabulary Continuous Speech Recognition. ICASSP, 2018.
  • Fei Tao, Gang Liu, Qingen Zhao. An Ensemble Framework of Voice-Based Emotion Recognition System. ACII Asia, 2018.
  • Shaofei Xue, Zhijie Yan. Improving Latency-Controlled BLSTM Acoustic Models for Online Speech Recognition. ICASSP, 2017.
  • Gang Liu, Qi Qian, Zhibin Wang, Qingen Zhao, Tianzhou Wang, Hao Li, Jian Xue, Shenghuo Zhu, Rong Jin and Tuo Zhao. The Opensesame NIST 2016 Speaker Recognition Evaluation System. INTERSPEECH, 2017.
  • Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Anthony Larcher, Chunlei Zhang, Andreas Nautsch, Themos Stafylakis, Gang Liu, Mickael Rouvier, Wei Rao, Federico Alegre, Jianbo Ma, Manwai Mak, Achintya Kumar Sarkar, Héctor Delgado, Rahim Saeidi, Hagai Aronowitz, Aleksandr Sizov, hanwu sun, Guangsen Wang, Trung Hieu Nguyen, Bin Ma, Ville Vestman, Md Sahidullah, Miikka Halonen, Anssi Kanervisto, Gael Le Lan, Fahimeh Bahmaninezhad, Sergey Isadskiy, Christian Rathgeb, Christoph Busch, Georgios Tzimiropoulos, Qi Qian, Zhibin Wang, Qingen Zhao, Tianzhou Wang, Hao Li, Jian Xue, Shenghuo Zhu, Rong Jin, Tuo Zhao, Pierre-Michel Bousquet, Moez Ajili, waad ben kheder, Driss Matrouf, Zhi Hao Lim, Chenglin Xu, Haihua Xu, Xiong Xiao, Eng Siong Chng, Benoit Fauve, Vidhyasaharan Sethu, Kaavya Sriskandaraja, W. W. Lin, Zheng-Hua Tan, Dennis Alexander Lehmann Thomsen, Massimiliano Todisco, Nicholas Evans, Haizhou Li, John H.L. Hansen, Jean-Francois Bonastre and Eliathamby Ambikairajah. The I4U Mega Fusion and Collaboration for NIST Speaker Recognition Evaluation 2016. INTERSPEECH, 2017.
  • Heng Lu, Ming Lei, Zeyu Meng, Yuping Wang, Miaomiao Wang. The Alibaba-iDST Entry to Blizzard Challenge 2017. Blizzard Challenge 2017 Workshop, 2017.
展开更多

联系我们
E-mail: nls_support@service.aliyun.com

扫描二维码
关注阿里技术微信公众号