Alibaba Innovative Research (AIR) > Low Carbon & Efficient Serverless Infrastructure
【CCF-AIR青年基金】服务网格性能调优及软硬结合优化 Service mesh performance tuning and software-hardware integration based optimization

Research Themes

Low Carbon & Efficient Serverless Infrastructure

Background

服务网格技术发展迅速,但在实际应用中,也遇到了一些产品上共性的问题,包括性能影响和资源占用、协议的解耦式扩展支持、网络延迟等诸多问题, 通过解决这些问题,将极大帮助用户提升服务网格的生产规模、稳定性、延时优化,从而促进用户业务应用在网格下的治理能力,为客户业务提供高性能、高吞吐、高灵活的服务治理与流量管控能力。

 

eBPF 是一项革命性的技术,起源于Linux内核,可以在操作系统内核中运行沙盒程序。它用于安全有效地扩展内核的功能,而无需更改内核源代码或加载内核模块。eBPF有一个巨大的优势,eBPF代码可以在运行时插入到现有的Linux 内核中,类似于Linux 内核模块,但与内核模块不同,它可以以安全和可移植的方式进行。因此eBPF是一项使Linux 内核能够跟上快速发展的云原生技术栈的关键技术。

 

eBPF 将能够卸载越来越多的目前由代理执行的功能,以进一步减少开销和复杂性。其中, sockmap就是一种基于eBPF的技术, 它允许将 TCP连接之间的数据转发过程卸载到内核中,减少了上下文切换以及用户态和内核态之间的数据拷贝操作,极大优化了TCP连接之间socket数据转发的性能。基于此技术,对于本地通信可以绕过TCP/IP协议栈将报文直接发给对端socket,以此来提高性能。

 

在可观测性方面, bpftrace将脚本编译为BPF字节码后,通过BCCeBPF进行交互, 能够追踪TCP细粒度的生命周期、线程阻塞、OOM等。

 

In general, service mesh technology is developing rapidly, but in practical applications, it also encounters some common problems in products, including performance impact and resource occupation, decoupling extension support of custom protocols, network latency and many other problems. By solving these problems, it will greatly help users improve the production scale, stability, and latency reduction of service mesh, thereby promoting the management capabilities of user business applications under the service mesh, and providing high-performance, high-throughput, and highly flexible service management and Flow control capability.

 

eBPF is a revolutionary technology, originated in the Linux kernel, that can run sandbox programs in the operating system kernel. It is used to safely and efficiently extend the functionality of the kernel without changing the kernel source code or loading kernel modules. eBPF has a huge advantage, eBPF code can be plugged into an existing Linux kernel at runtime, similar to a Linux kernel module, but unlike a kernel module, it can be done in a safe and portable way. So eBPF is a key technology that enables the Linux kernel to keep up with the rapidly evolving cloud-native technology stack.

 

eBPF will be able to offload a growing number of functions currently performed by service mesh proxies to further reduce overhead and complexity. Among them, sockmap is a technology based on eBPF, which allows the data forwarding process between TCP connections to be offloaded to the kernel, which reduces context switching and data copy operations between user mode and kernel mode, and greatly optimizes the performance of data forwarding between sockets during TCP connections. Based on this technology, for local communication, the TCP/IP protocol stack can be bypassed and the message can be sent directly to the peer socket to improve performance.

 

In terms of observability, after bpftrace compiles the script into BPF bytecode, it interacts with eBPF through BCC, and can track TCP fine-grained life cycle, thread blocking, OOM, etc.

Target

在技术上, 服务网格数据面会更加深入地融合到整个云原生基础设施层,与基础网络、WASM VM、软硬件加速等技术交叉融合, 结合调度控制、可观测性、零信任安全等能力,将服务网格技术作为云原生基础系统必不可少的重要一环。

 

将数据面代理逻辑放入系统内核层,这样在拦截处理网络请求的同时在内核实现部分的数据解析及流量治理,可以减少出入内核的次数,并可以在内核层面对数据代理使用的计算资源进行高优先级的保证。

 

此外,可以基于网格中服务的QPS、协议、日志等内容、分析网格服务的具体应用类型和服务质量要求,并协调、推荐全链路上应用的网格规则,平衡单个服务质量与性能需求和网格整体性能需求。

 

Technically, the service mesh data plane will be more deeply integrated into the entire cloud-native infrastructure layer, cross-integrated with basic network, WASM VM, software and hardware acceleration and other technologies, combined with scheduling control, observability, zero-trust security and other capabilities, so that making service mesh technology as an essential part of cloud-native basic systems.

 

Put the data plane proxy logic into the system kernel layer, so that while intercepting and processing network requests, part of the data analysis and traffic management can be implemented in the kernel, which can reduce the number of times entering and exiting the kernel, and can control the computing resources used by the data proxy at the kernel level with high priority guarantee.  

 

In addition, based on the QPS, protocols, logs and other contents of services in the mesh, the specific application types and service quality requirements can be analyzed, and mesh rules applied on the entire stack can be coordinated and recommended to balance individual service quality and performance., requirements and overall performance requirements of the mesh.

Related Research Topics

  • 使用eBPF 代替iptables 实现流量劫持,同时使用sockmap 加速Sidecar 代理和应用程序间的网络通信,在一定程度上降低了请求时延和资源开销。
  • 使用eBPF 增强服务网格中的可观察性:提出一种使用通过eBPF 收集的内核数据来增强Envoy 的方法,以快速区分应用程序和基础设施级别的问题。

 

  • Using eBPF instead of iptables to achieve traffic hijacking, and using sockmap to speed up network communication between the sidecar proxy and the application, reduces the request latency and resource overhead to a certain extent.
  • Using eBPF to Enhance Observability in a Service Mesh: a way of enhancing Envoy with kernel data gathered through eBPF to quickly distinguish application and infrastructure level problems.

Scan QR code
关注Ali TechnologyWechat Account