前言

一些概念

慢查询: 最常见在 mysql 数据库查询预警中,通过配置可以记录耗时超过一定阈值的 查询 SQL;

慢服务:分布式服务中,网关层耗时超过一定阈值的,分布式服务拓扑链路;

特点:

  1. 慢服务可能不会影响平均 RT,甚至 95-p/99-p 都不一定能发现;
  2. 慢服务一定会影响部分用户的体验;
  3. 慢服务会导致端到端可用性降低,可能会触发服务器报警(大部分报警阈值是 99.95);

慢服务定位的困难点

  1. 分布式服务往往上下游较多,如果是偶发的,很难确定是哪个服务的问题;
  2. 存在所有服务都未超时,但是网关总体耗时超过客户端阈值,客户端主动断开的情况;(如:Nginx499 错误)
  3. 即使抓取到原始请求,事后往往也难以复现,与时间点有关联性;

定位慢服务的一些解决方案

全链路埋点

实现: 针对 RPC 框架、ORM 框架、HttpClient、Redis 客户端等各种组件做统一封装,内置埋点逻辑,并提供统一的存储、查询服务,

优点:

  1. 对业务方接入透明,可复用性高,业务方可以做一定程度的自定义埋点;
  2. 接入后,业务方可以针对各种查询条件(如:特定的用户、特定的接口、特定的耗时),读取调用拓扑图和关键路径分析;
  3. 方便迭代和优化;

缺点:

  1. 有一定的技术复杂度,需要推动公司层面统一技术框架;
  2. 存储成本非常大,全量存储会产生 T 级别的存储数据;
  3. 由于存储量的问题,查询速度往往比较慢,甚至由于采样率问题导致无法抓取关键调用;

基于规则的实时采样

实现: 基于一定的规则(按业务定义的慢服务阈值)和采样率,将实时的慢服务上下游链路打印出来;

优点:

  1. 实现比较轻量,容易维护,自定义化程度高;
  2. 资源点扩展性强,不必局限于 RPC 调用、DB 查询等通用服务;

缺点:

  1. 有代码侵入性,需要硬编码;
  2. 通用性差,较难推广;

国内商店的实现

采用的方案为 "基于规则的实时采样" , 针对商店的资源调用进行切面拦截,采样后打印日志到磁盘文件.

整体设计

流程设计

拦截器的实现


public class TraceInfoInterceptor extends ControllerInterceptorAdapter {
    private static Logger LOGGER = LoggerFactory.getLogger(TraceInfoInterceptor.class);

    public TraceInfoInterceptor() {
        setPriority(700);
    }

    /**
     * 覆盖这个方法,表示只有标注@TraceInfo 的控制器或方法才会被此拦截器拦截
     *
     * @return
     */
    @Override
    public Class<? extends Annotation> getRequiredAnnotationClass() {
        return TraceInfo.class;
    }


    @Override
    protected Object round(Invocation inv, InvocationChain chain) throws Exception {
        long startTime = System.currentTimeMillis();
        Object result = null;
        String rosePath = RoseUtils.getRosePath(inv);
        SlowLogConfig slowLogConfig = getSlowLogConfig(rosePath);
        try {
            SlowLogUtils.enableSlowLog(slowLogConfig);
            result = super.round(inv, chain);
            SlowLogUtils.log(rosePath, startTime, (int) (System.currentTimeMillis() - startTime), true);
        } catch (Exception e) {
            SlowLogUtils.log(rosePath, startTime, (int) (System.currentTimeMillis() - startTime), false);
            throw e;
        } finally {
            SlowLogUtils.releaseSlowLogContext();
        }
        return result;
    }

    private SlowLogConfig getSlowLogConfig(String token) {
        Map<String, SlowLogConfig> slowLogConfigFromCache = ConfigManager.getSlowLogConfigFromCache();
        if (slowLogConfigFromCache.containsKey(token)) {
            return slowLogConfigFromCache.get(token);
        }
        return slowLogConfigFromCache.get("default");
    }
}

线程池的包装

/**
 * servlet 容器在应用启动时初始化配置
 */
public class GrantServiceContextListener implements ServletContextListener {
    private static final Logger LOGGER = LoggerFactory.getLogger(GrantServiceContextListener.class);

    @Override
    public void contextInitialized(ServletContextEvent sce) {
        //注入各种线程上下文处理器
        Task.registerThreadContextHandler(() -> IdentifierInfoHolder.getIndentifierValue(), x -> IdentifierInfoHolder.setIndentifierValue(x), () -> IdentifierInfoHolder.release());
        // slowLog上下文透传
        Task.registerThreadContextHandler(() -> SlowLogUtils.getSlowLogContext(), x -> SlowLogUtils.setSlowLogContext(x), () -> SlowLogUtils.releaseSlowLogContext());
        Task.registerThreadContextHandler(() -> InvocationHolder.getInvocation(), x -> InvocationHolder.setInvocation(x), () -> InvocationHolder.release());
    }

    @Override
    public void contextDestroyed(ServletContextEvent sce) {

    }
}
//提交线程池任务方式一
 tasks.forEach(task -> threadPoolExecutor.submit(Task.wrap(task)));
 //提交线程池任务方式二
 CompletableFuture.runAsync(Task.wrap(() -> {
                    AdResultInfo adResultInfo = handler.requestAds(adsProcessContext);
                    adsProcessContext.setAdResultInfo(adResultInfo);
                    handler.handleAdsInfo(adsProcessContext);
                }), EXECUTOR_SERVICE);

资源调用切面拦截

 public static <T> T getProxyThriftClient(Class iface, Object target1, Object target2) {
        Object newProxyInstance = Proxy.newProxyInstance(getAppstoreClassLoader(), new Class[]{iface}, new AppStoreRpcInvocationHandler(target1, target2));
        return (T) newProxyInstance;
    }

    private static ClassLoader getAppstoreClassLoader() {
        ClassLoader contextClassLoader = Thread.currentThread().getContextClassLoader();
        if (contextClassLoader != null) {
            return contextClassLoader;
        }
        return ThriftClientManager.class.getClassLoader();
    }

    private static class AppStoreRpcInvocationHandler implements InvocationHandler {
        // 旧服务
        private Object target1;
        // 新服务
        private Object target2;

        public AppStoreRpcInvocationHandler(Object target1, Object target2) {
            this.target1 = target1;
            this.target2 = target2;
        }

        public AppStoreRpcInvocationHandler(Object target1) {
            this.target1 = target1;
            this.target2 = null;
        }

        @Override
        public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
            long startTime = System.currentTimeMillis();
            Throwable throwable = null;
            Object result = null;
            //最终需要invoke的目标方法和类
            Object invokeTarget = target1;
            Method invokeMethod = method;
            try {
                //尝试替换 invokeTarget
                if (switchService() && target2 != null) {
                    invokeMethod = target2.getClass().getMethod(method.getName(), method.getParameterTypes());
                    invokeTarget = target2;
                }
            } catch (Exception e) {
                //替换失败会再次尝试使用target1
                LOGGER.error("", e);
            }
            try {
                result = invokeMethod.invoke(invokeTarget, args);
            } catch (Exception e) {
                throwable = e;
            }

            long endTime = System.currentTimeMillis();
            SlowLogUtils.RpcInvokeInfo rpcInvokeInfo = new SlowLogUtils.RpcInvokeInfo();
            String rpcToken = generateRpcToken(invokeMethod);
            rpcInvokeInfo.setMethod(rpcToken);
            rpcInvokeInfo.setStartTime(startTime);
            rpcInvokeInfo.setEndTime(endTime);
            rpcInvokeInfo.setCostTime(endTime - startTime);
            rpcInvokeInfo.setSuccess(throwable == null);
            //统计slowLog 参见wiki:https://wiki.n.miui.com/pages/viewpage.action?pageId=403240360
            SlowLogUtils.addRpcInvokeInfo(rpcInvokeInfo);
            if (throwable != null) {
                throw throwable;
            }
            return result;
        }

        private String generateRpcToken(Method invokedMethod) {
            String className = invokedMethod.getDeclaringClass().getName();
            String methodName = invokedMethod.getName();
            if (className.startsWith("com.sun.proxy")) {
                try {
                    Class<?> anInterface = invokedMethod.getDeclaringClass().getInterfaces()[0];
                    Method realMethod = anInterface.getMethod(invokedMethod.getName(), invokedMethod.getParameterTypes());
                    if (realMethod != null) {
                        className = realMethod.getDeclaringClass().getName();
                    }
                } catch (Exception e) {
                    LOGGER.error("", e);
                }
            }
            return className + "." + methodName;
        }


        /***
         * 服务切换控制
         * @return
         */
        private boolean switchService() {
            return true;
        }
    }

线上效果

api 慢服务配置:

[
  {
    "costTimeThreshold": 300,
    "rpcCostTimeThreshold": 10,
    "logRate": 1,
    "token": "/apm/featuredV2"
  }
]
2020-11-30 19:42:43 slow method path: /apm/featuredV2 , costTime: 569ms , success: true  slowLogContext:
{
    "rpcInvokeInfos":[
        {
            "costTime":32,
            "endTime":39,
            "method":"com.xiaomi.misearch.appstore.recommend.thrift.RecommendService$Iface.categoryRec",
            "startTime":7,
            "success":true
        },
        {
            "costTime":207,
            "endTime":276,
            "method":"com.xiaomi.misearch.appstore.recommend.thrift.RecommendService$Iface.categoryRec",
            "startTime":69,
            "success":true
        },
        {
            "costTime":80,
            "endTime":365,
            "method":"com.xiaomi.appstore.thrift.service.AppStoreContentService$Iface.getAppsetListByAppsetIds",
            "startTime":285,
            "success":true
        },
        {
            "costTime":141,
            "endTime":507,
            "method":"com.xiaomi.miui.ad.thrift.service.MiuiAdQueryService$Iface.getAdsV3",
            "startTime":366,
            "success":true
        }
    ],
    "slowLogConfig":{
        "costTimeThreshold":300,
        "logRate":1,
        "rpcCostTimeThreshold":10,
        "token":"/apm/featuredV2"
    }
# 统计慢方法排行
cat slow-8085.log  | grep "method\"" | sort -b | uniq -c | sort -nk1 -r
  14646             "method":"com.xiaomi.misearch.appstore.recommend.thrift.RecommendService$Iface.categoryRec",
  12108             "method":"com.xiaomi.miui.ad.thrift.service.MiuiAdQueryService$Iface.getAdsV3",
   7328             "method":"com.xiaomi.appstore.thrift.service.AppStoreContentService$Iface.getAppsetListByAppsetIds",
   2714             "method":"com.xiaomi.appstore.thrift.service.AppStoreContentService$Iface.listAppByIds",
   2401             "method":"com.xiaomi.appstore.thrift.service.AppStoreContentService$Iface.getFeedPageResult",
   1627             "method":"game-personal",
    159             "method":"com.xiaomi.appstore.thrift.service.AppStoreContentService$Iface.getCategoryById",
     35             "method":"com.xiaomi.market.usercomment.thrift.service.UserCommentService$Iface.getCommentScore",
     12             "method":"com.xiaomi.appstore.thrift.service.AppStoreContentService$Iface.getCmsListContent",

总结

发现问题:排查线上问题时候遇到的各种偶发问题、RT 毛刺问题,抽象总结问题;

分析问题:调研分析业界的解决方案,因地适宜,最终采用了"基于规则的实时采样"轻量级方案;

解决问题:"大胆开发,小心上线",验证效果,实践与理论结合;

results matching ""

    No results matching ""