Repository: huangfox/dpkb
Branch: main
Commit: e62b935e9354
Files: 19
Total size: 35.5 KB
Directory structure:
gitextract_e8gsxs7x/
├── README.md
└── columns/
├── doris/
│ ├── Doris全面解析.md
│ └── Doris最佳实践.md
├── flink/
│ ├── Apache Flink 漫谈系列.md
│ ├── Flink 相关论文.md
│ ├── Flink实战系列.md
│ ├── Flink开源项目汇总.md
│ ├── Flink架构、源码分析专栏.md
│ ├── Flink进阶教程.md
│ └── Flink零基础入门.md
├── hive/
│ └── hive教程.md
├── kudu/
│ ├── Kudu原理论文.md
│ └── 网易云Kudu技术文章.md
├── opensource/
│ └── 数仓相关开源项目汇总.md
├── presto/
│ ├── Presto最佳实践、调优、踩坑专栏.md
│ ├── Presto架构、源码分析专栏.md
│ └── Presto资料汇总、会议资讯专栏.md
├── spark/
│ └── Apache Spark的设计与实现.md
└── starrocks/
└── StarRocks技术内幕.md
================================================
FILE CONTENTS
================================================
================================================
FILE: README.md
================================================
# DPKB
大数据相关知识库,主要包括:
* 数据存储层、数据库(HDFS、Hive、HBase、Kudu、Doris、StarRocks、ClickHouse、TiDB等)
* 数据处理层、OLAP引擎(Spark、Flink、Presto、Trino等)
* 数据湖(IceBerg、Hudi、Delta等)
* 大数据开发、应用(主要包括ETL、调度、数仓、数据应用等,例如Seatunnel、Dolphinscheduler等)
* 数据治理(元数据管理、数据模型、数据标准、数据质量、数据安全等)
持续更新中(2024-12)
## 一、数据存储层、数据库(HDFS、Hive、HBase、Kudu、Doris、StarRocks、ClickHouse、TiDB等)
### ▶ HDFS
### ▶ Yarn
#### 1)原理
- [Hadoop Yarn 一文搞懂 Yarn架构原理和工作机制](https://www.cnblogs.com/liangzilx/p/14837562.html)
### ▶ Hive
#### 1)官网、社区、博客
- [Hive 官网](https://hive.apache.org/)
#### 2)专栏
- [Hive 教程](columns/hive/hive教程.md)
#### 3)大厂实践
- [HiveCube 在有赞的实践](https://tech.youzan.com/cube/) 2019-11
- [Hive Metastore Federation 在滴滴的实践](https://blog.didiyun.com/index.php/2019/03/25/hive-metastore-federation/) 2019-03
### ▶ HBase
#### 1)官网、社区、博客
- [HBase 官网](https://hbase.apache.org/)
- [hbasefly](http://hbasefly.com/)
#### 2)专栏
#### 3)大厂实践
#### 4)其他
- [HBase Bulkload 实践探讨](https://tech.youzan.com/hbase-bulkloadshi-practice/) 2019-12
### ▶ Kudu
#### 1)官网、社区、博客
- [Kudu 官网](https://kudu.apache.org/)
#### 2)专栏
- [Kudu 原理 论文](columns/kudu/Kudu原理论文.md)
- [网易云Kudu技术专栏](columns/kudu/网易云Kudu技术文章.md)
#### 3)大厂实践
- [Apache Kudu 在网易的实践](https://www.infoq.cn/article/kgwyqb5wer5wl8cquweq) 2021-08
- [Apache Kudu 在网易实时数仓的实践](https://www.infoq.cn/article/QETxjyIu5tAJTZ9ksMdu) 2020-02
- [Kudu架构介绍及其在小米的应用实践](https://www.modb.pro/db/119708) 2017-06
#### 4) 其他
- [我是如何成为Apache Kudu committer & PMC 的?](https://cloud.tencent.com/developer/article/1450749) 2019-06
### ▶ Doris
#### 1)官网、社区、博客
- [Doris 官网](https://doris.apache.org/)
- [Doris github](https://github.com/apache/doris)
- [Doris 论坛](https://github.com/apache/incubator-doris/discussions)
#### 2)专栏
- [Doris全面解析](columns/doris/Doris全面解析.md)
- [Doris最佳实践](columns/doris/Doris最佳实践.md)
#### 3)案例实践
- [Apache Doris在美团外卖数仓中的应用实践](https://tech.meituan.com/2020/04/09/doris-in-meituan-waimai.html) 2020-04
- [Apache Doris 在韵达物流领域的应用实践](https://mp.weixin.qq.com/s/Z_PhWk92ctZ7slz4SrVZ9Q) 2021-07
- [Apache Doris 在蜀海供应链的实践](https://mp.weixin.qq.com/s/SHuE-KCsIyh6jfo0DqLD6w) 2021-07
- [京东物流基于 Doris 的亿级数据自助探索应用](https://mp.weixin.qq.com/s/qVFa40yMg0_N9Lsb10ACQA) 2021-07
- [Doris on ES在快手商业化的最佳实践](https://mp.weixin.qq.com/s/5Pc5ewVFWPgauG4hNLH9xw) 2021-08
- [基于Doris的有道精品课数据中台建设实践](https://mp.weixin.qq.com/s/Gz-au9CHJ4lHrs5MkzeAJg) 2020-12
- [美团外卖实时数仓建设实践](https://mp.weixin.qq.com/s/-JPWqa_-at7F5hZ0zekVSQ) 2020-10
- [Doris在作业帮实时数仓中的应用&实践](https://mp.weixin.qq.com/s/hjbMM8CbElO04VLN5cfJtQ) 2020-09
- [基于Apache Doris的小米增长分析平台实践](https://mp.weixin.qq.com/s/WeNAItPJ4b7fsqW4kf0dSA) 2020-08
- [Apache Doris在京东双十一大促中的实践](https://mp.weixin.qq.com/s/8XnwJXm4kzq56SvElwL6kA) 2020-03
- [Apache Doris 在百度商业大规模微服务全链路监控的实践](https://mp.weixin.qq.com/s/k7CcCdHPTK1ZTDs_qKgh5w) 2020-02
### ▶ StarRocks
#### 1)官网、社区、博客
- [StarRocks](https://www.starrocks.com/zh-CN/index)
- [StarRocks文档](https://docs.starrocks.com/zh-cn/main/introduction/StarRocks_intro)
- [编程小梦 康凯森](https://blog.bcmeng.com/)
#### 2) 专栏
- [StarRocks技术内幕](columns/starrocks/StarRocks技术内幕.md)
### ▶ ClickHouse
#### 1)官网、社区、博客
- [ClickHouse 官网](https://clickhouse.com/)
#### 2)专栏
#### 3)大厂实践
- [ClickHouse 在有赞的实践之路](https://tech.youzan.com/clickhouse-zai-you-zan-de-shi-jian-zhi-lu/) 2021-01
#### 4)其他
## 二、数据处理层、OLAP引擎(Spark、Flink、Presto、Trino等)
### ▶ Spark
#### 1)官网、社区、博客
- [Spark 官网](https://spark.apache.org/)
#### 2)专栏
- [Apache Spark 的设计与实现](columns/spark/Apache%20Spark的设计与实现.md)
#### 3)大厂实践
- [SparkSQL 在有赞的实践](https://tech.youzan.com/sparksql-in-youzan/) 2019-01
- [SparkSQL 在有赞大数据的实践(二)](https://tech.youzan.com/sparksql-in-youzan-2/) 2020-01
### ▶ Flink
#### 1)官网、社区、博客
- [Flink 官网](https://flink.apache.org/)
- [Flink Confluence](https://cwiki.apache.org/confluence/display/FLINK/)
- [Flink Blog](https://flink.apache.org/blog/)
- [Ververica Blog](https://www.ververica.com/blog?hsLang=en)
- [Ververica 中文](https://ververica.cn/developers-resources/)
- [Flink 知识图谱](https://ververica.cn/wp-content/uploads/2020/03/Apache-Flink-Stateful-Computations-over-Data-Streams.pdf)
- [Jark's Blog - WuChong - 云邪](http://wuchong.me/)
#### 2)专栏
- [Flink 架构、源码分析专栏](columns/flink/Flink架构、源码分析专栏.md)
- [Flink 实战系列](columns/flink/Flink实战系列.md)
- [Flink 开源项目汇总](columns/flink/Flink开源项目汇总.md)
##### 教程
- [Flink SQL Cookbook - Ververica](https://github.com/ververica/flink-sql-cookbook/)
- [Flink 零基础入门](columns/flink/Flink零基础入门.md)
- [Flink 进阶教程](columns/flink/Flink进阶教程.md)
- [Apache Flink 漫谈系列](columns/flink/Apache%20Flink%20漫谈系列.md)
- [Flink 相关论文](columns/flink/Flink%20相关论文.md)
#### 3)大厂实践
- [flink-forward-asia-hackathon-2021](https://github.com/flink-china/flink-forward-asia-hackathon-2021/issues)
### ▶ Presto、Trino
#### 1)官网、社区、博客
- [PrestoDB 官网](https://prestodb.io/)
- [Trino 官网](https://trino.io/) 原PrestoSql
- [Google Presto Group](https://groups.google.com/g/presto-users)
- [Presto 知乎专栏](https://www.zhihu.com/column/presto-cn)
- [若飞-技术博客](http://armsword.com/archives/)
#### 2)专栏
- [Presto 架构、源码分析专栏](columns/presto/Presto架构、源码分析专栏.md)
- [Presto 最佳实践、调优、踩坑专栏](columns/presto/Presto最佳实践、调优、踩坑专栏.md)
- [Presto 资料汇总、会议资讯专栏](columns/presto/Presto资料汇总、会议资讯专栏.md)
#### 3)大厂实践
- [Presto 在车好多的实践](https://mp.weixin.qq.com/s/Bmqv54sVZgTqQ82I_RfmsA) 2020-12
- [Presto 在滴滴的探索与实践](https://zhuanlan.zhihu.com/p/266162270) 2020-10
- [Presto 在有赞的实践之路](https://tech.youzan.com/presto-zai-you-zan-de-shi-jian-zhi-lu/) 2020-04
- [PrestoCon 2020:云原生数据湖分析DLA的Presto实践](https://zhuanlan.zhihu.com/p/260784762) 2020-03
- [携程 Presto 技术演进之路](https://zhuanlan.zhihu.com/p/41538472) 2018-08
- [Presto 实现原理和美团的使用实践](https://tech.meituan.com/2014/06/16/presto.html) 2014-06
- [Presto 高性能引擎在美图的实践](https://zhuanlan.zhihu.com/p/408957032) 2021-09
## 三、数据湖(IceBerg、Hudi、Delta等)
- [一文看懂:什么是数据库、数据湖、数据仓库、湖仓一体、智能湖仓?](https://www.smartcity.team/consultingskills/experience/shujukuyushujuhu/#comments) 2021-08
### ▶ Iceberg
#### 1)官网、社区、博客
- [Iceberg 官网](https://iceberg.apache.org/)
#### 2)应用
- [数据湖 Iceberg | 实时数据仓库的发展、架构和趋势](https://mp.weixin.qq.com/s?__biz=MzIwNTUxNTI1Ng==&mid=2247485623&idx=1&sn=9f03a36dbfc06c712b6132faabaa1dfd&chksm=972ef820a05971360311fd69c686e4b420222cfa639a1bcb5648bece4c3d886ae8f981712d8c&scene=21#wechat_redirect) 2021-03
- [数据湖 Iceberg | Apache Iceberg 快速入门](https://mp.weixin.qq.com/s?__biz=MzIwNTUxNTI1Ng==&mid=2247485637&idx=1&sn=0489f233e3bda2bcef221c9532bb001e&chksm=972ef852a0597144538b7807948443a27e58f99ba33d17a7bcb12ccb8b382fd1d712d6e80cbc&cur_album_id=1746684202856579076&scene=190#rd) 2021-03
- [数据湖 Iceberg | 如何正确使用 Iceberg](https://mp.weixin.qq.com/s?__biz=MzIwNTUxNTI1Ng==&mid=2247485644&idx=1&sn=b2194d8f3c1e7cf7e8e8d9296b9025e2&chksm=972ef85ba059714dc69472e3860497389f2ca4503d2cddeedd348695b5c314da49aad0278978&cur_album_id=1746684202856579076&scene=190#rd) 2021-04
- [数据湖 Iceberg | 在网易云音乐的实践](https://mp.weixin.qq.com/s?__biz=MzIwNTUxNTI1Ng==&mid=2247485718&idx=1&sn=34347ac54e97877e4401ad37f1d15577&chksm=972ef981a059709724b7abab56786ef047a68f31fd829031d2214fa4994b9ec0f1b04e25318c&cur_album_id=1746684202856579076&scene=190#rd) 2021-04
### ▶ Hudi
#### 1)官网、社区、博客
- [Hudi 官网](https://hudi.apache.org/)
#### 2)应用
- [Flink CDC + Hudi + Hive + Presto 构建实时数据湖最佳实践](https://mp.weixin.qq.com/s/079VeDeIM_MQPyiiDX2l_w)
### ▶ Delta
## 四、大数据开发、应用(主要包括ETL、调度、数仓、数据应用等,例如Seatunnel、Dolphinscheduler等)
### ▶ Seatunnel
### ▶ DolphinScheduler
### ▶ 大数据架构
- [SQL on Hadoop 在快手大数据平台的实践与优化](https://www.infoq.cn/article/BN9cJjg1t-QSWE6fqkoR) 2019-06
- [携程机票大数据架构最佳实践](https://dbaplus.cn/news-73-1420-1.html) 2017-08
- [火山引擎DataLeap一站式数据治理解决方案及平台架构](https://www.cnblogs.com/bytedata/p/17745908.html) 2023-10
### ▶ 数仓相关
- [有赞数据仓库实践之路](https://tech.youzan.com/dw-in-youzan/) 2020-03
- [OneData 建设探索之路:SaaS 收银运营数仓建设](https://tech.meituan.com/2019/10/17/meituan-saas-data-warehouse.html) 2019-10
- [面向AI技术的工程架构实践 | 贝壳一站式大数据开发平台实践](https://www.infoq.cn/article/mmnwzdlcyjg83qm0tgqm) 2020-11
### ▶ 报表平台
- [有赞 BI 平台实现原理](https://tech.youzan.com/principle-on-bi-platform/) 2021-01
## 五、数据治理(元数据管理、数据指标、数据标准、数据质量、数据安全等)
### ▶ 数据治理
- [美团配送数据治理实践](https://tech.meituan.com/2020/03/12/delivery-data-governance.html) 2020-03
- [全链路数据治理在网易严选的实践](https://www.infoq.cn/article/FOV6aEWRGNOfhD91YVcr) 2020-10
- [数据资产、数据治理 - 有赞](https://tech.youzan.com/shu-ju-zi-chan-zan-zhi-zhi-li/) 2019-11
- [美团酒旅起源数据治理平台的建设与实践](https://tech.meituan.com/2018/12/27/onedata-origin.html) 2018-12
- [滴滴数据仓库指标体系建设实践](https://mp.weixin.qq.com/s/-pLpLD_HMiasyyRxo5oTRQ) 2020-08
- [有赞指标库实践](https://tech.youzan.com/you-zan-zhi-biao-ku-shi-jian/) 2020-03
- [浅谈有赞大数据安全体系](https://tech.youzan.com/you-zan-da-shu-ju-an-quan-ti-xi-jian-she-shi-jian/) 2021-01
### ▶ 元数据管理
- [字节跳动构建Data Catalog数据目录系统的实践](https://www.cnblogs.com/bytedata/p/16189474.html) 2022-04
- [有赞数据仓库元数据系统实践](https://tech.youzan.com/youzan-metadata/) 2018-08
- [饿了么元数据管理实践之路](https://dbaplus.cn/news-73-2143-1.html) 2018-07
- [数据治理方案技术调研 Atlas VS Datahub VS Amundsen](https://cloud.tencent.com/developer/article/1746714) 2020-11
- [数据资产治理-元数据采集那点事 - 有赞](https://tech.youzan.com/zi-chan-zhi-li-yuan-shu-ju-cai-ji-na-dian-shi/) 2020-12
- [来看看字节跳动内部的数据血缘用例与设计](https://segmentfault.com/a/1190000041452770) 2022-02
- [携程数据血缘构建及应用](https://mp.weixin.qq.com/s/LGK3YPZCe6oPTf48QaAIqA) 2021-09
- [Datahub](https://datahubproject.io/) A Metadata Platform for the Modern Data Stack
## 六、机器学习、AI
### ▶ 机器学习平台
- [机器学习平台建设指南](https://mp.weixin.qq.com/s/HEg_6Gly2WMrcPD5Ao2n6g) 2021-04
- [一站式机器学习平台建设实践](https://mp.weixin.qq.com/s/ZDRD0vAxkSqe4UeXi9avKQ) 2020-02
- [汽车之家机器学习平台的架构与实践](https://blog.csdn.net/hellozhxy/article/details/107210015) 2020-07
- [微博推荐算法实践与机器学习平台演进](https://blog.csdn.net/m0_37586850/article/details/116465255) 2021-05
- [爱奇艺机器学习平台的建设实践](https://mp.weixin.qq.com/s/Np4w7RC2JFlB7ZGIduu71w) 2020-11
- [爱奇艺一站式机器学习平台Deepthought的建设与初探](https://mp.weixin.qq.com/s?__biz=MzI0MjczMjM2NA==&mid=2247487206&idx=1&sn=c8db1e12378376722a1521f409149d44&chksm=e97692c5de011bd3f1b42a8112cd04c24907cb101ac5474b0054c95941ff5c4769a42d496f3a&scene=21#wechat_redirect) 2020-06
- [一站式机器学习平台在 vivo AI 的实践](https://www.infoq.cn/article/THlkStomYLRgXL2hzm8w) 2020-02
- [再见,Yarn!滴滴机器学习平台架构演进](https://mp.weixin.qq.com/s/iTfHv8EFx4O4G1sNxsuMkg) 2019-03
- [网易严选机器学习平台建设实践](https://www.6aiq.com/article/1661745581086) 2022
- [Sunfish-有赞智能平台实践](https://tech.youzan.com/sunfish/) 2020-06
- [同程-利用已有的大数据技术,如何构建机器学习平台](https://www.infoq.cn/news/build-machine-learning-platform-bigdata) 2017-11
## 七、LLM应用
### ▶ Text2SQL
- [NL2SQL基础系列(1):业界顶尖排行榜、权威测评数据集及LLM大模型(Spider vs BIRD)全面对比优劣分析](https://blog.csdn.net/sinat_39620217/article/details/137603846)
- [NL2SQL基础系列(2):主流大模型与微调方法精选集,Text2SQL经典算法技术回顾七年发展脉络梳理](https://blog.csdn.net/sinat_39620217/article/details/137603958)
- [NL2SQL进阶系列(1):DB-GPT-Hub、SQLcoder、Text2SQL开源应用实践详解](https://blog.csdn.net/sinat_39620217/article/details/137674671)
## 八、资源汇总
### ▶ 大厂技术博客
- [美团技术团队](https://tech.meituan.com/)
- [有赞技术团队](https://tech.youzan.com/)
- [滴滴云博客](https://blog.didiyun.com/)
### ▶ 大数据相关网站
- [dbaplus](https://dbaplus.cn/)
### ▶ 相关开源项目
- [数仓相关开源项目汇总](columns/opensource/数仓相关开源项目汇总.md)
### ▶ 相关论文
- [raft 中文翻译](https://github.com/maemual/raft-zh_cn/blob/master/raft-zh_cn.md)
================================================
FILE: columns/doris/Doris全面解析.md
================================================
# Doris全面解析
## 原理
- [Apache Doris : 一个开源 MPP 数据库的架构与实践](https://www.jianshu.com/p/d3742af8ecce)
## 存储相关
- [存储层设计介绍1——存储结构设计解析](https://mp.weixin.qq.com/s/aJ3FwDI6KprYYUwXzhl_-A) 2020-07
- [存储层设计介绍2——写入流程、删除流程分析](https://mp.weixin.qq.com/s/xl4ePcsSVPPNQDGBw-KoKA) 2020-07
- [存储层设计介绍3——读取流程、Compaction流程分析](https://mp.weixin.qq.com/s/U9w3VxCKhTk_3Sglo9J-aA) 2020-08
- [Doris Compaction机制解析](https://mp.weixin.qq.com/s/5D1gAOEiFWM7N6KPwqHHdw) 2021-02
- [Apache Doris Parquet文件读取的设计与实现](https://mp.weixin.qq.com/s/5D6G_kvl9TzYCMIgynhERA) 2019-08
- [Doris核心功能介绍——数据模型和物化视图](https://mp.weixin.qq.com/s/eRUg1du8AQxLvqYjJ621fA) 2020-07
## 计算相关
- [Apache Doris 查询原理](https://blog.bcmeng.com/post/apache-doris-query.html) 2020-03
- [Doris SQL 原理解析](https://mp.weixin.qq.com/s/v1jI1MxEHPT5czCWd0kRxw) 2021-01
- [Doris Stream Load原理解析](https://mp.weixin.qq.com/s/NUSHwAUsFskSXG5R0mw8kg) 2021-06
- [Apache Doris 索引机制解析](https://mp.weixin.qq.com/s/KdCdXb9Z3MdUZ5S0RV726Q) 2021-09
- [Spark Doris Sink的设计和实现](https://mp.weixin.qq.com/s/uoPLfFBv9Vt2gg9HEriR0Q) 2019-08
## 其他
- [Doris基于Hive表的全局字典设计与实现](https://mp.weixin.qq.com/s/YlZnlMTTI8xhULmk1y-N6w) 2020-08
================================================
FILE: columns/doris/Doris最佳实践.md
================================================
# Doris最佳实践
## 调优
- [Compaction调优(1)](https://mp.weixin.qq.com/s/Kv71HomwNioHQDz8NUec1A) 2021-06
- [Compaction调优(2)](https://mp.weixin.qq.com/s/mJrxpvYIoE9rgP9Hvo1Dnw) 2021-06
- [Compaction调优(3)](https://mp.weixin.qq.com/s/cZmXEsNPeRMLHp379kc2aA) 2021-06
- [Apache Doris Join 实现与调优实践](https://mp.weixin.qq.com/s/pukjERSOW-D-BM4z1G9JlA) 2021-09
## 业务实现
- [Apache Doris 基于 Bitmap的精确去重和用户行为分析](https://mp.weixin.qq.com/s/e0IrXgkinpeEDKi0etfGKA) 2020-01
- [Doris在用户画像人群业务的应用](https://mp.weixin.qq.com/s/HGyIgqCIIXfeJtNdKbj-fQ) 2020-10
## 组件结合
- [基于 Iceberg 拓展 Doris 数据湖能力的实践](https://mp.weixin.qq.com/s/Vgo2kWED8oxg45x6zumEYQ) 2021-07
- [Flink 消费 Kafka 实时写入 Apache Doris(KFD)](https://mp.weixin.qq.com/s/nUeHwFBQs50EvPukqnrinQ) 2021-09
- [Spark Doris Connector的最佳实践](https://mp.weixin.qq.com/s/c8zE7ymv6jC1WTlV44dldQ) 2020-04
- [ProxySQL实现Doris FE高可用](https://mp.weixin.qq.com/s/XHgtIzekxkiGCjqcRbqndw) 2020-08
## 其他
- [Apache Doris和ClickHouse的深度分析](https://mp.weixin.qq.com/s/fyVSRB3wxmsZUx4kY1eQRQ) 2021-10
================================================
FILE: columns/flink/Apache Flink 漫谈系列.md
================================================
# Apache Flink 漫谈系列 (阿里云实时计算Flink)
## 教程
- [Apache Flink 漫谈系列(01) - 序](https://developer.aliyun.com/article/666043?spm=a2c6h.14164896.0.0.541b7cb2dQp6jL)
- [Apache Flink 漫谈系列(02) - 概述](https://developer.aliyun.com/article/666052?spm=a2c6h.14164896.0.0.541b7cb2dQp6jL)
- [Apache Flink 漫谈系列(03) - Watermark](https://developer.aliyun.com/article/666056?spm=a2c6h.14164896.0.0.541b7cb2dQp6jL)
- [Apache Flink 漫谈系列(04) - State](https://developer.aliyun.com/article/667562?spm=a2c6h.14164896.0.0.541b7cb2dQp6jL)
- [Apache Flink 漫谈系列(05) - Fault Tolerance](https://developer.aliyun.com/article/667564?spm=a2c6h.14164896.0.0.541b7cb2dQp6jL)
- [Apache Flink 漫谈系列(06) - 流表对偶(duality)性](https://developer.aliyun.com/article/667566?spm=a2c6h.14164896.0.0.59817cb20Sk3GI)
- [Apache Flink 漫谈系列(07) - 持续查询(Continuous Queries)](https://developer.aliyun.com/article/667700?spm=a2c6h.14164896.0.0.541b7cb2dQp6jL)
- [Apache Flink 漫谈系列(08) - SQL概览](https://developer.aliyun.com/article/670202?spm=a2c6h.14164896.0.0.59817cb20Sk3GI)
- [Apache Flink 漫谈系列(09) - JOIN 算子](https://developer.aliyun.com/article/672760?spm=a2c6h.14164896.0.0.59817cb20Sk3GI)
- [Apache Flink 漫谈系列(10) - JOIN LATERAL](https://developer.aliyun.com/article/674345?spm=a2c6h.14164896.0.0.59817cb20Sk3GI)
- [Apache Flink 漫谈系列(11) - Temporal Table JOIN](https://developer.aliyun.com/article/679659?spm=a2c6h.14164896.0.0.59817cb20Sk3GI)
- [Apache Flink 漫谈系列(12) - Time Interval(Time-windowed) JOIN](https://developer.aliyun.com/article/683681?spm=a2c6h.14164896.0.0.541b7cb2dQp6jL)
- [Apache Flink 漫谈系列(13) - Table API 概述](https://developer.aliyun.com/article/685085?spm=a2c6h.14164896.0.0.59817cb20Sk3GI)
- [Apache Flink 漫谈系列(14) - DataStream Connectors之Kafka](https://developer.aliyun.com/article/686809?spm=a2c6h.14164896.0.0.541b7cb2dQp6jL)
## 资源
- [阿里云实时计算Flink](https://developer.aliyun.com/group/sc?spm=a2c6h.12873639.0.0.e12d59b2IvG4B2#/?_k=9flh5j)
================================================
FILE: columns/flink/Flink 相关论文.md
================================================
# Flink 相关论文
- [Distributed Snapshots: Determining Global States of Distributed Systems ](https://www.microsoft.com/en-us/research/uploads/prod/2016/12/Determining-Global-States-of-a-Distributed-System.pdf?ranMID=24542&ranEAID=J84DHJLQkR4&ranSiteID=J84DHJLQkR4-mVoVymFnAblBx3zwyf98Pw&epi=J84DHJLQkR4-mVoVymFnAblBx3zwyf98Pw&irgwc=1&OCID=AID2000142_aff_7593_1243925&tduid=%28ir__1hs2uuow6wkfq3oxkk0sohzzwm2xpc33lxd0o6g200%29%287593%29%281243925%29%28J84DHJLQkR4-mVoVymFnAblBx3zwyf98Pw%29%28%29&irclickid=_1hs2uuow6wkfq3oxkk0sohzzwm2xpc33lxd0o6g200)
================================================
FILE: columns/flink/Flink实战系列.md
================================================
# Flink实战系列
- [从零构建Flink SQL计算平台 - 1平台搭建概述](https://www.cnblogs.com/pyx0/p/12348114.html)
- [从零构建Flink SQL计算平台 - 2实现作业提交](https://www.cnblogs.com/pyx0/p/12387509.html)
- [从零构建Flink SQL计算平台 - 3实现校验和调试](https://www.cnblogs.com/pyx0/p/12441367.html)
- [网易游戏基于 Flink 的流式 ETL 建设](http://www.whitewood.me/2020/12/20/%E7%BD%91%E6%98%93%E6%B8%B8%E6%88%8F%E5%9F%BA%E4%BA%8E-Flink-%E7%9A%84%E6%B5%81%E5%BC%8F-ETL-%E5%BB%BA%E8%AE%BE/) 2020-12
================================================
FILE: columns/flink/Flink开源项目汇总.md
================================================
# Flink开源项目汇总
- [flink-sql-gateway](https://github.com/ververica/flink-sql-gateway#readme)
- [flink-jdbc-driver](https://github.com/ververica/flink-jdbc-driver)
- [flinkStreamSQL](https://github.com/DTStack/flinkStreamSQL)
- [flinkx](https://github.com/DTStack/flinkx)
- [waterdrop](https://github.com/InterestingLab/waterdrop)
- [streamx](https://github.com/streamxhub/streamx)
- [flink-streaming-platform-web](https://github.com/zhp8341/flink-streaming-platform-web)
- [dlink](https://github.com/DataLinkDC/dlink)
- [plink](https://github.com/hairless/plink)
================================================
FILE: columns/flink/Flink架构、源码分析专栏.md
================================================
# Flink架构、源码分析专栏
## 流式计算原理
- [Streaming 101: The world beyond batch](https://www.oreilly.com/radar/the-world-beyond-batch-streaming-101/)
- [Streaming 102: The world beyond batch](https://www.oreilly.com/radar/the-world-beyond-batch-streaming-102/)
## DataSet,DataStream
## Table,SQL
## Time,Watermark
- [Flink Watermark 机制浅析](http://www.whitewood.me/2018/06/01/Flink-Watermark-%E6%9C%BA%E5%88%B6%E6%B5%85%E6%9E%90/) 2018-06
## State
- [Flink State 最佳实践](https://ververica.cn/developers/flink-state-best-practices/) 2020-04
## Checkpoint,Savepoint
- 关键词:Barrier非对齐
- [分布式快照算法: Chandy-Lamport 算法](https://zhuanlan.zhihu.com/p/53482103) 2020-11
- [Flink Checkpoint 原理流程以及常见失败原因分析](https://tech.youzan.com/flink_checkpoint_mechanism/) 2019-12
- [Flink 轻量级异步快照 ABS 实现原理](http://www.whitewood.me/2018/05/13/Flink-%E8%BD%BB%E9%87%8F%E7%BA%A7%E5%BC%82%E6%AD%A5%E5%BF%AB%E7%85%A7-ABS-%E5%AE%9E%E7%8E%B0%E5%8E%9F%E7%90%86/) 2018-05
- [Flink Checkpoint/Savepoint 差异](http://www.whitewood.me/2018/09/06/Flink-Checkpoint-Savepoint-%E5%B7%AE%E5%BC%82/) 2018-09
## Operators
### Windows
### Joining
### ProcessFunction
## Connector
- [漫谈 Flink Source 接口重构](http://www.whitewood.me/2020/02/11/%E6%BC%AB%E8%B0%88-Flink-Source-%E6%8E%A5%E5%8F%A3%E9%87%8D%E6%9E%84/) 2020-02
- [Flink JDBC Connector:Flink 与数据库集成最佳实践](https://developer.aliyun.com/article/776069)
## Flink On YARN
- [Flink on YARN(上):一张图轻松掌握基础架构与启动流程](https://developer.aliyun.com/article/719262)
- [Flink on YARN(下):常见问题与排查思路](https://developer.aliyun.com/article/719703)
================================================
FILE: columns/flink/Flink进阶教程.md
================================================
# Flink进阶教程
时间:2019
来源:Ververica中文社区
- [Apache Flink 进阶教程(一):Runtime 核心机制剖析](https://ververica.cn/developers/advanced-tutorial-1-analysis-of-the-core-mechanism-of-runtime/)
- [Apache Flink 进阶教程(二):Time 深度解析](https://ververica.cn/developers/advanced-tutorial-2-time-depth-analysis/)
- [Apache Flink 进阶教程(三):Checkpoint 的应用实践](https://ververica.cn/developers/advanced-tutorial-2-checkpoint-application-practice/)
- [Apache Flink 进阶教程(四):Flink on Yarn/K8s 原理剖析及实践](https://ververica.cn/developers/advanced-tutorial-2-flink-on-yarn-k8s/)
- [Apache Flink 进阶教程(五):数据类型和序列化](https://ververica.cn/developers/advanced-tutorial-2-serialize/)
- [Apache Flink 进阶教程(六):Flink 作业执行深度解析](https://ververica.cn/developers/advanced-tutorial-2-flink-job-execution-depth-analysis/)
- [Apache Flink 进阶教程(七):网络流控及反压剖析](https://ververica.cn/developers/advanced-tutorial-2-analysis-of-network-flow-control-and-back-pressure/)
- [Apache Flink 进阶教程(八):详解 Metrics 原理与实战](https://ververica.cn/developers/advanced-tutorial-2-principles-and-practice-of-metrics/)
================================================
FILE: columns/flink/Flink零基础入门.md
================================================
# Flink零基础入门
时间:2019
来源:Ververica中文社区
- [Apache Flink 零基础入门(一&二):基础概念解析](https://ververica.cn/developers/flink-basic-tutorial-1-basic-concept/)
- [Apache Flink 零基础入门(三):开发环境搭建和应用的配置、部署及运行](https://ververica.cn/developers/flink-basic-tutorial-1-environmental-construction/)
- [Apache Flink 零基础入门(四):DataStream API 编程](https://ververica.cn/developers/apache-flink-basic-zero-iii-datastream-api-programming/)
- [Apache Flink 零基础入门(五):客户端操作](https://ververica.cn/developers/apache-flink-zero-basic-introduction-iv-client-operation/)
- [Apache Flink 零基础入门(六):Flink Time & Window 解析](https://ververica.cn/developers/time-window/)
- [Apache Flink 零基础入门(七):状态管理及容错机制](https://ververica.cn/developers/state-management/)
- [Apache Flink 零基础入门(八):Table API 编程](https://ververica.cn/developers/table-api-programming/)
- [Apache Flink 零基础入门(九):Flink SQL 编程实践](https://ververica.cn/developers/flink-sql-programming-practice/)
================================================
FILE: columns/hive/hive教程.md
================================================
# Hive教程
## Hive学习之路 2018
- [Hive学习之路 (一)Hive初识](https://www.cnblogs.com/qingyunzong/p/8707885.html)
- [Hive学习之路 (二)Hive安装](https://www.cnblogs.com/qingyunzong/p/8708057.html)
- [Hive学习之路 (三)Hive元数据信息对应MySQL数据库表](https://www.cnblogs.com/qingyunzong/p/8710356.html)
- [Hive学习之路 (四)Hive的连接3种连接方式](https://www.cnblogs.com/qingyunzong/p/8715925.html)
- [Hive学习之路 (五)DbVisualizer配置连接hive](https://www.cnblogs.com/qingyunzong/p/8715250.html)
- [Hive学习之路 (六)Hive SQL之数据类型和存储格式](https://www.cnblogs.com/qingyunzong/p/8733924.html)
- [Hive学习之路 (七)Hive的DDL操作](https://www.cnblogs.com/qingyunzong/p/8723271.html)
- [Hive学习之路 (八)Hive中文乱码](https://www.cnblogs.com/qingyunzong/p/8724155.html)
- [Hive学习之路 (九)Hive的内置函数](https://www.cnblogs.com/qingyunzong/p/8744593.html)
- [Hive学习之路 (十)Hive的高级操作](https://www.cnblogs.com/qingyunzong/p/8746159.html)
- [Hive学习之路 (十一)Hive的5个面试题](https://www.cnblogs.com/qingyunzong/p/8747656.html)
- [Hive学习之路 (十二)Hive SQL练习之影评案例](https://www.cnblogs.com/qingyunzong/p/8727264.html)
- [Hive学习之路 (十三)Hive分析窗口函数(一) SUM,AVG,MIN,MAX](https://www.cnblogs.com/qingyunzong/p/8782794.html)
- [Hive学习之路 (十四)Hive分析窗口函数(二) NTILE,ROW_NUMBER,RANK,DENSE_RANK](https://www.cnblogs.com/qingyunzong/p/8798102.html)
- [Hive学习之路 (十五)Hive分析窗口函数(三) CUME_DIST和PERCENT_RANK](https://www.cnblogs.com/qingyunzong/p/8798382.html)
- [Hive学习之路 (十六)Hive分析窗口函数(四) LAG、LEAD、FIRST_VALUE和LAST_VALUE](https://www.cnblogs.com/qingyunzong/p/8798606.html)
- [Hive学习之路 (十七)Hive分析窗口函数(五) GROUPING SETS、GROUPING__ID、CUBE和ROLLUP](https://www.cnblogs.com/qingyunzong/p/8798987.html)
- [Hive学习之路 (十八)Hive的Shell操作](https://www.cnblogs.com/qingyunzong/p/8847532.html)
- [Hive学习之路 (十九)Hive的数据倾斜](https://www.cnblogs.com/qingyunzong/p/8847597.html)
- [Hive学习之路 (二十)Hive 执行过程实例分析](https://www.cnblogs.com/qingyunzong/p/8847651.html)
- [Hive学习之路 (二十一)Hive 优化策略](https://www.cnblogs.com/qingyunzong/p/8847775.html)
================================================
FILE: columns/kudu/Kudu原理论文.md
================================================
# Kudu 原理
- [Apache Kudu Read & Write Paths](https://blog.cloudera.com/apache-kudu-read-write-paths/) 2017-04
- [Kudu存储原理](https://github.com/collabH/repository/blob/master/bigdata/olap/kudu/Kudu%E5%8E%9F%E7%90%86%E5%88%86%E6%9E%90.md)
# Kudu 相关论文
- [LSM Tree](https://www.cs.umb.edu/~poneil/lsmtree.pdf)
- [Kudu论文解读: Fast Analytics on Fast Data (上)](https://zhuanlan.zhihu.com/p/137238298) 2020-04
- [Kudu论文解读: Fast Analytics on Fast Data (下)](https://zhuanlan.zhihu.com/p/137243163) 2020-04
================================================
FILE: columns/kudu/网易云Kudu技术文章.md
================================================
# 网易云Kudu技术文章
- [【大数据之数据仓库】选型流水记](https://sq.sf.163.com/blog/article/174995941069086720) 2018-07
- [【大数据之数据仓库】kudu客户端java驱动缺陷](https://sq.sf.163.com/blog/article/169595475122905088) 2018-06
- [【大数据之数据仓库】kudu性能测试报告分析](https://sq.sf.163.com/blog/article/174995336187535360) 2018-07
- [分布式存储系统 Kudu 与 HBase 的简要分析与对比](https://sq.163yun.com/blog/article/198870236065431552) 2018-11
- [【kudu pk parquet】runtime filter实践](https://sq.sf.163.com/blog/article/174993565549518848) 2018-07
- [【kudu pk parquet】TPC-H Query2对比解析](https://sq.sf.163.com/blog/article/175000124925075456) 2018-07
================================================
FILE: columns/opensource/数仓相关开源项目汇总.md
================================================
# 数仓相关开源项目汇总
## 元数据、数据治理
- [atlas](https://github.com/apache/atlas)
- [datahub](https://github.com/linkedin/datahub)
## 数据集成
- [DataX](https://github.com/alibaba/DataX)
- [datax-web](https://github.com/WeiYe-Jing/datax-web)
## 数据计算
- [streamx](https://github.com/streamxhub/streamx)
- [plink](https://github.com/hairless/plink) Platform for Flink
- [FlinkSQL](https://github.com/ambition119/FlinkSQL)
- [flinkStreamSQL](https://github.com/DTStack/flinkStreamSQL)
- [waterdrop](https://github.com/InterestingLab/waterdrop)
## 调度
- [dolphinscheduler](https://github.com/apache/dolphinscheduler)
## 开发平台、其他
- [davinci](https://github.com/edp963/davinci)
- [DataSphereStudio](https://github.com/WeBankFinTech/DataSphereStudio) 微众银行
- [wormhole](https://github.com/edp963/wormhole) 宜信
- [big-whale](https://github.com/MeetYouDevs/big-whale)
- [lark](https://github.com/wxgzgl/lark)
================================================
FILE: columns/presto/Presto最佳实践、调优、踩坑专栏.md
================================================
# Presto最佳实践、调优、踩坑专栏
## 一、最佳实践
- [Presto的ETL之路](https://zhuanlan.zhihu.com/p/53996153) 2019-01
- [Presto的应用场景与企业案例](https://zhuanlan.zhihu.com/p/260653669) 2020-10
### 1.1 技术选型
- [PrestoDB VS PrestoSQL发展比较](https://zhuanlan.zhihu.com/p/87621360) 2019-10
- [PrestoDB和PrestoSQL比较及选择](http://armsword.com/2020/05/02/the-difference-between-prestodb-and-prestosql/) 2020-05
### 1.2 大厂实践
- [Presto在B站的实践](https://www.bilibili.com/read/cv16043517) 2022-04
- [Presto 在字节跳动的内部实践与优化(优化篇)](https://xie.infoq.cn/article/061bb0935a8575e01ea243852) 2021-12
- [Presto at Tencent at Scale - pdf](https://static.sched.com/hosted_files/prestocon2021/ed/Presto%20at%20Tencent%20at%20Scale%20%281%29.pdf) 2021-12
- [Presto在车好多的实践](https://mp.weixin.qq.com/s/Bmqv54sVZgTqQ82I_RfmsA) 2020-12
- [Presto在滴滴的探索与实践](https://zhuanlan.zhihu.com/p/266162270) 2020-10
- [Presto 在有赞的实践之路](https://tech.youzan.com/presto-zai-you-zan-de-shi-jian-zhi-lu/) 2020-04
- [PrestoCon 2020:云原生数据湖分析DLA的Presto实践](https://zhuanlan.zhihu.com/p/260784762) 2020-03
- [携程 Presto 技术演进之路](https://zhuanlan.zhihu.com/p/41538472) 2018-08
- [Presto实现原理和美团的使用实践](https://tech.meituan.com/2014/06/16/presto.html) 2014-06
- [阿里数据湖 Presto分析算力隔离技术剖析 ](https://mp.weixin.qq.com/s/lV_nzLI6_Ott7Abyaik_bw)
## 二、性能调优
- [Presto性能调优的五大技巧](https://zhuanlan.zhihu.com/p/162809568) 2020-07
- [Presto内存管理原理和调优](http://armsword.com/2018/05/22/the-memory-management-and-tuning-experience-of-presto/) 2018-05
- [Presto内存管理相关参数设置](http://armsword.com/2019/11/13/the-configuration-settings-of-presto-memory-management/) 2019-11
- [Presto集群内存不足时保护机制](http://armsword.com/2020/02/18/presto-memory-kill-policy/) 2020-02
- [火焰图在Presto YGC优化中的应用](https://mp.weixin.qq.com/s/BZG7Av5f9HH9gueVF8ABvQ) 2020-03
- [使用火焰图定位 OLAP 引擎瓶颈](https://mp.weixin.qq.com/s/pIYdeF0TtbGgV0Va35ejQg) 2021-03
- [How to Make The Presto Query Engine Run Fastest](https://ahana.io/learn/presto/making-the-presto-query-engine-run-faster/)
## 三、问题排查(踩坑)
- [说下那些导致Presto查询变慢的JVM Bug和解决方法](http://armsword.com/2021/02/07/jvm-bug-causes-Presto-queries-to-slow-down/) 2021-02
- [Presto Master JVM Core问题调研](http://armsword.com/2020/12/10/solve-presto-jvm-coredump/) 2020-12
- [Jetty导致Presto堆外内存泄露的排查过程](http://armsword.com/2020/06/23/jetty-cause-presto-memory-leak/) 2020-06
- [记一次Presto Worker OOM的查找过程](http://armsword.com/2020/06/03/the-solution-of-presto-oom-caused-by-orc-statistics/) 2020-06
- [Presto System load过高问题调研](http://armsword.com/2019/09/18/solve-presto-system-load-too-high/) 2019-09
- [一次 Presto 的连接数超限的问题定位](https://zhuanlan.zhihu.com/p/57956341) 2019-03
- [Presto Codegen问题排查案例](https://zhuanlan.zhihu.com/p/66243773) 2019-05
- [Presto coordinator的CPU持续上涨,原因竟然是这样](https://mayunlei.github.io/2019/05/20/Presto-coordinator%E7%9A%84CPU%E6%8C%81%E7%BB%AD%E4%B8%8A%E6%B6%A8%EF%BC%8C%E5%8E%9F%E5%9B%A0%E7%AB%9F%E7%84%B6%E6%98%AF%E8%BF%99%E6%A0%B7/) 2019-05
- [Presto内存泄露问题调查](https://mayunlei.github.io/2019/09/02/Presto%E5%86%85%E5%AD%98%E6%B3%84%E9%9C%B2%E9%97%AE%E9%A2%98%E8%B0%83%E6%9F%A5/) 2019-09
================================================
FILE: columns/presto/Presto架构、源码分析专栏.md
================================================
# Presto架构、源码分析专栏
## 一、原理、架构
- [Presto概述:特性、原理、架构](https://zhuanlan.zhihu.com/p/260399749) 2020-10
- [分布式SQL查询引擎Presto原理介绍](http://armsword.com/2017/12/05/presto/) 2017-12
- [深入理解Presto](https://zhuanlan.zhihu.com/p/101366898) 2020-01
- [分布式SQL查询引擎原理(以Presto SQL为例)](https://zhuanlan.zhihu.com/p/293775390) 2020-11
- [深入理解Presto,Presto的内部架构](https://mayunlei.github.io/2020/08/16/%E6%B7%B1%E5%85%A5%E7%90%86%E8%A7%A3Presto-Presto%E7%9A%84%E5%86%85%E9%83%A8%E6%9E%B6%E6%9E%84/) 2020-08
- [Presto 分布式SQL查询引擎及原理分析](https://mp.weixin.qq.com/s?__biz=MzI5MDEzMzg5Nw==&mid=2660400264&idx=1&sn=ebff65980ef45f7dffea1e5ec7d51fdc&chksm=f7425e6ec035d778dcc5704babe5241d8c80f3d21059434b00d8d4c46d9ce0bd232467ec92a6&scene=21#wechat_redirect) 2020-05
## 二、源码分析
### 2.1 前期准备
- [如何快速掌握Presto源码:思路和经验](https://zhuanlan.zhihu.com/p/262236892) 2020-10
- [Presto 源码阅读: Overview](https://zhuanlan.zhihu.com/p/51393518) 2018-12
- [Presto的一些基本概念](http://armsword.com/2018/08/11/the-basic-concepts-of-presto/) 2018-08
- [Presto/Trino权威指南及官方设计文档解读](https://www.jianshu.com/p/d3600d2a115d) 2021-05
### 2.2 数据类型、Query Execution Model
- [Presto类型系统初探](https://zhuanlan.zhihu.com/p/55299409) 2019-01
- [Presto源码分析之数据类型](https://zhuanlan.zhihu.com/p/52713533) 2018-12
- [Presto Core Data Structures: Slice, Block & Page](https://zhuanlan.zhihu.com/p/60813087) 2019-03
- [Presto源码分析之Slice](https://zhuanlan.zhihu.com/p/52735465) 2018-12
- [Presto Driver,Split and Pipeline](https://www.lewuathe.com/presto-driver,split-and-pipeline.html) 2017-05
### 2.3 SQL解析、执行计划生成与优化
- [Presto 源码分析:Coordinator 篇](https://www.infoq.cn/article/VNe0A9yKszPCmp32akCa) 2019-12
- [Presto SQL Parser源码分析](https://zhuanlan.zhihu.com/p/57438825) 2019-02
- [Presto 源码阅读:Optimizers](https://zhuanlan.zhihu.com/p/52154130) 2019-01
- [Presto逻辑执行计划生成](https://zhuanlan.zhihu.com/p/57395047) 2019-02
- [Presto源码分析之IterativeOptimizer](https://zhuanlan.zhihu.com/p/52879375) 2018-12
- [Presto源码分析之模式匹配](https://zhuanlan.zhihu.com/p/52916774) 2018-12
- [Presto技术源码解析总结-一个SQL的奇幻之旅 上](https://www.jianshu.com/p/3fccfa82e1ec) 2019-04
- [Presto技术源码解析总结-一个SQL的奇幻之旅 下](https://www.jianshu.com/p/d8a3d7488358) 2019-04
- [Presto查询执行过程和索引条件下推分析](https://mp.weixin.qq.com/s?src=11×tamp=1616394200&ver=2961&signature=E7fzfl-wO5wGpohLLkE8v9hRKn5GR1TbVwU-N6Hl11T0Xl6TtlgCbhJmisPs*Z-hYiprO0yYK91O5GR0m-V-s5kvv6NudfeWMGW4iPXdAdetAfDAo4EITB9l*yZajiJS&new=1) 2020-05
### 2.4 分布式任务调度、split生成与调度策略、worker选择策略
- [Presto运行时浅析](https://zhuanlan.zhihu.com/p/345733460) 2021-01
- [Presto源码阅读——如何获取Hive中的Metadata(HMS+HDFS)](https://blog.csdn.net/huang_quanlong/article/details/80380474) 2018-07
- [Presto如何构建和使用海量Hive Splits](https://zhuanlan.zhihu.com/p/344559757) 2021-01
- [Presto之Task执行框架](https://zhuanlan.zhihu.com/p/54172313) 2019-01
- [Presto 是如何 schedule task 的?](https://zhuanlan.zhihu.com/p/58959725) 2019-03
- [Presto 由Stage到Task的旅程](https://zhuanlan.zhihu.com/p/55785284) 2019-01
- [Presto调度task选择Worker方法](http://armsword.com/2020/04/08/presto-scheduling-task/) 2020-04
- [presto中的AllAtOnce与Phased](https://zhuanlan.zhihu.com/p/61656233) 2019-05
- [Presto 任务调度: 任务分配到哪里](https://mayunlei.github.io/2020/05/30/Presto-%E4%BB%BB%E5%8A%A1%E8%B0%83%E5%BA%A6%EF%BC%9A-%E4%BB%BB%E5%8A%A1%E5%88%86%E9%85%8D%E5%88%B0%E5%93%AA%E9%87%8C/) 2020-05
- [Presto Split 详解](https://blog.csdn.net/zhanyuanlin/article/details/109215177)
### 2.5 常用Operator分析、常用SQL底层实现原理
- [Window函数与WindowOperator源码解析](https://zhuanlan.zhihu.com/p/59550902) 2019-03
- [Presto中coalesce函数的实现与Expression Codegen](https://zhuanlan.zhihu.com/p/64131496) 2019-04
- [Presto Limit 类算子分析](https://zhuanlan.zhihu.com/p/62448395) 2019-04
- [Presto分页功能概述](https://zhuanlan.zhihu.com/p/57030465) 2019-02
#### join、shuffle
- [Presto 数据如何进行shuffle](https://zhuanlan.zhihu.com/p/61565957) 2019-04
- [Presto中的Hash Join](https://zhuanlan.zhihu.com/p/54731892) 2019-03
#### 分组聚合
- [Presto中的分组聚合查询流程](https://zhuanlan.zhihu.com/p/54385845) 2019-01
- [深入理解Presto中的Group By查询](https://zhuanlan.zhihu.com/p/67742519) 2019-09
### 2.6 Function、UDF
### 2.7 Connector机制、常用Connector分析
- [ORC & Presto](https://zhuanlan.zhihu.com/p/110013789) 2020-02
- [Presto ORC及其性能优化](http://armsword.com/2019/09/30/presto-orc-and-performance-optimization/) 2019-09
- [Presto Hive MetaStore相关代码分析](https://zhuanlan.zhihu.com/p/109033118) 2020-02
- [Presto Connector之SystemTable](https://zhuanlan.zhihu.com/p/60934739) 2019-03
- [如何让Presto可以连接Hbase?文中含Hbase-Connect开发详解](https://www.analysys.cn/article/detail/20019023) 2018-11
### 2.8 其他
- [Presto源码分析之TupleDomain](https://zhuanlan.zhihu.com/p/53113638) 2018-12
- [Presto的缓存机制](https://zhuanlan.zhihu.com/p/196398077) 2020-08
- [Presto Caching](https://zhuanlan.zhihu.com/p/147769024) 2020-06
- [Presto Codegen简介与优化尝试](https://zhuanlan.zhihu.com/p/53469238) 2018-12
- [Presto Procedure](https://zhuanlan.zhihu.com/p/59159147) 2019-03
- [How is data inserted into Presto?](https://zhuanlan.zhihu.com/p/59846328) 2019-03
- [Presto兼容Hive SQL的一些改造工作](http://armsword.com/2019/03/31/presto-compatible-hive-syntax/) 2019-03
- [Presto Coordinator分布式改造](https://mayunlei.github.io/2019/11/26/Presto-Coordinator%E5%88%86%E5%B8%83%E5%BC%8F%E6%94%B9%E9%80%A0/) 2019-11
- [Visualize Execution Plan in Presto](https://www.lewuathe.com/visualize-execution-plan-in-presto.html) 2019-09
- [Presto兼容Hive隐式类型转换](https://mp.weixin.qq.com/s/1hn3nVBdBtBeiPl3wxvHfQ) 2021-02
- [Presto 标量函数注册和调用过程简述](https://mp.weixin.qq.com/s/vd65OVeIOH7YFQ0QOAmsUg) 2020-09
- [Presto 函数实现简述](https://mp.weixin.qq.com/s/1Z_qik61N3hKwWqG8QR69w) 2020-07
- [Improved Hive Bucketing](https://trino.io/blog/2019/05/29/improved-hive-bucketing.html)
## 三、相关论文
- [官方论文《Presto: SQL on everything》](https://trino.io/Presto_SQL_on_Everything.pdf) [中文翻译](https://www.jianshu.com/p/de0a1de9f26e)
- [《F1 Query: Declarative Querying at Scale》读后感](https://zhuanlan.zhihu.com/p/53299556) 2018-12
- [《Column-Stores vs. Row-Stores》读后感](https://zhuanlan.zhihu.com/p/54433448) 2019-01 abei-知乎
- [读后感之《Column-Stores vs. Row-Stores》](https://zhuanlan.zhihu.com/p/54484592) 2019-01 萌豆-知乎
- [Wander Join:Online Aggregation via Random Walks读后感](https://zhuanlan.zhihu.com/p/55050773) 2020-03
- [《The Snowflake Elastic Data Warehouse》读后感](https://zhuanlan.zhihu.com/p/55577067) 2019-01
================================================
FILE: columns/presto/Presto资料汇总、会议资讯专栏.md
================================================
# Presto资料汇总、会议资讯专栏
## 一、官网、技术博客
### 1.1 官网
- [PrestoDB 官网](https://prestodb.io/)
- [Trino 官网](https://trino.io/) 原PrestoSql
- [PrestoDB Blog](https://prestodb.io/blog/index.html)
- [Trino Blog](https://trino.io/blog/)
- [PrestoDB github](https://github.com/prestodb/presto)
- [Trino github](https://github.com/trinodb/trino)
### 1.2 讨论区(群组、公众号等)
- [Google Presto Group](https://groups.google.com/g/presto-users)
- [PrestoDB Slack](https://prestodb.slack.com)
- [Trino Slack](https://trinodb.slack.com)
- 公众号:Presto News
- 公众号:FFCompute
### 1.3 技术博客
- [Presto知乎专栏](https://www.zhihu.com/column/presto-cn)
- [若飞-技术博客](http://armsword.com/archives/)
## 二、书籍相关
- [《Presto: The Definitive Guide》](https://trino.io/blog/2020/04/11/the-definitive-guide.html)
- [《Presto技术内幕》](https://book.douban.com/subject/26855863/) 京东Presto团队
## 三、会议、资讯
### 3.1 会议
- [Presto Meetup Oct 2019](https://zhuanlan.zhihu.com/p/88350254) 2019-10
- [PrestoCon 2020](https://prestocon2020.sched.com/)
- [PrestoCon 2021](https://prestocon2021.sched.com/)
- [PrestoCon 2022](https://prestocon2022.sched.com/)
### 3.2 资讯
- [惊闻Facebook开源大数据引擎Presto团队正在分裂](https://zhuanlan.zhihu.com/p/55628236) 2019-01
- [与 Facebook 分手后 ,PrestoSQL 再度因商标侵权被迫更名](https://www.infoq.cn/article/WmH0WXhqsWqpHDm6PpjC) 2021-01
================================================
FILE: columns/spark/Apache Spark的设计与实现.md
================================================
# Apache Spark的设计与实现
> Spark Version: 1.0.2 Doc Version: 1.0.2.0
- [介绍](https://spark-internals.books.yourtion.com/index.html)
- [概览](https://spark-internals.books.yourtion.com/markdown/1-Overview.html)
- [Job 逻辑执行图](https://spark-internals.books.yourtion.com/markdown/2-JobLogicalPlan.html)
- [Job 物理执行图](https://spark-internals.books.yourtion.com/markdown/3-JobPhysicalPlan.html)
- [Shuffle 过程](https://spark-internals.books.yourtion.com/markdown/4-shuffleDetails.html)
- [架构](https://spark-internals.books.yourtion.com/markdown/5-Architecture.html)
- [Cache 和 Checkpoint](https://spark-internals.books.yourtion.com/markdown/6-CacheAndCheckpoint.html)
- [Broadcast](https://spark-internals.books.yourtion.com/markdown/7-Broadcast.html)
- [SparkInternals - github](https://github.com/JerryLead/SparkInternals)
================================================
FILE: columns/starrocks/StarRocks技术内幕.md
================================================
# StarRocks技术内幕
- [多表物化视图的设计与实现](https://blog.csdn.net/StarRocks/article/details/127863764) 2022-11
gitextract_e8gsxs7x/
├── README.md
└── columns/
├── doris/
│ ├── Doris全面解析.md
│ └── Doris最佳实践.md
├── flink/
│ ├── Apache Flink 漫谈系列.md
│ ├── Flink 相关论文.md
│ ├── Flink实战系列.md
│ ├── Flink开源项目汇总.md
│ ├── Flink架构、源码分析专栏.md
│ ├── Flink进阶教程.md
│ └── Flink零基础入门.md
├── hive/
│ └── hive教程.md
├── kudu/
│ ├── Kudu原理论文.md
│ └── 网易云Kudu技术文章.md
├── opensource/
│ └── 数仓相关开源项目汇总.md
├── presto/
│ ├── Presto最佳实践、调优、踩坑专栏.md
│ ├── Presto架构、源码分析专栏.md
│ └── Presto资料汇总、会议资讯专栏.md
├── spark/
│ └── Apache Spark的设计与实现.md
└── starrocks/
└── StarRocks技术内幕.md
Condensed preview — 19 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (47K chars).
[
{
"path": "README.md",
"chars": 11431,
"preview": "# DPKB\n\n大数据相关知识库,主要包括:\n* 数据存储层、数据库(HDFS、Hive、HBase、Kudu、Doris、StarRocks、ClickHouse、TiDB等)\n* 数据处理层、OLAP引擎(Spark、Flink、Pre"
},
{
"path": "columns/doris/Doris全面解析.md",
"chars": 1186,
"preview": "# Doris全面解析\n\n## 原理\n- [Apache Doris : 一个开源 MPP 数据库的架构与实践](https://www.jianshu.com/p/d3742af8ecce)\n\n## 存储相关\n- [存储层设计介绍1——存"
},
{
"path": "columns/doris/Doris最佳实践.md",
"chars": 1048,
"preview": "# Doris最佳实践\n\n## 调优\n- [Compaction调优(1)](https://mp.weixin.qq.com/s/Kv71HomwNioHQDz8NUec1A) 2021-06\n- [Compaction调优(2)]"
},
{
"path": "columns/flink/Apache Flink 漫谈系列.md",
"chars": 1914,
"preview": "# Apache Flink 漫谈系列 (阿里云实时计算Flink)\n\n\n## 教程\n- [Apache Flink 漫谈系列(01) - 序](https://developer.aliyun.com/article/666043?spm"
},
{
"path": "columns/flink/Flink 相关论文.md",
"chars": 552,
"preview": "# Flink 相关论文\n\n- [Distributed Snapshots: Determining Global States of Distributed Systems ](https://www.microsoft.com/en-"
},
{
"path": "columns/flink/Flink实战系列.md",
"chars": 444,
"preview": "# Flink实战系列\n\n\n\n- [从零构建Flink SQL计算平台 - 1平台搭建概述](https://www.cnblogs.com/pyx0/p/12348114.html)\n- [从零构建Flink SQL计算平台 - 2实现作"
},
{
"path": "columns/flink/Flink开源项目汇总.md",
"chars": 577,
"preview": "# Flink开源项目汇总\n\n\n- [flink-sql-gateway](https://github.com/ververica/flink-sql-gateway#readme)\n\n- [flink-jdbc-driver](http"
},
{
"path": "columns/flink/Flink架构、源码分析专栏.md",
"chars": 1578,
"preview": "# Flink架构、源码分析专栏\n\n\n\n## 流式计算原理\n- [Streaming 101: The world beyond batch](https://www.oreilly.com/radar/the-world-beyond-b"
},
{
"path": "columns/flink/Flink进阶教程.md",
"chars": 1036,
"preview": "# Flink进阶教程\n\n时间:2019\n来源:Ververica中文社区\n\n\n- [Apache Flink 进阶教程(一):Runtime 核心机制剖析](https://ververica.cn/developers/advanced"
},
{
"path": "columns/flink/Flink零基础入门.md",
"chars": 918,
"preview": "# Flink零基础入门\n\n时间:2019\n来源:Ververica中文社区\n\n- [Apache Flink 零基础入门(一&二):基础概念解析](https://ververica.cn/developers/flink-basic-t"
},
{
"path": "columns/hive/hive教程.md",
"chars": 1894,
"preview": "# Hive教程\n\n\n## Hive学习之路 2018\n- [Hive学习之路 (一)Hive初识](https://www.cnblogs.com/qingyunzong/p/8707885.html)\n- [Hive学习之路 (二"
},
{
"path": "columns/kudu/Kudu原理论文.md",
"chars": 509,
"preview": "# Kudu 原理\n\n\n- [Apache Kudu Read & Write Paths](https://blog.cloudera.com/apache-kudu-read-write-paths/) 2017-04\n- [Ku"
},
{
"path": "columns/kudu/网易云Kudu技术文章.md",
"chars": 602,
"preview": "# 网易云Kudu技术文章\n\n\n- [【大数据之数据仓库】选型流水记](https://sq.sf.163.com/blog/article/174995941069086720) 2018-07\n- [【大数据之数据仓库】kudu客"
},
{
"path": "columns/opensource/数仓相关开源项目汇总.md",
"chars": 897,
"preview": "# 数仓相关开源项目汇总\n\n\n## 元数据、数据治理\n- [atlas](https://github.com/apache/atlas)\n- [datahub](https://github.com/linkedin/datahub)\n\n"
},
{
"path": "columns/presto/Presto最佳实践、调优、踩坑专栏.md",
"chars": 3110,
"preview": "# Presto最佳实践、调优、踩坑专栏\n\n\n\n\n## 一、最佳实践\n- [Presto的ETL之路](https://zhuanlan.zhihu.com/p/53996153) 2019-01\n- [Presto的应用场景与企业案"
},
{
"path": "columns/presto/Presto架构、源码分析专栏.md",
"chars": 6467,
"preview": "# Presto架构、源码分析专栏\n\n## 一、原理、架构\n- [Presto概述:特性、原理、架构](https://zhuanlan.zhihu.com/p/260399749) 2020-10\n- [分布式SQL查询引擎Pres"
},
{
"path": "columns/presto/Presto资料汇总、会议资讯专栏.md",
"chars": 1310,
"preview": "# Presto资料汇总、会议资讯专栏\n\n## 一、官网、技术博客\n### 1.1 官网\n- [PrestoDB 官网](https://prestodb.io/)\n- [Trino 官网](https://trino.io/) 原"
},
{
"path": "columns/spark/Apache Spark的设计与实现.md",
"chars": 818,
"preview": "# Apache Spark的设计与实现\n\n> Spark Version: 1.0.2 Doc Version: 1.0.2.0\n\n\n- [介绍](https://spark-internals.books.yourtion.com/in"
},
{
"path": "columns/starrocks/StarRocks技术内幕.md",
"chars": 103,
"preview": "# StarRocks技术内幕\n\n- [多表物化视图的设计与实现](https://blog.csdn.net/StarRocks/article/details/127863764) 2022-11"
}
]
About this extraction
This page contains the full source code of the huangfox/dpkb GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 19 files (35.5 KB), approximately 16.4k tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.