Showing preview only (1,037K chars total). Download the full file or copy to clipboard to get everything.
Repository: heibaiying/BigData-Notes
Branch: master
Commit: 3898939aca38
Files: 225
Total size: 964.0 KB
Directory structure:
gitextract_zbjfquu6/
├── .gitignore
├── README.md
├── code/
│ ├── Flink/
│ │ ├── flink-basis-java/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ └── main/
│ │ │ ├── java/
│ │ │ │ └── com/
│ │ │ │ └── heibaiying/
│ │ │ │ └── StreamingJob.java
│ │ │ └── resources/
│ │ │ └── log4j.properties
│ │ ├── flink-basis-scala/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ └── main/
│ │ │ ├── resources/
│ │ │ │ ├── log4j.properties
│ │ │ │ └── wordcount.txt
│ │ │ └── scala/
│ │ │ └── com/
│ │ │ └── heibaiying/
│ │ │ ├── WordCountBatch.scala
│ │ │ └── WordCountStreaming.scala
│ │ ├── flink-kafka-integration/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ └── main/
│ │ │ ├── java/
│ │ │ │ └── com/
│ │ │ │ └── heibaiying/
│ │ │ │ ├── CustomSinkJob.java
│ │ │ │ ├── KafkaStreamingJob.java
│ │ │ │ ├── bean/
│ │ │ │ │ └── Employee.java
│ │ │ │ └── sink/
│ │ │ │ └── FlinkToMySQLSink.java
│ │ │ └── resources/
│ │ │ └── log4j.properties
│ │ └── flink-state-management/
│ │ ├── pom.xml
│ │ └── src/
│ │ └── main/
│ │ ├── java/
│ │ │ └── com/
│ │ │ └── heibaiying/
│ │ │ ├── keyedstate/
│ │ │ │ ├── KeyedStateJob.java
│ │ │ │ ├── ThresholdWarning.java
│ │ │ │ └── ThresholdWarningWithTTL.java
│ │ │ └── operatorstate/
│ │ │ ├── OperatorStateJob.java
│ │ │ └── ThresholdWarning.java
│ │ └── resources/
│ │ └── log4j.properties
│ ├── Hadoop/
│ │ ├── hadoop-word-count/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ └── main/
│ │ │ ├── java/
│ │ │ │ └── com/
│ │ │ │ └── heibaiying/
│ │ │ │ ├── WordCountApp.java
│ │ │ │ ├── WordCountCombinerApp.java
│ │ │ │ ├── WordCountCombinerPartitionerApp.java
│ │ │ │ ├── component/
│ │ │ │ │ ├── CustomPartitioner.java
│ │ │ │ │ ├── WordCountMapper.java
│ │ │ │ │ └── WordCountReducer.java
│ │ │ │ └── utils/
│ │ │ │ └── WordCountDataUtils.java
│ │ │ └── resources/
│ │ │ └── log4j.properties
│ │ └── hdfs-java-api/
│ │ ├── pom.xml
│ │ └── src/
│ │ ├── main/
│ │ │ └── java/
│ │ │ └── com/
│ │ │ └── heibaiying/
│ │ │ └── utils/
│ │ │ └── HdfsUtils.java
│ │ └── test/
│ │ └── java/
│ │ └── HdfsTest.java
│ ├── Hbase/
│ │ ├── hbase-java-api-1.x/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ ├── main/
│ │ │ │ └── java/
│ │ │ │ └── com/
│ │ │ │ └── heibaiying/
│ │ │ │ └── HBaseUtils.java
│ │ │ └── test/
│ │ │ └── java/
│ │ │ └── com/
│ │ │ └── heibaiying/
│ │ │ └── HbaseUtilsTest.java
│ │ ├── hbase-java-api-2.x/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ ├── main/
│ │ │ │ └── java/
│ │ │ │ └── com/
│ │ │ │ └── heibaiying/
│ │ │ │ └── HBaseUtils.java
│ │ │ └── test/
│ │ │ └── java/
│ │ │ └── heibaiying/
│ │ │ └── HBaseUtilsTest.java
│ │ └── hbase-observer-coprocessor/
│ │ ├── pom.xml
│ │ └── src/
│ │ └── main/
│ │ └── java/
│ │ └── com/
│ │ └── heibaiying/
│ │ └── AppendRegionObserver.java
│ ├── Kafka/
│ │ └── kafka-basis/
│ │ ├── pom.xml
│ │ └── src/
│ │ └── main/
│ │ └── java/
│ │ └── com/
│ │ └── heibaiying/
│ │ ├── consumers/
│ │ │ ├── ConsumerASyn.java
│ │ │ ├── ConsumerASynAndSyn.java
│ │ │ ├── ConsumerASynWithOffsets.java
│ │ │ ├── ConsumerExit.java
│ │ │ ├── ConsumerGroup.java
│ │ │ ├── ConsumerSyn.java
│ │ │ ├── RebalanceListener.java
│ │ │ └── StandaloneConsumer.java
│ │ └── producers/
│ │ ├── ProducerASyn.java
│ │ ├── ProducerSyn.java
│ │ ├── ProducerWithPartitioner.java
│ │ ├── SimpleProducer.java
│ │ └── partitioners/
│ │ └── CustomPartitioner.java
│ ├── Phoenix/
│ │ ├── spring-boot-mybatis-phoenix/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ ├── main/
│ │ │ │ ├── java/
│ │ │ │ │ └── com/
│ │ │ │ │ └── heibaiying/
│ │ │ │ │ └── springboot/
│ │ │ │ │ ├── SpringBootMybatisApplication.java
│ │ │ │ │ ├── bean/
│ │ │ │ │ │ └── USPopulation.java
│ │ │ │ │ └── dao/
│ │ │ │ │ └── PopulationDao.java
│ │ │ │ └── resources/
│ │ │ │ └── application.yml
│ │ │ └── test/
│ │ │ └── java/
│ │ │ └── com/
│ │ │ └── heibaiying/
│ │ │ └── springboot/
│ │ │ └── PopulationTest.java
│ │ └── spring-mybatis-phoenix/
│ │ ├── pom.xml
│ │ └── src/
│ │ ├── main/
│ │ │ ├── java/
│ │ │ │ └── com/
│ │ │ │ └── heibaiying/
│ │ │ │ ├── bean/
│ │ │ │ │ └── USPopulation.java
│ │ │ │ └── dao/
│ │ │ │ └── PopulationDao.java
│ │ │ └── resources/
│ │ │ ├── jdbc.properties
│ │ │ ├── mappers/
│ │ │ │ └── Population.xml
│ │ │ ├── mybatisConfig.xml
│ │ │ └── springApplication.xml
│ │ └── test/
│ │ └── java/
│ │ └── com/
│ │ └── heibaiying/
│ │ └── dao/
│ │ └── PopulationDaoTest.java
│ ├── Storm/
│ │ ├── storm-hbase-integration/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ └── main/
│ │ │ └── java/
│ │ │ └── com/
│ │ │ └── heibaiying/
│ │ │ ├── WordCountToHBaseApp.java
│ │ │ └── component/
│ │ │ ├── CountBolt.java
│ │ │ ├── DataSourceSpout.java
│ │ │ └── SplitBolt.java
│ │ ├── storm-hdfs-integration/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ └── main/
│ │ │ └── java/
│ │ │ └── com.heibaiying/
│ │ │ ├── DataToHdfsApp.java
│ │ │ └── component/
│ │ │ └── DataSourceSpout.java
│ │ ├── storm-kafka-integration/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ └── main/
│ │ │ └── java/
│ │ │ └── com/
│ │ │ └── heibaiying/
│ │ │ └── kafka/
│ │ │ ├── read/
│ │ │ │ ├── LogConsoleBolt.java
│ │ │ │ └── ReadingFromKafkaApp.java
│ │ │ └── write/
│ │ │ ├── DataSourceSpout.java
│ │ │ └── WritingToKafkaApp.java
│ │ ├── storm-redis-integration/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ └── main/
│ │ │ └── java/
│ │ │ └── com/
│ │ │ └── heibaiying/
│ │ │ ├── CustomRedisCountApp.java
│ │ │ ├── WordCountToRedisApp.java
│ │ │ └── component/
│ │ │ ├── CountBolt.java
│ │ │ ├── DataSourceSpout.java
│ │ │ ├── RedisCountStoreBolt.java
│ │ │ ├── SplitBolt.java
│ │ │ └── WordCountStoreMapper.java
│ │ └── storm-word-count/
│ │ ├── pom.xml
│ │ └── src/
│ │ └── main/
│ │ ├── java/
│ │ │ └── com/
│ │ │ └── heibaiying/
│ │ │ └── wordcount/
│ │ │ ├── ClusterWordCountApp.java
│ │ │ ├── LocalWordCountApp.java
│ │ │ └── component/
│ │ │ ├── CountBolt.java
│ │ │ ├── DataSourceSpout.java
│ │ │ └── SplitBolt.java
│ │ └── resources/
│ │ └── assembly.xml
│ ├── Zookeeper/
│ │ └── curator/
│ │ ├── pom.xml
│ │ └── src/
│ │ └── main/
│ │ └── java/
│ │ └── com/
│ │ └── heibaiying/
│ │ ├── AclOperation.java
│ │ └── BasicOperation.java
│ └── spark/
│ ├── spark-streaming-basis/
│ │ ├── pom.xml
│ │ └── src/
│ │ └── main/
│ │ └── java/
│ │ └── com/
│ │ └── heibaiying/
│ │ ├── NetworkWordCount.scala
│ │ ├── NetworkWordCountToRedis.scala
│ │ ├── NetworkWordCountV2.scala
│ │ └── utils/
│ │ └── JedisPoolUtil.java
│ ├── spark-streaming-flume/
│ │ ├── pom.xml
│ │ └── src/
│ │ └── main/
│ │ └── scala/
│ │ └── com/
│ │ └── heibaiying/
│ │ └── flume/
│ │ ├── PullBasedWordCount.scala
│ │ └── PushBasedWordCount.scala
│ └── spark-streaming-kafka/
│ ├── pom.xml
│ └── src/
│ └── main/
│ └── scala/
│ └── com/
│ └── heibaiying/
│ └── kafka/
│ └── KafkaDirectStream.scala
├── notes/
│ ├── Azkaban_Flow_1.0_的使用.md
│ ├── Azkaban_Flow_2.0_的使用.md
│ ├── Azkaban简介.md
│ ├── Flink_Data_Sink.md
│ ├── Flink_Data_Source.md
│ ├── Flink_Data_Transformation.md
│ ├── Flink_Windows.md
│ ├── Flink开发环境搭建.md
│ ├── Flink核心概念综述.md
│ ├── Flink状态管理与检查点机制.md
│ ├── Flume整合Kafka.md
│ ├── Flume简介及基本使用.md
│ ├── HDFS-Java-API.md
│ ├── HDFS常用Shell命令.md
│ ├── Hadoop-HDFS.md
│ ├── Hadoop-MapReduce.md
│ ├── Hadoop-YARN.md
│ ├── Hbase_Java_API.md
│ ├── Hbase_Shell.md
│ ├── Hbase协处理器详解.md
│ ├── Hbase容灾与备份.md
│ ├── Hbase的SQL中间层_Phoenix.md
│ ├── Hbase简介.md
│ ├── Hbase系统架构及数据结构.md
│ ├── Hbase过滤器详解.md
│ ├── HiveCLI和Beeline命令行的基本使用.md
│ ├── Hive分区表和分桶表.md
│ ├── Hive常用DDL操作.md
│ ├── Hive常用DML操作.md
│ ├── Hive数据查询详解.md
│ ├── Hive简介及核心概念.md
│ ├── Hive视图和索引.md
│ ├── Kafka消费者详解.md
│ ├── Kafka深入理解分区副本机制.md
│ ├── Kafka生产者详解.md
│ ├── Kafka简介.md
│ ├── Scala函数和闭包.md
│ ├── Scala列表和集.md
│ ├── Scala基本数据类型和运算符.md
│ ├── Scala数组.md
│ ├── Scala映射和元组.md
│ ├── Scala模式匹配.md
│ ├── Scala流程控制语句.md
│ ├── Scala简介及开发环境配置.md
│ ├── Scala类和对象.md
│ ├── Scala类型参数.md
│ ├── Scala继承和特质.md
│ ├── Scala隐式转换和隐式参数.md
│ ├── Scala集合类型.md
│ ├── SparkSQL_Dataset和DataFrame简介.md
│ ├── SparkSQL外部数据源.md
│ ├── SparkSQL常用聚合函数.md
│ ├── SparkSQL联结操作.md
│ ├── Spark_RDD.md
│ ├── Spark_Streaming与流处理.md
│ ├── Spark_Streaming基本操作.md
│ ├── Spark_Streaming整合Flume.md
│ ├── Spark_Streaming整合Kafka.md
│ ├── Spark_Structured_API的基本使用.md
│ ├── Spark_Transformation和Action算子.md
│ ├── Spark简介.md
│ ├── Spark累加器与广播变量.md
│ ├── Spark部署模式与作业提交.md
│ ├── Spring+Mybtais+Phoenix整合.md
│ ├── Sqoop基本使用.md
│ ├── Sqoop简介与安装.md
│ ├── Storm三种打包方式对比分析.md
│ ├── Storm和流处理简介.md
│ ├── Storm核心概念详解.md
│ ├── Storm编程模型详解.md
│ ├── Storm集成HBase和HDFS.md
│ ├── Storm集成Kakfa.md
│ ├── Storm集成Redis详解.md
│ ├── Zookeeper_ACL权限控制.md
│ ├── Zookeeper_Java客户端Curator.md
│ ├── Zookeeper常用Shell命令.md
│ ├── Zookeeper简介及核心概念.md
│ ├── installation/
│ │ ├── Azkaban_3.x_编译及部署.md
│ │ ├── Flink_Standalone_Cluster.md
│ │ ├── HBase单机环境搭建.md
│ │ ├── HBase集群环境搭建.md
│ │ ├── Hadoop单机环境搭建.md
│ │ ├── Hadoop集群环境搭建.md
│ │ ├── Linux下Flume的安装.md
│ │ ├── Linux下JDK安装.md
│ │ ├── Linux下Python安装.md
│ │ ├── Linux环境下Hive的安装部署.md
│ │ ├── Spark开发环境搭建.md
│ │ ├── Spark集群环境搭建.md
│ │ ├── Storm单机环境搭建.md
│ │ ├── Storm集群环境搭建.md
│ │ ├── Zookeeper单机环境和集群环境搭建.md
│ │ ├── 基于Zookeeper搭建Hadoop高可用集群.md
│ │ ├── 基于Zookeeper搭建Kafka高可用集群.md
│ │ └── 虚拟机静态IP及多IP配置.md
│ ├── 大数据学习路线.md
│ ├── 大数据常用软件安装指南.md
│ ├── 大数据应用常用打包方式.md
│ ├── 大数据技术栈思维导图.md
│ └── 资料分享与工具推荐.md
├── pictures/
│ ├── bigdata-notes-icon.psd
│ └── 大数据技术栈思维导图.xmind
└── resources/
├── csv/
│ └── dept.csv
├── json/
│ ├── dept.json
│ └── emp.json
├── mysql-connector-java-5.1.47.jar
├── orc/
│ └── dept.orc
├── parquet/
│ ├── dept.parquet
│ └── emp.parquet
├── tsv/
│ ├── dept.tsv
│ └── emp.tsv
└── txt/
├── dept.txt
└── emp.txt
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
*#
*.iml
*.ipr
*.iws
*.sw?
*~
.#*
.*.md.html
.DS_Store
.classpath
.factorypath
.gradle
.idea
.metadata
.project
.recommenders
.settings
.springBeans
/build
MANIFEST.MF
_site/
activemq-data
bin
build
build.log
dependency-reduced-pom.xml
dump.rdb
interpolated*.xml
lib/
manifest.yml
overridedb.*
settings.xml
target
classes
out
logs
transaction-logs
.flattened-pom.xml
secrets.yml
.gradletasknamecache
.sts4-cache
================================================
FILE: README.md
================================================
# BigData-Notes
<div align="center"> <img width="444px" src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/bigdata-notes-icon.png"/> </div>
<br/>
**大数据入门指南**
<table>
<tr>
<th><img width="50px" src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/hadoop.jpg"></th>
<th><img width="50px" src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/hive.jpg"></th>
<th><img width="50px" src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/spark.jpg"></th>
<th><img width="50px" src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/storm.png"></th>
<th><img width="50px" src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/flink.png"></th>
<th><img width="50px" src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/hbase.png"></th>
<th><img width="50px" src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/kafka.png"></th>
<th><img width="50px" src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/zookeeper.jpg"></th>
<th><img width="50px" src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/flume.png"></th>
<th><img width="50px" src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/sqoop.png"></th>
<th><img width="50px" src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/azkaban.png"></th>
<th><img width="50px" src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/scala.jpg"></th>
</tr>
<tr>
<td align="center"><a href="#一hadoop">Hadoop</a></td>
<td align="center"><a href="#二hive">Hive</a></td>
<td align="center"><a href="#三spark">Spark</a></td>
<td align="center"><a href="#四storm">Storm</a></td>
<td align="center"><a href="#五flink">Flink</a></td>
<td align="center"><a href="#六hbase">HBase</a></td>
<td align="center"><a href="#七kafka">Kafka</a></td>
<td align="center"><a href="#八zookeeper">Zookeeper</a></td>
<td align="center"><a href="#九flume">Flume</a></td>
<td align="center"><a href="#十sqoop">Sqoop</a></td>
<td align="center"><a href="#十一azkaban">Azkaban</a></td>
<td align="center"><a href="#十二scala">Scala</a></td>
</tr>
</table>
<br/>
<div align="center">
<a href = "https://github.com/heibaiying/Full-Stack-Notes">
<img width="150px" src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/weixin.jpg"/>
</a>
</div>
<div align="center"> <strong> 如果需要离线阅读,可以在公众号上发送 “bigdata” 获取《大数据入门指南》离线阅读版! </strong> </div>
<br/>
## :black_nib: 前 言
1. [大数据学习路线](notes/大数据学习路线.md)
2. [大数据技术栈思维导图](notes/大数据技术栈思维导图.md)
3. [大数据常用软件安装指南](notes/大数据常用软件安装指南.md)
## 一、Hadoop
1. [分布式文件存储系统 —— HDFS](notes/Hadoop-HDFS.md)
2. [分布式计算框架 —— MapReduce](notes/Hadoop-MapReduce.md)
3. [集群资源管理器 —— YARN](notes/Hadoop-YARN.md)
4. [Hadoop 单机伪集群环境搭建](notes/installation/Hadoop单机环境搭建.md)
5. [Hadoop 集群环境搭建](notes/installation/Hadoop集群环境搭建.md)
6. [HDFS 常用 Shell 命令](notes/HDFS常用Shell命令.md)
7. [HDFS Java API 的使用](notes/HDFS-Java-API.md)
8. [基于 Zookeeper 搭建 Hadoop 高可用集群](notes/installation/基于Zookeeper搭建Hadoop高可用集群.md)
## 二、Hive
1. [Hive 简介及核心概念](notes/Hive简介及核心概念.md)
2. [Linux 环境下 Hive 的安装部署](notes/installation/Linux环境下Hive的安装部署.md)
4. [Hive CLI 和 Beeline 命令行的基本使用](notes/HiveCLI和Beeline命令行的基本使用.md)
6. [Hive 常用 DDL 操作](notes/Hive常用DDL操作.md)
7. [Hive 分区表和分桶表](notes/Hive分区表和分桶表.md)
8. [Hive 视图和索引](notes/Hive视图和索引.md)
9. [Hive 常用 DML 操作](notes/Hive常用DML操作.md)
10. [Hive 数据查询详解](notes/Hive数据查询详解.md)
## 三、Spark
**Spark Core :**
1. [Spark 简介](notes/Spark简介.md)
2. [Spark 开发环境搭建](notes/installation/Spark开发环境搭建.md)
4. [弹性式数据集 RDD](notes/Spark_RDD.md)
5. [RDD 常用算子详解](notes/Spark_Transformation和Action算子.md)
5. [Spark 运行模式与作业提交](notes/Spark部署模式与作业提交.md)
6. [Spark 累加器与广播变量](notes/Spark累加器与广播变量.md)
7. [基于 Zookeeper 搭建 Spark 高可用集群](notes/installation/Spark集群环境搭建.md)
**Spark SQL :**
1. [DateFrame 和 DataSet ](notes/SparkSQL_Dataset和DataFrame简介.md)
2. [Structured API 的基本使用](notes/Spark_Structured_API的基本使用.md)
3. [Spark SQL 外部数据源](notes/SparkSQL外部数据源.md)
4. [Spark SQL 常用聚合函数](notes/SparkSQL常用聚合函数.md)
5. [Spark SQL JOIN 操作](notes/SparkSQL联结操作.md)
**Spark Streaming :**
1. [Spark Streaming 简介](notes/Spark_Streaming与流处理.md)
2. [Spark Streaming 基本操作](notes/Spark_Streaming基本操作.md)
3. [Spark Streaming 整合 Flume](notes/Spark_Streaming整合Flume.md)
4. [Spark Streaming 整合 Kafka](notes/Spark_Streaming整合Kafka.md)
## 四、Storm
1. [Storm 和流处理简介](notes/Storm和流处理简介.md)
2. [Storm 核心概念详解](notes/Storm核心概念详解.md)
3. [Storm 单机环境搭建](notes/installation/Storm单机环境搭建.md)
4. [Storm 集群环境搭建](notes/installation/Storm集群环境搭建.md)
5. [Storm 编程模型详解](notes/Storm编程模型详解.md)
6. [Storm 项目三种打包方式对比分析](notes/Storm三种打包方式对比分析.md)
7. [Storm 集成 Redis 详解](notes/Storm集成Redis详解.md)
8. [Storm 集成 HDFS/HBase](notes/Storm集成HBase和HDFS.md)
9. [Storm 集成 Kafka](notes/Storm集成Kakfa.md)
## 五、Flink
1. [Flink 核心概念综述](notes/Flink核心概念综述.md)
2. [Flink 开发环境搭建](notes/Flink开发环境搭建.md)
3. [Flink Data Source](notes/Flink_Data_Source.md)
4. [Flink Data Transformation](notes/Flink_Data_Transformation.md)
4. [Flink Data Sink](notes/Flink_Data_Sink.md)
6. [Flink 窗口模型](notes/Flink_Windows.md)
7. [Flink 状态管理与检查点机制](notes/Flink状态管理与检查点机制.md)
8. [Flink Standalone 集群部署](notes/installation/Flink_Standalone_Cluster.md)
## 六、HBase
1. [Hbase 简介](notes/Hbase简介.md)
2. [HBase 系统架构及数据结构](notes/Hbase系统架构及数据结构.md)
3. [HBase 基本环境搭建 (Standalone /pseudo-distributed mode)](notes/installation/HBase单机环境搭建.md)
4. [HBase 集群环境搭建](notes/installation/HBase集群环境搭建.md)
5. [HBase 常用 Shell 命令](notes/Hbase_Shell.md)
6. [HBase Java API](notes/Hbase_Java_API.md)
7. [HBase 过滤器详解](notes/Hbase过滤器详解.md)
8. [HBase 协处理器详解](notes/Hbase协处理器详解.md)
9. [HBase 容灾与备份](notes/Hbase容灾与备份.md)
10. [HBase的 SQL 中间层 —— Phoenix](notes/Hbase的SQL中间层_Phoenix.md)
11. [Spring/Spring Boot 整合 Mybatis + Phoenix](notes/Spring+Mybtais+Phoenix整合.md)
## 七、Kafka
1. [Kafka 简介](notes/Kafka简介.md)
2. [基于 Zookeeper 搭建 Kafka 高可用集群](notes/installation/基于Zookeeper搭建Kafka高可用集群.md)
3. [Kafka 生产者详解](notes/Kafka生产者详解.md)
4. [Kafka 消费者详解](notes/Kafka消费者详解.md)
5. [深入理解 Kafka 副本机制](notes/Kafka深入理解分区副本机制.md)
## 八、Zookeeper
1. [Zookeeper 简介及核心概念](notes/Zookeeper简介及核心概念.md)
2. [Zookeeper 单机环境和集群环境搭建](notes/installation/Zookeeper单机环境和集群环境搭建.md)
3. [Zookeeper 常用 Shell 命令](notes/Zookeeper常用Shell命令.md)
4. [Zookeeper Java 客户端 —— Apache Curator](notes/Zookeeper_Java客户端Curator.md)
5. [Zookeeper ACL 权限控制](notes/Zookeeper_ACL权限控制.md)
## 九、Flume
1. [Flume 简介及基本使用](notes/Flume简介及基本使用.md)
2. [Linux 环境下 Flume 的安装部署](notes/installation/Linux下Flume的安装.md)
3. [Flume 整合 Kafka](notes/Flume整合Kafka.md)
## 十、Sqoop
1. [Sqoop 简介与安装](notes/Sqoop简介与安装.md)
2. [Sqoop 的基本使用](notes/Sqoop基本使用.md)
## 十一、Azkaban
1. [Azkaban 简介](notes/Azkaban简介.md)
2. [Azkaban3.x 编译及部署](notes/installation/Azkaban_3.x_编译及部署.md)
3. [Azkaban Flow 1.0 的使用](notes/Azkaban_Flow_1.0_的使用.md)
4. [Azkaban Flow 2.0 的使用](notes/Azkaban_Flow_2.0_的使用.md)
## 十二、Scala
1. [Scala 简介及开发环境配置](notes/Scala简介及开发环境配置.md)
2. [基本数据类型和运算符](notes/Scala基本数据类型和运算符.md)
3. [流程控制语句](notes/Scala流程控制语句.md)
4. [数组 —— Array](notes/Scala数组.md)
5. [集合类型综述](notes/Scala集合类型.md)
6. [常用集合类型之 —— List & Set](notes/Scala列表和集.md)
7. [常用集合类型之 —— Map & Tuple](notes/Scala映射和元组.md)
8. [类和对象](notes/Scala类和对象.md)
9. [继承和特质](notes/Scala继承和特质.md)
10. [函数 & 闭包 & 柯里化](notes/Scala函数和闭包.md)
11. [模式匹配](notes/Scala模式匹配.md)
12. [类型参数](notes/Scala类型参数.md)
13. [隐式转换和隐式参数](notes/Scala隐式转换和隐式参数.md)
## 十三、公共内容
1. [大数据应用常用打包方式](notes/大数据应用常用打包方式.md)
<br>
## :bookmark_tabs: 后 记
[资料分享与开发工具推荐](notes/资料分享与工具推荐.md)
<br>
<div align="center">
<a href = "https://blog.csdn.net/m0_37809146">
<img width="200px" src="https://gitee.com/heibaiying/BigData-Notes/raw/master/pictures/blog-logo.png"/>
</a>
</div>
<div align="center"> <a href = "https://blog.csdn.net/m0_37809146"> 欢迎关注我的博客:https://blog.csdn.net/m0_37809146</a> </div>
================================================
FILE: code/Flink/flink-basis-java/pom.xml
================================================
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.heibaiying</groupId>
<artifactId>flink-basis-java</artifactId>
<version>1.0</version>
<packaging>jar</packaging>
<name>Flink Quickstart Job</name>
<url>http://www.myorganization.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<flink.version>1.9.0</flink.version>
<java.version>1.8</java.version>
<scala.binary.version>2.11</scala.binary.version>
<maven.compiler.source>${java.version}</maven.compiler.source>
<maven.compiler.target>${java.version}</maven.compiler.target>
</properties>
<repositories>
<repository>
<id>apache.snapshots</id>
<name>Apache Development Snapshot Repository</name>
<url>https://repository.apache.org/content/repositories/snapshots/</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
<dependencies>
<!-- Apache Flink dependencies -->
<!-- These dependencies are provided, because they should not be packaged into the JAR file. -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<!-- Add connector dependencies here. They must be in the default scope (compile). -->
<!-- Example:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.10_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
-->
<!-- Add logging framework, to produce console output when running in the IDE. -->
<!-- These dependencies are excluded from the application JAR by default. -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.7</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
<scope>runtime</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.projectlombok/lombok -->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.10</version>
<scope>provided</scope>
</dependency>
</dependencies>
<build>
<plugins>
<!-- Java Compiler -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
</configuration>
</plugin>
<!-- We use the maven-shade plugin to create a fat jar that contains all necessary dependencies. -->
<!-- Change the value of <mainClass>...</mainClass> if your program entry point changes. -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.0.0</version>
<executions>
<!-- Run shade goal on package phase -->
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<artifactSet>
<excludes>
<exclude>org.apache.flink:force-shading</exclude>
<exclude>com.google.code.findbugs:jsr305</exclude>
<exclude>org.slf4j:*</exclude>
<exclude>log4j:*</exclude>
</excludes>
</artifactSet>
<filters>
<filter>
<!-- Do not copy the signatures in the META-INF folder.
Otherwise, this might cause SecurityExceptions when using the JAR. -->
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.heibaiying.StreamingJob</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
<pluginManagement>
<plugins>
<!-- This improves the out-of-the-box experience in Eclipse by resolving some warnings. -->
<plugin>
<groupId>org.eclipse.m2e</groupId>
<artifactId>lifecycle-mapping</artifactId>
<version>1.0.0</version>
<configuration>
<lifecycleMappingMetadata>
<pluginExecutions>
<pluginExecution>
<pluginExecutionFilter>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<versionRange>[3.0.0,)</versionRange>
<goals>
<goal>shade</goal>
</goals>
</pluginExecutionFilter>
<action>
<ignore/>
</action>
</pluginExecution>
<pluginExecution>
<pluginExecutionFilter>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<versionRange>[3.1,)</versionRange>
<goals>
<goal>testCompile</goal>
<goal>compile</goal>
</goals>
</pluginExecutionFilter>
<action>
<ignore/>
</action>
</pluginExecution>
</pluginExecutions>
</lifecycleMappingMetadata>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
<!-- This profile helps to make things run out of the box in IntelliJ -->
<!-- Its adds Flink's core classes to the runtime class path. -->
<!-- Otherwise they are missing in IntelliJ, because the dependency is 'provided' -->
<profiles>
<profile>
<id>add-dependencies-for-IDEA</id>
<activation>
<property>
<name>idea.version</name>
</property>
</activation>
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>compile</scope>
</dependency>
</dependencies>
</profile>
</profiles>
</project>
================================================
FILE: code/Flink/flink-basis-java/src/main/java/com/heibaiying/StreamingJob.java
================================================
package com.heibaiying;
import org.apache.flink.api.java.operators.DataSource;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
public class StreamingJob {
private static final String ROOT_PATH = "D:\\BigData-Notes\\code\\Flink\\flink-basis-java\\src\\main\\resources\\";
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<String> streamSource = env.readTextFile(ROOT_PATH + "log4j.properties");
streamSource.writeAsText(ROOT_PATH + "out").setParallelism(1);
env.execute();
}
}
================================================
FILE: code/Flink/flink-basis-java/src/main/resources/log4j.properties
================================================
################################################################################
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################################################################
log4j.rootLogger=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{HH:mm:ss,SSS} %-5p %-60c %x - %m%n
================================================
FILE: code/Flink/flink-basis-scala/pom.xml
================================================
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.heibaiying</groupId>
<artifactId>flink-basis-scala</artifactId>
<version>1.0</version>
<packaging>jar</packaging>
<name>Flink Quickstart Job</name>
<url>http://www.myorganization.org</url>
<repositories>
<repository>
<id>apache.snapshots</id>
<name>Apache Development Snapshot Repository</name>
<url>https://repository.apache.org/content/repositories/snapshots/</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<flink.version>1.9.0</flink.version>
<scala.binary.version>2.11</scala.binary.version>
<scala.version>2.11.12</scala.version>
</properties>
<dependencies>
<!-- Apache Flink dependencies -->
<!-- These dependencies are provided, because they should not be packaged into the JAR file. -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-scala_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<!-- Scala Library, provided by Flink as well. -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
<scope>provided</scope>
</dependency>
<!-- Add connector dependencies here. They must be in the default scope (compile). -->
<!-- Example:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.10_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
-->
<!-- Add logging framework, to produce console output when running in the IDE. -->
<!-- These dependencies are excluded from the application JAR by default. -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.7</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
<scope>runtime</scope>
</dependency>
</dependencies>
<build>
<plugins>
<!-- We use the maven-shade plugin to create a fat jar that contains all necessary dependencies. -->
<!-- Change the value of <mainClass>...</mainClass> if your program entry point changes. -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.0.0</version>
<executions>
<!-- Run shade goal on package phase -->
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<artifactSet>
<excludes>
<exclude>org.apache.flink:force-shading</exclude>
<exclude>com.google.code.findbugs:jsr305</exclude>
<exclude>org.slf4j:*</exclude>
<exclude>log4j:*</exclude>
</excludes>
</artifactSet>
<filters>
<filter>
<!-- Do not copy the signatures in the META-INF folder.
Otherwise, this might cause SecurityExceptions when using the JAR. -->
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.heibaiying.StreamingJob</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
<!-- Java Compiler -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<!-- Scala Compiler -->
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.2</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
</plugin>
<!-- Eclipse Scala Integration -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-eclipse-plugin</artifactId>
<version>2.8</version>
<configuration>
<downloadSources>true</downloadSources>
<projectnatures>
<projectnature>org.scala-ide.sdt.core.scalanature</projectnature>
<projectnature>org.eclipse.jdt.core.javanature</projectnature>
</projectnatures>
<buildcommands>
<buildcommand>org.scala-ide.sdt.core.scalabuilder</buildcommand>
</buildcommands>
<classpathContainers>
<classpathContainer>org.scala-ide.sdt.launching.SCALA_CONTAINER</classpathContainer>
<classpathContainer>org.eclipse.jdt.launching.JRE_CONTAINER</classpathContainer>
</classpathContainers>
<excludes>
<exclude>org.scala-lang:scala-library</exclude>
<exclude>org.scala-lang:scala-compiler</exclude>
</excludes>
<sourceIncludes>
<sourceInclude>**/*.scala</sourceInclude>
<sourceInclude>**/*.java</sourceInclude>
</sourceIncludes>
</configuration>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>build-helper-maven-plugin</artifactId>
<version>1.7</version>
<executions>
<!-- Add src/main/scala to eclipse build path -->
<execution>
<id>add-source</id>
<phase>generate-sources</phase>
<goals>
<goal>add-source</goal>
</goals>
<configuration>
<sources>
<source>src/main/scala</source>
</sources>
</configuration>
</execution>
<!-- Add src/test/scala to eclipse build path -->
<execution>
<id>add-test-source</id>
<phase>generate-test-sources</phase>
<goals>
<goal>add-test-source</goal>
</goals>
<configuration>
<sources>
<source>src/test/scala</source>
</sources>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
<!-- This profile helps to make things run out of the box in IntelliJ -->
<!-- Its adds Flink's core classes to the runtime class path. -->
<!-- Otherwise they are missing in IntelliJ, because the dependency is 'provided' -->
<profiles>
<profile>
<id>add-dependencies-for-IDEA</id>
<activation>
<property>
<name>idea.version</name>
</property>
</activation>
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-scala_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
<scope>compile</scope>
</dependency>
</dependencies>
</profile>
</profiles>
</project>
================================================
FILE: code/Flink/flink-basis-scala/src/main/resources/log4j.properties
================================================
################################################################################
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################################################################
log4j.rootLogger=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{HH:mm:ss,SSS} %-5p %-60c %x - %m%n
================================================
FILE: code/Flink/flink-basis-scala/src/main/resources/wordcount.txt
================================================
a,a,a,a,a
b,b,b
c,c
d,d
================================================
FILE: code/Flink/flink-basis-scala/src/main/scala/com/heibaiying/WordCountBatch.scala
================================================
package com.heibaiying
import org.apache.flink.api.scala._
object WordCountBatch {
def main(args: Array[String]): Unit = {
val benv = ExecutionEnvironment.getExecutionEnvironment
val dataSet = benv.readTextFile("D:\\BigData-Notes\\code\\Flink\\flink-basis-scala\\src\\main\\resources\\wordcount.txt")
dataSet.flatMap { _.toLowerCase.split(",")}
.filter (_.nonEmpty)
.map { (_, 1) }
.groupBy(0)
.sum(1)
.print()
}
}
================================================
FILE: code/Flink/flink-basis-scala/src/main/scala/com/heibaiying/WordCountStreaming.scala
================================================
package com.heibaiying
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.time.Time
object WordCountStreaming {
def main(args: Array[String]): Unit = {
val senv = StreamExecutionEnvironment.getExecutionEnvironment
val dataStream: DataStream[String] = senv.socketTextStream("192.168.0.229", 9999, '\n')
dataStream.flatMap { line => line.toLowerCase.split(",") }
.filter(_.nonEmpty)
.map { word => (word, 1) }
.keyBy(0)
.timeWindow(Time.seconds(3))
.sum(1)
.print()
senv.execute("Streaming WordCount")
}
}
================================================
FILE: code/Flink/flink-kafka-integration/pom.xml
================================================
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.heibaiying</groupId>
<artifactId>flink-kafka-integration</artifactId>
<version>1.0</version>
<packaging>jar</packaging>
<name>Flink Quickstart Job</name>
<url>http://www.myorganization.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<flink.version>1.9.0</flink.version>
<java.version>1.8</java.version>
<scala.binary.version>2.11</scala.binary.version>
<maven.compiler.source>${java.version}</maven.compiler.source>
<maven.compiler.target>${java.version}</maven.compiler.target>
</properties>
<repositories>
<repository>
<id>apache.snapshots</id>
<name>Apache Development Snapshot Repository</name>
<url>https://repository.apache.org/content/repositories/snapshots/</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
<dependencies>
<!-- Apache Flink dependencies -->
<!-- These dependencies are provided, because they should not be packaged into the JAR file. -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<!-- Add connector dependencies here. They must be in the default scope (compile). -->
<!-- Example:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.10_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
-->
<!-- Add logging framework, to produce console output when running in the IDE. -->
<!-- These dependencies are excluded from the application JAR by default. -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.7</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka_2.11</artifactId>
<version>1.9.0</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>8.0.16</version>
</dependency>
</dependencies>
<build>
<plugins>
<!-- Java Compiler -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
</configuration>
</plugin>
<!-- We use the maven-shade plugin to create a fat jar that contains all necessary dependencies. -->
<!-- Change the value of <mainClass>...</mainClass> if your program entry point changes. -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.0.0</version>
<executions>
<!-- Run shade goal on package phase -->
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<artifactSet>
<excludes>
<exclude>org.apache.flink:force-shading</exclude>
<exclude>com.google.code.findbugs:jsr305</exclude>
<exclude>org.slf4j:*</exclude>
<exclude>log4j:*</exclude>
</excludes>
</artifactSet>
<filters>
<filter>
<!-- Do not copy the signatures in the META-INF folder.
Otherwise, this might cause SecurityExceptions when using the JAR. -->
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.heibaiying.KafkaStreamingJob</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
<pluginManagement>
<plugins>
<!-- This improves the out-of-the-box experience in Eclipse by resolving some warnings. -->
<plugin>
<groupId>org.eclipse.m2e</groupId>
<artifactId>lifecycle-mapping</artifactId>
<version>1.0.0</version>
<configuration>
<lifecycleMappingMetadata>
<pluginExecutions>
<pluginExecution>
<pluginExecutionFilter>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<versionRange>[3.0.0,)</versionRange>
<goals>
<goal>shade</goal>
</goals>
</pluginExecutionFilter>
<action>
<ignore/>
</action>
</pluginExecution>
<pluginExecution>
<pluginExecutionFilter>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<versionRange>[3.1,)</versionRange>
<goals>
<goal>testCompile</goal>
<goal>compile</goal>
</goals>
</pluginExecutionFilter>
<action>
<ignore/>
</action>
</pluginExecution>
</pluginExecutions>
</lifecycleMappingMetadata>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
<!-- This profile helps to make things run out of the box in IntelliJ -->
<!-- Its adds Flink's core classes to the runtime class path. -->
<!-- Otherwise they are missing in IntelliJ, because the dependency is 'provided' -->
<profiles>
<profile>
<id>add-dependencies-for-IDEA</id>
<activation>
<property>
<name>idea.version</name>
</property>
</activation>
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>compile</scope>
</dependency>
</dependencies>
</profile>
</profiles>
</project>
================================================
FILE: code/Flink/flink-kafka-integration/src/main/java/com/heibaiying/CustomSinkJob.java
================================================
package com.heibaiying;
import com.heibaiying.bean.Employee;
import com.heibaiying.sink.FlinkToMySQLSink;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import java.sql.Date;
public class CustomSinkJob {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Date date = new Date(System.currentTimeMillis());
DataStreamSource<Employee> streamSource = env.fromElements(
new Employee("hei", 10, date),
new Employee("bai", 20, date),
new Employee("ying", 30, date));
streamSource.addSink(new FlinkToMySQLSink());
env.execute();
}
}
================================================
FILE: code/Flink/flink-kafka-integration/src/main/java/com/heibaiying/KafkaStreamingJob.java
================================================
package com.heibaiying;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer;
import org.apache.flink.streaming.connectors.kafka.KafkaSerializationSchema;
import org.apache.kafka.clients.producer.ProducerRecord;
import javax.annotation.Nullable;
import java.util.Properties;
public class KafkaStreamingJob {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// 1.指定Kafka的相关配置属性
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "192.168.0.229:9092");
// 2.接收Kafka上的数据
DataStream<String> stream = env
.addSource(new FlinkKafkaConsumer<>("flink-stream-in-topic", new SimpleStringSchema(), properties));
// 3.定义计算结果到 Kafka ProducerRecord 的转换
KafkaSerializationSchema<String> kafkaSerializationSchema = new KafkaSerializationSchema<String>() {
@Override
public ProducerRecord<byte[], byte[]> serialize(String element, @Nullable Long timestamp) {
return new ProducerRecord<>("flink-stream-out-topic", element.getBytes());
}
};
// 4. 定义Flink Kafka生产者
FlinkKafkaProducer<String> kafkaProducer = new FlinkKafkaProducer<>("flink-stream-out-topic",
kafkaSerializationSchema,
properties,
FlinkKafkaProducer.Semantic.AT_LEAST_ONCE, 5);
// 5. 将接收到输入元素*2后写出到Kafka
stream.map((MapFunction<String, String>) value -> value + value).addSink(kafkaProducer);
env.execute("Flink Streaming");
}
}
================================================
FILE: code/Flink/flink-kafka-integration/src/main/java/com/heibaiying/bean/Employee.java
================================================
package com.heibaiying.bean;
import java.sql.Date;
public class Employee {
private String name;
private int age;
private Date birthday;
Employee(){}
public Employee(String name, int age, Date birthday) {
this.name = name;
this.age = age;
this.birthday = birthday;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public int getAge() {
return age;
}
public void setAge(int age) {
this.age = age;
}
public Date getBirthday() {
return birthday;
}
public void setBirthday(Date birthday) {
this.birthday = birthday;
}
}
================================================
FILE: code/Flink/flink-kafka-integration/src/main/java/com/heibaiying/sink/FlinkToMySQLSink.java
================================================
package com.heibaiying.sink;
import com.heibaiying.bean.Employee;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
public class FlinkToMySQLSink extends RichSinkFunction<Employee> {
private PreparedStatement stmt;
private Connection conn;
@Override
public void open(Configuration parameters) throws Exception {
Class.forName("com.mysql.cj.jdbc.Driver");
conn = DriverManager.getConnection("jdbc:mysql://192.168.0.229:3306/employees?characterEncoding=UTF-8&serverTimezone=UTC&useSSL=false", "root", "123456");
String sql = "insert into emp(name, age, birthday) values(?, ?, ?)";
stmt = conn.prepareStatement(sql);
}
@Override
public void invoke(Employee value, Context context) throws Exception {
stmt.setString(1, value.getName());
stmt.setInt(2, value.getAge());
stmt.setDate(3, value.getBirthday());
stmt.executeUpdate();
}
@Override
public void close() throws Exception {
super.close();
if (stmt != null) {
stmt.close();
}
if (conn != null) {
conn.close();
}
}
}
================================================
FILE: code/Flink/flink-kafka-integration/src/main/resources/log4j.properties
================================================
################################################################################
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################################################################
log4j.rootLogger=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{HH:mm:ss,SSS} %-5p %-60c %x - %m%n
================================================
FILE: code/Flink/flink-state-management/pom.xml
================================================
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.heibaiying</groupId>
<artifactId>flink-state-management</artifactId>
<version>1.0</version>
<packaging>jar</packaging>
<name>Flink Quickstart Job</name>
<url>http://www.myorganization.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<flink.version>1.9.0</flink.version>
<java.version>1.8</java.version>
<scala.binary.version>2.11</scala.binary.version>
<maven.compiler.source>${java.version}</maven.compiler.source>
<maven.compiler.target>${java.version}</maven.compiler.target>
</properties>
<repositories>
<repository>
<id>apache.snapshots</id>
<name>Apache Development Snapshot Repository</name>
<url>https://repository.apache.org/content/repositories/snapshots/</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
<dependencies>
<!-- Apache Flink dependencies -->
<!-- These dependencies are provided, because they should not be packaged into the JAR file. -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>provided</scope>
</dependency>
<!-- Add connector dependencies here. They must be in the default scope (compile). -->
<!-- Example:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.10_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
-->
<!-- Add logging framework, to produce console output when running in the IDE. -->
<!-- These dependencies are excluded from the application JAR by default. -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.7</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-statebackend-rocksdb_2.11</artifactId>
<version>1.9.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<!-- Java Compiler -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>${java.version}</source>
<target>${java.version}</target>
</configuration>
</plugin>
<!-- We use the maven-shade plugin to create a fat jar that contains all necessary dependencies. -->
<!-- Change the value of <mainClass>...</mainClass> if your program entry point changes. -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.0.0</version>
<executions>
<!-- Run shade goal on package phase -->
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<artifactSet>
<excludes>
<exclude>org.apache.flink:force-shading</exclude>
<exclude>com.google.code.findbugs:jsr305</exclude>
<exclude>org.slf4j:*</exclude>
<exclude>log4j:*</exclude>
</excludes>
</artifactSet>
<filters>
<filter>
<!-- Do not copy the signatures in the META-INF folder.
Otherwise, this might cause SecurityExceptions when using the JAR. -->
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.heibaiying.keyedstate.KeyedStateJob</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
<pluginManagement>
<plugins>
<!-- This improves the out-of-the-box experience in Eclipse by resolving some warnings. -->
<plugin>
<groupId>org.eclipse.m2e</groupId>
<artifactId>lifecycle-mapping</artifactId>
<version>1.0.0</version>
<configuration>
<lifecycleMappingMetadata>
<pluginExecutions>
<pluginExecution>
<pluginExecutionFilter>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<versionRange>[3.0.0,)</versionRange>
<goals>
<goal>shade</goal>
</goals>
</pluginExecutionFilter>
<action>
<ignore/>
</action>
</pluginExecution>
<pluginExecution>
<pluginExecutionFilter>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<versionRange>[3.1,)</versionRange>
<goals>
<goal>testCompile</goal>
<goal>compile</goal>
</goals>
</pluginExecutionFilter>
<action>
<ignore/>
</action>
</pluginExecution>
</pluginExecutions>
</lifecycleMappingMetadata>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
<!-- This profile helps to make things run out of the box in IntelliJ -->
<!-- Its adds Flink's core classes to the runtime class path. -->
<!-- Otherwise they are missing in IntelliJ, because the dependency is 'provided' -->
<profiles>
<profile>
<id>add-dependencies-for-IDEA</id>
<activation>
<property>
<name>idea.version</name>
</property>
</activation>
<dependencies>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
<scope>compile</scope>
</dependency>
</dependencies>
</profile>
</profiles>
</project>
================================================
FILE: code/Flink/flink-state-management/src/main/java/com/heibaiying/keyedstate/KeyedStateJob.java
================================================
package com.heibaiying.keyedstate;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
public class KeyedStateJob {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStreamSource<Tuple2<String, Long>> tuple2DataStreamSource = env.fromElements(
Tuple2.of("a", 50L), Tuple2.of("a", 80L), Tuple2.of("a", 400L),
Tuple2.of("a", 100L), Tuple2.of("a", 200L), Tuple2.of("a", 200L),
Tuple2.of("b", 100L), Tuple2.of("b", 200L), Tuple2.of("b", 200L),
Tuple2.of("b", 500L), Tuple2.of("b", 600L), Tuple2.of("b", 700L));
tuple2DataStreamSource
.keyBy(0)
.flatMap(new ThresholdWarning(100L, 3))
.printToErr();
env.execute("Managed Keyed State");
}
}
================================================
FILE: code/Flink/flink-state-management/src/main/java/com/heibaiying/keyedstate/ThresholdWarning.java
================================================
package com.heibaiying.keyedstate;
import org.apache.flink.api.common.functions.RichFlatMapFunction;
import org.apache.flink.api.common.state.ListState;
import org.apache.flink.api.common.state.ListStateDescriptor;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.shaded.guava18.com.google.common.collect.Lists;
import org.apache.flink.util.Collector;
import java.util.ArrayList;
import java.util.List;
public class ThresholdWarning extends RichFlatMapFunction<Tuple2<String, Long>, Tuple2<String, List<Long>>> {
// 通过ListState来存储非正常数据的状态
private transient ListState<Long> abnormalData;
// 需要监控阈值
private Long threshold;
// 达到阈值多少次后触发报警
private Integer numberOfTimes;
ThresholdWarning(Long threshold, Integer numberOfTimes) {
this.threshold = threshold;
this.numberOfTimes = numberOfTimes;
}
@Override
public void open(Configuration parameters) {
// 通过状态名称(句柄)获取状态实例,如果不存在则会自动创建
abnormalData = getRuntimeContext().getListState(new ListStateDescriptor<>("abnormalData", Long.class));
}
@Override
public void flatMap(Tuple2<String, Long> value, Collector<Tuple2<String, List<Long>>> out) throws Exception {
Long inputValue = value.f1;
// 如果输入值超过阈值,则记录该次不正常的数据信息
if (inputValue >= threshold) {
abnormalData.add(inputValue);
}
ArrayList<Long> list = Lists.newArrayList(abnormalData.get().iterator());
// 如果不正常的数据出现达到一定次数,则输出报警信息
if (list.size() >= numberOfTimes) {
out.collect(Tuple2.of(value.f0 + " 超过指定阈值 ", list));
// 报警信息输出后,清空暂存的状态
abnormalData.clear();
}
}
}
================================================
FILE: code/Flink/flink-state-management/src/main/java/com/heibaiying/keyedstate/ThresholdWarningWithTTL.java
================================================
package com.heibaiying.keyedstate;
import org.apache.flink.api.common.functions.RichFlatMapFunction;
import org.apache.flink.api.common.state.ListState;
import org.apache.flink.api.common.state.ListStateDescriptor;
import org.apache.flink.api.common.state.StateTtlConfig;
import org.apache.flink.api.common.time.Time;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.shaded.guava18.com.google.common.collect.Lists;
import org.apache.flink.util.Collector;
import java.util.ArrayList;
import java.util.List;
public class ThresholdWarningWithTTL extends RichFlatMapFunction<Tuple2<String, Long>, Tuple2<String, List<Long>>> {
private transient ListState<Long> abnormalData;
private Long threshold;
private Integer numberOfTimes;
ThresholdWarningWithTTL(Long threshold, Integer numberOfTimes) {
this.threshold = threshold;
this.numberOfTimes = numberOfTimes;
}
@Override
public void open(Configuration parameters) {
StateTtlConfig ttlConfig = StateTtlConfig
// 设置有效期为 10 秒
.newBuilder(Time.seconds(10))
// 设置有效期更新规则,这里设置为当创建和写入时,都重置其有效期到规定的10秒
.setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite)
// 设置只要值过期就不可见,另外一个可选值是 ReturnExpiredIfNotCleanedUp,代表即使值过期了,但如果还没有被删除,就是可见的
.setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired)
.build();
ListStateDescriptor<Long> descriptor = new ListStateDescriptor<>("abnormalData", Long.class);
descriptor.enableTimeToLive(ttlConfig);
this.abnormalData = getRuntimeContext().getListState(descriptor);
}
@Override
public void flatMap(Tuple2<String, Long> value, Collector<Tuple2<String, List<Long>>> out) throws Exception {
Long inputValue = value.f1;
if (inputValue >= threshold) {
abnormalData.add(inputValue);
}
ArrayList<Long> list = Lists.newArrayList(abnormalData.get().iterator());
if (list.size() >= numberOfTimes) {
out.collect(Tuple2.of(value.f0 + " 超过指定阈值 ", list));
abnormalData.clear();
}
}
}
================================================
FILE: code/Flink/flink-state-management/src/main/java/com/heibaiying/operatorstate/OperatorStateJob.java
================================================
package com.heibaiying.operatorstate;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
public class OperatorStateJob {
public static void main(String[] args) throws Exception {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// 开启检查点机制
env.enableCheckpointing(1000);
// 设置并行度为1
DataStreamSource<Tuple2<String, Long>> tuple2DataStreamSource = env.setParallelism(1).fromElements(
Tuple2.of("a", 50L), Tuple2.of("a", 80L), Tuple2.of("a", 400L),
Tuple2.of("a", 100L), Tuple2.of("a", 200L), Tuple2.of("a", 200L),
Tuple2.of("b", 100L), Tuple2.of("b", 200L), Tuple2.of("b", 200L),
Tuple2.of("b", 500L), Tuple2.of("b", 600L), Tuple2.of("b", 700L));
tuple2DataStreamSource
.flatMap(new ThresholdWarning(100L, 3))
.printToErr();
env.execute("Managed Keyed State");
}
}
================================================
FILE: code/Flink/flink-state-management/src/main/java/com/heibaiying/operatorstate/ThresholdWarning.java
================================================
package com.heibaiying.operatorstate;
import org.apache.flink.api.common.functions.RichFlatMapFunction;
import org.apache.flink.api.common.state.ListState;
import org.apache.flink.api.common.state.ListStateDescriptor;
import org.apache.flink.api.common.typeinfo.TypeHint;
import org.apache.flink.api.common.typeinfo.TypeInformation;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.runtime.state.FunctionInitializationContext;
import org.apache.flink.runtime.state.FunctionSnapshotContext;
import org.apache.flink.streaming.api.checkpoint.CheckpointedFunction;
import org.apache.flink.util.Collector;
import java.util.ArrayList;
import java.util.List;
public class ThresholdWarning extends RichFlatMapFunction<Tuple2<String, Long>, Tuple2<String, List<Tuple2<String, Long>>>> implements CheckpointedFunction {
// 非正常数据
private List<Tuple2<String, Long>> bufferedData;
// checkPointedState
private transient ListState<Tuple2<String, Long>> checkPointedState;
// 需要监控的阈值
private Long threshold;
// 次数
private Integer numberOfTimes;
ThresholdWarning(Long threshold, Integer numberOfTimes) {
this.threshold = threshold;
this.numberOfTimes = numberOfTimes;
this.bufferedData = new ArrayList<>();
}
@Override
public void initializeState(FunctionInitializationContext context) throws Exception {
// 注意这里获取的是OperatorStateStore
checkPointedState = context.getOperatorStateStore().getListState(new ListStateDescriptor<>("abnormalData",
TypeInformation.of(new TypeHint<Tuple2<String, Long>>() {
})));
// 如果发生重启,则需要从快照中将状态进行恢复
if (context.isRestored()) {
for (Tuple2<String, Long> element : checkPointedState.get()) {
bufferedData.add(element);
}
}
}
@Override
public void flatMap(Tuple2<String, Long> value, Collector<Tuple2<String, List<Tuple2<String, Long>>>> out) {
Long inputValue = value.f1;
// 超过阈值则进行记录
if (inputValue >= threshold) {
bufferedData.add(value);
}
// 超过指定次数则输出报警信息
if (bufferedData.size() >= numberOfTimes) {
// 顺便输出状态实例的hashcode
out.collect(Tuple2.of(checkPointedState.hashCode() + "阈值警报!", bufferedData));
bufferedData.clear();
}
}
@Override
public void snapshotState(FunctionSnapshotContext context) throws Exception {
// 在进行快照时,将数据存储到checkPointedState
checkPointedState.clear();
for (Tuple2<String, Long> element : bufferedData) {
checkPointedState.add(element);
}
}
}
================================================
FILE: code/Flink/flink-state-management/src/main/resources/log4j.properties
================================================
################################################################################
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################################################################
log4j.rootLogger=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{HH:mm:ss,SSS} %-5p %-60c %x - %m%n
================================================
FILE: code/Hadoop/hadoop-word-count/pom.xml
================================================
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.heibaiying</groupId>
<artifactId>hadoop-word-count</artifactId>
<version>1.0</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
</plugin>
</plugins>
</build>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<hadoop.version>2.6.0-cdh5.15.2</hadoop.version>
</properties>
<!---配置CDH仓库地址-->
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
<dependencies>
<!--Hadoop-client-->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.8.1</version>
</dependency>
</dependencies>
</project>
================================================
FILE: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/WordCountApp.java
================================================
package com.heibaiying;
import com.heibaiying.component.WordCountMapper;
import com.heibaiying.component.WordCountReducer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.net.URI;
/**
* 组装作业 并提交到集群运行
*/
public class WordCountApp {
// 这里为了直观显示参数 使用了硬编码,实际开发中可以通过外部传参
private static final String HDFS_URL = "hdfs://192.168.0.107:8020";
private static final String HADOOP_USER_NAME = "root";
public static void main(String[] args) throws Exception {
// 文件输入路径和输出路径由外部传参指定
if (args.length < 2) {
System.out.println("Input and output paths are necessary!");
return;
}
// 需要指明hadoop用户名,否则在HDFS上创建目录时可能会抛出权限不足的异常
System.setProperty("HADOOP_USER_NAME", HADOOP_USER_NAME);
Configuration configuration = new Configuration();
// 指明HDFS的地址
configuration.set("fs.defaultFS", HDFS_URL);
// 创建一个Job
Job job = Job.getInstance(configuration);
// 设置运行的主类
job.setJarByClass(WordCountApp.class);
// 设置Mapper和Reducer
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
// 设置Mapper输出key和value的类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// 设置Reducer输出key和value的类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// 如果输出目录已经存在,则必须先删除,否则重复运行程序时会抛出异常
FileSystem fileSystem = FileSystem.get(new URI(HDFS_URL), configuration, HADOOP_USER_NAME);
Path outputPath = new Path(args[1]);
if (fileSystem.exists(outputPath)) {
fileSystem.delete(outputPath, true);
}
// 设置作业输入文件和输出文件的路径
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, outputPath);
// 将作业提交到群集并等待它完成,参数设置为true代表打印显示对应的进度
boolean result = job.waitForCompletion(true);
// 关闭之前创建的fileSystem
fileSystem.close();
// 根据作业结果,终止当前运行的Java虚拟机,退出程序
System.exit(result ? 0 : -1);
}
}
================================================
FILE: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/WordCountCombinerApp.java
================================================
package com.heibaiying;
import com.heibaiying.component.WordCountMapper;
import com.heibaiying.component.WordCountReducer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.net.URI;
/**
* 组装作业 并提交到集群运行
*/
public class WordCountCombinerApp {
// 这里为了直观显示参数 使用了硬编码的形式,实际开发中可以通过外部传参
private static final String HDFS_URL = "hdfs://192.168.0.107:8020";
private static final String HADOOP_USER_NAME = "root";
public static void main(String[] args) throws Exception {
// 文件输入路径和输出路径由外部传参指定
if (args.length < 2) {
System.out.println("Input and output paths are necessary!");
return;
}
// 需要指明hadoop用户名,否则在HDFS上创建目录时可能会抛出权限不足的异常
System.setProperty("HADOOP_USER_NAME", HADOOP_USER_NAME);
Configuration configuration = new Configuration();
// 指明HDFS的地址
configuration.set("fs.defaultFS", HDFS_URL);
// 创建一个Job
Job job = Job.getInstance(configuration);
// 设置运行的主类
job.setJarByClass(WordCountCombinerApp.class);
// 设置Mapper和Reducer
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
// 设置Combiner
job.setCombinerClass(WordCountReducer.class);
// 设置Mapper输出key和value的类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// 设置Reducer输出key和value的类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// 如果输出目录已经存在,则必须先删除,否则重复运行程序时会抛出异常
FileSystem fileSystem = FileSystem.get(new URI(HDFS_URL), configuration, HADOOP_USER_NAME);
Path outputPath = new Path(args[1]);
if (fileSystem.exists(outputPath)) {
fileSystem.delete(outputPath, true);
}
// 设置作业输入文件和输出文件的路径
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, outputPath);
// 将作业提交到群集并等待它完成,参数设置为true代表打印显示对应的进度
boolean result = job.waitForCompletion(true);
// 关闭之前创建的fileSystem
fileSystem.close();
// 根据作业结果,终止当前运行的Java虚拟机,退出程序
System.exit(result ? 0 : -1);
}
}
================================================
FILE: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/WordCountCombinerPartitionerApp.java
================================================
package com.heibaiying;
import com.heibaiying.component.CustomPartitioner;
import com.heibaiying.component.WordCountMapper;
import com.heibaiying.component.WordCountReducer;
import com.heibaiying.utils.WordCountDataUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.net.URI;
/**
* 组装作业 并提交到集群运行
*/
public class WordCountCombinerPartitionerApp {
// 这里为了直观显示参数 使用了硬编码的形式,实际开发中可以通过外部传参
private static final String HDFS_URL = "hdfs://192.168.0.107:8020";
private static final String HADOOP_USER_NAME = "root";
public static void main(String[] args) throws Exception {
// 文件输入路径和输出路径由外部传参指定
if (args.length < 2) {
System.out.println("Input and output paths are necessary!");
return;
}
// 需要指明hadoop用户名,否则在HDFS上创建目录时可能会抛出权限不足的异常
System.setProperty("HADOOP_USER_NAME", HADOOP_USER_NAME);
Configuration configuration = new Configuration();
// 指明HDFS的地址
configuration.set("fs.defaultFS", HDFS_URL);
// 创建一个Job
Job job = Job.getInstance(configuration);
// 设置运行的主类
job.setJarByClass(WordCountCombinerPartitionerApp.class);
// 设置Mapper和Reducer
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
// 设置Combiner
job.setCombinerClass(WordCountReducer.class);
// 设置自定义分区规则
job.setPartitionerClass(CustomPartitioner.class);
// 设置reduce个数
job.setNumReduceTasks(WordCountDataUtils.WORD_LIST.size());
// 设置Mapper输出key和value的类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// 设置Reducer输出key和value的类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// 如果输出目录已经存在,则必须先删除,否则重复运行程序时会抛出异常
FileSystem fileSystem = FileSystem.get(new URI(HDFS_URL), configuration, HADOOP_USER_NAME);
Path outputPath = new Path(args[1]);
if (fileSystem.exists(outputPath)) {
fileSystem.delete(outputPath, true);
}
// 设置作业输入文件和输出文件的路径
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, outputPath);
// 将作业提交到群集并等待它完成,参数设置为true代表打印显示对应的进度
boolean result = job.waitForCompletion(true);
// 关闭之前创建的fileSystem
fileSystem.close();
// 根据作业结果,终止当前运行的Java虚拟机,退出程序
System.exit(result ? 0 : -1);
}
}
================================================
FILE: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/component/CustomPartitioner.java
================================================
package com.heibaiying.component;
import com.heibaiying.utils.WordCountDataUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;
/**
* 自定义partitioner,按照单词分区
*/
public class CustomPartitioner extends Partitioner<Text, IntWritable> {
public int getPartition(Text text, IntWritable intWritable, int numPartitions) {
return WordCountDataUtils.WORD_LIST.indexOf(text.toString());
}
}
================================================
FILE: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/component/WordCountMapper.java
================================================
package com.heibaiying.component;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
/**
* 将每行数据按照指定分隔符进行拆分
*/
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] words = value.toString().split("\t");
for (String word : words) {
context.write(new Text(word), new IntWritable(1));
}
}
}
================================================
FILE: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/component/WordCountReducer.java
================================================
package com.heibaiying.component;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
/**
* 进行词频统计
*/
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int count = 0;
for (IntWritable value : values) {
count += value.get();
}
context.write(key, new IntWritable(count));
}
}
================================================
FILE: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/utils/WordCountDataUtils.java
================================================
package com.heibaiying.utils;
import org.apache.commons.lang3.StringUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import java.io.IOException;
import java.net.URI;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.util.Random;
/**
* 产生词频统计模拟数据
*/
public class WordCountDataUtils {
public static final List<String> WORD_LIST = Arrays.asList("Spark", "Hadoop", "HBase", "Storm", "Flink", "Hive");
/**
* 模拟产生词频数据
*
* @return 词频数据
*/
private static String generateData() {
StringBuilder builder = new StringBuilder();
for (int i = 0; i < 1000; i++) {
Collections.shuffle(WORD_LIST);
Random random = new Random();
int endIndex = random.nextInt(WORD_LIST.size()) % (WORD_LIST.size()) + 1;
String line = StringUtils.join(WORD_LIST.toArray(), "\t", 0, endIndex);
builder.append(line).append("\n");
}
return builder.toString();
}
/**
* 模拟产生词频数据并输出到本地
*
* @param outputPath 输出文件路径
*/
private static void generateDataToLocal(String outputPath) {
try {
java.nio.file.Path path = Paths.get(outputPath);
if (Files.exists(path)) {
Files.delete(path);
}
Files.write(path, generateData().getBytes(), StandardOpenOption.CREATE);
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* 模拟产生词频数据并输出到HDFS
*
* @param hdfsUrl HDFS地址
* @param user hadoop用户名
* @param outputPathString 存储到HDFS上的路径
*/
private static void generateDataToHDFS(String hdfsUrl, String user, String outputPathString) {
FileSystem fileSystem = null;
try {
fileSystem = FileSystem.get(new URI(hdfsUrl), new Configuration(), user);
Path outputPath = new Path(outputPathString);
if (fileSystem.exists(outputPath)) {
fileSystem.delete(outputPath, true);
}
FSDataOutputStream out = fileSystem.create(outputPath);
out.write(generateData().getBytes());
out.flush();
out.close();
fileSystem.close();
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
//generateDataToLocal("input.txt");
generateDataToHDFS("hdfs://192.168.0.107:8020", "root", "/wordcount/input.txt");
}
}
================================================
FILE: code/Hadoop/hadoop-word-count/src/main/resources/log4j.properties
================================================
log4j.rootLogger=INFO,CONSOLE
log4j.addivity.org.apache=false
log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.Threshold=INFO
log4j.appender.CONSOLE.layout.ConversionPattern=%d{yyyy-MM-dd HH\:mm\:ss} -%-4r [%t] %-5p %x - %m%n
log4j.appender.CONSOLE.Target=System.out
log4j.appender.CONSOLE.Encoding=UTF-8
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
================================================
FILE: code/Hadoop/hdfs-java-api/pom.xml
================================================
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.heibaiying</groupId>
<artifactId>hdfs-java-api</artifactId>
<version>1.0</version>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<hadoop.version>2.6.0-cdh5.15.2</hadoop.version>
</properties>
<!---配置CDH仓库地址-->
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
<dependencies>
<!--Hadoop-client-->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>
================================================
FILE: code/Hadoop/hdfs-java-api/src/main/java/com/heibaiying/utils/HdfsUtils.java
================================================
package com.heibaiying.utils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.net.URI;
import java.net.URISyntaxException;
/**
* HDFS 工具类
*/
public class HdfsUtils {
private static final String HDFS_PATH = "hdfs://192.168.0.107:8020";
private static final String HDFS_USER = "root";
private static FileSystem fileSystem;
static {
try {
Configuration configuration = new Configuration();
configuration.set("dfs.replication", "1");
fileSystem = FileSystem.get(new URI(HDFS_PATH), configuration, HDFS_USER);
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (URISyntaxException e) {
e.printStackTrace();
}
}
public static FileSystem getFileSystem() {
return fileSystem;
}
/**
* 创建目录 支持递归创建
*
* @param path 路径地址
* @return 创建是否成功
*/
public static boolean mkdir(String path) throws Exception {
return fileSystem.mkdirs(new Path(path));
}
/**
* 查看文件内容
*
* @param path 路径地址
* @return 返回文件内容字符串
*/
public static String text(String path, String encode) throws Exception {
FSDataInputStream inputStream = fileSystem.open(new Path(path));
return inputStreamToString(inputStream, encode);
}
/**
* 创建文件并写入内容
*
* @param path 路径地址
* @param context 文件内容
*/
public void createAndWrite(String path, String context) throws Exception {
FSDataOutputStream out = fileSystem.create(new Path(path));
out.write(context.getBytes());
out.flush();
out.close();
}
/**
* 文件重命名
*
* @param oldPath 旧文件路径
* @param newPath 新文件路径
* @return 重命名是否成功
*/
public boolean rename(String oldPath, String newPath) throws Exception {
return fileSystem.rename(new Path(oldPath), new Path(newPath));
}
/**
* 拷贝文件到HDFS
*
* @param localPath 本地文件路径
* @param hdfsPath 存储到hdfs上的路径
*/
public void copyFromLocalFile(String localPath, String hdfsPath) throws Exception {
fileSystem.copyFromLocalFile(new Path(localPath), new Path(hdfsPath));
}
/**
* 从HDFS下载文件
*
* @param hdfsPath 文件在hdfs上的路径
* @param localPath 存储到本地的路径
*/
public void copyToLocalFile(String hdfsPath, String localPath) throws Exception {
fileSystem.copyToLocalFile(new Path(hdfsPath), new Path(localPath));
}
/**
* 查询给定路径中文件/目录的状态
*
* @param path 目录路径
* @return 文件信息的数组
*/
public FileStatus[] listFiles(String path) throws Exception {
return fileSystem.listStatus(new Path(path));
}
/**
* 查询给定路径中文件的状态和块位置
*
* @param path 路径可以是目录路径也可以是文件路径
* @return 文件信息的数组
*/
public RemoteIterator<LocatedFileStatus> listFilesRecursive(String path, boolean recursive) throws Exception {
return fileSystem.listFiles(new Path(path), recursive);
}
/**
* 查看文件块信息
*
* @param path 文件路径
* @return 块信息数组
*/
public BlockLocation[] getFileBlockLocations(String path) throws Exception {
FileStatus fileStatus = fileSystem.getFileStatus(new Path(path));
return fileSystem.getFileBlockLocations(fileStatus, 0, fileStatus.getLen());
}
/**
* 删除文件
*
* @param path 文件路径
* @return 删除是否成功
*/
public boolean delete(String path) throws Exception {
return fileSystem.delete(new Path(path), true);
}
/**
* 把输入流转换为指定字符
*
* @param inputStream 输入流
* @param encode 指定编码类型
*/
private static String inputStreamToString(InputStream inputStream, String encode) {
try {
if (encode == null || ("".equals(encode))) {
encode = "utf-8";
}
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream, encode));
StringBuilder builder = new StringBuilder();
String str = "";
while ((str = reader.readLine()) != null) {
builder.append(str).append("\n");
}
return builder.toString();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
}
================================================
FILE: code/Hadoop/hdfs-java-api/src/test/java/HdfsTest.java
================================================
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.fs.permission.FsAction;
import org.apache.hadoop.fs.permission.FsPermission;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.Progressable;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import java.io.*;
import java.net.URI;
import java.net.URISyntaxException;
/**
* HDFS常用API
*/
public class HdfsTest {
private static final String HDFS_PATH = "hdfs://192.168.0.106:8020";
private static final String HDFS_USER = "root";
private static FileSystem fileSystem;
/**
* 获取fileSystem
*/
@Before
public void prepare() {
try {
Configuration configuration = new Configuration();
// 这里我启动的是单节点的Hadoop,副本系数可以设置为1,不设置的话默认值为3
configuration.set("dfs.replication", "1");
fileSystem = FileSystem.get(new URI(HDFS_PATH), configuration, HDFS_USER);
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (URISyntaxException e) {
e.printStackTrace();
}
}
/**
* 创建目录,支持递归创建
*/
@Test
public void mkDir() throws Exception {
fileSystem.mkdirs(new Path("/hdfs-api/test0/"));
}
/**
* 创建具有指定权限的目录
*/
@Test
public void mkDirWithPermission() throws Exception {
fileSystem.mkdirs(new Path("/hdfs-api/test1/"),
new FsPermission(FsAction.READ_WRITE, FsAction.READ, FsAction.READ));
}
/**
* 创建文件,并写入内容
*/
@Test
public void create() throws Exception {
// 如果文件存在,默认会覆盖, 可以通过第二个参数进行控制。第三个参数可以控制使用缓冲区的大小
FSDataOutputStream out = fileSystem.create(new Path("/hdfs-api/test/a.txt"),
true, 4096);
out.write("hello hadoop!".getBytes());
out.write("hello spark!".getBytes());
out.write("hello flink!".getBytes());
// 强制将缓冲区中内容刷出
out.flush();
out.close();
}
/**
* 判断文件是否存在
*/
@Test
public void exist() throws Exception {
boolean exists = fileSystem.exists(new Path("/hdfs-api/test/a.txt"));
System.out.println(exists);
}
/**
* 查看文件内容
*/
@Test
public void readToString() throws Exception {
FSDataInputStream inputStream = fileSystem.open(new Path("/hdfs-api/test/a.txt"));
String context = inputStreamToString(inputStream, "utf-8");
System.out.println(context);
}
/**
* 文件重命名
*/
@Test
public void rename() throws Exception {
Path oldPath = new Path("/hdfs-api/test/a.txt");
Path newPath = new Path("/hdfs-api/test/b.txt");
boolean result = fileSystem.rename(oldPath, newPath);
System.out.println(result);
}
/**
* 删除文件
*/
@Test
public void delete() throws Exception {
/*
* 第二个参数代表是否递归删除
* + 如果path是一个目录且递归删除为true, 则删除该目录及其中所有文件;
* + 如果path是一个目录但递归删除为false,则会则抛出异常。
*/
boolean result = fileSystem.delete(new Path("/hdfs-api/test/b.txt"), true);
System.out.println(result);
}
/**
* 上传文件到HDFS
*/
@Test
public void copyFromLocalFile() throws Exception {
// 如果指定的是目录,则会把目录及其中的文件都复制到指定目录下
Path src = new Path("D:\\BigData-Notes\\notes\\installation");
Path dst = new Path("/hdfs-api/test/");
fileSystem.copyFromLocalFile(src, dst);
}
/**
* 上传文件到HDFS
*/
@Test
public void copyFromLocalBigFile() throws Exception {
File file = new File("D:\\kafka.tgz");
final float fileSize = file.length();
InputStream in = new BufferedInputStream(new FileInputStream(file));
FSDataOutputStream out = fileSystem.create(new Path("/hdfs-api/test/kafka5.tgz"),
new Progressable() {
long fileCount = 0;
public void progress() {
fileCount++;
// progress方法每上传大约64KB的数据后就会被调用一次
System.out.println("文件上传总进度:" + (fileCount * 64 * 1024 / fileSize) * 100 + " %");
}
});
IOUtils.copyBytes(in, out, 4096);
}
/**
* 从HDFS上下载文件
*/
@Test
public void copyToLocalFile() throws Exception {
Path src = new Path("/hdfs-api/test/kafka.tgz");
Path dst = new Path("D:\\app\\");
/*
* 第一个参数控制下载完成后是否删除源文件,默认是true,即删除;
* 最后一个参数表示是否将RawLocalFileSystem用作本地文件系统;
* RawLocalFileSystem默认为false,通常情况下可以不设置,
* 但如果你在执行时候抛出NullPointerException异常,则代表你的文件系统与程序可能存在不兼容的情况(window下常见),
* 此时可以将RawLocalFileSystem设置为true
*/
fileSystem.copyToLocalFile(false, src, dst, true);
}
/**
* 查看指定目录下所有文件的信息
*/
@Test
public void listFiles() throws Exception {
FileStatus[] statuses = fileSystem.listStatus(new Path("/hdfs-api"));
for (FileStatus fileStatus : statuses) {
//fileStatus的toString方法被重写过,直接打印可以看到所有信息
System.out.println(fileStatus.toString());
}
}
/**
* 递归查看指定目录下所有文件的信息
*/
@Test
public void listFilesRecursive() throws Exception {
RemoteIterator<LocatedFileStatus> files = fileSystem.listFiles(new Path("/hbase"), true);
while (files.hasNext()) {
System.out.println(files.next());
}
}
/**
* 查看文件块信息
*/
@Test
public void getFileBlockLocations() throws Exception {
FileStatus fileStatus = fileSystem.getFileStatus(new Path("/hdfs-api/test/kafka.tgz"));
BlockLocation[] blocks = fileSystem.getFileBlockLocations(fileStatus, 0, fileStatus.getLen());
for (BlockLocation block : blocks) {
System.out.println(block);
}
}
/**
* 测试结束后,释放fileSystem
*/
@After
public void destroy() {
fileSystem = null;
}
/**
* 把输入流转换为指定编码的字符
*
* @param inputStream 输入流
* @param encode 指定编码类型
*/
private static String inputStreamToString(InputStream inputStream, String encode) {
try {
if (encode == null || ("".equals(encode))) {
encode = "utf-8";
}
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream, encode));
StringBuilder builder = new StringBuilder();
String str = "";
while ((str = reader.readLine()) != null) {
builder.append(str).append("\n");
}
return builder.toString();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
}
================================================
FILE: code/Hbase/hbase-java-api-1.x/pom.xml
================================================
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.heibaiying</groupId>
<artifactId>hbase-java-api-1.x</artifactId>
<version>1.0-SNAPSHOT</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.2.0</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>
================================================
FILE: code/Hbase/hbase-java-api-1.x/src/main/java/com/heibaiying/HBaseUtils.java
================================================
package com.heibaiying;
import javafx.util.Pair;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
import java.util.List;
public class HBaseUtils {
private static Connection connection;
static {
Configuration configuration = HBaseConfiguration.create();
configuration.set("hbase.zookeeper.property.clientPort", "2181");
// 如果是集群 则主机名用逗号分隔
configuration.set("hbase.zookeeper.quorum", "hadoop001");
try {
connection = ConnectionFactory.createConnection(configuration);
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* 创建HBase表
*
* @param tableName 表名
* @param columnFamilies 列族的数组
*/
public static boolean createTable(String tableName, List<String> columnFamilies) {
try {
HBaseAdmin admin = (HBaseAdmin) connection.getAdmin();
if (admin.tableExists(tableName)) {
return false;
}
HTableDescriptor tableDescriptor = new HTableDescriptor(TableName.valueOf(tableName));
columnFamilies.forEach(columnFamily -> {
HColumnDescriptor columnDescriptor = new HColumnDescriptor(columnFamily);
columnDescriptor.setMaxVersions(1);
tableDescriptor.addFamily(columnDescriptor);
});
admin.createTable(tableDescriptor);
} catch (IOException e) {
e.printStackTrace();
}
return true;
}
/**
* 删除hBase表
*
* @param tableName 表名
*/
public static boolean deleteTable(String tableName) {
try {
HBaseAdmin admin = (HBaseAdmin) connection.getAdmin();
// 删除表前需要先禁用表
admin.disableTable(tableName);
admin.deleteTable(tableName);
} catch (Exception e) {
e.printStackTrace();
}
return true;
}
/**
* 插入数据
*
* @param tableName 表名
* @param rowKey 唯一标识
* @param columnFamilyName 列族名
* @param qualifier 列标识
* @param value 数据
*/
public static boolean putRow(String tableName, String rowKey, String columnFamilyName, String qualifier,
String value) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
Put put = new Put(Bytes.toBytes(rowKey));
put.addColumn(Bytes.toBytes(columnFamilyName), Bytes.toBytes(qualifier), Bytes.toBytes(value));
table.put(put);
table.close();
} catch (IOException e) {
e.printStackTrace();
}
return true;
}
/**
* 插入数据
*
* @param tableName 表名
* @param rowKey 唯一标识
* @param columnFamilyName 列族名
* @param pairList 列标识和值的集合
*/
public static boolean putRow(String tableName, String rowKey, String columnFamilyName, List<Pair<String, String>> pairList) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
Put put = new Put(Bytes.toBytes(rowKey));
pairList.forEach(pair -> put.addColumn(Bytes.toBytes(columnFamilyName), Bytes.toBytes(pair.getKey()), Bytes.toBytes(pair.getValue())));
table.put(put);
table.close();
} catch (IOException e) {
e.printStackTrace();
}
return true;
}
/**
* 根据rowKey获取指定行的数据
*
* @param tableName 表名
* @param rowKey 唯一标识
*/
public static Result getRow(String tableName, String rowKey) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
Get get = new Get(Bytes.toBytes(rowKey));
return table.get(get);
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
/**
* 获取指定行指定列(cell)的最新版本的数据
*
* @param tableName 表名
* @param rowKey 唯一标识
* @param columnFamily 列族
* @param qualifier 列标识
*/
public static String getCell(String tableName, String rowKey, String columnFamily, String qualifier) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
Get get = new Get(Bytes.toBytes(rowKey));
if (!get.isCheckExistenceOnly()) {
get.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(qualifier));
Result result = table.get(get);
byte[] resultValue = result.getValue(Bytes.toBytes(columnFamily), Bytes.toBytes(qualifier));
return Bytes.toString(resultValue);
} else {
return null;
}
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
/**
* 检索全表
*
* @param tableName 表名
*/
public static ResultScanner getScanner(String tableName) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
Scan scan = new Scan();
return table.getScanner(scan);
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
/**
* 检索表中指定数据
*
* @param tableName 表名
* @param filterList 过滤器
*/
public static ResultScanner getScanner(String tableName, FilterList filterList) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
Scan scan = new Scan();
scan.setFilter(filterList);
return table.getScanner(scan);
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
/**
* 检索表中指定数据
*
* @param tableName 表名
* @param startRowKey 起始RowKey
* @param endRowKey 终止RowKey
* @param filterList 过滤器
*/
public static ResultScanner getScanner(String tableName, String startRowKey, String endRowKey,
FilterList filterList) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
Scan scan = new Scan();
scan.setStartRow(Bytes.toBytes(startRowKey));
scan.setStopRow(Bytes.toBytes(endRowKey));
scan.setFilter(filterList);
return table.getScanner(scan);
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
/**
* 删除指定行记录
*
* @param tableName 表名
* @param rowKey 唯一标识
*/
public static boolean deleteRow(String tableName, String rowKey) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
Delete delete = new Delete(Bytes.toBytes(rowKey));
table.delete(delete);
} catch (IOException e) {
e.printStackTrace();
}
return true;
}
/**
* 删除指定行的指定列
*
* @param tableName 表名
* @param rowKey 唯一标识
* @param familyName 列族
* @param qualifier 列标识
*/
public static boolean deleteColumn(String tableName, String rowKey, String familyName,
String qualifier) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
Delete delete = new Delete(Bytes.toBytes(rowKey));
delete.addColumn(Bytes.toBytes(familyName), Bytes.toBytes(qualifier));
table.delete(delete);
table.close();
} catch (IOException e) {
e.printStackTrace();
}
return true;
}
}
================================================
FILE: code/Hbase/hbase-java-api-1.x/src/test/java/com/heibaiying/HbaseUtilsTest.java
================================================
package com.heibaiying;
import javafx.util.Pair;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.filter.CompareFilter;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.Test;
import java.util.Arrays;
import java.util.List;
public class HBaseUtilsTest {
private static final String TABLE_NAME = "WordCount";
private static final String TEACHER = "teacher";
private static final String STUDENT = "student";
@Test
public void createTable() {
// 新建表
List<String> columnFamilies = Arrays.asList(TEACHER, STUDENT);
boolean table = HBaseUtils.createTable(TABLE_NAME, columnFamilies);
System.out.println("表创建结果:" + table);
}
@Test
public void insertData() {
List<Pair<String, String>> pairs1 = Arrays.asList(new Pair<>("name", "Tom"),
new Pair<>("age", "22"),
new Pair<>("gender", "1"));
HBaseUtils.putRow(TABLE_NAME, "rowKey1", STUDENT, pairs1);
List<Pair<String, String>> pairs2 = Arrays.asList(new Pair<>("name", "Jack"),
new Pair<>("age", "33"),
new Pair<>("gender", "2"));
HBaseUtils.putRow(TABLE_NAME, "rowKey2", STUDENT, pairs2);
List<Pair<String, String>> pairs3 = Arrays.asList(new Pair<>("name", "Mike"),
new Pair<>("age", "44"),
new Pair<>("gender", "1"));
HBaseUtils.putRow(TABLE_NAME, "rowKey3", STUDENT, pairs3);
}
@Test
public void getRow() {
Result result = HBaseUtils.getRow(TABLE_NAME, "rowKey1");
if (result != null) {
System.out.println(Bytes
.toString(result.getValue(Bytes.toBytes(STUDENT), Bytes.toBytes("name"))));
}
}
@Test
public void getCell() {
String cell = HBaseUtils.getCell(TABLE_NAME, "rowKey2", STUDENT, "age");
System.out.println("cell age :" + cell);
}
@Test
public void getScanner() {
ResultScanner scanner = HBaseUtils.getScanner(TABLE_NAME);
if (scanner != null) {
scanner.forEach(result -> System.out.println(Bytes.toString(result.getRow()) + "->" + Bytes
.toString(result.getValue(Bytes.toBytes("cf"), Bytes.toBytes("count")))));
scanner.close();
}
}
@Test
public void getScannerWithFilter() {
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);
SingleColumnValueFilter nameFilter = new SingleColumnValueFilter(Bytes.toBytes(STUDENT),
Bytes.toBytes("name"), CompareFilter.CompareOp.EQUAL, Bytes.toBytes("Jack"));
filterList.addFilter(nameFilter);
ResultScanner scanner = HBaseUtils.getScanner(TABLE_NAME, filterList);
if (scanner != null) {
scanner.forEach(result -> System.out.println(Bytes.toString(result.getRow()) + "->" + Bytes
.toString(result.getValue(Bytes.toBytes(STUDENT), Bytes.toBytes("name")))));
scanner.close();
}
}
@Test
public void deleteColumn() {
boolean b = HBaseUtils.deleteColumn(TABLE_NAME, "rowKey2", STUDENT, "age");
System.out.println("删除结果: " + b);
}
@Test
public void deleteRow() {
boolean b = HBaseUtils.deleteRow(TABLE_NAME, "rowKey2");
System.out.println("删除结果: " + b);
}
@Test
public void deleteTable() {
boolean b = HBaseUtils.deleteTable(TABLE_NAME);
System.out.println("删除结果: " + b);
}
}
================================================
FILE: code/Hbase/hbase-java-api-2.x/pom.xml
================================================
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.heibaiying</groupId>
<artifactId>hbase-java-api-2.x</artifactId>
<version>1.0-SNAPSHOT</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>2.1.4</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>
================================================
FILE: code/Hbase/hbase-java-api-2.x/src/main/java/com/heibaiying/HBaseUtils.java
================================================
package com.heibaiying;
import javafx.util.Pair;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
import java.util.List;
public class HBaseUtils {
private static Connection connection;
static {
Configuration configuration = HBaseConfiguration.create();
configuration.set("hbase.zookeeper.property.clientPort", "2181");
// 如果是集群 则主机名用逗号分隔
configuration.set("hbase.zookeeper.quorum", "hadoop001");
try {
connection = ConnectionFactory.createConnection(configuration);
} catch (IOException e) {
e.printStackTrace();
}
}
/**
* 创建HBase表
*
* @param tableName 表名
* @param columnFamilies 列族的数组
*/
public static boolean createTable(String tableName, List<String> columnFamilies) {
try {
HBaseAdmin admin = (HBaseAdmin) connection.getAdmin();
if (admin.tableExists(TableName.valueOf(tableName))) {
return false;
}
TableDescriptorBuilder tableDescriptor = TableDescriptorBuilder.newBuilder(TableName.valueOf(tableName));
columnFamilies.forEach(columnFamily -> {
ColumnFamilyDescriptorBuilder cfDescriptorBuilder = ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes(columnFamily));
cfDescriptorBuilder.setMaxVersions(1);
ColumnFamilyDescriptor familyDescriptor = cfDescriptorBuilder.build();
tableDescriptor.setColumnFamily(familyDescriptor);
});
admin.createTable(tableDescriptor.build());
} catch (IOException e) {
e.printStackTrace();
}
return true;
}
/**
* 删除hBase表
*
* @param tableName 表名
*/
public static boolean deleteTable(String tableName) {
try {
HBaseAdmin admin = (HBaseAdmin) connection.getAdmin();
// 删除表前需要先禁用表
admin.disableTable(TableName.valueOf(tableName));
admin.deleteTable(TableName.valueOf(tableName));
} catch (Exception e) {
e.printStackTrace();
}
return true;
}
/**
* 插入数据
*
* @param tableName 表名
* @param rowKey 唯一标识
* @param columnFamilyName 列族名
* @param qualifier 列标识
* @param value 数据
*/
public static boolean putRow(String tableName, String rowKey, String columnFamilyName, String qualifier,
String value) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
Put put = new Put(Bytes.toBytes(rowKey));
put.addColumn(Bytes.toBytes(columnFamilyName), Bytes.toBytes(qualifier), Bytes.toBytes(value));
table.put(put);
table.close();
} catch (IOException e) {
e.printStackTrace();
}
return true;
}
/**
* 插入数据
*
* @param tableName 表名
* @param rowKey 唯一标识
* @param columnFamilyName 列族名
* @param pairList 列标识和值的集合
*/
public static boolean putRow(String tableName, String rowKey, String columnFamilyName, List<Pair<String, String>> pairList) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
Put put = new Put(Bytes.toBytes(rowKey));
pairList.forEach(pair -> put.addColumn(Bytes.toBytes(columnFamilyName), Bytes.toBytes(pair.getKey()), Bytes.toBytes(pair.getValue())));
table.put(put);
table.close();
} catch (IOException e) {
e.printStackTrace();
}
return true;
}
/**
* 根据rowKey获取指定行的数据
*
* @param tableName 表名
* @param rowKey 唯一标识
*/
public static Result getRow(String tableName, String rowKey) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
Get get = new Get(Bytes.toBytes(rowKey));
return table.get(get);
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
/**
* 获取指定行指定列(cell)的最新版本的数据
*
* @param tableName 表名
* @param rowKey 唯一标识
* @param columnFamily 列族
* @param qualifier 列标识
*/
public static String getCell(String tableName, String rowKey, String columnFamily, String qualifier) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
Get get = new Get(Bytes.toBytes(rowKey));
if (!get.isCheckExistenceOnly()) {
get.addColumn(Bytes.toBytes(columnFamily), Bytes.toBytes(qualifier));
Result result = table.get(get);
byte[] resultValue = result.getValue(Bytes.toBytes(columnFamily), Bytes.toBytes(qualifier));
return Bytes.toString(resultValue);
} else {
return null;
}
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
/**
* 检索全表
*
* @param tableName 表名
*/
public static ResultScanner getScanner(String tableName) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
Scan scan = new Scan();
return table.getScanner(scan);
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
/**
* 检索表中指定数据
*
* @param tableName 表名
* @param filterList 过滤器
*/
public static ResultScanner getScanner(String tableName, FilterList filterList) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
Scan scan = new Scan();
scan.setFilter(filterList);
return table.getScanner(scan);
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
/**
* 检索表中指定数据
*
* @param tableName 表名
* @param startRowKey 起始RowKey
* @param endRowKey 终止RowKey
* @param filterList 过滤器
*/
public static ResultScanner getScanner(String tableName, String startRowKey, String endRowKey,
FilterList filterList) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
Scan scan = new Scan();
scan.withStartRow(Bytes.toBytes(startRowKey));
scan.withStopRow(Bytes.toBytes(endRowKey));
scan.setFilter(filterList);
return table.getScanner(scan);
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
/**
* 删除指定行记录
*
* @param tableName 表名
* @param rowKey 唯一标识
*/
public static boolean deleteRow(String tableName, String rowKey) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
Delete delete = new Delete(Bytes.toBytes(rowKey));
table.delete(delete);
} catch (IOException e) {
e.printStackTrace();
}
return true;
}
/**
* 删除指定行指定列
*
* @param tableName 表名
* @param rowKey 唯一标识
* @param familyName 列族
* @param qualifier 列标识
*/
public static boolean deleteColumn(String tableName, String rowKey, String familyName,
String qualifier) {
try {
Table table = connection.getTable(TableName.valueOf(tableName));
Delete delete = new Delete(Bytes.toBytes(rowKey));
delete.addColumn(Bytes.toBytes(familyName), Bytes.toBytes(qualifier));
table.delete(delete);
table.close();
} catch (IOException e) {
e.printStackTrace();
}
return true;
}
}
================================================
FILE: code/Hbase/hbase-java-api-2.x/src/test/java/heibaiying/HBaseUtilsTest.java
================================================
package heibaiying;
import com.heibaiying.HBaseUtils;
import javafx.util.Pair;
import org.apache.hadoop.hbase.CompareOperator;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.filter.CompareFilter;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.util.Bytes;
import org.junit.Test;
import java.util.Arrays;
import java.util.List;
public class HBaseUtilsTest {
private static final String TABLE_NAME = "class";
private static final String TEACHER = "teacher";
private static final String STUDENT = "student";
@Test
public void createTable() {
// 新建表
List<String> columnFamilies = Arrays.asList(TEACHER, STUDENT);
boolean table = HBaseUtils.createTable(TABLE_NAME, columnFamilies);
System.out.println("表创建结果:" + table);
}
@Test
public void insertData() {
List<Pair<String, String>> pairs1 = Arrays.asList(new Pair<>("name", "Tom"),
new Pair<>("age", "22"),
new Pair<>("gender", "1"));
HBaseUtils.putRow(TABLE_NAME, "rowKey1", STUDENT, pairs1);
List<Pair<String, String>> pairs2 = Arrays.asList(new Pair<>("name", "Jack"),
new Pair<>("age", "33"),
new Pair<>("gender", "2"));
HBaseUtils.putRow(TABLE_NAME, "rowKey2", STUDENT, pairs2);
List<Pair<String, String>> pairs3 = Arrays.asList(new Pair<>("name", "Mike"),
new Pair<>("age", "44"),
new Pair<>("gender", "1"));
HBaseUtils.putRow(TABLE_NAME, "rowKey3", STUDENT, pairs3);
}
@Test
public void getRow() {
Result result = HBaseUtils.getRow(TABLE_NAME, "rowKey1");
if (result != null) {
System.out.println(Bytes
.toString(result.getValue(Bytes.toBytes(STUDENT), Bytes.toBytes("name"))));
}
}
@Test
public void getCell() {
String cell = HBaseUtils.getCell(TABLE_NAME, "rowKey2", STUDENT, "age");
System.out.println("cell age :" + cell);
}
@Test
public void getScanner() {
ResultScanner scanner = HBaseUtils.getScanner(TABLE_NAME);
if (scanner != null) {
scanner.forEach(result -> System.out.println(Bytes.toString(result.getRow()) + "->" + Bytes
.toString(result.getValue(Bytes.toBytes(STUDENT), Bytes.toBytes("name")))));
scanner.close();
}
}
@Test
public void getScannerWithFilter() {
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL);
SingleColumnValueFilter nameFilter = new SingleColumnValueFilter(Bytes.toBytes(STUDENT),
Bytes.toBytes("name"), CompareOperator.EQUAL, Bytes.toBytes("Jack"));
filterList.addFilter(nameFilter);
ResultScanner scanner = HBaseUtils.getScanner(TABLE_NAME, filterList);
if (scanner != null) {
scanner.forEach(result -> System.out.println(Bytes.toString(result.getRow()) + "->" + Bytes
.toString(result.getValue(Bytes.toBytes(STUDENT), Bytes.toBytes("name")))));
scanner.close();
}
}
@Test
public void deleteColumn() {
boolean b = HBaseUtils.deleteColumn(TABLE_NAME, "rowKey2", STUDENT, "age");
System.out.println("删除结果: " + b);
}
@Test
public void deleteRow() {
boolean b = HBaseUtils.deleteRow(TABLE_NAME, "rowKey2");
System.out.println("删除结果: " + b);
}
@Test
public void deleteTable() {
boolean b = HBaseUtils.deleteTable(TABLE_NAME);
System.out.println("删除结果: " + b);
}
}
================================================
FILE: code/Hbase/hbase-observer-coprocessor/pom.xml
================================================
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.heibaiying</groupId>
<artifactId>hbase-observer-coprocessor</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>1.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>1.2.0</version>
</dependency>
</dependencies>
</project>
================================================
FILE: code/Hbase/hbase-observer-coprocessor/src/main/java/com/heibaiying/AppendRegionObserver.java
================================================
package com.heibaiying;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.CellUtil;
import org.apache.hadoop.hbase.client.Durability;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.coprocessor.BaseRegionObserver;
import org.apache.hadoop.hbase.coprocessor.ObserverContext;
import org.apache.hadoop.hbase.coprocessor.RegionCoprocessorEnvironment;
import org.apache.hadoop.hbase.regionserver.wal.WALEdit;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
import java.util.List;
/**
* 对相同的article:content执行put命令时,将新插入的内容添加到原有内容的末尾
*/
public class AppendRegionObserver extends BaseRegionObserver {
private byte[] columnFamily = Bytes.toBytes("article");
private byte[] qualifier = Bytes.toBytes("content");
@Override
public void prePut(ObserverContext<RegionCoprocessorEnvironment> e, Put put, WALEdit edit,
Durability durability) throws IOException {
if (put.has(columnFamily, qualifier)) {
// 遍历查询结果,获取指定列的原值
Result rs = e.getEnvironment().getRegion().get(new Get(put.getRow()));
String oldValue = "";
for (Cell cell : rs.rawCells())
if (CellUtil.matchingColumn(cell, columnFamily, qualifier)) {
oldValue = Bytes.toString(CellUtil.cloneValue(cell));
}
// 获取指定列新插入的值
List<Cell> cells = put.get(columnFamily, qualifier);
String newValue = "";
for (Cell cell : cells) {
if (CellUtil.matchingColumn(cell, columnFamily, qualifier)) {
newValue = Bytes.toString(CellUtil.cloneValue(cell));
}
}
// Append 操作
put.addColumn(columnFamily, qualifier, Bytes.toBytes(oldValue + newValue));
}
}
}
================================================
FILE: code/Kafka/kafka-basis/pom.xml
================================================
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.heibaiying</groupId>
<artifactId>kafka-basis</artifactId>
<version>1.0</version>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
</plugin>
</plugins>
</build>
<dependencies>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.12</artifactId>
<version>2.2.0</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-nop</artifactId>
<version>1.7.25</version>
</dependency>
</dependencies>
</project>
================================================
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/ConsumerASyn.java
================================================
package com.heibaiying.consumers;
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;
import java.time.Duration;
import java.time.temporal.ChronoUnit;
import java.util.Collections;
import java.util.Map;
import java.util.Properties;
/**
* Kafka消费者——异步提交
*/
public class ConsumerASyn {
public static void main(String[] args) {
String topic = "Hello-Kafka";
String group = "group1";
Properties props = new Properties();
props.put("bootstrap.servers", "hadoop001:9092");
props.put("group.id", group);
props.put("enable.auto.commit", false);
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList(topic));
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.of(100, ChronoUnit.MILLIS));
for (ConsumerRecord<String, String> record : records) {
System.out.println(record);
}
/*异步提交并定义回调*/
consumer.commitAsync(new OffsetCommitCallback() {
@Override
public void onComplete(Map<TopicPartition, OffsetAndMetadata> offsets, Exception exception) {
if (exception != null) {
System.out.println("错误处理");
offsets.forEach((x, y) -> System.out.printf("topic = %s,partition = %d, offset = %s \n",
x.topic(), x.partition(), y.offset()));
}
}
});
}
} finally {
consumer.close();
}
}
}
================================================
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/ConsumerASynAndSyn.java
================================================
package com.heibaiying.consumers;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import java.time.Duration;
import java.time.temporal.ChronoUnit;
import java.util.Collections;
import java.util.Properties;
/**
* Kafka消费者——同步加异步提交
*/
public class ConsumerASynAndSyn {
public static void main(String[] args) {
String topic = "Hello-Kafka";
String group = "group1";
Properties props = new Properties();
props.put("bootstrap.servers", "hadoop001:9092");
props.put("group.id", group);
props.put("enable.auto.commit", false);
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList(topic));
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.of(100, ChronoUnit.MILLIS));
for (ConsumerRecord<String, String> record : records) {
System.out.println(record);
}
// 异步提交
consumer.commitAsync();
}
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
// 因为即将要关闭消费者,所以要用同步提交保证提交成功
consumer.commitSync();
} finally {
consumer.close();
}
}
}
}
================================================
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/ConsumerASynWithOffsets.java
================================================
package com.heibaiying.consumers;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.consumer.OffsetAndMetadata;
import org.apache.kafka.common.TopicPartition;
import java.time.Duration;
import java.time.temporal.ChronoUnit;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
/**
* Kafka消费者——异步提交特定偏移量
*/
public class ConsumerASynWithOffsets {
public static void main(String[] args) {
String topic = "Hello-Kafka";
String group = "group1";
Properties props = new Properties();
props.put("bootstrap.servers", "hadoop001:9092");
props.put("group.id", group);
props.put("enable.auto.commit", false);
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList(topic));
Map<TopicPartition, OffsetAndMetadata> offsets = new HashMap<>();
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.of(100, ChronoUnit.MILLIS));
for (ConsumerRecord<String, String> record : records) {
System.out.println(record);
/*记录每个主题的每个分区的偏移量*/
TopicPartition topicPartition = new TopicPartition(record.topic(), record.partition());
OffsetAndMetadata offsetAndMetadata = new OffsetAndMetadata(record.offset()+1, "no metaData");
/*TopicPartition重写过hashCode和equals方法,所以能够保证同一主题和分区的实例不会被重复添加*/
offsets.put(topicPartition, offsetAndMetadata);
}
/*提交特定偏移量*/
consumer.commitAsync(offsets, null);
}
} finally {
consumer.close();
}
}
}
================================================
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/ConsumerExit.java
================================================
package com.heibaiying.consumers;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.errors.WakeupException;
import java.time.Duration;
import java.time.temporal.ChronoUnit;
import java.util.Collections;
import java.util.Properties;
import java.util.Scanner;
/**
* Kafka消费者和消费者组
*/
public class ConsumerExit {
public static void main(String[] args) {
String topic = "Hello-Kafka";
String group = "group1";
Properties props = new Properties();
props.put("bootstrap.servers", "hadoop001:9092");
props.put("group.id", group);
props.put("enable.auto.commit", false);
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList(topic));
/*调用wakeup优雅的退出*/
final Thread mainThread = Thread.currentThread();
new Thread(() -> {
Scanner sc = new Scanner(System.in);
while (sc.hasNext()) {
if ("exit".equals(sc.next())) {
consumer.wakeup();
try {
/*等待主线程完成提交偏移量、关闭消费者等操作*/
mainThread.join();
break;
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}).start();
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.of(100, ChronoUnit.MILLIS));
for (ConsumerRecord<String, String> record : records) {
System.out.printf("topic = %s,partition = %d, key = %s, value = %s, offset = %d,\n",
record.topic(), record.partition(), record.key(), record.value(), record.offset());
}
}
} catch (WakeupException e) {
//对于wakeup()调用引起的WakeupException异常可以不必处理
} finally {
consumer.close();
System.out.println("consumer关闭");
}
}
}
================================================
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/ConsumerGroup.java
================================================
package com.heibaiying.consumers;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import java.time.Duration;
import java.time.temporal.ChronoUnit;
import java.util.Collections;
import java.util.Properties;
/**
* Kafka消费者和消费者组
*/
public class ConsumerGroup {
public static void main(String[] args) {
String topic = "Hello-Kafka";
String group = "group1";
Properties props = new Properties();
props.put("bootstrap.servers", "hadoop001:9092");
/*指定分组ID*/
props.put("group.id", group);
props.put("enable.auto.commit", true);
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
/*订阅主题(s)*/
consumer.subscribe(Collections.singletonList(topic));
try {
while (true) {
/*轮询获取数据*/
ConsumerRecords<String, String> records = consumer.poll(Duration.of(100, ChronoUnit.MILLIS));
for (ConsumerRecord<String, String> record : records) {
System.out.printf("topic = %s,partition = %d, key = %s, value = %s, offset = %d,\n",
record.topic(), record.partition(), record.key(), record.value(), record.offset());
}
}
} finally {
consumer.close();
}
}
}
================================================
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/ConsumerSyn.java
================================================
package com.heibaiying.consumers;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import java.time.Duration;
import java.time.temporal.ChronoUnit;
import java.util.Collections;
import java.util.Properties;
/**
* Kafka消费者——同步提交
*/
public class ConsumerSyn {
public static void main(String[] args) {
String topic = "Hello-Kafka";
String group = "group1";
Properties props = new Properties();
props.put("bootstrap.servers", "hadoop001:9092");
props.put("group.id", group);
props.put("enable.auto.commit", false);
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Collections.singletonList(topic));
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.of(100, ChronoUnit.MILLIS));
for (ConsumerRecord<String, String> record : records) {
System.out.println(record);
}
/*同步提交*/
consumer.commitSync();
}
} finally {
consumer.close();
}
}
}
================================================
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/RebalanceListener.java
================================================
package com.heibaiying.consumers;
import org.apache.kafka.clients.consumer.*;
import org.apache.kafka.common.TopicPartition;
import java.time.Duration;
import java.time.temporal.ChronoUnit;
import java.util.*;
public class RebalanceListener {
public static void main(String[] args) {
String topic = "Hello-Kafka";
String group = "group1";
Properties props = new Properties();
props.put("bootstrap.servers", "hadoop001:9092");
props.put("group.id", group);
props.put("enable.auto.commit", false);
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
Map<TopicPartition, OffsetAndMetadata> offsets = new HashMap<>();
consumer.subscribe(Collections.singletonList(topic), new ConsumerRebalanceListener() {
/*该方法会在消费者停止读取消息之后,再均衡开始之前就调用*/
@Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
System.out.println("再均衡即将触发");
// 提交已经处理的偏移量
consumer.commitSync(offsets);
}
/*该方法会在重新分配分区之后,消费者开始读取消息之前被调用*/
@Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
}
});
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.of(100, ChronoUnit.MILLIS));
for (ConsumerRecord<String, String> record : records) {
System.out.println(record);
TopicPartition topicPartition = new TopicPartition(record.topic(), record.partition());
OffsetAndMetadata offsetAndMetadata = new OffsetAndMetadata(record.offset() + 1, "no metaData");
/*TopicPartition重写过hashCode和equals方法,所以能够保证同一主题和分区的实例不会被重复添加*/
offsets.put(topicPartition, offsetAndMetadata);
}
consumer.commitAsync(offsets, null);
}
} finally {
consumer.close();
}
}
}
================================================
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/StandaloneConsumer.java
================================================
package com.heibaiying.consumers;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.PartitionInfo;
import org.apache.kafka.common.TopicPartition;
import java.time.Duration;
import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;
/**
* 独立消费者
*/
public class StandaloneConsumer {
public static void main(String[] args) {
String topic = "Kafka-Partitioner-Test";
String group = "group1";
Properties props = new Properties();
props.put("bootstrap.servers", "hadoop001:9092");
props.put("group.id", group);
props.put("enable.auto.commit", false);
props.put("key.deserializer", "org.apache.kafka.common.serialization.IntegerDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<Integer, String> consumer = new KafkaConsumer<>(props);
List<TopicPartition> partitions = new ArrayList<>();
List<PartitionInfo> partitionInfos = consumer.partitionsFor(topic);
/*可以指定读取哪些分区 如这里假设只读取主题的0分区*/
for (PartitionInfo partition : partitionInfos) {
if (partition.partition()==0){
partitions.add(new TopicPartition(partition.topic(), partition.partition()));
}
}
// 为消费者指定分区
consumer.assign(partitions);
while (true) {
ConsumerRecords<Integer, String> records = consumer.poll(Duration.of(100, ChronoUnit.MILLIS));
for (ConsumerRecord<Integer, String> record : records) {
System.out.printf("partition = %s, key = %d, value = %s\n",
record.partition(), record.key(), record.value());
}
consumer.commitSync();
}
}
}
================================================
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/producers/ProducerASyn.java
================================================
package com.heibaiying.producers;
import org.apache.kafka.clients.producer.*;
import java.util.Properties;
/*
* Kafka生产者示例——异步发送消息
*/
public class ProducerASyn {
public static void main(String[] args) {
String topicName = "Hello-Kafka";
Properties props = new Properties();
props.put("bootstrap.servers", "hadoop001:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
/*创建生产者*/
Producer<String, String> producer = new KafkaProducer<>(props);
for (int i = 0; i < 10; i++) {
ProducerRecord<String, String> record = new ProducerRecord<>(topicName, "k" + i, "world" + i);
/*异步发送消息,并监听回调*/
producer.send(record, new Callback() {
@Override
public void onCompletion(RecordMetadata metadata, Exception exception) {
if (exception != null) {
System.out.println("进行异常处理");
} else {
System.out.printf("topic=%s, partition=%d, offset=%s \n",
metadata.topic(), metadata.partition(), metadata.offset());
}
}
});
}
/*关闭生产者*/
producer.close();
}
}
================================================
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/producers/ProducerSyn.java
================================================
package com.heibaiying.producers;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.clients.producer.RecordMetadata;
import java.util.Properties;
import java.util.concurrent.ExecutionException;
/*
* Kafka生产者示例——同步发送消息
*/
public class ProducerSyn {
public static void main(String[] args) {
String topicName = "Hello-Kafka";
Properties props = new Properties();
props.put("bootstrap.servers", "hadoop001:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
/*创建生产者*/
Producer<String, String> producer = new KafkaProducer<>(props);
for (int i = 0; i < 10; i++) {
try {
ProducerRecord<String, String> record = new ProducerRecord<>(topicName, "k" + i, "world" + i);
/*同步发送消息*/
RecordMetadata metadata = producer.send(record).get();
System.out.printf("topic=%s, partition=%d, offset=%s \n",
metadata.topic(), metadata.partition(), metadata.offset());
} catch (InterruptedException | ExecutionException e) {
e.printStackTrace();
}
}
/*关闭生产者*/
producer.close();
}
}
================================================
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/producers/ProducerWithPartitioner.java
================================================
package com.heibaiying.producers;
import org.apache.kafka.clients.producer.*;
import java.util.Properties;
/*
* Kafka生产者示例——异步发送消息
*/
public class ProducerWithPartitioner {
public static void main(String[] args) {
String topicName = "Kafka-Partitioner-Test";
Properties props = new Properties();
props.put("bootstrap.servers", "hadoop001:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.IntegerSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
/*传递自定义分区器*/
props.put("partitioner.class", "com.heibaiying.producers.partitioners.CustomPartitioner");
/*传递分区器所需的参数*/
props.put("pass.line", 6);
Producer<Integer, String> producer = new KafkaProducer<>(props);
for (int i = 0; i <= 10; i++) {
String score = "score:" + i;
ProducerRecord<Integer, String> record = new ProducerRecord<>(topicName, i, score);
/*异步发送消息*/
producer.send(record, (metadata, exception) ->
System.out.printf("%s, partition=%d, \n", score, metadata.partition()));
}
producer.close();
}
}
================================================
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/producers/SimpleProducer.java
================================================
package com.heibaiying.producers;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import java.util.Properties;
/*
* Kafka生产者示例
*/
public class SimpleProducer {
public static void main(String[] args) {
String topicName = "Hello-Kafka";
Properties props = new Properties();
props.put("bootstrap.servers", "hadoop001:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
/*创建生产者*/
Producer<String, String> producer = new KafkaProducer<>(props);
for (int i = 0; i < 10; i++) {
ProducerRecord<String, String> record = new ProducerRecord<>(topicName, "hello" + i, "world" + i);
/* 发送消息*/
producer.send(record);
}
/*关闭生产者*/
producer.close();
}
}
================================================
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/producers/partitioners/CustomPartitioner.java
================================================
package com.heibaiying.producers.partitioners;
import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;
import java.util.Map;
/**
* 自定义分区器
*/
public class CustomPartitioner implements Partitioner {
private int passLine;
@Override
public void configure(Map<String, ?> configs) {
passLine = (Integer) configs.get("pass.line");
}
@Override
public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {
return (Integer) key >= passLine ? 1 : 0;
}
@Override
public void close() {
System.out.println("分区器关闭");
}
}
================================================
FILE: code/Phoenix/spring-boot-mybatis-phoenix/pom.xml
================================================
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.1.4.RELEASE</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.heibaiying</groupId>
<artifactId>spring-boot-mybatis-phoenix</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>spring-boot-mybatis-phoenix</name>
<description>mybatis project for Spring Boot</description>
<properties>
<java.version>1.8</java.version>
</properties>
<dependencies>
<!--spring 1.5 x 以上版本对应 mybatis 1.3.x (1.3.1)
关于更多spring-boot 与 mybatis 的版本对应可以参见 <a href="http://www.mybatis.org/spring-boot-starter/mybatis-spring-boot-autoconfigure/">-->
<dependency>
<groupId>org.mybatis.spring.boot</groupId>
<artifactId>mybatis-spring-boot-starter</artifactId>
<version>1.3.2</version>
</dependency>
<!--phoenix core-->
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-core</artifactId>
<version>4.14.0-cdh5.14.2</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
================================================
FILE: code/Phoenix/spring-boot-mybatis-phoenix/src/main/java/com/heibaiying/springboot/SpringBootMybatisApplication.java
================================================
package com.heibaiying.springboot;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class SpringBootMybatisApplication {
public static void main(String[] args) {
SpringApplication.run(SpringBootMybatisApplication.class, args);
}
}
================================================
FILE: code/Phoenix/spring-boot-mybatis-phoenix/src/main/java/com/heibaiying/springboot/bean/USPopulation.java
================================================
package com.heibaiying.springboot.bean;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
import lombok.ToString;
@Data
@AllArgsConstructor
@NoArgsConstructor
@ToString
public class USPopulation {
private String state;
private String city;
private long population;
}
================================================
FILE: code/Phoenix/spring-boot-mybatis-phoenix/src/main/java/com/heibaiying/springboot/dao/PopulationDao.java
================================================
package com.heibaiying.springboot.dao;
import com.heibaiying.springboot.bean.USPopulation;
import org.apache.ibatis.annotations.*;
import java.util.List;
@Mapper
public interface PopulationDao {
@Select("SELECT * from us_population")
List<USPopulation> queryAll();
@Insert("UPSERT INTO us_population VALUES( #{state}, #{city}, #{population} )")
void save(USPopulation USPopulation);
@Select("SELECT * FROM us_population WHERE state=#{state} AND city = #{city}")
USPopulation queryByStateAndCity(String state, String city);
@Delete("DELETE FROM us_population WHERE state=#{state} AND city = #{city}")
void deleteByStateAndCity(String state, String city);
}
================================================
FILE: code/Phoenix/spring-boot-mybatis-phoenix/src/main/resources/application.yml
================================================
spring:
datasource:
#zookeeper地址
url: jdbc:phoenix:192.168.0.105:2181
driver-class-name: org.apache.phoenix.jdbc.PhoenixDriver
# 如果不想配置对数据库连接池做特殊配置的话,以下关于连接池的配置就不是必须的
# spring-boot 2.X 默认采用高性能的 Hikari 作为连接池 更多配置可以参考 https://github.com/brettwooldridge/HikariCP#configuration-knobs-baby
type: com.zaxxer.hikari.HikariDataSource
hikari:
# 池中维护的最小空闲连接数
minimum-idle: 10
# 池中最大连接数,包括闲置和使用中的连接
maximum-pool-size: 20
# 此属性控制从池返回的连接的默认自动提交行为。默认为true
auto-commit: true
# 允许最长空闲时间
idle-timeout: 30000
# 此属性表示连接池的用户定义名称,主要显示在日志记录和JMX管理控制台中,以标识池和池配置。 默认值:自动生成
pool-name: custom-hikari
#此属性控制池中连接的最长生命周期,值0表示无限生命周期,默认1800000即30分钟
max-lifetime: 1800000
# 数据库连接超时时间,默认30秒,即30000
connection-timeout: 30000
# 连接测试sql 这个地方需要根据数据库方言差异而配置 例如 oracle 就应该写成 select 1 from dual
connection-test-query: SELECT 1
# mybatis 相关配置
mybatis:
configuration:
# 是否打印sql语句 调试的时候可以开启
log-impl: org.apache.ibatis.logging.stdout.StdOutImpl
================================================
FILE: code/Phoenix/spring-boot-mybatis-phoenix/src/test/java/com/heibaiying/springboot/PopulationTest.java
================================================
package com.heibaiying.springboot;
import com.heibaiying.springboot.bean.USPopulation;
import com.heibaiying.springboot.dao.PopulationDao;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.test.context.junit4.SpringRunner;
import java.util.List;
@RunWith(SpringRunner.class)
@SpringBootTest
public class PopulationTest {
@Autowired
private PopulationDao populationDao;
@Test
public void queryAll() {
List<USPopulation> USPopulationList = populationDao.queryAll();
if (USPopulationList != null) {
for (USPopulation USPopulation : USPopulationList) {
System.out.println(USPopulation.getCity() + " " + USPopulation.getPopulation());
}
}
}
@Test
public void save() {
populationDao.save(new USPopulation("TX", "Dallas", 66666));
USPopulation usPopulation = populationDao.queryByStateAndCity("TX", "Dallas");
System.out.println(usPopulation);
}
@Test
public void update() {
populationDao.save(new USPopulation("TX", "Dallas", 99999));
USPopulation usPopulation = populationDao.queryByStateAndCity("TX", "Dallas");
System.out.println(usPopulation);
}
@Test
public void delete() {
populationDao.deleteByStateAndCity("TX", "Dallas");
USPopulation usPopulation = populationDao.queryByStateAndCity("TX", "Dallas");
System.out.println(usPopulation);
}
}
================================================
FILE: code/Phoenix/spring-mybatis-phoenix/pom.xml
================================================
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.heibaiying</groupId>
<artifactId>spring-mybatis-phoenix</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<spring-base-version>5.1.6.RELEASE</spring-base-version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-context</artifactId>
<version>${spring-base-version}</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-beans</artifactId>
<version>${spring-base-version}</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-core</artifactId>
<version>${spring-base-version}</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.4</version>
<scope>provided</scope>
</dependency>
<!--spring jdbc-->
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-jdbc</artifactId>
<version>${spring-base-version}</version>
</dependency>
<!--单元测试相关依赖包-->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-test</artifactId>
<version>${spring-base-version}</version>
<scope>test</scope>
</dependency>
<!--mybatis 依赖包-->
<dependency>
<groupId>org.mybatis</groupId>
<artifactId>mybatis-spring</artifactId>
<version>1.3.2</version>
</dependency>
<dependency>
<groupId>org.mybatis</groupId>
<artifactId>mybatis</artifactId>
<version>3.4.6</version>
</dependency>
<!--phoenix core-->
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-core</artifactId>
<version>4.14.0-cdh5.14.2</version>
</dependency>
</dependencies>
</project>
================================================
FILE: code/Phoenix/spring-mybatis-phoenix/src/main/java/com/heibaiying/bean/USPopulation.java
================================================
package com.heibaiying.bean;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;
@Data
@AllArgsConstructor
@NoArgsConstructor
public class USPopulation {
private String state;
private String city;
private long population;
}
================================================
FILE: code/Phoenix/spring-mybatis-phoenix/src/main/java/com/heibaiying/dao/PopulationDao.java
================================================
package com.heibaiying.dao;
import com.heibaiying.bean.USPopulation;
import org.apache.ibatis.annotations.Param;
import java.util.List;
public interface PopulationDao {
List<USPopulation> queryAll();
void save(USPopulation USPopulation);
USPopulation queryByStateAndCity(@Param("state") String state, @Param("city") String city);
void deleteByStateAndCity(@Param("state") String state, @Param("city") String city);
}
================================================
FILE: code/Phoenix/spring-mybatis-phoenix/src/main/resources/jdbc.properties
================================================
# ݿ
phoenix.driverClassName=org.apache.phoenix.jdbc.PhoenixDriver
# zookeeperַ
phoenix.url=jdbc:phoenix:192.168.0.105:2181
================================================
FILE: code/Phoenix/spring-mybatis-phoenix/src/main/resources/mappers/Population.xml
================================================
<!DOCTYPE mapper
PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN"
"http://mybatis.org/dtd/mybatis-3-mapper.dtd">
<mapper namespace="com.heibaiying.dao.PopulationDao">
<select id="queryAll" resultType="com.heibaiying.bean.USPopulation">
SELECT * FROM us_population
</select>
<insert id="save">
UPSERT INTO us_population VALUES( #{state}, #{city}, #{population} )
</insert>
<select id="queryByStateAndCity" resultType="com.heibaiying.bean.USPopulation">
SELECT * FROM us_population WHERE state=#{state} AND city = #{city}
</select>
<delete id="deleteByStateAndCity">
DELETE FROM us_population WHERE state=#{state} AND city = #{city}
</delete>
</mapper>
================================================
FILE: code/Phoenix/spring-mybatis-phoenix/src/main/resources/mybatisConfig.xml
================================================
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE configuration
PUBLIC "-//mybatis.org//DTD Config 3.0//EN"
"http://mybatis.org/dtd/mybatis-3-config.dtd">
<!-- mybatis 配置文件 -->
<configuration>
<settings>
<!-- 开启驼峰命名 -->
<setting name="mapUnderscoreToCamelCase" value="true"/>
<!-- 打印查询sql -->
<setting name="logImpl" value="STDOUT_LOGGING"/>
</settings>
</configuration>
<!--更多settings配置项可以参考官方文档: <a href="http://www.mybatis.org/mybatis-3/zh/configuration.html"/>-->
================================================
FILE: code/Phoenix/spring-mybatis-phoenix/src/main/resources/springApplication.xml
================================================
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:context="http://www.springframework.org/schema/context" xmlns:tx="http://www.springframework.org/schema/tx"
xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-4.1.xsd http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx.xsd">
<!-- 开启注解包扫描-->
<context:component-scan base-package="com.heibaiying.*"/>
<!--指定配置文件的位置-->
<context:property-placeholder location="classpath:jdbc.properties"/>
<!--配置数据源-->
<bean id="dataSource" class="org.springframework.jdbc.datasource.DriverManagerDataSource">
<!--Phoenix配置-->
<property name="driverClassName" value="${phoenix.driverClassName}"/>
<property name="url" value="${phoenix.url}"/>
</bean>
<!--配置 mybatis 会话工厂 -->
<bean id="sqlSessionFactory" class="org.mybatis.spring.SqlSessionFactoryBean">
<property name="dataSource" ref="dataSource"/>
<!--指定mapper文件所在的位置-->
<property name="mapperLocations" value="classpath*:/mappers/**/*.xml"/>
<property name="configLocation" value="classpath:mybatisConfig.xml"/>
</bean>
<!--扫描注册接口 -->
<!--作用:从接口的基础包开始递归搜索,并将它们注册为 MapperFactoryBean(只有至少一种方法的接口才会被注册;, 具体类将被忽略)-->
<bean class="org.mybatis.spring.mapper.MapperScannerConfigurer">
<!--指定会话工厂 -->
<property name="sqlSessionFactoryBeanName" value="sqlSessionFactory"/>
<!-- 指定mybatis接口所在的包 -->
<property name="basePackage" value="com.heibaiying.dao"/>
</bean>
</beans>
================================================
FILE: code/Phoenix/spring-mybatis-phoenix/src/test/java/com/heibaiying/dao/PopulationDaoTest.java
================================================
package com.heibaiying.dao;
import com.heibaiying.bean.USPopulation;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.test.context.ContextConfiguration;
import org.springframework.test.context.junit4.SpringRunner;
import java.util.List;
@RunWith(SpringRunner.class)
@ContextConfiguration({"classpath:springApplication.xml"})
public class PopulationDaoTest {
@Autowired
private PopulationDao populationDao;
@Test
public void queryAll() {
List<USPopulation> USPopulationList = populationDao.queryAll();
if (USPopulationList != null) {
for (USPopulation USPopulation : USPopulationList) {
System.out.println(USPopulation.getCity() + " " + USPopulation.getPopulation());
}
}
}
@Test
public void save() {
populationDao.save(new USPopulation("TX", "Dallas", 66666));
USPopulation usPopulation = populationDao.queryByStateAndCity("TX", "Dallas");
System.out.println(usPopulation);
}
@Test
public void update() {
populationDao.save(new USPopulation("TX", "Dallas", 99999));
USPopulation usPopulation = populationDao.queryByStateAndCity("TX", "Dallas");
System.out.println(usPopulation);
}
@Test
public void delete() {
populationDao.deleteByStateAndCity("TX", "Dallas");
USPopulation usPopulation = populationDao.queryByStateAndCity("TX", "Dallas");
System.out.println(usPopulation);
}
}
================================================
FILE: code/Storm/storm-hbase-integration/pom.xml
================================================
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.heibaiying</groupId>
<artifactId>storm-hbase-integration</artifactId>
<version>1.0</version>
<properties>
<storm.version>1.2.2</storm.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>${storm.version}</version>
</dependency>
<!--Storm整合HBase依赖-->
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-hbase</artifactId>
<version>${storm.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
</plugin>
<!--使用shade进行打包-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<configuration>
<createDependencyReducedPom>true</createDependencyReducedPom>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.sf</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.dsa</exclude>
<exclude>META-INF/*.RSA</exclude>
<exclude>META-INF/*.rsa</exclude>
<exclude>META-INF/*.EC</exclude>
<exclude>META-INF/*.ec</exclude>
<exclude>META-INF/MSFTSIG.SF</exclude>
<exclude>META-INF/MSFTSIG.RSA</exclude>
</excludes>
</filter>
</filters>
<artifactSet>
<excludes>
<exclude>org.apache.storm:storm-core</exclude>
</excludes>
</artifactSet>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
================================================
FILE: code/Storm/storm-hbase-integration/src/main/java/com/heibaiying/WordCountToHBaseApp.java
================================================
package com.heibaiying;
import com.heibaiying.component.CountBolt;
import com.heibaiying.component.DataSourceSpout;
import com.heibaiying.component.SplitBolt;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.generated.AlreadyAliveException;
import org.apache.storm.generated.AuthorizationException;
import org.apache.storm.generated.InvalidTopologyException;
import org.apache.storm.hbase.bolt.HBaseBolt;
import org.apache.storm.hbase.bolt.mapper.SimpleHBaseMapper;
import org.apache.storm.topology.TopologyBuilder;
import org.apache.storm.tuple.Fields;
import java.util.HashMap;
import java.util.Map;
/**
* 进行词频统计 并将统计结果存储到HBase中
*/
public class WordCountToHBaseApp {
private static final String DATA_SOURCE_SPOUT = "dataSourceSpout";
private static final String SPLIT_BOLT = "splitBolt";
private static final String COUNT_BOLT = "countBolt";
private static final String HBASE_BOLT = "hbaseBolt";
public static void main(String[] args) {
// storm的配置
Config config = new Config();
// HBase的配置
Map<String, Object> hbConf = new HashMap<>();
hbConf.put("hbase.rootdir", "hdfs://hadoop001:8020/hbase");
hbConf.put("hbase.zookeeper.quorum", "hadoop001:2181");
// 将HBase的配置传入Storm的配置中
config.put("hbase.conf", hbConf);
// 定义流数据与HBase中数据的映射
SimpleHBaseMapper mapper = new SimpleHBaseMapper()
.withRowKeyField("word")
.withColumnFields(new Fields("word","count"))
.withColumnFamily("info");
/*
* 给HBaseBolt传入表名、数据映射关系、和HBase的配置信息
* 表需要预先创建: create 'WordCount','info'
*/
HBaseBolt hbase = new HBaseBolt("WordCount", mapper)
.withConfigKey("hbase.conf");
// 构建Topology
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout(DATA_SOURCE_SPOUT, new DataSourceSpout(),1);
// split
builder.setBolt(SPLIT_BOLT, new SplitBolt(), 1).shuffleGrouping(DATA_SOURCE_SPOUT);
// count
builder.setBolt(COUNT_BOLT, new CountBolt(),1).shuffleGrouping(SPLIT_BOLT);
// save to HBase
builder.setBolt(HBASE_BOLT, hbase, 1).shuffleGrouping(COUNT_BOLT);
// 如果外部传参cluster则代表线上环境启动,否则代表本地启动
if (args.length > 0 && args[0].equals("cluster")) {
try {
StormSubmitter.submitTopology("ClusterWordCountToRedisApp", config, builder.createTopology());
} catch (AlreadyAliveException | InvalidTopologyException | AuthorizationException e) {
e.printStackTrace();
}
} else {
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("LocalWordCountToRedisApp",
config, builder.createTopology());
}
}
}
================================================
FILE: code/Storm/storm-hbase-integration/src/main/java/com/heibaiying/component/CountBolt.java
================================================
package com.heibaiying.component;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.HashMap;
import java.util.Map;
/**
* 进行词频统计
*/
public class CountBolt extends BaseRichBolt {
private Map<String, Integer> counts = new HashMap<>();
private OutputCollector collector;
@Override
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
this.collector=collector;
}
@Override
public void execute(Tuple input) {
String word = input.getStringByField("word");
Integer count = counts.get(word);
if (count == null) {
count = 0;
}
count++;
counts.put(word, count);
// 输出
collector.emit(new Values(word, String.valueOf(count)));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word", "count"));
}
}
================================================
FILE: code/Storm/storm-hbase-integration/src/main/java/com/heibaiying/component/DataSourceSpout.java
================================================
package com.heibaiying.component;
import org.apache.storm.shade.org.apache.commons.lang.StringUtils;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichSpout;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import org.apache.storm.utils.Utils;
import java.util.*;
/**
* 产生词频样本的数据源
*/
public class DataSourceSpout extends BaseRichSpout {
private List<String> list = Arrays.asList("Spark", "Hadoop", "HBase", "Storm", "Flink", "Hive");
private SpoutOutputCollector spoutOutputCollector;
@Override
public void open(Map map, TopologyContext topologyContext, SpoutOutputCollector spoutOutputCollector) {
this.spoutOutputCollector = spoutOutputCollector;
}
@Override
public void nextTuple() {
// 模拟产生数据
String lineData = productData();
spoutOutputCollector.emit(new Values(lineData));
Utils.sleep(1000);
}
@Override
public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
outputFieldsDeclarer.declare(new Fields("line"));
}
/**
* 模拟数据
*/
private String productData() {
Collections.shuffle(list);
Random random = new Random();
int endIndex = random.nextInt(list.size()) % (list.size()) + 1;
return StringUtils.join(list.toArray(), "\t", 0, endIndex);
}
}
================================================
FILE: code/Storm/storm-hbase-integration/src/main/java/com/heibaiying/component/SplitBolt.java
================================================
package com.heibaiying.component;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import java.util.Map;
import static org.apache.storm.utils.Utils.tuple;
/**
* 将每行数据按照指定分隔符进行拆分
*/
public class SplitBolt extends BaseRichBolt {
private OutputCollector collector;
@Override
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
this.collector = collector;
}
@Override
public void execute(Tuple input) {
String line = input.getStringByField("line");
String[] words = line.split("\t");
for (String word : words) {
collector.emit(tuple(word, 1));
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word", "count"));
}
}
================================================
FILE: code/Storm/storm-hdfs-integration/pom.xml
================================================
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.heibaiying</groupId>
<artifactId>storm-hdfs-integration</artifactId>
<version>1.0</version>
<properties>
<storm.version>1.2.2</storm.version>
</properties>
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>${storm.version}</version>
</dependency>
<!--Storm整合HDFS依赖-->
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-hdfs</artifactId>
<version>${storm.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0-cdh5.15.2</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0-cdh5.15.2</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.6.0-cdh5.15.2</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
<build>
<plugins>
<!--使用java8编译-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
</plugin>
<!--使用shade进行打包-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<configuration>
<createDependencyReducedPom>true</createDependencyReducedPom>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.sf</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.dsa</exclude>
<exclude>META-INF/*.RSA</exclude>
<exclude>META-INF/*.rsa</exclude>
<exclude>META-INF/*.EC</exclude>
<exclude>META-INF/*.ec</exclude>
<exclude>META-INF/MSFTSIG.SF</exclude>
<exclude>META-INF/MSFTSIG.RSA</exclude>
</excludes>
</filter>
</filters>
<artifactSet>
<excludes>
<exclude>org.apache.storm:storm-core</exclude>
</excludes>
</artifactSet>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
================================================
FILE: code/Storm/storm-hdfs-integration/src/main/java/com.heibaiying/DataToHdfsApp.java
================================================
package com.heibaiying;
import com.heibaiying.component.DataSourceSpout;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.generated.AlreadyAliveException;
import org.apache.storm.generated.AuthorizationException;
import org.apache.storm.generated.InvalidTopologyException;
import org.apache.storm.hdfs.bolt.HdfsBolt;
import org.apache.storm.hdfs.bolt.format.DefaultFileNameFormat;
import org.apache.storm.hdfs.bolt.format.DelimitedRecordFormat;
import org.apache.storm.hdfs.bolt.format.FileNameFormat;
import org.apache.storm.hdfs.bolt.format.RecordFormat;
import org.apache.storm.hdfs.bolt.rotation.FileRotationPolicy;
import org.apache.storm.hdfs.bolt.rotation.FileSizeRotationPolicy;
import org.apache.storm.hdfs.bolt.rotation.FileSizeRotationPolicy.Units;
import org.apache.storm.hdfs.bolt.sync.CountSyncPolicy;
import org.apache.storm.hdfs.bolt.sync.SyncPolicy;
import org.apache.storm.topology.TopologyBuilder;
/**
* 将样本数据存储到HDFS中
*/
public class DataToHdfsApp {
private static final String DATA_SOURCE_SPOUT = "dataSourceSpout";
private static final String HDFS_BOLT = "hdfsBolt";
public static void main(String[] args) {
// 指定Hadoop的用户名 如果不指定,则在HDFS创建目录时候有可能抛出无权限的异常(RemoteException: Permission denied)
System.setProperty("HADOOP_USER_NAME", "root");
// 定义输出字段(Field)之间的分隔符
RecordFormat format = new DelimitedRecordFormat()
.withFieldDelimiter("|");
// 同步策略: 每100个tuples之后就会把数据从缓存刷新到HDFS中
SyncPolicy syncPolicy = new CountSyncPolicy(100);
// 文件策略: 每个文件大小上限1M,超过限定时,创建新文件并继续写入
FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(1.0f, Units.MB);
// 定义存储路径
FileNameFormat fileNameFormat = new DefaultFileNameFormat()
.withPath("/storm-hdfs/");
// 定义HdfsBolt
HdfsBolt hdfsBolt = new HdfsBolt()
.withFsUrl("hdfs://hadoop001:8020")
.withFileNameFormat(fileNameFormat)
.withRecordFormat(format)
.withRotationPolicy(rotationPolicy)
.withSyncPolicy(syncPolicy);
// 构建Topology
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout(DATA_SOURCE_SPOUT, new DataSourceSpout());
// save to HDFS
builder.setBolt(HDFS_BOLT, hdfsBolt, 1).shuffleGrouping(DATA_SOURCE_SPOUT);
// 如果外部传参cluster则代表线上环境启动,否则代表本地启动
if (args.length > 0 && args[0].equals("cluster")) {
try {
StormSubmitter.submitTopology("ClusterDataToHdfsApp", new Config(), builder.createTopology());
} catch (AlreadyAliveException | InvalidTopologyException | AuthorizationException e) {
e.printStackTrace();
}
} else {
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("LocalDataToHdfsApp",
new Config(), builder.createTopology());
}
}
}
================================================
FILE: code/Storm/storm-hdfs-integration/src/main/java/com.heibaiying/component/DataSourceSpout.java
================================================
package com.heibaiying.component;
import org.apache.storm.shade.org.apache.commons.lang.StringUtils;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichSpout;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import org.apache.storm.utils.Utils;
import java.util.*;
/**
* 产生词频样本的数据源
*/
public class DataSourceSpout extends BaseRichSpout {
private List<String> list = Arrays.asList("Spark", "Hadoop", "HBase", "Storm", "Flink", "Hive");
private SpoutOutputCollector spoutOutputCollector;
@Override
public void open(Map map, TopologyContext topologyContext, SpoutOutputCollector spoutOutputCollector) {
this.spoutOutputCollector = spoutOutputCollector;
}
@Override
public void nextTuple() {
// 模拟产生数据
String lineData = productData();
spoutOutputCollector.emit(new Values(lineData));
Utils.sleep(1000);
}
@Override
public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
outputFieldsDeclarer.declare(new Fields("line"));
}
/**
* 模拟数据
*/
private String productData() {
Collections.shuffle(list);
Random random = new Random();
int endIndex = random.nextInt(list.size()) % (list.size()) + 1;
return StringUtils.join(list.toArray(), "\t", 0, endIndex);
}
}
================================================
FILE: code/Storm/storm-kafka-integration/pom.xml
================================================
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.heibaiying</groupId>
<artifactId>storm-kafka-integration</artifactId>
<version>1.0</version>
<properties>
<storm.version>1.2.2</storm.version>
<kafka.version>2.2.0</kafka.version>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>${storm.version}</version>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka-client</artifactId>
<version>${storm.version}</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>${kafka.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>8</source>
<target>8</target>
</configuration>
</plugin>
<!--使用shade进行打包-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<configuration>
<createDependencyReducedPom>true</createDependencyReducedPom>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.sf</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.dsa</exclude>
<exclude>META-INF/*.RSA</exclude>
<exclude>META-INF/*.rsa</exclude>
<exclude>META-INF/*.EC</exclude>
<exclude>META-INF/*.ec</exclude>
<exclude>META-INF/MSFTSIG.SF</exclude>
<exclude>META-INF/MSFTSIG.RSA</exclude>
</excludes>
</filter>
</filters>
<artifactSet>
<excludes>
<exclude>org.apache.storm:storm-core</exclude>
</excludes>
</artifactSet>
</configuration>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
<transformer
implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
================================================
FILE: code/Storm/storm-kafka-integration/src/main/java/com/heibaiying/kafka/read/LogConsoleBolt.java
================================================
package com.heibaiying.kafka.read;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.tuple.Tuple;
import java.util.Map;
/**
* 打印从Kafka中获取的数据
*/
public class LogConsoleBolt extends BaseRichBolt {
private OutputCollector collector;
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
this.collector=collector;
}
public void execute(Tuple input) {
try {
String value = input.getStringByField("value");
System.out.println("received from kafka : "+ value);
// 必须ack,否则会重复消费kafka中的消息
collector.ack(input);
}catch (Exception e){
e.printStackTrace();
collector.fail(input);
}
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
}
}
================================================
FILE: code/Storm/storm-kafka-integration/src/main/java/com/heibaiying/kafka/read/ReadingFromKafkaApp.java
================================================
package com.heibaiying.kafka.read;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.generated.AlreadyAliveException;
import org.apache.storm.generated.AuthorizationException;
import org.apache.storm.generated.InvalidTopologyException;
import org.apache.storm.kafka.spout.KafkaSpout;
import org.apache.storm.kafka.spout.KafkaSpoutConfig;
import org.apache.storm.kafka.spout.KafkaSpoutRetryExponentialBackoff;
import org.apache.storm.kafka.spout.KafkaSpoutRetryExponentialBackoff.TimeInterval;
import org.apache.storm.kafka.spout.KafkaSpoutRetryService;
import org.apache.storm.topology.TopologyBuilder;
/**
* 从Kafka中读取数据
*/
public class ReadingFromKafkaApp {
private static final String BOOTSTRAP_SERVERS = "hadoop001:9092";
private static final String TOPIC_NAME = "storm-topic";
public static void main(String[] args) {
final TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("kafka_spout", new KafkaSpout<>(getKafkaSpoutConfig(BOOTSTRAP_SERVERS, TOPIC_NAME)), 1);
builder.setBolt("bolt", new LogConsoleBolt()).shuffleGrouping("kafka_spout");
// 如果外部传参cluster则代表线上环境启动,否则代表本地启动
if (args.length > 0 && args[0].equals("cluster")) {
try {
StormSubmitter.submitTopology("ClusterReadingFromKafkaApp", new Config(), builder.createTopology());
} catch (AlreadyAliveException | InvalidTopologyException | AuthorizationException e) {
e.printStackTrace();
}
} else {
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("LocalReadingFromKafkaApp",
new Config(), builder.createTopology());
}
}
private static KafkaSpoutConfig<String, String> getKafkaSpoutConfig(String bootstrapServers, String topic) {
return KafkaSpoutConfig.builder(bootstrapServers, topic)
// 除了分组ID,以下配置都是可选的。分组ID必须指定,否则会抛出InvalidGroupIdException异常
.setProp(ConsumerConfig.GROUP_ID_CONFIG, "kafkaSpoutTestGroup")
// 定义重试策略
.setRetry(getRetryService())
// 定时提交偏移量的时间间隔,默认是15s
.setOffsetCommitPeriodMs(10_000)
.build();
}
// 定义重试策略
private static KafkaSpoutRetryService getRetryService() {
return new KafkaSpoutRetryExponentialBackoff(TimeInterval.microSeconds(500),
TimeInterval.milliSeconds(2), Integer.MAX_VALUE, TimeInterval.seconds(10));
}
}
================================================
FILE: code/Storm/storm-kafka-integration/src/main/java/com/heibaiying/kafka/write/DataSourceSpout.java
================================================
package com.heibaiying.kafka.write;
import org.apache.storm.shade.org.apache.commons.lang.StringUtils;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichSpout;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import org.apache.storm.utils.Utils;
import java.util.*;
/**
* 产生词频样本的数据源
*/
public class DataSourceSpout extends BaseRichSpout {
private List<String> list = Arrays.asList("Spark", "Hadoop", "HBase", "Storm", "Flink", "Hive");
private SpoutOutputCollector spoutOutputCollector;
@Override
public void open(Map map, TopologyContext topologyContext, SpoutOutputCollector spoutOutputCollector) {
this.spoutOutputCollector = spoutOutputCollector;
}
@Override
public void nextTuple() {
// 模拟产生数据
String lineData = productData();
spoutOutputCollector.emit(new Values("key",lineData));
Utils.sleep(1000);
}
@Override
public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
outputFieldsDeclarer.declare( new Fields("key", "message"));
}
/**
* 模拟数据
*/
private String productData() {
Collections.shuffle(list);
Random random = new Random();
int endIndex = random.nextInt(list.size()) % (list.size()) + 1;
return StringUtils.join(list.toArray(), "\t", 0, endIndex);
}
}
================================================
FILE: code/Storm/storm-kafka-integration/src/main/java/com/heibaiying/kafka/write/WritingToKafkaApp.java
================================================
package com.heibaiying.kafka.write;
import org.apache.storm.Config;
import org.apache.storm.LocalCluster;
import org.apache.storm.StormSubmitter;
import org.apache.storm.generated.AlreadyAliveException;
import org.apache.storm.generated.AuthorizationException;
import org.apache.storm.generated.InvalidTopologyException;
import org.apache.storm.kafka.bolt.KafkaBolt;
import org.apache.storm.kafka.bolt.mapper.FieldNameBasedTupleToKafkaMapper;
import org.apache.storm.kafka.bolt.selector.DefaultTopicSelector;
import org.apache.storm.topology.TopologyBuilder;
import java.util.Properties;
/**
* 写入数据到Kafka中
*/
public class WritingToKafkaApp {
private static final String BOOTSTRAP_SERVERS = "hadoop001:9092";
private static final String TOPIC_NAME = "storm-topic";
public static void main(String[] args) {
TopologyBuilder builder = new TopologyBuilder();
// 定义Kafka生产者属性
Properties props = new Properties();
/*
* 指定broker的地址清单,清单里不需要包含所有的broker地址,生产者会从给定的broker里查找其他broker的信息。
* 不过建议至少要提供两个broker的信息作为容错。
*/
props.put("bootstrap.servers", BOOTSTRAP_SERVERS);
/*
* acks 参数指定了必须要有多少个分区副本收到消息,生产者才会认为消息写入是成功的。
* acks=0 : 生产者在成功写入消息之前不会等待任何来自服务器的响应。
* acks=1 : 只要集群的首领节点收到消息,生产者就会收到一个来自服务器成功响应。
* acks=all : 只有当所有参与复制的节点全部收到消息时,生产者才会收到一个来自服务器的成功响应。
*/
props.put("acks", "1");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
KafkaBolt bolt = new KafkaBolt<String, String>()
.withProducerProperties(props)
.withTopicSelector(new DefaultTopicSelector(TOPIC_NAME))
.withTupleToKafkaMapper(new FieldNameBasedTupleToKafkaMapper<>());
builder.setSpout("sourceSpout", new DataSourceSpout(), 1);
builder.setBolt("kafkaBolt", bolt, 1).shuffleGrouping("sourceSpout");
if (args.length > 0 && args[0].equals("cluster")) {
try {
StormSubmitter.submitTopology("ClusterWritingToKafkaApp", new Config(), builder.createTopology());
} catch (AlreadyAliveException | InvalidTopologyException | AuthorizationException e) {
e.printStackTrace();
}
} else {
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("LocalWritingToKafkaApp",
new Config(), builder.createTopology());
}
}
}
================================================
FILE: code/Storm/storm-redis-integration/pom.xml
================================================
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.heibaiying</groupId>
<artifactId>sto
gitextract_zbjfquu6/
├── .gitignore
├── README.md
├── code/
│ ├── Flink/
│ │ ├── flink-basis-java/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ └── main/
│ │ │ ├── java/
│ │ │ │ └── com/
│ │ │ │ └── heibaiying/
│ │ │ │ └── StreamingJob.java
│ │ │ └── resources/
│ │ │ └── log4j.properties
│ │ ├── flink-basis-scala/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ └── main/
│ │ │ ├── resources/
│ │ │ │ ├── log4j.properties
│ │ │ │ └── wordcount.txt
│ │ │ └── scala/
│ │ │ └── com/
│ │ │ └── heibaiying/
│ │ │ ├── WordCountBatch.scala
│ │ │ └── WordCountStreaming.scala
│ │ ├── flink-kafka-integration/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ └── main/
│ │ │ ├── java/
│ │ │ │ └── com/
│ │ │ │ └── heibaiying/
│ │ │ │ ├── CustomSinkJob.java
│ │ │ │ ├── KafkaStreamingJob.java
│ │ │ │ ├── bean/
│ │ │ │ │ └── Employee.java
│ │ │ │ └── sink/
│ │ │ │ └── FlinkToMySQLSink.java
│ │ │ └── resources/
│ │ │ └── log4j.properties
│ │ └── flink-state-management/
│ │ ├── pom.xml
│ │ └── src/
│ │ └── main/
│ │ ├── java/
│ │ │ └── com/
│ │ │ └── heibaiying/
│ │ │ ├── keyedstate/
│ │ │ │ ├── KeyedStateJob.java
│ │ │ │ ├── ThresholdWarning.java
│ │ │ │ └── ThresholdWarningWithTTL.java
│ │ │ └── operatorstate/
│ │ │ ├── OperatorStateJob.java
│ │ │ └── ThresholdWarning.java
│ │ └── resources/
│ │ └── log4j.properties
│ ├── Hadoop/
│ │ ├── hadoop-word-count/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ └── main/
│ │ │ ├── java/
│ │ │ │ └── com/
│ │ │ │ └── heibaiying/
│ │ │ │ ├── WordCountApp.java
│ │ │ │ ├── WordCountCombinerApp.java
│ │ │ │ ├── WordCountCombinerPartitionerApp.java
│ │ │ │ ├── component/
│ │ │ │ │ ├── CustomPartitioner.java
│ │ │ │ │ ├── WordCountMapper.java
│ │ │ │ │ └── WordCountReducer.java
│ │ │ │ └── utils/
│ │ │ │ └── WordCountDataUtils.java
│ │ │ └── resources/
│ │ │ └── log4j.properties
│ │ └── hdfs-java-api/
│ │ ├── pom.xml
│ │ └── src/
│ │ ├── main/
│ │ │ └── java/
│ │ │ └── com/
│ │ │ └── heibaiying/
│ │ │ └── utils/
│ │ │ └── HdfsUtils.java
│ │ └── test/
│ │ └── java/
│ │ └── HdfsTest.java
│ ├── Hbase/
│ │ ├── hbase-java-api-1.x/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ ├── main/
│ │ │ │ └── java/
│ │ │ │ └── com/
│ │ │ │ └── heibaiying/
│ │ │ │ └── HBaseUtils.java
│ │ │ └── test/
│ │ │ └── java/
│ │ │ └── com/
│ │ │ └── heibaiying/
│ │ │ └── HbaseUtilsTest.java
│ │ ├── hbase-java-api-2.x/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ ├── main/
│ │ │ │ └── java/
│ │ │ │ └── com/
│ │ │ │ └── heibaiying/
│ │ │ │ └── HBaseUtils.java
│ │ │ └── test/
│ │ │ └── java/
│ │ │ └── heibaiying/
│ │ │ └── HBaseUtilsTest.java
│ │ └── hbase-observer-coprocessor/
│ │ ├── pom.xml
│ │ └── src/
│ │ └── main/
│ │ └── java/
│ │ └── com/
│ │ └── heibaiying/
│ │ └── AppendRegionObserver.java
│ ├── Kafka/
│ │ └── kafka-basis/
│ │ ├── pom.xml
│ │ └── src/
│ │ └── main/
│ │ └── java/
│ │ └── com/
│ │ └── heibaiying/
│ │ ├── consumers/
│ │ │ ├── ConsumerASyn.java
│ │ │ ├── ConsumerASynAndSyn.java
│ │ │ ├── ConsumerASynWithOffsets.java
│ │ │ ├── ConsumerExit.java
│ │ │ ├── ConsumerGroup.java
│ │ │ ├── ConsumerSyn.java
│ │ │ ├── RebalanceListener.java
│ │ │ └── StandaloneConsumer.java
│ │ └── producers/
│ │ ├── ProducerASyn.java
│ │ ├── ProducerSyn.java
│ │ ├── ProducerWithPartitioner.java
│ │ ├── SimpleProducer.java
│ │ └── partitioners/
│ │ └── CustomPartitioner.java
│ ├── Phoenix/
│ │ ├── spring-boot-mybatis-phoenix/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ ├── main/
│ │ │ │ ├── java/
│ │ │ │ │ └── com/
│ │ │ │ │ └── heibaiying/
│ │ │ │ │ └── springboot/
│ │ │ │ │ ├── SpringBootMybatisApplication.java
│ │ │ │ │ ├── bean/
│ │ │ │ │ │ └── USPopulation.java
│ │ │ │ │ └── dao/
│ │ │ │ │ └── PopulationDao.java
│ │ │ │ └── resources/
│ │ │ │ └── application.yml
│ │ │ └── test/
│ │ │ └── java/
│ │ │ └── com/
│ │ │ └── heibaiying/
│ │ │ └── springboot/
│ │ │ └── PopulationTest.java
│ │ └── spring-mybatis-phoenix/
│ │ ├── pom.xml
│ │ └── src/
│ │ ├── main/
│ │ │ ├── java/
│ │ │ │ └── com/
│ │ │ │ └── heibaiying/
│ │ │ │ ├── bean/
│ │ │ │ │ └── USPopulation.java
│ │ │ │ └── dao/
│ │ │ │ └── PopulationDao.java
│ │ │ └── resources/
│ │ │ ├── jdbc.properties
│ │ │ ├── mappers/
│ │ │ │ └── Population.xml
│ │ │ ├── mybatisConfig.xml
│ │ │ └── springApplication.xml
│ │ └── test/
│ │ └── java/
│ │ └── com/
│ │ └── heibaiying/
│ │ └── dao/
│ │ └── PopulationDaoTest.java
│ ├── Storm/
│ │ ├── storm-hbase-integration/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ └── main/
│ │ │ └── java/
│ │ │ └── com/
│ │ │ └── heibaiying/
│ │ │ ├── WordCountToHBaseApp.java
│ │ │ └── component/
│ │ │ ├── CountBolt.java
│ │ │ ├── DataSourceSpout.java
│ │ │ └── SplitBolt.java
│ │ ├── storm-hdfs-integration/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ └── main/
│ │ │ └── java/
│ │ │ └── com.heibaiying/
│ │ │ ├── DataToHdfsApp.java
│ │ │ └── component/
│ │ │ └── DataSourceSpout.java
│ │ ├── storm-kafka-integration/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ └── main/
│ │ │ └── java/
│ │ │ └── com/
│ │ │ └── heibaiying/
│ │ │ └── kafka/
│ │ │ ├── read/
│ │ │ │ ├── LogConsoleBolt.java
│ │ │ │ └── ReadingFromKafkaApp.java
│ │ │ └── write/
│ │ │ ├── DataSourceSpout.java
│ │ │ └── WritingToKafkaApp.java
│ │ ├── storm-redis-integration/
│ │ │ ├── pom.xml
│ │ │ └── src/
│ │ │ └── main/
│ │ │ └── java/
│ │ │ └── com/
│ │ │ └── heibaiying/
│ │ │ ├── CustomRedisCountApp.java
│ │ │ ├── WordCountToRedisApp.java
│ │ │ └── component/
│ │ │ ├── CountBolt.java
│ │ │ ├── DataSourceSpout.java
│ │ │ ├── RedisCountStoreBolt.java
│ │ │ ├── SplitBolt.java
│ │ │ └── WordCountStoreMapper.java
│ │ └── storm-word-count/
│ │ ├── pom.xml
│ │ └── src/
│ │ └── main/
│ │ ├── java/
│ │ │ └── com/
│ │ │ └── heibaiying/
│ │ │ └── wordcount/
│ │ │ ├── ClusterWordCountApp.java
│ │ │ ├── LocalWordCountApp.java
│ │ │ └── component/
│ │ │ ├── CountBolt.java
│ │ │ ├── DataSourceSpout.java
│ │ │ └── SplitBolt.java
│ │ └── resources/
│ │ └── assembly.xml
│ ├── Zookeeper/
│ │ └── curator/
│ │ ├── pom.xml
│ │ └── src/
│ │ └── main/
│ │ └── java/
│ │ └── com/
│ │ └── heibaiying/
│ │ ├── AclOperation.java
│ │ └── BasicOperation.java
│ └── spark/
│ ├── spark-streaming-basis/
│ │ ├── pom.xml
│ │ └── src/
│ │ └── main/
│ │ └── java/
│ │ └── com/
│ │ └── heibaiying/
│ │ ├── NetworkWordCount.scala
│ │ ├── NetworkWordCountToRedis.scala
│ │ ├── NetworkWordCountV2.scala
│ │ └── utils/
│ │ └── JedisPoolUtil.java
│ ├── spark-streaming-flume/
│ │ ├── pom.xml
│ │ └── src/
│ │ └── main/
│ │ └── scala/
│ │ └── com/
│ │ └── heibaiying/
│ │ └── flume/
│ │ ├── PullBasedWordCount.scala
│ │ └── PushBasedWordCount.scala
│ └── spark-streaming-kafka/
│ ├── pom.xml
│ └── src/
│ └── main/
│ └── scala/
│ └── com/
│ └── heibaiying/
│ └── kafka/
│ └── KafkaDirectStream.scala
├── notes/
│ ├── Azkaban_Flow_1.0_的使用.md
│ ├── Azkaban_Flow_2.0_的使用.md
│ ├── Azkaban简介.md
│ ├── Flink_Data_Sink.md
│ ├── Flink_Data_Source.md
│ ├── Flink_Data_Transformation.md
│ ├── Flink_Windows.md
│ ├── Flink开发环境搭建.md
│ ├── Flink核心概念综述.md
│ ├── Flink状态管理与检查点机制.md
│ ├── Flume整合Kafka.md
│ ├── Flume简介及基本使用.md
│ ├── HDFS-Java-API.md
│ ├── HDFS常用Shell命令.md
│ ├── Hadoop-HDFS.md
│ ├── Hadoop-MapReduce.md
│ ├── Hadoop-YARN.md
│ ├── Hbase_Java_API.md
│ ├── Hbase_Shell.md
│ ├── Hbase协处理器详解.md
│ ├── Hbase容灾与备份.md
│ ├── Hbase的SQL中间层_Phoenix.md
│ ├── Hbase简介.md
│ ├── Hbase系统架构及数据结构.md
│ ├── Hbase过滤器详解.md
│ ├── HiveCLI和Beeline命令行的基本使用.md
│ ├── Hive分区表和分桶表.md
│ ├── Hive常用DDL操作.md
│ ├── Hive常用DML操作.md
│ ├── Hive数据查询详解.md
│ ├── Hive简介及核心概念.md
│ ├── Hive视图和索引.md
│ ├── Kafka消费者详解.md
│ ├── Kafka深入理解分区副本机制.md
│ ├── Kafka生产者详解.md
│ ├── Kafka简介.md
│ ├── Scala函数和闭包.md
│ ├── Scala列表和集.md
│ ├── Scala基本数据类型和运算符.md
│ ├── Scala数组.md
│ ├── Scala映射和元组.md
│ ├── Scala模式匹配.md
│ ├── Scala流程控制语句.md
│ ├── Scala简介及开发环境配置.md
│ ├── Scala类和对象.md
│ ├── Scala类型参数.md
│ ├── Scala继承和特质.md
│ ├── Scala隐式转换和隐式参数.md
│ ├── Scala集合类型.md
│ ├── SparkSQL_Dataset和DataFrame简介.md
│ ├── SparkSQL外部数据源.md
│ ├── SparkSQL常用聚合函数.md
│ ├── SparkSQL联结操作.md
│ ├── Spark_RDD.md
│ ├── Spark_Streaming与流处理.md
│ ├── Spark_Streaming基本操作.md
│ ├── Spark_Streaming整合Flume.md
│ ├── Spark_Streaming整合Kafka.md
│ ├── Spark_Structured_API的基本使用.md
│ ├── Spark_Transformation和Action算子.md
│ ├── Spark简介.md
│ ├── Spark累加器与广播变量.md
│ ├── Spark部署模式与作业提交.md
│ ├── Spring+Mybtais+Phoenix整合.md
│ ├── Sqoop基本使用.md
│ ├── Sqoop简介与安装.md
│ ├── Storm三种打包方式对比分析.md
│ ├── Storm和流处理简介.md
│ ├── Storm核心概念详解.md
│ ├── Storm编程模型详解.md
│ ├── Storm集成HBase和HDFS.md
│ ├── Storm集成Kakfa.md
│ ├── Storm集成Redis详解.md
│ ├── Zookeeper_ACL权限控制.md
│ ├── Zookeeper_Java客户端Curator.md
│ ├── Zookeeper常用Shell命令.md
│ ├── Zookeeper简介及核心概念.md
│ ├── installation/
│ │ ├── Azkaban_3.x_编译及部署.md
│ │ ├── Flink_Standalone_Cluster.md
│ │ ├── HBase单机环境搭建.md
│ │ ├── HBase集群环境搭建.md
│ │ ├── Hadoop单机环境搭建.md
│ │ ├── Hadoop集群环境搭建.md
│ │ ├── Linux下Flume的安装.md
│ │ ├── Linux下JDK安装.md
│ │ ├── Linux下Python安装.md
│ │ ├── Linux环境下Hive的安装部署.md
│ │ ├── Spark开发环境搭建.md
│ │ ├── Spark集群环境搭建.md
│ │ ├── Storm单机环境搭建.md
│ │ ├── Storm集群环境搭建.md
│ │ ├── Zookeeper单机环境和集群环境搭建.md
│ │ ├── 基于Zookeeper搭建Hadoop高可用集群.md
│ │ ├── 基于Zookeeper搭建Kafka高可用集群.md
│ │ └── 虚拟机静态IP及多IP配置.md
│ ├── 大数据学习路线.md
│ ├── 大数据常用软件安装指南.md
│ ├── 大数据应用常用打包方式.md
│ ├── 大数据技术栈思维导图.md
│ └── 资料分享与工具推荐.md
├── pictures/
│ ├── bigdata-notes-icon.psd
│ └── 大数据技术栈思维导图.xmind
└── resources/
├── csv/
│ └── dept.csv
├── json/
│ ├── dept.json
│ └── emp.json
├── mysql-connector-java-5.1.47.jar
├── orc/
│ └── dept.orc
├── parquet/
│ ├── dept.parquet
│ └── emp.parquet
├── tsv/
│ ├── dept.tsv
│ └── emp.tsv
└── txt/
├── dept.txt
└── emp.txt
SYMBOL INDEX (282 symbols across 69 files)
FILE: code/Flink/flink-basis-java/src/main/java/com/heibaiying/StreamingJob.java
class StreamingJob (line 7) | public class StreamingJob {
method main (line 11) | public static void main(String[] args) throws Exception {
FILE: code/Flink/flink-kafka-integration/src/main/java/com/heibaiying/CustomSinkJob.java
class CustomSinkJob (line 10) | public class CustomSinkJob {
method main (line 12) | public static void main(String[] args) throws Exception {
FILE: code/Flink/flink-kafka-integration/src/main/java/com/heibaiying/KafkaStreamingJob.java
class KafkaStreamingJob (line 15) | public class KafkaStreamingJob {
method main (line 17) | public static void main(String[] args) throws Exception {
FILE: code/Flink/flink-kafka-integration/src/main/java/com/heibaiying/bean/Employee.java
class Employee (line 5) | public class Employee {
method Employee (line 11) | Employee(){}
method Employee (line 13) | public Employee(String name, int age, Date birthday) {
method getName (line 19) | public String getName() {
method setName (line 23) | public void setName(String name) {
method getAge (line 27) | public int getAge() {
method setAge (line 31) | public void setAge(int age) {
method getBirthday (line 35) | public Date getBirthday() {
method setBirthday (line 39) | public void setBirthday(Date birthday) {
FILE: code/Flink/flink-kafka-integration/src/main/java/com/heibaiying/sink/FlinkToMySQLSink.java
class FlinkToMySQLSink (line 11) | public class FlinkToMySQLSink extends RichSinkFunction<Employee> {
method open (line 16) | @Override
method invoke (line 24) | @Override
method close (line 32) | @Override
FILE: code/Flink/flink-state-management/src/main/java/com/heibaiying/keyedstate/KeyedStateJob.java
class KeyedStateJob (line 8) | public class KeyedStateJob {
method main (line 10) | public static void main(String[] args) throws Exception {
FILE: code/Flink/flink-state-management/src/main/java/com/heibaiying/keyedstate/ThresholdWarning.java
class ThresholdWarning (line 14) | public class ThresholdWarning extends RichFlatMapFunction<Tuple2<String,...
method ThresholdWarning (line 23) | ThresholdWarning(Long threshold, Integer numberOfTimes) {
method open (line 28) | @Override
method flatMap (line 34) | @Override
FILE: code/Flink/flink-state-management/src/main/java/com/heibaiying/keyedstate/ThresholdWarningWithTTL.java
class ThresholdWarningWithTTL (line 16) | public class ThresholdWarningWithTTL extends RichFlatMapFunction<Tuple2<...
method ThresholdWarningWithTTL (line 22) | ThresholdWarningWithTTL(Long threshold, Integer numberOfTimes) {
method open (line 27) | @Override
method flatMap (line 42) | @Override
FILE: code/Flink/flink-state-management/src/main/java/com/heibaiying/operatorstate/OperatorStateJob.java
class OperatorStateJob (line 8) | public class OperatorStateJob {
method main (line 10) | public static void main(String[] args) throws Exception {
FILE: code/Flink/flink-state-management/src/main/java/com/heibaiying/operatorstate/ThresholdWarning.java
class ThresholdWarning (line 17) | public class ThresholdWarning extends RichFlatMapFunction<Tuple2<String,...
method ThresholdWarning (line 28) | ThresholdWarning(Long threshold, Integer numberOfTimes) {
method initializeState (line 34) | @Override
method flatMap (line 48) | @Override
method snapshotState (line 63) | @Override
FILE: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/WordCountApp.java
class WordCountApp (line 19) | public class WordCountApp {
method main (line 26) | public static void main(String[] args) throws Exception {
FILE: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/WordCountCombinerApp.java
class WordCountCombinerApp (line 19) | public class WordCountCombinerApp {
method main (line 26) | public static void main(String[] args) throws Exception {
FILE: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/WordCountCombinerPartitionerApp.java
class WordCountCombinerPartitionerApp (line 21) | public class WordCountCombinerPartitionerApp {
method main (line 28) | public static void main(String[] args) throws Exception {
FILE: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/component/CustomPartitioner.java
class CustomPartitioner (line 11) | public class CustomPartitioner extends Partitioner<Text, IntWritable> {
method getPartition (line 13) | public int getPartition(Text text, IntWritable intWritable, int numPar...
FILE: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/component/WordCountMapper.java
class WordCountMapper (line 13) | public class WordCountMapper extends Mapper<LongWritable, Text, Text, In...
method map (line 15) | @Override
FILE: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/component/WordCountReducer.java
class WordCountReducer (line 12) | public class WordCountReducer extends Reducer<Text, IntWritable, Text, I...
method reduce (line 14) | @Override
FILE: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/utils/WordCountDataUtils.java
class WordCountDataUtils (line 22) | public class WordCountDataUtils {
method generateData (line 32) | private static String generateData() {
method generateDataToLocal (line 50) | private static void generateDataToLocal(String outputPath) {
method generateDataToHDFS (line 69) | private static void generateDataToHDFS(String hdfsUrl, String user, St...
method main (line 87) | public static void main(String[] args) {
FILE: code/Hadoop/hdfs-java-api/src/main/java/com/heibaiying/utils/HdfsUtils.java
class HdfsUtils (line 16) | public class HdfsUtils {
method getFileSystem (line 39) | public static FileSystem getFileSystem() {
method mkdir (line 49) | public static boolean mkdir(String path) throws Exception {
method text (line 59) | public static String text(String path, String encode) throws Exception {
method createAndWrite (line 71) | public void createAndWrite(String path, String context) throws Excepti...
method rename (line 85) | public boolean rename(String oldPath, String newPath) throws Exception {
method copyFromLocalFile (line 97) | public void copyFromLocalFile(String localPath, String hdfsPath) throw...
method copyToLocalFile (line 108) | public void copyToLocalFile(String hdfsPath, String localPath) throws ...
method listFiles (line 119) | public FileStatus[] listFiles(String path) throws Exception {
method listFilesRecursive (line 130) | public RemoteIterator<LocatedFileStatus> listFilesRecursive(String pat...
method getFileBlockLocations (line 141) | public BlockLocation[] getFileBlockLocations(String path) throws Excep...
method delete (line 152) | public boolean delete(String path) throws Exception {
method inputStreamToString (line 163) | private static String inputStreamToString(InputStream inputStream, Str...
FILE: code/Hadoop/hdfs-java-api/src/test/java/HdfsTest.java
class HdfsTest (line 18) | public class HdfsTest {
method prepare (line 28) | @Before
method mkDir (line 48) | @Test
method mkDirWithPermission (line 57) | @Test
method create (line 66) | @Test
method exist (line 83) | @Test
method readToString (line 93) | @Test
method rename (line 104) | @Test
method delete (line 116) | @Test
method copyFromLocalFile (line 131) | @Test
method copyFromLocalBigFile (line 142) | @Test
method copyToLocalFile (line 167) | @Test
method listFiles (line 185) | @Test
method listFilesRecursive (line 198) | @Test
method getFileBlockLocations (line 210) | @Test
method destroy (line 224) | @After
method inputStreamToString (line 236) | private static String inputStreamToString(InputStream inputStream, Str...
FILE: code/Hbase/hbase-java-api-1.x/src/main/java/com/heibaiying/HBaseUtils.java
class HBaseUtils (line 16) | public class HBaseUtils {
method createTable (line 38) | public static boolean createTable(String tableName, List<String> colum...
method deleteTable (line 63) | public static boolean deleteTable(String tableName) {
method putRow (line 84) | public static boolean putRow(String tableName, String rowKey, String c...
method putRow (line 107) | public static boolean putRow(String tableName, String rowKey, String c...
method getRow (line 127) | public static Result getRow(String tableName, String rowKey) {
method getCell (line 147) | public static String getCell(String tableName, String rowKey, String c...
method getScanner (line 172) | public static ResultScanner getScanner(String tableName) {
method getScanner (line 191) | public static ResultScanner getScanner(String tableName, FilterList fi...
method getScanner (line 212) | public static ResultScanner getScanner(String tableName, String startR...
method deleteRow (line 233) | public static boolean deleteRow(String tableName, String rowKey) {
method deleteColumn (line 253) | public static boolean deleteColumn(String tableName, String rowKey, St...
FILE: code/Hbase/hbase-java-api-1.x/src/test/java/com/heibaiying/HbaseUtilsTest.java
class HBaseUtilsTest (line 15) | public class HBaseUtilsTest {
method createTable (line 21) | @Test
method insertData (line 29) | @Test
method getRow (line 48) | @Test
method getCell (line 58) | @Test
method getScanner (line 65) | @Test
method getScannerWithFilter (line 76) | @Test
method deleteColumn (line 90) | @Test
method deleteRow (line 96) | @Test
method deleteTable (line 102) | @Test
FILE: code/Hbase/hbase-java-api-2.x/src/main/java/com/heibaiying/HBaseUtils.java
class HBaseUtils (line 14) | public class HBaseUtils {
method createTable (line 36) | public static boolean createTable(String tableName, List<String> colum...
method deleteTable (line 62) | public static boolean deleteTable(String tableName) {
method putRow (line 83) | public static boolean putRow(String tableName, String rowKey, String c...
method putRow (line 106) | public static boolean putRow(String tableName, String rowKey, String c...
method getRow (line 126) | public static Result getRow(String tableName, String rowKey) {
method getCell (line 146) | public static String getCell(String tableName, String rowKey, String c...
method getScanner (line 171) | public static ResultScanner getScanner(String tableName) {
method getScanner (line 190) | public static ResultScanner getScanner(String tableName, FilterList fi...
method getScanner (line 211) | public static ResultScanner getScanner(String tableName, String startR...
method deleteRow (line 232) | public static boolean deleteRow(String tableName, String rowKey) {
method deleteColumn (line 252) | public static boolean deleteColumn(String tableName, String rowKey, St...
FILE: code/Hbase/hbase-java-api-2.x/src/test/java/heibaiying/HBaseUtilsTest.java
class HBaseUtilsTest (line 17) | public class HBaseUtilsTest {
method createTable (line 23) | @Test
method insertData (line 31) | @Test
method getRow (line 50) | @Test
method getCell (line 60) | @Test
method getScanner (line 67) | @Test
method getScannerWithFilter (line 78) | @Test
method deleteColumn (line 92) | @Test
method deleteRow (line 98) | @Test
method deleteTable (line 104) | @Test
FILE: code/Hbase/hbase-observer-coprocessor/src/main/java/com/heibaiying/AppendRegionObserver.java
class AppendRegionObserver (line 21) | public class AppendRegionObserver extends BaseRegionObserver {
method prePut (line 26) | @Override
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/ConsumerASyn.java
class ConsumerASyn (line 15) | public class ConsumerASyn {
method main (line 17) | public static void main(String[] args) {
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/ConsumerASynAndSyn.java
class ConsumerASynAndSyn (line 15) | public class ConsumerASynAndSyn {
method main (line 17) | public static void main(String[] args) {
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/ConsumerASynWithOffsets.java
class ConsumerASynWithOffsets (line 19) | public class ConsumerASynWithOffsets {
method main (line 21) | public static void main(String[] args) {
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/ConsumerExit.java
class ConsumerExit (line 18) | public class ConsumerExit {
method main (line 20) | public static void main(String[] args) {
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/ConsumerGroup.java
class ConsumerGroup (line 16) | public class ConsumerGroup {
method main (line 18) | public static void main(String[] args) {
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/ConsumerSyn.java
class ConsumerSyn (line 15) | public class ConsumerSyn {
method main (line 17) | public static void main(String[] args) {
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/RebalanceListener.java
class RebalanceListener (line 10) | public class RebalanceListener {
method main (line 12) | public static void main(String[] args) {
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/StandaloneConsumer.java
class StandaloneConsumer (line 19) | public class StandaloneConsumer {
method main (line 21) | public static void main(String[] args) {
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/producers/ProducerASyn.java
class ProducerASyn (line 10) | public class ProducerASyn {
method main (line 12) | public static void main(String[] args) {
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/producers/ProducerSyn.java
class ProducerSyn (line 14) | public class ProducerSyn {
method main (line 16) | public static void main(String[] args) {
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/producers/ProducerWithPartitioner.java
class ProducerWithPartitioner (line 10) | public class ProducerWithPartitioner {
method main (line 12) | public static void main(String[] args) {
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/producers/SimpleProducer.java
class SimpleProducer (line 13) | public class SimpleProducer {
method main (line 15) | public static void main(String[] args) {
FILE: code/Kafka/kafka-basis/src/main/java/com/heibaiying/producers/partitioners/CustomPartitioner.java
class CustomPartitioner (line 11) | public class CustomPartitioner implements Partitioner {
method configure (line 15) | @Override
method partition (line 20) | @Override
method close (line 25) | @Override
FILE: code/Phoenix/spring-boot-mybatis-phoenix/src/main/java/com/heibaiying/springboot/SpringBootMybatisApplication.java
class SpringBootMybatisApplication (line 6) | @SpringBootApplication
method main (line 9) | public static void main(String[] args) {
FILE: code/Phoenix/spring-boot-mybatis-phoenix/src/main/java/com/heibaiying/springboot/bean/USPopulation.java
class USPopulation (line 8) | @Data
FILE: code/Phoenix/spring-boot-mybatis-phoenix/src/main/java/com/heibaiying/springboot/dao/PopulationDao.java
type PopulationDao (line 8) | @Mapper
method queryAll (line 11) | @Select("SELECT * from us_population")
method save (line 14) | @Insert("UPSERT INTO us_population VALUES( #{state}, #{city}, #{popula...
method queryByStateAndCity (line 17) | @Select("SELECT * FROM us_population WHERE state=#{state} AND city = #...
method deleteByStateAndCity (line 21) | @Delete("DELETE FROM us_population WHERE state=#{state} AND city = #{c...
FILE: code/Phoenix/spring-boot-mybatis-phoenix/src/test/java/com/heibaiying/springboot/PopulationTest.java
class PopulationTest (line 13) | @RunWith(SpringRunner.class)
method queryAll (line 20) | @Test
method save (line 30) | @Test
method update (line 37) | @Test
method delete (line 45) | @Test
FILE: code/Phoenix/spring-mybatis-phoenix/src/main/java/com/heibaiying/bean/USPopulation.java
class USPopulation (line 7) | @Data
FILE: code/Phoenix/spring-mybatis-phoenix/src/main/java/com/heibaiying/dao/PopulationDao.java
type PopulationDao (line 8) | public interface PopulationDao {
method queryAll (line 10) | List<USPopulation> queryAll();
method save (line 12) | void save(USPopulation USPopulation);
method queryByStateAndCity (line 14) | USPopulation queryByStateAndCity(@Param("state") String state, @Param(...
method deleteByStateAndCity (line 16) | void deleteByStateAndCity(@Param("state") String state, @Param("city")...
FILE: code/Phoenix/spring-mybatis-phoenix/src/test/java/com/heibaiying/dao/PopulationDaoTest.java
class PopulationDaoTest (line 12) | @RunWith(SpringRunner.class)
method queryAll (line 19) | @Test
method save (line 29) | @Test
method update (line 36) | @Test
method delete (line 44) | @Test
FILE: code/Storm/storm-hbase-integration/src/main/java/com/heibaiying/WordCountToHBaseApp.java
class WordCountToHBaseApp (line 23) | public class WordCountToHBaseApp {
method main (line 30) | public static void main(String[] args) {
FILE: code/Storm/storm-hbase-integration/src/main/java/com/heibaiying/component/CountBolt.java
class CountBolt (line 17) | public class CountBolt extends BaseRichBolt {
method prepare (line 24) | @Override
method execute (line 29) | @Override
method declareOutputFields (line 43) | @Override
FILE: code/Storm/storm-hbase-integration/src/main/java/com/heibaiying/component/DataSourceSpout.java
class DataSourceSpout (line 18) | public class DataSourceSpout extends BaseRichSpout {
method open (line 24) | @Override
method nextTuple (line 29) | @Override
method declareOutputFields (line 37) | @Override
method productData (line 46) | private String productData() {
FILE: code/Storm/storm-hbase-integration/src/main/java/com/heibaiying/component/SplitBolt.java
class SplitBolt (line 17) | public class SplitBolt extends BaseRichBolt {
method prepare (line 21) | @Override
method execute (line 26) | @Override
method declareOutputFields (line 35) | @Override
FILE: code/Storm/storm-hdfs-integration/src/main/java/com.heibaiying/DataToHdfsApp.java
class DataToHdfsApp (line 25) | public class DataToHdfsApp {
method main (line 30) | public static void main(String[] args) {
FILE: code/Storm/storm-hdfs-integration/src/main/java/com.heibaiying/component/DataSourceSpout.java
class DataSourceSpout (line 18) | public class DataSourceSpout extends BaseRichSpout {
method open (line 24) | @Override
method nextTuple (line 29) | @Override
method declareOutputFields (line 37) | @Override
method productData (line 46) | private String productData() {
FILE: code/Storm/storm-kafka-integration/src/main/java/com/heibaiying/kafka/read/LogConsoleBolt.java
class LogConsoleBolt (line 14) | public class LogConsoleBolt extends BaseRichBolt {
method prepare (line 19) | public void prepare(Map stormConf, TopologyContext context, OutputColl...
method execute (line 23) | public void execute(Tuple input) {
method declareOutputFields (line 37) | public void declareOutputFields(OutputFieldsDeclarer declarer) {
FILE: code/Storm/storm-kafka-integration/src/main/java/com/heibaiying/kafka/read/ReadingFromKafkaApp.java
class ReadingFromKafkaApp (line 20) | public class ReadingFromKafkaApp {
method main (line 25) | public static void main(String[] args) {
method getKafkaSpoutConfig (line 45) | private static KafkaSpoutConfig<String, String> getKafkaSpoutConfig(St...
method getRetryService (line 57) | private static KafkaSpoutRetryService getRetryService() {
FILE: code/Storm/storm-kafka-integration/src/main/java/com/heibaiying/kafka/write/DataSourceSpout.java
class DataSourceSpout (line 17) | public class DataSourceSpout extends BaseRichSpout {
method open (line 23) | @Override
method nextTuple (line 28) | @Override
method declareOutputFields (line 36) | @Override
method productData (line 45) | private String productData() {
FILE: code/Storm/storm-kafka-integration/src/main/java/com/heibaiying/kafka/write/WritingToKafkaApp.java
class WritingToKafkaApp (line 19) | public class WritingToKafkaApp {
method main (line 24) | public static void main(String[] args) {
FILE: code/Storm/storm-redis-integration/src/main/java/com/heibaiying/CustomRedisCountApp.java
class CustomRedisCountApp (line 19) | public class CustomRedisCountApp {
method main (line 28) | public static void main(String[] args) {
FILE: code/Storm/storm-redis-integration/src/main/java/com/heibaiying/WordCountToRedisApp.java
class WordCountToRedisApp (line 21) | public class WordCountToRedisApp {
method main (line 32) | public static void main(String[] args) {
FILE: code/Storm/storm-redis-integration/src/main/java/com/heibaiying/component/CountBolt.java
class CountBolt (line 17) | public class CountBolt extends BaseRichBolt {
method prepare (line 24) | @Override
method execute (line 29) | @Override
method declareOutputFields (line 43) | @Override
FILE: code/Storm/storm-redis-integration/src/main/java/com/heibaiying/component/DataSourceSpout.java
class DataSourceSpout (line 18) | public class DataSourceSpout extends BaseRichSpout {
method open (line 24) | @Override
method nextTuple (line 29) | @Override
method declareOutputFields (line 37) | @Override
method productData (line 46) | private String productData() {
FILE: code/Storm/storm-redis-integration/src/main/java/com/heibaiying/component/RedisCountStoreBolt.java
class RedisCountStoreBolt (line 15) | public class RedisCountStoreBolt extends AbstractRedisBolt {
method RedisCountStoreBolt (line 21) | public RedisCountStoreBolt(JedisPoolConfig config, RedisStoreMapper st...
method process (line 29) | @Override
method declareOutputFields (line 52) | @Override
FILE: code/Storm/storm-redis-integration/src/main/java/com/heibaiying/component/SplitBolt.java
class SplitBolt (line 16) | public class SplitBolt extends BaseRichBolt {
method prepare (line 20) | @Override
method execute (line 25) | @Override
method declareOutputFields (line 34) | @Override
FILE: code/Storm/storm-redis-integration/src/main/java/com/heibaiying/component/WordCountStoreMapper.java
class WordCountStoreMapper (line 10) | public class WordCountStoreMapper implements RedisStoreMapper {
method WordCountStoreMapper (line 14) | public WordCountStoreMapper() {
method getDataTypeDescription (line 19) | @Override
method getKeyFromTuple (line 24) | @Override
method getValueFromTuple (line 29) | @Override
FILE: code/Storm/storm-word-count/src/main/java/com/heibaiying/wordcount/ClusterWordCountApp.java
class ClusterWordCountApp (line 14) | public class ClusterWordCountApp {
method main (line 16) | public static void main(String[] args) {
FILE: code/Storm/storm-word-count/src/main/java/com/heibaiying/wordcount/LocalWordCountApp.java
class LocalWordCountApp (line 10) | public class LocalWordCountApp {
method main (line 12) | public static void main(String[] args) {
FILE: code/Storm/storm-word-count/src/main/java/com/heibaiying/wordcount/component/CountBolt.java
class CountBolt (line 12) | public class CountBolt extends BaseRichBolt {
method prepare (line 16) | @Override
method execute (line 21) | @Override
method declareOutputFields (line 36) | @Override
FILE: code/Storm/storm-word-count/src/main/java/com/heibaiying/wordcount/component/DataSourceSpout.java
class DataSourceSpout (line 14) | public class DataSourceSpout extends BaseRichSpout {
method open (line 20) | @Override
method nextTuple (line 25) | @Override
method declareOutputFields (line 33) | @Override
method productData (line 42) | private String productData() {
FILE: code/Storm/storm-word-count/src/main/java/com/heibaiying/wordcount/component/SplitBolt.java
class SplitBolt (line 13) | public class SplitBolt extends BaseRichBolt {
method prepare (line 17) | @Override
method execute (line 22) | @Override
method declareOutputFields (line 31) | @Override
FILE: code/Zookeeper/curator/src/main/java/com/heibaiying/AclOperation.java
class AclOperation (line 24) | public class AclOperation {
method prepare (line 32) | @Before
method createNodesWithAcl (line 47) | @Test
method SetAcl (line 72) | @Test
method getAcl (line 85) | @Test
method destroy (line 93) | @After
FILE: code/Zookeeper/curator/src/main/java/com/heibaiying/BasicOperation.java
class BasicOperation (line 25) | public class BasicOperation {
method prepare (line 33) | @Before
method getStatus (line 48) | @Test
method createNodes (line 58) | @Test
method getNode (line 71) | @Test
method getChildrenNodes (line 82) | @Test
method updateNode (line 94) | @Test
method deleteNodes (line 105) | @Test
method existNode (line 118) | @Test
method DisposableWatch (line 129) | @Test
method permanentWatch (line 143) | @Test
method permanentChildrenNodesWatch (line 165) | @Test
method destroy (line 208) | @After
FILE: code/spark/spark-streaming-basis/src/main/java/com/heibaiying/utils/JedisPoolUtil.java
class JedisPoolUtil (line 7) | public class JedisPoolUtil {
method getConnection (line 17) | public static Jedis getConnection() {
Condensed preview — 225 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (1,052K chars).
[
{
"path": ".gitignore",
"chars": 411,
"preview": "*#\n*.iml\n*.ipr\n*.iws\n*.sw?\n*~\n.#*\n.*.md.html\n.DS_Store\n.classpath\n.factorypath\n.gradle\n.idea\n.metadata\n.project\n.recomme"
},
{
"path": "README.md",
"chars": 7929,
"preview": "# BigData-Notes\n\n\n\n<div align=\"center\"> <img width=\"444px\" src=\"https://gitee.com/heibaiying/BigData-Notes/raw/master/pi"
},
{
"path": "code/Flink/flink-basis-java/pom.xml",
"chars": 7529,
"preview": "<!--\nLicensed to the Apache Software Foundation (ASF) under one\nor more contributor license agreements. See the NOTICE "
},
{
"path": "code/Flink/flink-basis-java/src/main/java/com/heibaiying/StreamingJob.java",
"chars": 741,
"preview": "package com.heibaiying;\n\nimport org.apache.flink.api.java.operators.DataSource;\nimport org.apache.flink.streaming.api.da"
},
{
"path": "code/Flink/flink-basis-java/src/main/resources/log4j.properties",
"chars": 1194,
"preview": "################################################################################\n# Licensed to the Apache Software Foun"
},
{
"path": "code/Flink/flink-basis-scala/pom.xml",
"chars": 8605,
"preview": "<!--\nLicensed to the Apache Software Foundation (ASF) under one\nor more contributor license agreements. See the NOTICE "
},
{
"path": "code/Flink/flink-basis-scala/src/main/resources/log4j.properties",
"chars": 1194,
"preview": "################################################################################\n# Licensed to the Apache Software Foun"
},
{
"path": "code/Flink/flink-basis-scala/src/main/resources/wordcount.txt",
"chars": 24,
"preview": "a,a,a,a,a\nb,b,b\nc,c\nd,d\n"
},
{
"path": "code/Flink/flink-basis-scala/src/main/scala/com/heibaiying/WordCountBatch.scala",
"chars": 494,
"preview": "package com.heibaiying\n\nimport org.apache.flink.api.scala._\n\nobject WordCountBatch {\n\n def main(args: Array[String]): U"
},
{
"path": "code/Flink/flink-basis-scala/src/main/scala/com/heibaiying/WordCountStreaming.scala",
"chars": 657,
"preview": "package com.heibaiying\n\nimport org.apache.flink.streaming.api.scala._\nimport org.apache.flink.streaming.api.windowing.ti"
},
{
"path": "code/Flink/flink-kafka-integration/pom.xml",
"chars": 7597,
"preview": "<!--\nLicensed to the Apache Software Foundation (ASF) under one\nor more contributor license agreements. See the NOTICE "
},
{
"path": "code/Flink/flink-kafka-integration/src/main/java/com/heibaiying/CustomSinkJob.java",
"chars": 824,
"preview": "package com.heibaiying;\n\nimport com.heibaiying.bean.Employee;\nimport com.heibaiying.sink.FlinkToMySQLSink;\nimport org.ap"
},
{
"path": "code/Flink/flink-kafka-integration/src/main/java/com/heibaiying/KafkaStreamingJob.java",
"chars": 2010,
"preview": "package com.heibaiying;\n\nimport org.apache.flink.api.common.functions.MapFunction;\nimport org.apache.flink.api.common.se"
},
{
"path": "code/Flink/flink-kafka-integration/src/main/java/com/heibaiying/bean/Employee.java",
"chars": 719,
"preview": "package com.heibaiying.bean;\n\nimport java.sql.Date;\n\npublic class Employee {\n\n private String name;\n private int a"
},
{
"path": "code/Flink/flink-kafka-integration/src/main/java/com/heibaiying/sink/FlinkToMySQLSink.java",
"chars": 1315,
"preview": "package com.heibaiying.sink;\n\nimport com.heibaiying.bean.Employee;\nimport org.apache.flink.configuration.Configuration;\n"
},
{
"path": "code/Flink/flink-kafka-integration/src/main/resources/log4j.properties",
"chars": 1194,
"preview": "################################################################################\n# Licensed to the Apache Software Foun"
},
{
"path": "code/Flink/flink-state-management/pom.xml",
"chars": 7471,
"preview": "<!--\nLicensed to the Apache Software Foundation (ASF) under one\nor more contributor license agreements. See the NOTICE "
},
{
"path": "code/Flink/flink-state-management/src/main/java/com/heibaiying/keyedstate/KeyedStateJob.java",
"chars": 1037,
"preview": "package com.heibaiying.keyedstate;\n\nimport org.apache.flink.api.java.tuple.Tuple2;\nimport org.apache.flink.streaming.api"
},
{
"path": "code/Flink/flink-state-management/src/main/java/com/heibaiying/keyedstate/ThresholdWarning.java",
"chars": 1743,
"preview": "package com.heibaiying.keyedstate;\n\nimport org.apache.flink.api.common.functions.RichFlatMapFunction;\nimport org.apache."
},
{
"path": "code/Flink/flink-state-management/src/main/java/com/heibaiying/keyedstate/ThresholdWarningWithTTL.java",
"chars": 2231,
"preview": "package com.heibaiying.keyedstate;\n\nimport org.apache.flink.api.common.functions.RichFlatMapFunction;\nimport org.apache."
},
{
"path": "code/Flink/flink-state-management/src/main/java/com/heibaiying/operatorstate/OperatorStateJob.java",
"chars": 1112,
"preview": "package com.heibaiying.operatorstate;\n\nimport org.apache.flink.api.java.tuple.Tuple2;\nimport org.apache.flink.streaming."
},
{
"path": "code/Flink/flink-state-management/src/main/java/com/heibaiying/operatorstate/ThresholdWarning.java",
"chars": 2673,
"preview": "package com.heibaiying.operatorstate;\n\nimport org.apache.flink.api.common.functions.RichFlatMapFunction;\nimport org.apac"
},
{
"path": "code/Flink/flink-state-management/src/main/resources/log4j.properties",
"chars": 1194,
"preview": "################################################################################\n# Licensed to the Apache Software Foun"
},
{
"path": "code/Hadoop/hadoop-word-count/pom.xml",
"chars": 1629,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<project xmlns=\"http://maven.apache.org/POM/4.0.0\"\n xmlns:xsi=\"http://www"
},
{
"path": "code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/WordCountApp.java",
"chars": 2443,
"preview": "package com.heibaiying;\n\nimport com.heibaiying.component.WordCountMapper;\nimport com.heibaiying.component.WordCountReduc"
},
{
"path": "code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/WordCountCombinerApp.java",
"chars": 2539,
"preview": "package com.heibaiying;\n\nimport com.heibaiying.component.WordCountMapper;\nimport com.heibaiying.component.WordCountReduc"
},
{
"path": "code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/WordCountCombinerPartitionerApp.java",
"chars": 2831,
"preview": "package com.heibaiying;\n\nimport com.heibaiying.component.CustomPartitioner;\nimport com.heibaiying.component.WordCountMap"
},
{
"path": "code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/component/CustomPartitioner.java",
"chars": 476,
"preview": "package com.heibaiying.component;\n\nimport com.heibaiying.utils.WordCountDataUtils;\nimport org.apache.hadoop.io.IntWritab"
},
{
"path": "code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/component/WordCountMapper.java",
"chars": 639,
"preview": "package com.heibaiying.component;\n\nimport org.apache.hadoop.io.IntWritable;\nimport org.apache.hadoop.io.LongWritable;\nim"
},
{
"path": "code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/component/WordCountReducer.java",
"chars": 599,
"preview": "package com.heibaiying.component;\n\nimport org.apache.hadoop.io.IntWritable;\nimport org.apache.hadoop.io.Text;\nimport org"
},
{
"path": "code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/utils/WordCountDataUtils.java",
"chars": 2743,
"preview": "package com.heibaiying.utils;\n\nimport org.apache.commons.lang3.StringUtils;\nimport org.apache.hadoop.conf.Configuration;"
},
{
"path": "code/Hadoop/hadoop-word-count/src/main/resources/log4j.properties",
"chars": 398,
"preview": "log4j.rootLogger=INFO,CONSOLE\nlog4j.addivity.org.apache=false\n\nlog4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender\nl"
},
{
"path": "code/Hadoop/hdfs-java-api/pom.xml",
"chars": 1262,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<project xmlns=\"http://maven.apache.org/POM/4.0.0\"\n xmlns:xsi=\"http://www"
},
{
"path": "code/Hadoop/hdfs-java-api/src/main/java/com/heibaiying/utils/HdfsUtils.java",
"chars": 4536,
"preview": "package com.heibaiying.utils;\n\nimport org.apache.hadoop.conf.Configuration;\nimport org.apache.hadoop.fs.*;\n\nimport java."
},
{
"path": "code/Hadoop/hdfs-java-api/src/test/java/HdfsTest.java",
"chars": 6829,
"preview": "import org.apache.hadoop.conf.Configuration;\nimport org.apache.hadoop.fs.*;\nimport org.apache.hadoop.fs.permission.FsAct"
},
{
"path": "code/Hbase/hbase-java-api-1.x/pom.xml",
"chars": 1216,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<project xmlns=\"http://maven.apache.org/POM/4.0.0\"\n xmlns:xsi=\"http://www"
},
{
"path": "code/Hbase/hbase-java-api-1.x/src/main/java/com/heibaiying/HBaseUtils.java",
"chars": 8052,
"preview": "package com.heibaiying;\n\nimport javafx.util.Pair;\nimport org.apache.hadoop.conf.Configuration;\nimport org.apache.hadoop."
},
{
"path": "code/Hbase/hbase-java-api-1.x/src/test/java/com/heibaiying/HbaseUtilsTest.java",
"chars": 3694,
"preview": "package com.heibaiying;\n\nimport javafx.util.Pair;\nimport org.apache.hadoop.hbase.client.Result;\nimport org.apache.hadoop"
},
{
"path": "code/Hbase/hbase-java-api-2.x/pom.xml",
"chars": 1215,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<project xmlns=\"http://maven.apache.org/POM/4.0.0\"\n xmlns:xsi=\"http://www"
},
{
"path": "code/Hbase/hbase-java-api-2.x/src/main/java/com/heibaiying/HBaseUtils.java",
"chars": 8183,
"preview": "package com.heibaiying;\n\nimport javafx.util.Pair;\nimport org.apache.hadoop.conf.Configuration;\nimport org.apache.hadoop."
},
{
"path": "code/Hbase/hbase-java-api-2.x/src/test/java/heibaiying/HBaseUtilsTest.java",
"chars": 3762,
"preview": "package heibaiying;\n\nimport com.heibaiying.HBaseUtils;\nimport javafx.util.Pair;\nimport org.apache.hadoop.hbase.CompareOp"
},
{
"path": "code/Hbase/hbase-observer-coprocessor/pom.xml",
"chars": 839,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<project xmlns=\"http://maven.apache.org/POM/4.0.0\"\n xmlns:xsi=\"http://www"
},
{
"path": "code/Hbase/hbase-observer-coprocessor/src/main/java/com/heibaiying/AppendRegionObserver.java",
"chars": 1939,
"preview": "package com.heibaiying;\n\nimport org.apache.hadoop.hbase.Cell;\nimport org.apache.hadoop.hbase.CellUtil;\nimport org.apache"
},
{
"path": "code/Kafka/kafka-basis/pom.xml",
"chars": 1356,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<project xmlns=\"http://maven.apache.org/POM/4.0.0\"\n xmlns:xsi=\"http://www"
},
{
"path": "code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/ConsumerASyn.java",
"chars": 1953,
"preview": "package com.heibaiying.consumers;\n\nimport org.apache.kafka.clients.consumer.*;\nimport org.apache.kafka.common.TopicParti"
},
{
"path": "code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/ConsumerASynAndSyn.java",
"chars": 1686,
"preview": "package com.heibaiying.consumers;\n\nimport org.apache.kafka.clients.consumer.ConsumerRecord;\nimport org.apache.kafka.clie"
},
{
"path": "code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/ConsumerASynWithOffsets.java",
"chars": 2146,
"preview": "package com.heibaiying.consumers;\n\nimport org.apache.kafka.clients.consumer.ConsumerRecord;\nimport org.apache.kafka.clie"
},
{
"path": "code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/ConsumerExit.java",
"chars": 2400,
"preview": "package com.heibaiying.consumers;\n\nimport org.apache.kafka.clients.consumer.ConsumerRecord;\nimport org.apache.kafka.clie"
},
{
"path": "code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/ConsumerGroup.java",
"chars": 1641,
"preview": "package com.heibaiying.consumers;\n\nimport org.apache.kafka.clients.consumer.ConsumerRecord;\nimport org.apache.kafka.clie"
},
{
"path": "code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/ConsumerSyn.java",
"chars": 1469,
"preview": "package com.heibaiying.consumers;\n\nimport org.apache.kafka.clients.consumer.ConsumerRecord;\nimport org.apache.kafka.clie"
},
{
"path": "code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/RebalanceListener.java",
"chars": 2272,
"preview": "package com.heibaiying.consumers;\n\nimport org.apache.kafka.clients.consumer.*;\nimport org.apache.kafka.common.TopicParti"
},
{
"path": "code/Kafka/kafka-basis/src/main/java/com/heibaiying/consumers/StandaloneConsumer.java",
"chars": 1969,
"preview": "package com.heibaiying.consumers;\n\nimport org.apache.kafka.clients.consumer.ConsumerRecord;\nimport org.apache.kafka.clie"
},
{
"path": "code/Kafka/kafka-basis/src/main/java/com/heibaiying/producers/ProducerASyn.java",
"chars": 1404,
"preview": "package com.heibaiying.producers;\n\nimport org.apache.kafka.clients.producer.*;\n\nimport java.util.Properties;\n\n/*\n * Kafk"
},
{
"path": "code/Kafka/kafka-basis/src/main/java/com/heibaiying/producers/ProducerSyn.java",
"chars": 1472,
"preview": "package com.heibaiying.producers;\n\nimport org.apache.kafka.clients.producer.KafkaProducer;\nimport org.apache.kafka.clien"
},
{
"path": "code/Kafka/kafka-basis/src/main/java/com/heibaiying/producers/ProducerWithPartitioner.java",
"chars": 1225,
"preview": "package com.heibaiying.producers;\n\nimport org.apache.kafka.clients.producer.*;\n\nimport java.util.Properties;\n\n/*\n * Kafk"
},
{
"path": "code/Kafka/kafka-basis/src/main/java/com/heibaiying/producers/SimpleProducer.java",
"chars": 1027,
"preview": "package com.heibaiying.producers;\n\nimport org.apache.kafka.clients.producer.KafkaProducer;\nimport org.apache.kafka.clien"
},
{
"path": "code/Kafka/kafka-basis/src/main/java/com/heibaiying/producers/partitioners/CustomPartitioner.java",
"chars": 674,
"preview": "package com.heibaiying.producers.partitioners;\n\nimport org.apache.kafka.clients.producer.Partitioner;\nimport org.apache."
},
{
"path": "code/Phoenix/spring-boot-mybatis-phoenix/pom.xml",
"chars": 2113,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<project xmlns=\"http://maven.apache.org/POM/4.0.0\" xmlns:xsi=\"http://www.w3.org/2"
},
{
"path": "code/Phoenix/spring-boot-mybatis-phoenix/src/main/java/com/heibaiying/springboot/SpringBootMybatisApplication.java",
"chars": 353,
"preview": "package com.heibaiying.springboot;\n\nimport org.springframework.boot.SpringApplication;\nimport org.springframework.boot.a"
},
{
"path": "code/Phoenix/spring-boot-mybatis-phoenix/src/main/java/com/heibaiying/springboot/bean/USPopulation.java",
"chars": 320,
"preview": "package com.heibaiying.springboot.bean;\n\nimport lombok.AllArgsConstructor;\nimport lombok.Data;\nimport lombok.NoArgsConst"
},
{
"path": "code/Phoenix/spring-boot-mybatis-phoenix/src/main/java/com/heibaiying/springboot/dao/PopulationDao.java",
"chars": 697,
"preview": "package com.heibaiying.springboot.dao;\n\nimport com.heibaiying.springboot.bean.USPopulation;\nimport org.apache.ibatis.ann"
},
{
"path": "code/Phoenix/spring-boot-mybatis-phoenix/src/main/resources/application.yml",
"chars": 1039,
"preview": "spring:\n datasource:\n #zookeeper地址\n url: jdbc:phoenix:192.168.0.105:2181\n driver-class-name: org.apache.phoeni"
},
{
"path": "code/Phoenix/spring-boot-mybatis-phoenix/src/test/java/com/heibaiying/springboot/PopulationTest.java",
"chars": 1602,
"preview": "package com.heibaiying.springboot;\n\nimport com.heibaiying.springboot.bean.USPopulation;\nimport com.heibaiying.springboot"
},
{
"path": "code/Phoenix/spring-mybatis-phoenix/pom.xml",
"chars": 2642,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<project xmlns=\"http://maven.apache.org/POM/4.0.0\"\n xmlns:xsi=\"http://www"
},
{
"path": "code/Phoenix/spring-mybatis-phoenix/src/main/java/com/heibaiying/bean/USPopulation.java",
"chars": 274,
"preview": "package com.heibaiying.bean;\n\nimport lombok.AllArgsConstructor;\nimport lombok.Data;\nimport lombok.NoArgsConstructor;\n\n@D"
},
{
"path": "code/Phoenix/spring-mybatis-phoenix/src/main/java/com/heibaiying/dao/PopulationDao.java",
"chars": 440,
"preview": "package com.heibaiying.dao;\n\nimport com.heibaiying.bean.USPopulation;\nimport org.apache.ibatis.annotations.Param;\n\nimpor"
},
{
"path": "code/Phoenix/spring-mybatis-phoenix/src/main/resources/jdbc.properties",
"chars": 123,
"preview": "# ݿ\nphoenix.driverClassName=org.apache.phoenix.jdbc.PhoenixDriver\n# zookeeperַ\nphoenix.url=jdbc:phoenix:192.168.0.105:21"
},
{
"path": "code/Phoenix/spring-mybatis-phoenix/src/main/resources/mappers/Population.xml",
"chars": 732,
"preview": "<!DOCTYPE mapper\n PUBLIC \"-//mybatis.org//DTD Mapper 3.0//EN\"\n \"http://mybatis.org/dtd/mybatis-3-mapper.dt"
},
{
"path": "code/Phoenix/spring-mybatis-phoenix/src/main/resources/mybatisConfig.xml",
"chars": 527,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n<!DOCTYPE configuration\n PUBLIC \"-//mybatis.org//DTD Config 3.0//EN\"\n "
},
{
"path": "code/Phoenix/spring-mybatis-phoenix/src/main/resources/springApplication.xml",
"chars": 1854,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<beans xmlns=\"http://www.springframework.org/schema/beans\"\n xmlns:xsi=\"http"
},
{
"path": "code/Phoenix/spring-mybatis-phoenix/src/test/java/com/heibaiying/dao/PopulationDaoTest.java",
"chars": 1577,
"preview": "package com.heibaiying.dao;\n\nimport com.heibaiying.bean.USPopulation;\nimport org.junit.Test;\nimport org.junit.runner.Run"
},
{
"path": "code/Storm/storm-hbase-integration/pom.xml",
"chars": 3655,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<project xmlns=\"http://maven.apache.org/POM/4.0.0\"\n xmlns:xsi=\"http://www"
},
{
"path": "code/Storm/storm-hbase-integration/src/main/java/com/heibaiying/WordCountToHBaseApp.java",
"chars": 2901,
"preview": "package com.heibaiying;\n\nimport com.heibaiying.component.CountBolt;\nimport com.heibaiying.component.DataSourceSpout;\nimp"
},
{
"path": "code/Storm/storm-hbase-integration/src/main/java/com/heibaiying/component/CountBolt.java",
"chars": 1209,
"preview": "package com.heibaiying.component;\n\nimport org.apache.storm.task.OutputCollector;\nimport org.apache.storm.task.TopologyCo"
},
{
"path": "code/Storm/storm-hbase-integration/src/main/java/com/heibaiying/component/DataSourceSpout.java",
"chars": 1505,
"preview": "package com.heibaiying.component;\n\nimport org.apache.storm.shade.org.apache.commons.lang.StringUtils;\nimport org.apache."
},
{
"path": "code/Storm/storm-hbase-integration/src/main/java/com/heibaiying/component/SplitBolt.java",
"chars": 1042,
"preview": "package com.heibaiying.component;\n\nimport org.apache.storm.task.OutputCollector;\nimport org.apache.storm.task.TopologyCo"
},
{
"path": "code/Storm/storm-hdfs-integration/pom.xml",
"chars": 5093,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<project xmlns=\"http://maven.apache.org/POM/4.0.0\"\n xmlns:xsi=\"http://www"
},
{
"path": "code/Storm/storm-hdfs-integration/src/main/java/com.heibaiying/DataToHdfsApp.java",
"chars": 3040,
"preview": "package com.heibaiying;\n\nimport com.heibaiying.component.DataSourceSpout;\nimport org.apache.storm.Config;\nimport org.apa"
},
{
"path": "code/Storm/storm-hdfs-integration/src/main/java/com.heibaiying/component/DataSourceSpout.java",
"chars": 1505,
"preview": "package com.heibaiying.component;\n\nimport org.apache.storm.shade.org.apache.commons.lang.StringUtils;\nimport org.apache."
},
{
"path": "code/Storm/storm-kafka-integration/pom.xml",
"chars": 3865,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<project xmlns=\"http://maven.apache.org/POM/4.0.0\"\n xmlns:xsi=\"http://www"
},
{
"path": "code/Storm/storm-kafka-integration/src/main/java/com/heibaiying/kafka/read/LogConsoleBolt.java",
"chars": 994,
"preview": "package com.heibaiying.kafka.read;\n\nimport org.apache.storm.task.OutputCollector;\nimport org.apache.storm.task.TopologyC"
},
{
"path": "code/Storm/storm-kafka-integration/src/main/java/com/heibaiying/kafka/read/ReadingFromKafkaApp.java",
"chars": 2635,
"preview": "package com.heibaiying.kafka.read;\n\nimport org.apache.kafka.clients.consumer.ConsumerConfig;\nimport org.apache.storm.Con"
},
{
"path": "code/Storm/storm-kafka-integration/src/main/java/com/heibaiying/kafka/write/DataSourceSpout.java",
"chars": 1522,
"preview": "package com.heibaiying.kafka.write;\n\nimport org.apache.storm.shade.org.apache.commons.lang.StringUtils;\nimport org.apach"
},
{
"path": "code/Storm/storm-kafka-integration/src/main/java/com/heibaiying/kafka/write/WritingToKafkaApp.java",
"chars": 2578,
"preview": "package com.heibaiying.kafka.write;\n\nimport org.apache.storm.Config;\nimport org.apache.storm.LocalCluster;\nimport org.ap"
},
{
"path": "code/Storm/storm-redis-integration/pom.xml",
"chars": 3624,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<project xmlns=\"http://maven.apache.org/POM/4.0.0\"\n xmlns:xsi=\"http://www"
},
{
"path": "code/Storm/storm-redis-integration/src/main/java/com/heibaiying/CustomRedisCountApp.java",
"chars": 2232,
"preview": "package com.heibaiying;\n\nimport com.heibaiying.component.*;\nimport org.apache.storm.Config;\nimport org.apache.storm.Loca"
},
{
"path": "code/Storm/storm-redis-integration/src/main/java/com/heibaiying/WordCountToRedisApp.java",
"chars": 2547,
"preview": "package com.heibaiying;\n\nimport com.heibaiying.component.CountBolt;\nimport com.heibaiying.component.DataSourceSpout;\nimp"
},
{
"path": "code/Storm/storm-redis-integration/src/main/java/com/heibaiying/component/CountBolt.java",
"chars": 1209,
"preview": "package com.heibaiying.component;\n\nimport org.apache.storm.task.OutputCollector;\nimport org.apache.storm.task.TopologyCo"
},
{
"path": "code/Storm/storm-redis-integration/src/main/java/com/heibaiying/component/DataSourceSpout.java",
"chars": 1505,
"preview": "package com.heibaiying.component;\n\nimport org.apache.storm.shade.org.apache.commons.lang.StringUtils;\nimport org.apache."
},
{
"path": "code/Storm/storm-redis-integration/src/main/java/com/heibaiying/component/RedisCountStoreBolt.java",
"chars": 1970,
"preview": "package com.heibaiying.component;\n\nimport org.apache.storm.redis.bolt.AbstractRedisBolt;\nimport org.apache.storm.redis.c"
},
{
"path": "code/Storm/storm-redis-integration/src/main/java/com/heibaiying/component/SplitBolt.java",
"chars": 1050,
"preview": "package com.heibaiying.component;\n\nimport org.apache.storm.task.OutputCollector;\nimport org.apache.storm.task.TopologyCo"
},
{
"path": "code/Storm/storm-redis-integration/src/main/java/com/heibaiying/component/WordCountStoreMapper.java",
"chars": 921,
"preview": "package com.heibaiying.component;\n\nimport org.apache.storm.redis.common.mapper.RedisDataTypeDescription;\nimport org.apac"
},
{
"path": "code/Storm/storm-word-count/pom.xml",
"chars": 1743,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<project xmlns=\"http://maven.apache.org/POM/4.0.0\"\n xmlns:xsi=\"http://www"
},
{
"path": "code/Storm/storm-word-count/src/main/java/com/heibaiying/wordcount/ClusterWordCountApp.java",
"chars": 1330,
"preview": "package com.heibaiying.wordcount;\n\nimport com.heibaiying.wordcount.component.CountBolt;\nimport com.heibaiying.wordcount."
},
{
"path": "code/Storm/storm-word-count/src/main/java/com/heibaiying/wordcount/LocalWordCountApp.java",
"chars": 1022,
"preview": "package com.heibaiying.wordcount;\n\nimport com.heibaiying.wordcount.component.CountBolt;\nimport com.heibaiying.wordcount."
},
{
"path": "code/Storm/storm-word-count/src/main/java/com/heibaiying/wordcount/component/CountBolt.java",
"chars": 1100,
"preview": "package com.heibaiying.wordcount.component;\n\nimport org.apache.storm.task.OutputCollector;\nimport org.apache.storm.task."
},
{
"path": "code/Storm/storm-word-count/src/main/java/com/heibaiying/wordcount/component/DataSourceSpout.java",
"chars": 1470,
"preview": "package com.heibaiying.wordcount.component;\n\nimport org.apache.commons.lang3.StringUtils;\nimport org.apache.storm.spout."
},
{
"path": "code/Storm/storm-word-count/src/main/java/com/heibaiying/wordcount/component/SplitBolt.java",
"chars": 1002,
"preview": "package com.heibaiying.wordcount.component;\n\nimport org.apache.storm.task.OutputCollector;\nimport org.apache.storm.task."
},
{
"path": "code/Storm/storm-word-count/src/main/resources/assembly.xml",
"chars": 842,
"preview": "<assembly xmlns=\"http://maven.apache.org/ASSEMBLY/2.0.0\"\n xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\""
},
{
"path": "code/Zookeeper/curator/pom.xml",
"chars": 1558,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<project xmlns=\"http://maven.apache.org/POM/4.0.0\"\n xmlns:xsi=\"http://www"
},
{
"path": "code/Zookeeper/curator/src/main/java/com/heibaiying/AclOperation.java",
"chars": 2988,
"preview": "package com.heibaiying;\n\nimport org.apache.curator.RetryPolicy;\nimport org.apache.curator.framework.CuratorFramework;\nim"
},
{
"path": "code/Zookeeper/curator/src/main/java/com/heibaiying/BasicOperation.java",
"chars": 6516,
"preview": "package com.heibaiying;\n\nimport org.apache.curator.RetryPolicy;\nimport org.apache.curator.framework.CuratorFramework;\nim"
},
{
"path": "code/spark/spark-streaming-basis/pom.xml",
"chars": 823,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<project xmlns=\"http://maven.apache.org/POM/4.0.0\"\n xmlns:xsi=\"http://www"
},
{
"path": "code/spark/spark-streaming-basis/src/main/java/com/heibaiying/NetworkWordCount.scala",
"chars": 603,
"preview": "package com.heibaiying\n\nimport org.apache.spark.SparkConf\nimport org.apache.spark.streaming.{Seconds, StreamingContext}\n"
},
{
"path": "code/spark/spark-streaming-basis/src/main/java/com/heibaiying/NetworkWordCountToRedis.scala",
"chars": 1210,
"preview": "package com.heibaiying\n\nimport com.heibaiying.utils.JedisPoolUtil\nimport org.apache.spark.SparkConf\nimport org.apache.sp"
},
{
"path": "code/spark/spark-streaming-basis/src/main/java/com/heibaiying/NetworkWordCountV2.scala",
"chars": 1173,
"preview": "package com.heibaiying\n\nimport org.apache.spark.SparkConf\nimport org.apache.spark.streaming.{Seconds, StreamingContext}\n"
},
{
"path": "code/spark/spark-streaming-basis/src/main/java/com/heibaiying/utils/JedisPoolUtil.java",
"chars": 860,
"preview": "package com.heibaiying.utils;\n\nimport redis.clients.jedis.Jedis;\nimport redis.clients.jedis.JedisPool;\nimport redis.clie"
},
{
"path": "code/spark/spark-streaming-flume/pom.xml",
"chars": 5288,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<project xmlns=\"http://maven.apache.org/POM/4.0.0\"\n xmlns:xsi=\"http://www"
},
{
"path": "code/spark/spark-streaming-flume/src/main/scala/com/heibaiying/flume/PullBasedWordCount.scala",
"chars": 647,
"preview": "package com.heibaiying.flume\n\nimport org.apache.spark.SparkConf\nimport org.apache.spark.streaming.{Seconds, StreamingCon"
},
{
"path": "code/spark/spark-streaming-flume/src/main/scala/com/heibaiying/flume/PushBasedWordCount.scala",
"chars": 630,
"preview": "package com.heibaiying.flume\n\nimport org.apache.spark.SparkConf\nimport org.apache.spark.streaming.{Seconds, StreamingCon"
},
{
"path": "code/spark/spark-streaming-kafka/pom.xml",
"chars": 1402,
"preview": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<project xmlns=\"http://maven.apache.org/POM/4.0.0\"\n xmlns:xsi=\"http://www"
},
{
"path": "code/spark/spark-streaming-kafka/src/main/scala/com/heibaiying/kafka/KafkaDirectStream.scala",
"chars": 1758,
"preview": "package com.heibaiying.kafka\n\nimport org.apache.kafka.common.serialization.StringDeserializer\nimport org.apache.spark.Sp"
},
{
"path": "notes/Azkaban_Flow_1.0_的使用.md",
"chars": 5347,
"preview": "# Azkaban Flow 1.0 的使用\n\n<nav>\n<a href=\"#一简介\">一、简介</a><br/>\n<a href=\"#二基本任务调度\">二、基本任务调度</a><br/>\n<a href=\"#三多任务调度\">三、多任务调"
},
{
"path": "notes/Azkaban_Flow_2.0_的使用.md",
"chars": 6089,
"preview": "# Azkaban Flow 2.0的使用\n\n<nav>\n<a href=\"#一Flow-20-简介\">一、Flow 2.0 简介</a><br/>\n<a href=\"#二YAML语法\">二、YAML语法</a><br/>\n<a href="
},
{
"path": "notes/Azkaban简介.md",
"chars": 1921,
"preview": "# Azkaban简介\n\n\n## 一、Azkaban 介绍\n\n#### 1.1 背景\n\n一个完整的大数据分析系统,必然由很多任务单元 (如数据收集、数据清洗、数据存储、数据分析等) 组成,所有的任务单元及其之间的依赖关系组成了复杂的工作流。"
},
{
"path": "notes/Flink_Data_Sink.md",
"chars": 8754,
"preview": "# Flink Sink\n<nav>\n<a href=\"#一Data-Sinks\">一、Data Sinks</a><br/>\n <a href="
},
{
"path": "notes/Flink_Data_Source.md",
"chars": 9141,
"preview": "# Flink Data Source\n<nav>\n<a href=\"#一内置-Data-Source\">一、内置 Data Source</a><br/>\n  "
},
{
"path": "notes/Flink_Data_Transformation.md",
"chars": 11520,
"preview": "# Flink Transformation\n<nav>\n<a href=\"#一Transformations-分类\">一、Transformations 分类</a><br/>\n<a href=\"#二DataStream-Transfor"
},
{
"path": "notes/Flink_Windows.md",
"chars": 4247,
"preview": "# Flink Windows\n<nav>\n<a href=\"#一窗口概念\">一、窗口概念</a><br/>\n<a href=\"#二Time-Windows\">二、Time Windows</a><br/>\n &nbs"
},
{
"path": "notes/Flink开发环境搭建.md",
"chars": 9443,
"preview": "# Flink 开发环境搭建\n\n<nav>\n<a href=\"#一安装-Scala-插件\">一、安装 Scala 插件</a><br/>\n<a href=\"#二Flink-项目初始化\">二、Flink 项目初始化</a><br/>\n&nbs"
},
{
"path": "notes/Flink核心概念综述.md",
"chars": 7799,
"preview": "# Flink 核心概念综述\n<nav>\n<a href=\"#一Flink-简介\">一、Flink 简介</a><br/>\n<a href=\"#二Flink-核心架构\">二、Flink 核心架构</a><br/>\n &"
},
{
"path": "notes/Flink状态管理与检查点机制.md",
"chars": 12716,
"preview": "# Flink 状态管理\n<nav>\n<a href=\"#一状态分类\">一、状态分类</a><br/>\n <a href=\"#21-算子状态\">2"
},
{
"path": "notes/Flume整合Kafka.md",
"chars": 3043,
"preview": "# Flume 整合 Kafka\n\n<nav>\n<a href=\"#一背景\">一、背景</a><br/>\n<a href=\"#二整合流程\">二、整合流程</a><br/>\n &nbs"
},
{
"path": "notes/Flume简介及基本使用.md",
"chars": 8882,
"preview": "# Flume 简介及基本使用\n\n<nav>\n<a href=\"#一Flume简介\">一、Flume简介</a><br/>\n<a href=\"#二Flume架构和基本概念\">二、Flume架构和基本概念</a><br/>\n &nb"
},
{
"path": "notes/HDFS-Java-API.md",
"chars": 9903,
"preview": "# HDFS Java API\n\n<nav>\n<a href=\"#一-简介\">一、 简介</a><br/>\n<a href=\"#二API的使用\">二、API的使用</a><br/>\n  "
},
{
"path": "notes/HDFS常用Shell命令.md",
"chars": 2236,
"preview": "# HDFS 常用 shell 命令\n\n**1. 显示当前目录结构**\n\n```shell\n# 显示当前目录结构\nhadoop fs -ls <path>\n# 递归显示当前目录结构\nhadoop fs -ls -R <path>\n# "
},
{
"path": "notes/Hadoop-HDFS.md",
"chars": 5975,
"preview": "# Hadoop分布式文件系统——HDFS\n\n<nav>\n<a href=\"#一介绍\">一、介绍</a><br/>\n<a href=\"#二HDFS-设计原理\">二、HDFS 设计原理</a><br/>\n &"
},
{
"path": "notes/Hadoop-MapReduce.md",
"chars": 11413,
"preview": "# 分布式计算框架——MapReduce\n\n<nav>\n<a href=\"#一MapReduce概述\">一、MapReduce概述</a><br/>\n<a href=\"#二MapReduce编程模型简述\">二、MapReduce编程模型简述"
},
{
"path": "notes/Hadoop-YARN.md",
"chars": 4793,
"preview": "# 集群资源管理器——YARN\n\n<nav>\n<a href=\"#一hadoop-yarn-简介\">一、hadoop yarn 简介</a><br/>\n<a href=\"#二YARN架构\">二、YARN架构</a><br/>\n &"
},
{
"path": "notes/Hbase_Java_API.md",
"chars": 24111,
"preview": "# HBase Java API 的基本使用\n\n<nav>\n<a href=\"#一简述\">一、简述</a><br/>\n<a href=\"#二Java-API-1x-基本使用\">二、Java API 1.x 基本使用</a><br/>\n<a "
},
{
"path": "notes/Hbase_Shell.md",
"chars": 6098,
"preview": "# Hbase 常用 Shell 命令\n<nav>\n<a href=\"#一基本命令\">一、基本命令</a><br/>\n <a href=\"#11-"
},
{
"path": "notes/Hbase协处理器详解.md",
"chars": 13627,
"preview": "# Hbase 协处理器\n\n<nav>\n<a href=\"#一简述\">一、简述</a><br/>\n<a href=\"#二协处理器类型\">二、协处理器类型</a><br/>\n <a href=\"#"
},
{
"path": "notes/Hbase容灾与备份.md",
"chars": 4446,
"preview": "# Hbase容灾与备份\n\n<nav>\n<a href=\"#一前言\">一、前言</a><br/>\n<a href=\"#二CopyTable\">二、CopyTable</a><br/>\n <a h"
},
{
"path": "notes/Hbase的SQL中间层_Phoenix.md",
"chars": 8036,
"preview": "# Hbase的SQL中间层——Phoenix\n\n<nav>\n<a href=\"#一Phoenix简介\">一、Phoenix简介</a><br/>\n<a href=\"#二Phoenix安装\">二、Phoenix安装</a><br/>\n&nb"
},
{
"path": "notes/Hbase简介.md",
"chars": 2796,
"preview": "# HBase简介\n\n<nav>\n<a href=\"#一Hadoop的局限\">一、Hadoop的局限</a><br/>\n<a href=\"#二HBase简介\">二、HBase简介</a><br/>\n<a href=\"#三HBase-Tabl"
},
{
"path": "notes/Hbase系统架构及数据结构.md",
"chars": 6845,
"preview": "# Hbase系统架构及数据结构\n\n<nav>\n<a href=\"#一基本概念\">一、基本概念</a><br/>\n <a href=\"#11-Row-Key-行键\">1.1 Row Key (行"
},
{
"path": "notes/Hbase过滤器详解.md",
"chars": 13718,
"preview": "# Hbase 过滤器详解\n\n<nav>\n<a href=\"#一HBase过滤器简介\">一、HBase过滤器简介</a><br/>\n<a href=\"#二过滤器基础\">二、过滤器基础</a><br/>\n &"
},
{
"path": "notes/HiveCLI和Beeline命令行的基本使用.md",
"chars": 10570,
"preview": "# Hive CLI和Beeline命令行的基本使用\n\n<nav>\n<a href=\"#一Hive-CLI\">一、Hive CLI</a><br/>\n &nb"
},
{
"path": "notes/Hive分区表和分桶表.md",
"chars": 4749,
"preview": "# Hive分区表和分桶表\n\n<nav>\n<a href=\"#一分区表\">一、分区表</a><br/>\n<a href=\"#二分桶表\">二、分桶表</a><br/>\n<a href=\"#三分区表和分桶表结合使用\">三、分区表和分桶表结合使用"
},
{
"path": "notes/Hive常用DDL操作.md",
"chars": 10240,
"preview": "# Hive常用DDL操作\n\n<nav>\n<a href=\"#一Database\">一、Database</a><br/>\n <a href=\"#"
},
{
"path": "notes/Hive常用DML操作.md",
"chars": 8716,
"preview": "# Hive 常用DML操作\n\n<nav>\n<a href=\"#一加载文件数据到表\">一、加载文件数据到表</a><br/>\n<a href=\"#二查询结果插入到表\">二、查询结果插入到表</a><br/>\n<a href=\"#三使用SQL"
},
{
"path": "notes/Hive数据查询详解.md",
"chars": 9384,
"preview": "# Hive数据查询详解\n\n<nav>\n<a href=\"#一数据准备\">一、数据准备</a><br/>\n<a href=\"#二单表查询\">二、单表查询</a><br/>\n &nbs"
},
{
"path": "notes/Hive简介及核心概念.md",
"chars": 8201,
"preview": "# Hive简介及核心概念\n\n<nav>\n<a href=\"#一简介\">一、简介</a><br/>\n<a href=\"#二Hive的体系架构\">二、Hive的体系架构</a><br/>\n<a href=\"#三数据类型\">三、数据类型</a>"
},
{
"path": "notes/Hive视图和索引.md",
"chars": 5660,
"preview": "# Hive 视图和索引\n\n<nav>\n<a href=\"#一视图\">一、视图</a><br/>\n<a href=\"#二索引\">二、索引</a><br/>\n<a href=\"#三索引案例\">三、索引案例</a><br/>\n<a href=\""
},
{
"path": "notes/Kafka消费者详解.md",
"chars": 12557,
"preview": "# Kafka消费者详解\n\n<nav>\n<a href=\"#一消费者和消费者群组\">一、消费者和消费者群组</a><br/>\n<a href=\"#二分区再均衡\">二、分区再均衡</a><br/>\n<a href=\"#三创建Kafka消费者\""
},
{
"path": "notes/Kafka深入理解分区副本机制.md",
"chars": 7399,
"preview": "# 深入理解Kafka副本机制\n\n<nav>\n<a href=\"#一Kafka集群\">一、Kafka集群</a><br/>\n<a href=\"#二副本机制\">二、副本机制</a><br/>\n &"
},
{
"path": "notes/Kafka生产者详解.md",
"chars": 9834,
"preview": "# Kafka生产者详解\n\n<nav>\n<a href=\"#一生产者发送消息的过程\">一、生产者发送消息的过程</a><br/>\n<a href=\"#二创建生产者\">二、创建生产者</a><br/>\n<a href=\"#二发送消息\">二、发"
},
{
"path": "notes/Kafka简介.md",
"chars": 2647,
"preview": "# Kafka简介\n\n<nav>\n<a href=\"#一Kafka简介\">一、Kafka简介</a><br/>\n<a href=\"#二Kafka核心概念\">二、Kafka核心概念</a><br/>\n &nb"
},
{
"path": "notes/Scala函数和闭包.md",
"chars": 6311,
"preview": "# 函数和闭包\n\n<nav>\n<a href=\"#一函数\">一、函数</a><br/>\n <a href=\"#11-函数与方法\">1.1 函数与方"
},
{
"path": "notes/Scala列表和集.md",
"chars": 10552,
"preview": "# List & Set\n\n<nav>\n<a href=\"#一List字面量\">一、List字面量</a><br/>\n<a href=\"#二List类型\">二、List类型</a><br/>\n<a href=\"#三构建List\">三、构建L"
},
{
"path": "notes/Scala基本数据类型和运算符.md",
"chars": 5395,
"preview": "# Scala基本数据类型和运算符\n\n<nav>\n<a href=\"#一数据类型\">一、数据类型</a><br/>\n<a href=\"#二字面量\">二、字面量</a><br/>\n<a href=\"#三运算符\">三、运算符</a><br/>\n"
},
{
"path": "notes/Scala数组.md",
"chars": 4050,
"preview": "# Scala 数组相关操作\n\n<nav>\n<a href=\"#一定长数组\">一、定长数组</a><br/>\n<a href=\"#二变长数组\">二、变长数组</a><br/>\n<a href=\"#三数组遍历\">三、数组遍历</a><br/>"
},
{
"path": "notes/Scala映射和元组.md",
"chars": 5696,
"preview": "# Map & Tuple\n\n<nav>\n<a href=\"#一映射Map\">一、映射(Map)</a><br/>\n <a href=\"#11-构"
},
{
"path": "notes/Scala模式匹配.md",
"chars": 4148,
"preview": "# Scala模式匹配\n\n<nav>\n<a href=\"#一模式匹配\">一、模式匹配</a><br/>\n <a href=\"#11-更好的swit"
},
{
"path": "notes/Scala流程控制语句.md",
"chars": 3748,
"preview": "# 流程控制语句\n\n<nav>\n<a href=\"#一条件表达式if\">一、条件表达式if</a><br/>\n<a href=\"#二块表达式\">二、块表达式</a><br/>\n<a href=\"#三循环表达式while\">三、循环表达式wh"
},
{
"path": "notes/Scala简介及开发环境配置.md",
"chars": 3456,
"preview": "# Scala简介及开发环境配置\n\n<nav>\n<a href=\"#一Scala简介\">一、Scala简介</a><br/>\n<a href=\"#二配置IDEA开发环境\">二、配置IDEA开发环境</a><br/>\n</nav>\n\n\n## "
},
{
"path": "notes/Scala类和对象.md",
"chars": 7707,
"preview": "# 类和对象\n\n<nav>\n<a href=\"#一初识类和对象\">一、初识类和对象</a><br/>\n<a href=\"#二类\">二、类</a><br/>\n "
},
{
"path": "notes/Scala类型参数.md",
"chars": 12377,
"preview": "# 类型参数\n\n<nav>\n<a href=\"#一泛型\">一、泛型</a><br/>\n <a href=\"#11-泛型类\">1.1 泛型类</a>"
},
{
"path": "notes/Scala继承和特质.md",
"chars": 8354,
"preview": "# 继承和特质\n\n<nav>\n<a href=\"#一继承\">一、继承</a><br/>\n <a href=\"#11-Scala中的继承结构\">1."
},
{
"path": "notes/Scala隐式转换和隐式参数.md",
"chars": 7703,
"preview": "# 隐式转换和隐式参数\n\n<nav>\n<a href=\"#一隐式转换\">一、隐式转换</a><br/>\n <a href=\"#11-使用隐式转换\""
},
{
"path": "notes/Scala集合类型.md",
"chars": 14142,
"preview": "# 集合\n\n<nav>\n<a href=\"#一集合简介\">一、集合简介</a><br/>\n<a href=\"#二集合结构\">二、集合结构</a><br/>\n "
},
{
"path": "notes/SparkSQL_Dataset和DataFrame简介.md",
"chars": 6600,
"preview": "# DataFrame和Dataset简介\n\n<nav>\n<a href=\"#一Spark-SQL简介\">一、Spark SQL简介</a><br/>\n<a href=\"#二DataFrame--DataSet\">二、DataFrame &"
},
{
"path": "notes/SparkSQL外部数据源.md",
"chars": 20508,
"preview": "# Spark SQL 外部数据源\n\n<nav>\n<a href=\"#一简介\">一、简介</a><br/>\n <a href=\"#11-多数据源支"
},
{
"path": "notes/SparkSQL常用聚合函数.md",
"chars": 9127,
"preview": "# 聚合函数Aggregations\n\n<nav>\n<a href=\"#一简单聚合\">一、简单聚合</a><br/>\n <a href=\"#11-"
},
{
"path": "notes/SparkSQL联结操作.md",
"chars": 5325,
"preview": "# Spark SQL JOIN\n\n<nav>\n<a href=\"#一-数据准备\">一、 数据准备</a><br/>\n<a href=\"#二连接类型\">二、连接类型</a><br/>\n &nbs"
},
{
"path": "notes/Spark_RDD.md",
"chars": 7972,
"preview": "\n\n# 弹性式数据集RDDs\n\n<nav>\n<a href=\"#一RDD简介\">一、RDD简介</a><br/>\n<a href=\"#二创建RDD\">二、创建RDD</a><br/>\n &nbs"
},
{
"path": "notes/Spark_Streaming与流处理.md",
"chars": 2840,
"preview": "# Spark Streaming与流处理\n\n<nav>\n<a href=\"#一流处理\">一、流处理</a><br/>\n <a href=\"#11"
},
{
"path": "notes/Spark_Streaming基本操作.md",
"chars": 11192,
"preview": "# Spark Streaming 基本操作\n\n<nav>\n<a href=\"#一案例引入\">一、案例引入</a><br/>\n <a href=\""
},
{
"path": "notes/Spark_Streaming整合Flume.md",
"chars": 11451,
"preview": "# Spark Streaming 整合 Flume\n\n<nav>\n<a href=\"#一简介\">一、简介</a><br/>\n<a href=\"#二推送式方法\">二、推送式方法</a><br/>\n &nbs"
},
{
"path": "notes/Spark_Streaming整合Kafka.md",
"chars": 10020,
"preview": "# Spark Streaming 整合 Kafka\n\n<nav>\n<a href=\"#一版本说明\">一、版本说明</a><br/>\n<a href=\"#二项目依赖\">二、项目依赖</a><br/>\n<a href=\"#三整合Kafka\">"
},
{
"path": "notes/Spark_Structured_API的基本使用.md",
"chars": 5585,
"preview": "# Structured API基本使用\n\n<nav>\n<a href=\"#一创建DataFrame和Dataset\">一、创建DataFrame和Dataset</a><br/>\n<a href=\"#二Columns列操作\">二、Colu"
},
{
"path": "notes/Spark_Transformation和Action算子.md",
"chars": 14814,
"preview": "# Transformation 和 Action 常用算子\n\n<nav>\n<a href=\"#一Transformation\">一、Transformation</a><br/>\n  "
},
{
"path": "notes/Spark简介.md",
"chars": 3641,
"preview": "# Spark简介\n\n<nav>\n<a href=\"#一简介\">一、简介</a><br/>\n<a href=\"#二特点\">二、特点</a><br/>\n<a href=\"#三集群架构\">三、集群架构</a><br/>\n<a href=\"#四核"
},
{
"path": "notes/Spark累加器与广播变量.md",
"chars": 2857,
"preview": "# Spark 累加器与广播变量\n\n<nav>\n<a href=\"#一简介\">一、简介</a><br/>\n<a href=\"#二累加器\">二、累加器</a><br/>\n "
},
{
"path": "notes/Spark部署模式与作业提交.md",
"chars": 8015,
"preview": "# Spark部署模式与作业提交\n\n<nav>\n<a href=\"#一作业提交\">一、作业提交</a><br/>\n<a href=\"#二Local模式\">二、Local模式</a><br/>\n<a href=\"#三Standalone模式\""
},
{
"path": "notes/Spring+Mybtais+Phoenix整合.md",
"chars": 12362,
"preview": "# Spring/Spring Boot 整合 Mybatis + Phoenix\n\n<nav>\n<a href=\"#一前言\">一、前言</a><br/>\n<a href=\"#二Spring-+-Mybatis-+-Phoenix\">二、S"
},
{
"path": "notes/Sqoop基本使用.md",
"chars": 9699,
"preview": "# Sqoop基本使用\n\n<nav>\n<a href=\"#一Sqoop-基本命令\">一、Sqoop 基本命令</a><br/>\n<a href=\"#二Sqoop-与-MySQL\">二、Sqoop 与 MySQL</a><br/>\n<a hr"
},
{
"path": "notes/Sqoop简介与安装.md",
"chars": 4280,
"preview": "# Sqoop 简介与安装\n\n<nav>\n<a href=\"#一Sqoop-简介\">一、Sqoop 简介</a><br/>\n<a href=\"#二安装\">二、安装</a><br/>\n <a hr"
},
{
"path": "notes/Storm三种打包方式对比分析.md",
"chars": 11084,
"preview": "# Storm三种打包方式对比分析\n\n<nav>\n<a href=\"#一简介\">一、简介</a><br/>\n<a href=\"#二mvn-package\">二、mvn package</a><br/>\n<a href=\"#三maven-as"
},
{
"path": "notes/Storm和流处理简介.md",
"chars": 3707,
"preview": "# Storm和流处理简介\n\n<nav>\n<a href=\"#一Storm\">一、Storm</a><br/>\n <a href=\"#11-简介\""
},
{
"path": "notes/Storm核心概念详解.md",
"chars": 5796,
"preview": "# Storm 核心概念详解\n\n<nav>\n<a href=\"#一storm核心概念\">一、Storm核心概念</a><br/>\n <a href=\"#11--Topologies拓扑\">1.1"
},
{
"path": "notes/Storm编程模型详解.md",
"chars": 13472,
"preview": "# Storm 编程模型\n\n<nav>\n<a href=\"#一简介\">一、简介</a><br/>\n<a href=\"#二IComponent接口\">二、IComponent接口</a><br/>\n<a href=\"#三Spout\">三、Sp"
},
{
"path": "notes/Storm集成HBase和HDFS.md",
"chars": 13372,
"preview": "# Storm集成HDFS和HBase\n\n<nav>\n<a href=\"#一Storm集成HDFS\">一、Storm集成HDFS</a><br/>\n<a href=\"#二Storm集成HBase\">二、Storm集成HBase</a><br"
},
{
"path": "notes/Storm集成Kakfa.md",
"chars": 10354,
"preview": "# Storm集成Kafka\n\n<nav>\n<a href=\"#一整合说明\">一、整合说明</a><br/>\n<a href=\"#二写入数据到Kafka\">二、写入数据到Kafka</a><br/>\n<a href=\"#三从Kafka中读取"
},
{
"path": "notes/Storm集成Redis详解.md",
"chars": 19510,
"preview": "# Storm 集成 Redis 详解\n\n<nav>\n<a href=\"#一简介\">一、简介</a><br/>\n<a href=\"#二集成案例\">二、集成案例</a><br/>\n<a href=\"#三storm-redis-实现原理\">三、"
},
{
"path": "notes/Zookeeper_ACL权限控制.md",
"chars": 8527,
"preview": "# Zookeeper ACL\n\n<nav>\n<a href=\"#一前言\">一、前言</a><br/>\n<a href=\"#二使用Shell进行权限管理\">二、使用Shell进行权限管理</a><br/>\n  "
},
{
"path": "notes/Zookeeper_Java客户端Curator.md",
"chars": 10094,
"preview": "# Zookeeper Java 客户端 ——Apache Curator\n\n<nav>\n<a href=\"#一基本依赖\">一、基本依赖</a><br/>\n<a href=\"#二客户端相关操作\">二、客户端相关操作</a><br/>\n&nb"
},
{
"path": "notes/Zookeeper常用Shell命令.md",
"chars": 7338,
"preview": "# Zookeeper常用Shell命令\n\n<nav>\n<a href=\"#一节点增删改查\">一、节点增删改查</a><br/>\n <a href"
},
{
"path": "notes/Zookeeper简介及核心概念.md",
"chars": 7552,
"preview": "# Zookeeper简介及核心概念\n\n<nav>\n<a href=\"#一Zookeeper简介\">一、Zookeeper简介</a><br/>\n<a href=\"#二Zookeeper设计目标\">二、Zookeeper设计目标</a><b"
},
{
"path": "notes/installation/Azkaban_3.x_编译及部署.md",
"chars": 3562,
"preview": "# Azkaban 3.x 编译及部署\n\n<nav>\n<a href=\"#一Azkaban-源码编译\">一、Azkaban 源码编译</a><br/>\n<a href=\"#二Azkaban-部署模式\">二、Azkaban 部署模式</a><"
},
{
"path": "notes/installation/Flink_Standalone_Cluster.md",
"chars": 8178,
"preview": "# Flink Standalone Cluster\n<nav>\n<a href=\"#一部署模式\">一、部署模式</a><br/>\n<a href=\"#二单机模式\">二、单机模式</a><br/>\n &nb"
},
{
"path": "notes/installation/HBase单机环境搭建.md",
"chars": 4497,
"preview": "# HBase基本环境搭建\n\n<nav>\n<a href=\"#一安装前置条件说明\">一、安装前置条件说明</a><br/>\n<a href=\"#二Standalone-模式\">二、Standalone 模式</a><br/>\n<a href"
},
{
"path": "notes/installation/HBase集群环境搭建.md",
"chars": 5416,
"preview": "# HBase集群环境配置\n\n<nav>\n<a href=\"#一集群规划\">一、集群规划</a><br/>\n<a href=\"#二前置条件\">二、前置条件</a><br/>\n<a href=\"#三集群搭建\">三、集群搭建</a><br/>\n"
},
{
"path": "notes/installation/Hadoop单机环境搭建.md",
"chars": 4163,
"preview": "# Hadoop单机版环境搭建\n\n<nav>\n<a href=\"#一前置条件\">一、前置条件</a><br/>\n<a href=\"#二配置-SSH-免密登录\">二、配置 SSH 免密登录</a><br/>\n<a href=\"#三Hadoop"
},
{
"path": "notes/installation/Hadoop集群环境搭建.md",
"chars": 5302,
"preview": "# Hadoop集群环境搭建\n\n<nav>\n<a href=\"#一集群规划\">一、集群规划</a><br/>\n<a href=\"#二前置条件\">二、前置条件</a><br/>\n<a href=\"#三配置免密登录\">三、配置免密登录</a><"
},
{
"path": "notes/installation/Linux下Flume的安装.md",
"chars": 1168,
"preview": "# Linux下Flume的安装\n\n\n## 一、前置条件\n\nFlume 需要依赖 JDK 1.8+,JDK 安装方式见本仓库:\n\n> [Linux 环境下 JDK 安装](https://github.com/heibaiying/BigD"
},
{
"path": "notes/installation/Linux下JDK安装.md",
"chars": 1034,
"preview": "# Linux下JDK的安装\n\n>**系统环境**:centos 7.6\n>\n>**JDK 版本**:jdk 1.8.0_20\n\n\n\n### 1. 下载并解压\n\n在[官网](https://www.oracle.com/technetwor"
},
{
"path": "notes/installation/Linux下Python安装.md",
"chars": 1238,
"preview": "## Linux下Python安装\n\n>**系统环境**:centos 7.6\n>\n>**Python 版本**:Python-3.6.8\n\n### 1. 环境依赖\n\nPython3.x 的安装需要依赖这四个组件:gcc, zlib,zli"
},
{
"path": "notes/installation/Linux环境下Hive的安装部署.md",
"chars": 5000,
"preview": "# Linux环境下Hive的安装\n\n<nav>\n<a href=\"#一安装Hive\">一、安装Hive</a><br/>\n <a href=\"#"
},
{
"path": "notes/installation/Spark开发环境搭建.md",
"chars": 4434,
"preview": "# Spark开发环境搭建\n\n<nav>\n<a href=\"#一安装Spark\">一、安装Spark</a><br/>\n<a href=\"#二词频统计案例\">二、词频统计案例</a><br/>\n<a href=\"#三Scala开发环境配置\""
}
]
// ... and 25 more files (download for full content)
About this extraction
This page contains the full source code of the heibaiying/BigData-Notes GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 225 files (964.0 KB), approximately 348.0k tokens, and a symbol index with 282 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.